# Data Analysis and Visualization in Python

## About this workshop

This workshop is based on the online tutorial
[Data Carpentry - Data Analysis and Visualization in Python for Ecologists](https://datacarpentry.org/python-ecology-lesson/).

Note: the IPython *notebooks* of this workshop and the associated data are [published on GitHub](https://github.com/calculquebec/data-analysis-python).

### Table of contents
**Day 1** (3 hours)
* [Starting With Data](01-data.ipynb) (notebook [`01-data.ipynb`](01-data.ipynb))
* [Indexing, Slicing and Subsetting DataFrames](02-selection.ipynb) (notebook [`02-selection.ipynb`](02-selection.ipynb))
* [Combining DataFrames with Pandas](03-combining.ipynb) (notebook [`03-combining.ipynb`](03-combining.ipynb))

**Day 2** (3 hours)
* [Data Workflows and Automation](04-workflows.ipynb) (notebook [`04-workflows.ipynb`](04-workflows.ipynb))
* [Making Plots With plotnine](05-plotnine.ipynb) (notebook [`05-plotnine.ipynb`](05-plotnine.ipynb))
* [Matplotlib and Pandas](06-matplotlib.ipynb) (notebook [`06-matplotlib.ipynb`](06-matplotlib.ipynb))

**Extras** (if time permits)
* [Accessing SQLite Databases with Pandas](extras/sqlite.ipynb) (notebook [`extras/sqlite.ipynb`](extras/sqlite.ipynb))

### Our Data

For this lesson, we will be using the Portal Teaching data, which is a subset of the data from Ernst *et al* Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal, Arizona, USA:
https://esapubs.org/archive/ecol/E090/118/

We will be using files from the Portal Project Teaching Database:
https://figshare.com/articles/Portal_Project_Teaching_Database/1314459

This section will use the **`data/surveys.csv`** file, which is a simplified version of the original file that can be downloaded here:
https://ndownloader.figshare.com/files/2292172

We are studying the species and weight of animals caught in plots (or sites) in our study area. The dataset is stored as a `.csv` file: each row holds information for a single animal, and the columns represent:

 Column           | Description
----------------- | -----------
`record_id`       | Unique id for the observation
`month`           | month of observation
`day`             | day of observation
`year`            | year of observation
`plot_id`         | ID of a particular site
`species_id`      | 2-letter code
`sex`             | sex of animal (“M”, “F”)
`hindfoot_length` | length of the hindfoot in mm
`weight`          | weight of the animal in grams

The first few rows of `data/surveys.csv` look like this:

```
record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight
1,7,16,1977,2,NL,M,32,
2,7,16,1977,3,NL,M,33,
3,7,16,1977,2,DM,F,37,
4,7,16,1977,7,DM,M,36,
5,7,16,1977,3,DM,M,35,
6,7,16,1977,1,PF,M,14,
7,7,16,1977,2,PE,F,,
8,7,16,1977,1,DM,M,37,
9,7,16,1977,1,DM,F,34,
```

### How to Use Jupyter
When a cell is in edit mode:

  Shortcut  | Description
----------- | -----------
Shift+Enter | Run the cell, and go to the next
Tab         | Indent code or auto-completion
Esc         | Go to command mode

When a cell is in command mode:

  Shortcut   | Description
------------ | -----------
Shift+Enter  | Run the cell, and go to the next
Double-click | Go to edit mode
Enter        | Go to edit mode

  Shortcut   | Description
------------ | -----------
A            | Insert a cell above
B            | Insert a cell below
C            | Copy the current cell
V            | Paste the cell below
D D          | Delete the current cell
M            | Change to Markdown cell
Y            | Change to Code cell

To reset all cells:
* Go to the top menu, and select Kernel -> Restart & Clear Output

In [None]:
1 + 2  # Press Shift+Enter to execute this cell

### Reference
Python modules:
* [`pandas`](https://pandas.pydata.org/docs/reference/index.html)
  * The [`melt()`](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) function
* [`plotnine`](https://plotnine.readthedocs.io/en/stable/) :
  * [Gallery (examples)](https://plotnine.readthedocs.io/en/stable/gallery.html)
  * Geometric elements constructors [`geom_*()`](https://plotnine.readthedocs.io/en/stable/api.html#geoms)
  * Theme constructors [`theme*()`](https://plotnine.readthedocs.io/en/stable/api.html#themes)
  * Facet constructors [`facet_*()`](https://plotnine.readthedocs.io/en/stable/api.html#facets)
* [`bokeh`](https://docs.bokeh.org/en/latest/) and
[`plotly`](https://plotly.com/python/) for interactive plots in a Web page

Python development environments:
* [JupyterLab](https://docs.computecanada.ca/wiki/JupyterLab) on supercomputers
* [Jupyter Notebook](https://docs.computecanada.ca/wiki/JupyterNotebook/fr), launched locally
* [Spyder IDE](https://www.spyder-ide.org)
* [Visual Studio Code](https://code.visualstudio.com)

Training:
* Future workshops at [Calcul Québec](https://www.eventbrite.ca/o/calcul-quebec-8295332683)
  and [Calcul Canada](https://www.computecanada.ca/research-portal/technical-support/training/)
* [Software Carpentry](https://software-carpentry.org/lessons/)
  and [Data Carpentry](https://datacarpentry.org/lessons/) online tutorials:
  * [The Unix Shell](https://swcarpentry.github.io/shell-novice/)
  * [Programming with Python](https://swcarpentry.github.io/python-novice-inflammation/)
  * [Data Cleaning with OpenRefine](https://datacarpentry.org/OpenRefine-ecology-lesson/)
  * [Data Management with SQL](https://datacarpentry.org/sql-ecology-lesson/)
  * [Data Analysis and Visualization in R](https://datacarpentry.org/R-ecology-lesson/)