# Data Visualization in Python

## About this workshop

* This workshop is based on the online tutorial
  [Data Carpentry - *Data Analysis and Visualization in Python for Ecologists*](https://datacarpentry.org/python-ecology-lesson/).
  * Since the different data analysis
    techniques were seen in the workshop
    [*DAT201 - Data Analysis in Python*](https://github.com/calculquebec/cq-formation-dat201),
    this material mainly covers the visualization examples.
* The IPython *notebooks* of this
  workshop and the associated data are
  [published on GitHub](https://github.com/calculquebec/cq-formation-dat203).

### Table of contents

1. [Visualization with Pandas and Matplotlib](en/01-matplotlib.ipynb)
2. [Introduction to Altair](en/02-altair.ipynb)
3. [Building your plots iteratively](en/03-ggaltair.ipynb)
4. [Plotting time series data](en/04-mark_line.ipynb)
5. [Faceting](en/05-facet.ipynb)
6. [Plotting distributions](en/06-distributions.ipynb)

### How to Use Jupyter
When a cell is in **edit mode**:

  Shortcut  | Description
----------- | -----------
Shift+Enter | Run the cell, and go to the next
Enter       | Insert a new line break
Tab         | Indent code or auto-completion
Esc         | Go to **command mode**

When a cell is in **command mode**:

  Shortcut   | Description
------------ | -----------
Shift+Enter  | Run the cell, and go to the next
Enter        | Go to **edit mode**
A            | Insert a cell above
B            | Insert a cell below
C            | Copy the current cell
V            | Paste the cell below
D D          | Delete the current cell
M            | Change to Markdown cell
Y            | Change to Code cell

To reset all cells:
* Go to the top menu, and select
  Kernel -> Restart & Clear Output

In [None]:
1 + 2  # Press Shift+Enter to execute this cell

### About Tabular Data

Here is some vocabulary about tabular data:

* **Column** or field: one variable of the observations
* **Row** or line: one observation or record
* **Index**: list of unique identifiers for rows

### Our Data

For this lesson, we will be using simplified files from the
[*Portal Project Teaching Database*](https://figshare.com/articles/Portal_Project_Teaching_Database/1314459).
For instance, we will use [`data/surveys.csv`](data/surveys.csv)
which has been made from
[the original file](https://ndownloader.figshare.com/files/2292172).

The following command returns the first lines of the CSV file:

In [None]:
!head data/surveys.csv

 Column           | Description
----------------- | -----------
`record_id`       | Unique id for the observation
`month`           | month of observation
`day`             | day of observation
`year`            | year of observation
`plot_id`         | ID of a particular site
`species_id`      | 2-letter code
`sex`             | sex of animal (“M”, “F”)
`hindfoot_length` | length of the hindfoot in mm
`weight`          | weight of the animal in grams

With this data file, we are studying the species and weight
of animals caught in plots (or sites) in our study area.
Each row holds information for a single animal.

The details for each species is then provided in the file
[`data/species.csv`](data/species.csv).
The following command returns the first lines of that CSV file:

In [None]:
!head data/species.csv

 Column      | Description
------------ | -----------
`species_id` | Species identifier encoded with two letters
`genus`      | Type of species
`species`    | Latin name
`taxa`       | Species familly

Reference: Ernst *et al*
[Long-term monitoring and experimental manipulation
of a Chihuahuan Desert ecosystem near Portal, Arizona, USA](https://esapubs.org/archive/ecol/E090/118/)

### Reference
Python modules:
* [`pandas`](https://pandas.pydata.org/docs/reference/index.html)
* [`altair`](https://altair-viz.github.io/index.html),
  [Vega-Lite](https://vega.github.io/vega-lite/)
  grammar of interactive graphics
* Other visualization modules:
  * [`plotnine`](https://plotnine.org)
  * [`bokeh`](https://docs.bokeh.org/en/latest/)
    and [`plotly`](https://plotly.com/python/)
    for interactive plots in a Web page

Python development environments:
* [JupyterLab](https://docs.alliancecan.ca/wiki/Jupyter#JupyterLab) on supercomputers
* [Spyder IDE](https://www.spyder-ide.org)
* [Visual Studio Code](https://code.visualstudio.com)

Training:
* Future workshops at
  [Calcul Québec](https://www.eventbrite.ca/o/calcul-quebec-8295332683)
  and the
  [Digital Research Alliance of Canada](https://explora.alliancecan.ca/)
* [Software Carpentry](https://software-carpentry.org/lessons/)
  and [Data Carpentry](https://datacarpentry.org/lessons/) online tutorials:
  * [The Unix Shell](https://swcarpentry.github.io/shell-novice/)
  * [Programming with Python](https://swcarpentry.github.io/python-novice-inflammation/)
  * [Data Cleaning with OpenRefine](https://datacarpentry.org/OpenRefine-ecology-lesson/)
  * [Data Management with SQL](https://datacarpentry.org/sql-ecology-lesson/)
  * [Data Analysis and Visualization in R](https://datacarpentry.org/R-ecology-lesson/)