<a href="https://colab.research.google.com/github/NIP-Data-Computation/show-and-tell/blob/master/piercel_week2_notes3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Author**: Pierce Lopez <br>
**Date Created**: August 12, 2020 <br>
**Last Updated**: August 13, 2020 <br> 
**Description**: Contains my notes on the Data Analyst lesson: _Introduction to Data Visualization with Matplotlib_.

# Introduction to Data Visualization with Matplotlib
For this chapter, we will make use of the `matplotlib.pyplot` functions so do not forget to import the necessary modules!

```
# import modules
import matplotlib.pyplot as plt
```
## Chapter 1: Introduction to Matplotlib

### Section 1: Introduction to data visualization with matplotlib

1. Subplots
  * `fig` stands for the figure which represent the whole page.
  * `ax` stands for axes which represent a part of that whole page which holds the data.

```
# create figure and axes
fig, ax = plt.subplots()

# add data on axes by calling the plot function
ax.plot(x_data, y_data)

# show plot
plt.show()
```

<br>

### Section 2: Customizing your data

1. A recap of some of ways to customize data.

* `color`
  * 'red' 
  * 'green'
  * 'blue' 
* `linewidth`
  * numeric value
* `linestyle`
  * line ('-')
  * dashed ('--')
  * dash-dotted ('-.')
  * dotted (':')
* `marker`
  * 'x'
  * '*'
  * square ('s')
  * holes ('o')
  * diamond ('d')
  * hexagon ('h')

2. Additional functions for `ax`
  * `.set_xlabel`
  * `.set_ylabel`
  * `.set_title`

<br>

### Section 3: Small multiples

1. Creating a group of subplots in one plot
  * `fig, ax = plt.subplots(rows, columns)`
  * `ax.shape` determines dimensions of plot in terms of the subplot count.

2. Adding data to a particular subplot
  * `ax[row, column].plot(x_data, y_data)`

3. Adding descriptions to subplots
  * `ax[row, column].set_xlabel()`
  * `ax[row, column].set_ylabel()`
  * `ax[row, column].set_title()`

**Additional insights:** 
1. The `sharey` argument in the `plt.subplot()` function makes the y-values in the plots similar.

<br>


## Chapter 2: Plotting Time-Series

### Section 1: Plotting time-series data

1. To let pandas recognize that a column's values are dates, we add an argument to the `pd.read_csv()` function: `parse_dates = True`.
2. We can use the indices of a DataFrame as the axis values for our plots.
3. We can slice our data so that the plots can just show a section of the whole data.

<br>

### Section 2: Plotting time-series with different variables

1. Plotting two different variables over a time-series using twin axes.

```
# plot first pair of data with a color
ax.plot(x_data, y1_data, color = 'r')
ax.set_xlabel("x-axis label")
ax.set_xlabel("y-axis label 1", color = 'r')

# change color of y-axis ticks and tick labels to further distinguish
ax.tick_params("y", colors = 'r')

# set axis twin
ax2 = ax.twinx()

# plot second pair of data with a different color
ax2.plot(x_data, y2_data, color = 'b')
ax2.set_ylabel("y-axis label 2", color = 'b')

# change color of y-axis ticks and tick labels to further distinguish
ax2.tick_params("y", colors = 'b')

# display plot
plt.show()
```

Typing this over and over again for several plots will be tiring, so let's make a function out of it!

2. Defining the plotting function

```
# template

def plot_timeseries(axes, x, y, color, xlabel, ylabel):
  axes.plot(x, y, color = color)
  axes.set_xlabel(xlabel)
  axes.set_ylabel(ylabel, color = color)
  axes.tick_params('y', colors = color)
```

<br>

### Section 3: Annotating time-series data

1. Annotating plots
  * `ax.annotate("annotation", xy = [xcoord, ycoord], xytext = [xcoord, ycoord], arrowprops{})`
    * `xy` - position of the datapoint
    * `xytext` - position of the annotation
    * `arrowprops{}` - places an arrow that points from the annotation position to the datapoint position
      * "arrowstyle"="->" - arrow will have a thin linestyle
      * "color" - changes the arrow color
      * [More annotation guides!](https://matplotlib.org/users/annotations.html)

<br>






## Chapter 3: Quantitative Comparisons and Statistical Visualizations

### Section 1: Quantitative comparisons: bar charts

1. Generating a bar chart
```
ax.bar(x_data, y_data)
```
2. Generating a stacking bar chart (2 different values)
```
ax.bar(x1_data, y1_data)
ax.bar(x1_data, y2_data, bottom = y1_data)
```
3. Generating a stacking bar chart (3 different values)
```
ax.bar(x1_data, y1_data)
ax.bar(x1_data, y2_data, bottom = y1_data)
ax.bar(x1_data, y3_data, bottom = y1_data + y2_data)
```
4. Customizing of bar chart elements
  * `ax.set_xticklabels(x_data, rotation = 90)` - rotates tick labels
  * `label` - `ax.bar()` argument that adds a legend label to the plotted data
  * `ax.legend()` - shows legend labels

<br>

### Section 3: Quantitative comparisons: histograms

1. Generating a histogram
```
ax.hist(x_data)
```
2. Customizing a histogram
  * `bins` - `ax.hist()` argument that specifies bin number (default is 10)
    * We can set the bins to be a list of values, making those values bin dividers!
  * `histtype = step` - `ax.hist()` argument that draws histograms using thin lines.
  * `label`
  * `ax.legend()`

<br>

### Section 3: Statistical plotting

1. Adding error bars to bar charts
  * `yerr` - a plot argument that adds error bars

2. Adding error bars to line plots
```
ax.errorbar(x_data, y_data, yerr)
```
3. Generating boxplots
```
ax.boxplot([x1_data, x2_data])
```

### Section 4: Quantitative comparisons: scatter plots

1. Generating scatter plots
```
# c creates a color gradient basing from the indices
ax.scatter(x1_data, x2_data, color, label, c = climate_change.index)
```

<br>


## Chapter 4: Sharing Visualization with Others

### Section 1: Preparing your figures to share with others

Plot aesthetics: `plt.style.use()`

<br>

### Section 2: Saving your visualizations

Saving a figure: `fig.savefig(filename)`
  * We can choose the filetype for the image.
  * We can toggle the quality of the saved image using the `quality` or `dpi` argument.
  * `fig.set_size_inches([width, height])`

<br>

### Section 3: Automating figures from data

To get unique values from a column: `.unique()`


Tasks from this lesson (self-assessment)

1. Introduction to Matplotlib
  * Introduction to data visualization using matplotlib
    * Did I create plots?
  * Customizing your plots
    * Did I customize my plots using the various plot methods?
  * Small multiples
    * Did I make different subplots on one plot?

2. Plotting Time-Series
  * Plotting time-series data
    * Did I plot time-series data?
  * Plotting time-series with different variables
    * Did I make twin axes to combine time-series plots?
  * Annotating time-series data
    * Did I place an annotation on any of my plots?

3. Quantitative Comparisons and Statistical Visualizations
  * Quantitative comparisons: bar charts
    * Did I make a bar chart?
    * Did I make a stacked bar chart?
  * Quantitative comparisons: histograms
    * Did I make a histogram?
  * Statistical plotting
    * Did I add error bars to my plots?
  * Quantitative comparisons: scatter plots
    * Did I make a scatter plot?

4. Sharing Visualizations with Others
  * Preparing your figures to share with others
    * Did I use a different plot style?
  * Saving your visualizations
    * Did I save any of my plots?