<a href="https://colab.research.google.com/github/rhodes-byu/cs-stat-180/blob/main/notebooks/04-matplotlib.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><p><b>After clicking the "Open in Colab" link, copy the notebook to your own Google Drive before getting started, or it will not save your work</b></p>

## Introduction

Matplotlib is the "grandfather" library of data visualization with Python. It was created to try to replicate MatLab's (another programming language) plotting capabilities in Python.  

Matplotlib was originally released in 2003!  This predates Pandas (which was originally released in 2008), so it can feel a little clunky to modern users.  Having said that, it is still a great visualization package and you really can't talk about visualization in Python without some talk about Matplotlib.  

Some of the major Pros of Matplotlib are:

* Generally easy to get started for simple plots
* Support for custom labels and texts
* Great control of every element in a figure
* High-quality output in many formats
* Very customizable in general

For more information, explore the official Maplotlib web page: https://matplotlib.org/.  Here you can find 
[examples (with code)](https://matplotlib.org/stable/gallery/index) for almost any kind of plot you want to create, a [quick start guide](https://matplotlib.org/stable/tutorials/introductory/quick_start.html), [cheat sheets](https://matplotlib.org/cheatsheets/), detailed official [documentation](https://matplotlib.org/stable/api/index.html) to the api, and more.  
    
## Importing

It is standard to import the `matplotlib.pyplot` module with the alias `plt` :

In [None]:
import matplotlib.pyplot as plt
import matplotlib
import numpy as np

# Basic Example

Let's walk through a very simple example using two numpy arrays:

### Example

Let's walk through a very simple example using two numpy arrays. You can also use lists or pandas columns.

**The data we want to plot:**

In [None]:
x = np.linspace(-np.pi, np.pi, 50)
cos_x = np.cos(x)
sin_x = np.sin(x)

## Basic Matplotlib Commands with *functions*

We can create a very simple line plot using the `plt.plot()`

In [None]:
plt.plot(x, cos_x)

We can create a very simple scatterplot using the `plt.scatter()`

In [None]:
plt.scatter(x, cos_x)

____
## Labels and titles

Now that we have covered the basics of how to create a figure canvas and add axes instances to the canvas, let's look at how decorate a figure with titles, and axis labels.

**Figure titles**

A title can be added to each axis instance in a figure. To set the title, use the `plt.title` method in the axes instance:

**Axis labels**

Similarly, with the methods `plt.xlabel` and `plt.ylabel`, we can set the labels of the X and Y axes:

In [None]:
# Generate basic plot
plt.plot(x, cos_x)

# Add some attributes
plt.xlabel('X Axis Title')
plt.ylabel('Y Axis Title')
plt.title('Title')

## Optional Arguments for scatteplot:

**s:**          The size of the markers. It can be a single value or an array of the same length as x and y.  
**c:**         The color of the markers. It can be a single color format string, or an array of numbers that will be mapped to colors using the cmap and norm parameters.  
**marker:**     The style of the markers. Examples include 'o' for circles, 's' for squares, etc.  
**cmap:**       A Colormap instance or a registered colormap name. This is used when c is an array of numbers.  
**norm:**      A Normalize instance to scale data values to the range [0, 1].  
**vmin:**       Minimum data value that corresponds to the colormap’s lower bound.  
**vmax:**      Maximum data value that corresponds to the colormap’s upper bound.  
**alpha:**      The alpha blending value, between 0 (transparent) and 1 (opaque).  
**linewidths:** The width of the marker edges.  
**edgecolors:** The color of the marker edges. It can be 'face', 'none', 'auto', or a color.  


A list of matplotlib colors can be found [here](https://matplotlib.org/stable/gallery/color/named_colors.html).  
Axes classes (e.g., plot, scatter, bar, etc.) can be found [here](https://matplotlib.org/stable/api/axes_api.html).

## Color Usage in Matplotlib

In Matplotlib, colors can be specified in several ways to customize plots:

- **color**: The `color` parameter is used to set the color of lines, markers, or other plot elements. It accepts named colors (e.g., `"red"`), hex codes (e.g., `"#FF5733"`), RGB(A) tuples, or grayscale strings (e.g., `"0.5"` for 50% gray).

- **c**: The `c` parameter is commonly used in functions like `scatter()` to specify the color(s) of individual points. It can take a single color (like `color`), or an array of values that will be mapped to colors using a colormap (with the `cmap` parameter).

**Summary:**  
- Use `color` for setting a single color for a plot element.  
- Use `c` for mapping data values to colors, especially in scatter plots and similar functions.
- Colormaps (cmaps) available can be found here: [Matplotlib Colormaps](https://matplotlib.org/stable/users/explain/colors/colormaps.html)

In [None]:
# Using 'color' to set a single color for all points in a scatter plot
plt.scatter(x, sin_x, color='tomato')
plt.title("Scatter with single 'color'")

In [None]:
# Using 'c' to map data values to colors in a scatter plot
plt.scatter(x, sin_x, c=cos_x, cmap='viridis')
plt.title("Scatter with 'c' mapped to cos_x")
plt.colorbar(label='cos_x value')

In [None]:
# Brigham Young University (BYU) official RGB colors

# Matplotlib expects RGB values in the range [0, 1], so we need to normalize them
byu_blue = (0/255, 46/255, 93/255)
byu_royal = (0/255, 61/255, 165/255)
byu_slate = (124/255, 135/255, 142/255)

# Example plot using BYU colors
plt.plot(x, cos_x + .5, color=byu_blue, label='BYU Blue')
plt.plot(x, sin_x, color=byu_royal, label='BYU Royal')
plt.plot(x, cos_x + sin_x, color=byu_slate, label='BYU Slate')

plt.title("Example using BYU RGB Colors")
plt.legend()
plt.show()

## Line Width (`linewidth` or `lw`)

We can make lines thicker or thinner.


In [None]:
x = np.linspace(0, 10, 100)

plt.plot(x, x + 1, color="red", linewidth=0.5, label="linewidth=0.5")
plt.plot(x, x + 2, color="red", linewidth=1.5, label="linewidth=1.5")
plt.plot(x, x + 3, color="red", linewidth=3.0, label="linewidth=3.0")

plt.ylim(0, 15)
plt.legend()
plt.title("Varying Line Widths")
plt.show()


## Line Style (`linestyle` or `ls`)

Common options are:
- `'-'` solid  
- `'--'` dashed  
- `'-.'` dash-dot  
- `':'` dotted  


In [None]:
plt.plot(x, x + 1, color="green", lw=2, linestyle="-", label="solid")
plt.plot(x, x + 2, color="green", lw=2, linestyle="--", label="dashed")
plt.plot(x, x + 3, color="green", lw=2, linestyle="-.", label="dash-dot")
plt.plot(x, x + 4, color="green", lw=2, linestyle=":", label="dotted")

plt.ylim(0, 15)
plt.legend()
plt.title("Different Line Styles")
plt.show()


## Markers (`marker`)

We can add symbols at each data point.  
Examples: `'o'` (circle), `'s'` (square), `'+'` (plus), `'*'` (star), `'1'` (tri-down).

Additional markers can be found here: [Matplotlib Markers](https://matplotlib.org/stable/gallery/lines_bars_and_markers/marker_reference.html)


In [None]:
x = np.linspace(0, 10, 10)

plt.plot(x, x + 1, color="blue", lw=2, marker="+", label="plus")
plt.plot(x, x + 2, color="blue", lw=2, marker="o", label="circle")
plt.plot(x, x + 3, color="blue", lw=2, marker="s", label="square")
plt.plot(x, x + 4, color="blue", lw=2, marker="1", label="tri-down")

plt.ylim(0, 15)
plt.legend()
plt.title("Different Markers")
plt.show()


## Marker Size and Color

We can customize markers further with:
- `markersize` (or `ms`)  
- `markerfacecolor` (fill color)  
- `markeredgecolor` and `markeredgewidth`  


In [None]:
plt.plot(x, x + 1, color="purple", lw=1, ls="-", marker="o", markersize=4, label="small circle")
plt.plot(x, x + 2, color="purple", lw=1, ls="-", marker="o", markersize=8, markerfacecolor="red", label="red fill")
plt.plot(x, x + 3, color="purple", lw=1, ls="-", marker="s", markersize=10,
         markerfacecolor="yellow", markeredgewidth=2, markeredgecolor="green", label="custom square")

plt.ylim(0, 15)
plt.legend()
plt.title("Marker Size and Color")
plt.show()


In [None]:
# Original x
x = np.linspace(-np.pi, np.pi, 50)

## Multiple Plots (Axes) on the Same Figure

plt.subplot(nrows, ncols, plot_number)

## plt.subplots for multiple figures
plt.subplots() is a convenient way to create a figure and a grid of subplots (axes) at once.
It returns a tuple: (fig, axes), where 'fig' is the Figure object and 'axes' is an array of Axes objects.
You can specify the number of rows and columns with nrows and ncols, and other options like figsize.

### Common parameters:
- nrows, ncols: number of rows and columns of subplots
- figsize: size of the figure (width, height) in inches
- sharex, sharey: share x/y axes among subplots
- squeeze: if True, reduce the dimensionality of the axes array when possible

In [None]:
# Empty canvas of 1 by 2 subplots
fig, axes = plt.subplots(nrows = 1, ncols = 2, sharey = True, figsize = (10, 5)) # figsize: (width, height)

In [None]:
# Axes is an array of axes to plot on
axes

In [None]:
axes[0]

We can iterate through this array:

In [None]:
axes[0].plot(x, cos_x, 'b')
axes[0].set_xlabel('X Label 0')
axes[0].set_ylabel('Y Label 0')
axes[0].set_title('Title 0')


axes[1].plot(x, sin_x, 'r')
axes[1].set_xlabel('X Label 1')
axes[1].set_ylabel('Y Label 1')
axes[1].set_title('Title 1')


# # Display the figure object    
fig

In [None]:
fig, axes = plt.subplots(1, 2, sharey = True, figsize = (12, 6))

axes[0].plot(x, sin_x, color = 'darkcyan')
axes[0].set_xlabel('x')
axes[0].set_ylabel('y')
axes[0].set_title('title')

axes[1].scatter(cos_x, sin_x)

We can iterate through the axes. A common issue with matplolib is overlapping subplots or figures. We ca use **fig.tight_layout()** or **plt.tight_layout()** method, which automatically adjusts the positions of the axes on the figure canvas so that there is no overlapping content.

In [None]:
fig, axes = plt.subplots(nrows = 1, ncols = 2, figsize = (10, 5))
colors = ['firebrick', 'blue']

for i, ax in enumerate(axes):
    ax.plot(x, cos_x, color = colors[i])
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_title(colors[i])

fig    
plt.tight_layout()

We can have a matrix of plots.

In [None]:
fig, axes = plt.subplots(2, 3)

axes[0, 1].scatter(x, sin_x)
axes[1, 2].plot(x, cos_x)

### Looping through double axes.

In [None]:
fig, axes = plt.subplots(2, 2, figsize = (12, 8))
colors = ['blue', 'green', 'purple', 'orange']
widths = [1, 1, 5, 10]

for i, row in enumerate(axes):
    for j, ax in enumerate(row):
        ax.plot(x, cos_x, color = colors[i + j], linewidth = widths[i + j])
        ax.set_title(f'Plot {i + j}')

### Loop through axes with numpy.ravel()

This allows you to use a single for loop as it flattens the array used to store the individual ax objects.

In [None]:
fig, axes = plt.subplots(2, 2, figsize = (12, 8))
colors = ['blue', 'green', 'purple', 'orange']
widths = [1, 1, 5, 10]

for i, ax in enumerate(axes.ravel()):
    ax.plot(x, cos_x, x, sin_x + i, c = colors[i], linewidth = widths[i])

    # if i == 0:
    #     ax.scatter(x, sin_x)
        
    ax.set(title = colors[i].upper())

plt.tight_layout()

### Legends

You can use the **label="label text"** keyword argument when plots or other objects are added to the figure, and then using the **legend** method without arguments to add the legend to the figure: 

In [None]:
fig = plt.figure()

ax = fig.add_axes([0, 0, 1, 1])

ax.plot(x, cos_x, label = "cos")
ax.plot(x, sin_x, label = "sin")
ax.legend(loc = 0)

The **legend** function takes an optional keyword argument **loc** that can be used to specify where in the figure the legend is to be drawn. The allowed values of **loc** are numerical codes for the various places the legend can be drawn. See the [documentation page](http://matplotlib.org/users/legend_guide.html#legend-location) for details. Some of the most common **loc** values are:

```
ax.legend(loc = 0) # let matplotlib decide the optimal location
ax.legend(loc = 1) # upper right corner 
ax.legend(loc = 2) # upper left corner
ax.legend(loc = 3) # lower left corner
ax.legend(loc = 4) # lower right corner
```
many more options are available

you can also specify these using strings, such as `'best'`, `'upper right'`, `'lower right'`, etc.

## Saving figures
Matplotlib can generate high-quality output in a number formats, including PNG, JPG, EPS, SVG, PGF and PDF. 

To save a figure to a file we can use the `savefig` method in the `Figure` class:

In [None]:
fig.savefig("trigonometry.pdf")

For a tighter layout:

In [None]:
fig.savefig('trig_tight.pdf', bbox_inches = 'tight')

Here we can also optionally specify the DPI and choose between different output formats:

In [None]:
fig.savefig("filename.png", dpi = 200)

# Other options

## Plot range

We can configure the ranges of the axes using the `set_ylim` and `set_xlim` methods in the axis object, or `axis('tight')` for automatically getting "tightly fitted" axes ranges:

In [None]:
fig, axes = plt.subplots(1, 2, figsize = (12, 4))

axes[0].plot(x, cos_x, x, sin_x)
axes[0].set_title("default axes ranges")


axes[1].plot(x, cos_x, x, sin_x)
axes[1].set_ylim([-.5, .5])
axes[1].set_xlim([-2, 0])
axes[1].set_title("custom axes range")

#### Logarithmic scale

It is also possible to set a logarithmic scale for one or both axes. This functionality is in fact only one application of a more general transformation system in Matplotlib. Each of the axes' scales are set seperately using `set_xscale` and `set_yscale` methods which accept one parameter (with the value "log" in this case):

In [None]:
xl = np.linspace(0, 1, 100)

fig, axes = plt.subplots(1, 2, figsize=(10,4))
      
axes[0].plot(xl, xl**2, xl, np.exp(xl))
axes[0].set_title("Normal scale")

axes[1].plot(xl, xl**2, xl, np.exp(xl))
axes[1].set_yscale("log")
axes[1].set_title("Logarithmic scale (y)");

### Axis grid

With the `grid` method in the axis object, we can turn on and off grid lines. We can also customize the appearance of the grid lines using the same keyword arguments as the `plot` function:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10,3))

# default grid appearance
axes[0].plot(x, x**2, x, x**3, lw=2)
axes[0].grid(True)

# custom grid appearance
axes[1].plot(x, x**2, x, x**3, lw=2)
axes[1].grid(color='b', alpha=0.5, linestyle='dashed', linewidth=0.9)

### Text annotation

Annotating text in matplotlib figures can be done using the `text` function. It supports LaTeX formatting just like axis label texts and titles:

In [None]:
fig, ax = plt.subplots()

xx = np.linspace(-0.75, 1., 100)
ax.plot(xx, xx**2, xx, xx**3)

ax.text(0.15, 0.2, r"$y=x^2$", fontsize=20, color="blue")
ax.text(0.65, 0.1, r"$y=x^3$", fontsize=20, color="green")

# Additional Plot Types

There are many specialized plots we can create, such as barplots, histograms, scatter plots, and much more. Most of these type of plots we will create using seaborn, a statistical plotting library for Python. But here are a few examples of these type of plots:

In [None]:
from random import sample
data = sample(range(1, 1000), 100)
plt.hist(data)

In [None]:
data = [np.random.normal(0, std, 100) for std in range(1, 4)]

# rectangular box plot
plt.boxplot(data, vert = True);

In [None]:
# Violin Plot
fig, ax = plt.subplots()
ax.violinplot(data, showmeans=True, showmedians=True);

In [None]:
# Heatmap data
data = np.random.rand(10, 10)

# Create the heatmap
fig, ax = plt.subplots()
cax = ax.imshow(data, cmap='viridis')

# Add colorbar
fig.colorbar(cax)

# Add labels
ax.set_title('Heatmap Example')
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')

plt.show()

## Further reading

* http://www.matplotlib.org - The project web page for matplotlib.
* https://github.com/matplotlib/matplotlib - The source code for matplotlib.
* http://matplotlib.org/gallery.html - A large gallery showcaseing various types of plots matplotlib can create. Highly recommended! 
* http://www.loria.fr/~rougier/teaching/matplotlib - A good matplotlib tutorial.
* http://scipy-lectures.github.io/matplotlib/matplotlib.html - Another good matplotlib reference.
