# Matplotlib/Seaborn tutorial

Welcome to the Matplotlib and Seaborn tutorial! In this tutorial, you will learn the basics of graphing in Python. Matplotlib and Seaborn are the main libraries we will be using to do this. Both are relatively easy to use and integrate seamlessly with the other libraries we will be using. To get started, import the following libraries:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# workshop-specific
import numpy as np
import pandas as pd

# **Matplotlib**

Matplotlib is one of the easiest graphing libraries to use, but it doesn't produce very good-looking graphs in general. It takes a lot of customization to make your graphs look good! However, Matplotlib has many styling and sizing options for you, and it's very well documented on stackoverflow as well. So we can build most of our graphs in just a few lines and without much headache at all.

Matplotlib uses a functional paradigm of programming most of the time- we call functions on the same object over and over again in order to build a graph. This central object is the ```matplotlib.pyplot``` object, which we imported previously as ```plt```!

Matplotlib can interact with many different libraries easily, e.g., Pandas and Numpy, which makes it a really powerful tool in our stack.

## **Important Matplotlib imports**

In [None]:
# the actual matplotlib library
import matplotlib.pyplot as plt # we did this before, but it's included again here for reference
import matplotlib as mpl

# global parameters
from matplotlib import rcParams

# for custom fonts
import matplotlib.font_manager as fm

# color mapping
from matplotlib import cm

# for gradients
from matplotlib.collections import LineCollection

## **Simple plotting**

There are several basic plotting functions that are the basis of any graph you make in matplotlib:
- ```plt.plot(x [, y, z, ...])```: plots a graph with the input provided. You can provide one or more inputs, and matplotlib will decide what to do with them.
    - if you provide 1 input (e.g., ```plt.plot(x)```), the input will be plotted on the y-axis with no x-axis
    - if you provide 2 inputs (e.g., ```plt.plot(x, y)```), the first input will be x-axis and the second will be the y-axis
    - The inputs must be python iterables (lists, numpy arrays, pandas dataframe rows/columns) but they can contains numbers or strings.
- ```plt.show()```: delimits the lines on the given graph. You can call ```plt.plot``` as many times as you want, but if you call ```plt.show``` after all those calls their results will all show up on 1 graph. If you don't call ```plt.show```, each ```plt.plot``` call will show up on its own individual graph.

Run the examples below to see more on how ```plt.plot``` and ```plt.show``` work!

In [None]:
# defining some arrays to test
x = [1, 3, 5]
y = [2, 4, 6]

# all on 1 plot using plt.show at the end
plt.plot(x)
plt.plot(y)
plt.plot(x, y)
plt.show()

# separate plots by calling plt.show between each plt.plot call
plt.plot(x)
plt.show()
plt.plot(y)
plt.show()
plt.plot(x, y)
plt.show()

# plt.plot can handle any (finite) iterable- it cannot handle generators
z = np.array([i for i in range(1, 35, 2)])
a = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

# numpy array plotting
print(z)
plt.plot(z)
plt.show()

# pandas dataframe plotting
print(a)
plt.plot(a["b"])
plt.show()

## **```plt.plot``` Paramters**

```plt.plot``` will take many parameters to help you customize your graphs to your heart's desire! All the parameters are passed as keyword arguments to the ```plt.plot``` function. 
- full list of arguments: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

The list of parameters is pretty extensive and it can usually be a bit cumbersome to enter. For essential styling, matplotlib provides an easy way of defining options, known as format strings!
- format string: easy way of defining multiple options at a time
- format strings are the first parameter you pass to ```plt.plot``` after your input data.
- syntax: "[color][marker shape][line type]" where each is 1 or 2 characters.
    - color: single character denoting a basic color (e.g., red- r, blue- b, etc.)
    - marker shape: single character denoting the shape of the marker for each data point: e.g., o- circle, +- plus
    - line type: denotes line style in one or more characters, e.g. -: solid, --: dashed
    - all are optional
- full list of options for format strings can be found here: https://python-course.eu/numerical-programming/formatting-plot-in-matplotlib.php

See some examples of matplotlib keyword arguments below!

In [None]:
x = [1, 3, 5]
y = [2, 4, 6]

# basic keyword arguments: color and linestyle
plt.plot(x, color='black', linewidth=10)
plt.show()

# basic format string. g represents green, o prepresents dots 
# as the shape, and -- represents a dashed line between points.
plt.plot(y, "go--", markersize=12)
plt.show()

# many keyword arguments
plt.plot(x, y, color='#b0b0b0', marker='+', linestyle='dashed',
     linewidth=2, markersize=12)
plt.show()

## **Titles and labels**

After creating your perfect graph using the ```plt.plot``` function, it's time to add some titles and labels to your graph.

- titles can be added using the ```plt.title``` function. ```plt.title``` has several options for customization, all of which can be found here: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.title.html
- labels can be passed in as a list of strings. Your list of labels MUST be the same size as your input, or else matplotlib will throw an error.
    - labels can appear on the x or y axis. They can be accessed using the ```plt.gca().get_[x/y]ticklabels()``` function, and properties can be applied to them using the ```plt.setp``` function. 

See examples of labels and titles below!

In [None]:
z = np.array([i for i in range(1, 35, 2)])
a = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "xlabels": ["point a", "point b", "point c"]})

# plot with a title
plt.plot(z)
plt.title("This is a title", fontdict=None, loc='center', pad=None, fontname='sans-serif')
plt.show()

# simple plot with labels on x axis
plt.plot(a["xlabels"], a["b"])
plt.show()

# plot with label properties added
plt.plot(a["xlabels"], a["a"])
plt.setp(plt.gca().get_xticklabels(), rotation=60, horizontalalignment='right')
plt.show()

# plot with labels on the y axis and properties added
plt.plot(a["a"], a["xlabels"])
plt.setp(plt.gca().get_yticklabels(), rotation=20, horizontalalignment='right')
plt.show()

## **Types of plots**

Matplotlib has many, many different types of plots. Each one has its own similar function to ```plt.plot```, and each can be customized in the exact same way as a plot created using ```plt.plot``` (i.e., each one interacts with titles and labels, etc in the same way).
- full list of different types of plots (click on the pictures for example code): https://matplotlib.org/stable/plot_types/index.html

See some quick examples of different plots below!

In [None]:
# normal plot
plt.plot([0, 1, 2])

# horizontal line
plt.axhline(y=1, color='r', linestyle='-')

# vertical; line
plt.axvline(x=1, color='r', linestyle='--')

plt.show()

## **```rcParams```**

As you program with matplotlib, you may notice that you're setting the same parameters to the same values over and over again. However, you don't need to do this! ```rcParams``` is the global paramter set for matplotlib, and it allows you to set values for parameters that you would normally pass to your plot functions globally, so you don't have to pass them to your plot functions every time. Thus, any changes to rcParams should be made before interacting with other parts of matplotlib, since they will be reflected in every single graph generated after it is set.

```rcParams``` is already filled with defaults, so you only need to make changes to it if you have settings you want on all or most of your graphs. ```rcParams``` is a python dictionary and can be interacted with in the same way any python dictionary can.

The full list of ```rcParams``` attributes can be found here: https://matplotlib.org/stable/tutorials/introductory/customizing.html#the-default-matplotlibrc-file

See some examples of ```rcParams``` configurations below!

In [None]:
# rcParams must be imported
from matplotlib import rcParams

# some notable rcParams settings

# line width in points
rcParams["lines.linewidth"] = 1.5 
# line style             
rcParams["lines.linestyle"] = "-" # "--": dashed line 
# line marker
rcParams["lines.marker"] = "None"
# font             
rcParams["font.family"] = "sans-serif"
# text color
rcParams["text.color"] = "black" # or any hexadecimal code
# legend location
rcParams["legend.loc"] = "best"

## **```fig``` and ```ax```**

Matplotlib plots can be broken into several parts, which makes them easier to interact with and customize. This functionality can be accessed using functions discussed previously, but it is often easier to do this using ```fig``` and ```ax```.

- ```fig```: the actual figure of the plot
  - allows the size of the plot to be set
  - allows for macroscopic parameters of the plot to be set: main title, layout, etc.
- ```ax```: the axes of the plot
  - allows for labels, limits, and more to be set
  - allows for axis-specific settings
- advantage of ```fig``` and ```ax```: allows for easy organization of graphs. You can see this in the example below, where we have 1 set of graphs which is based on ```ax```, and another based on ```ax2```. The graphs plotted on ```ax``` share their y-axis, and so do those on ```ax2```, and each receives its own y-axis scale and legend.

See some ```fig``` and ```ax``` examples below!

In [None]:
# setting figure size- very important for aspect ratio!
fig = plt.figure(figsize=(10, 6))

# using ax- first group
ax = plt.gca()

ax.set_ylim(3, 20)
ax.plot([i for i in range(20)], [i for i in range(20)], label="line 1")

ax.set_xlabel('x label')
ax.set_ylabel('y label')

ax.legend(loc=0)

# creating a new group and plotting
ax2 = ax.twinx()

ax2.plot([2 * i for i in range(20)], label="line 2")
ax2.plot([3 * i for i in range(20)], label="line 3")

ax2.legend(loc=1)

# note that fig title must be set using suptitle
fig.suptitle('Main title')

plt.show()

## Some useful skills

This section contains some notably useful pieces of code for matplotlib users! Note that the saved graph from the "saving graphs as images" sections must be uploaded to your google form!

In [None]:
# changing the font to a custom font

# importing matplotlib font manager
import matplotlib.font_manager as font_manager

# specifying font directories to search
# note: the comma after the one list item denotes a default
font_dirs = ['./tutorial_assets',]
font_files = font_manager.findSystemFonts(fontpaths=font_dirs)
print([f.name for f in mpl.font_manager.fontManager.ttflist])

# Your font path goes here
font_path = './tutorial_assets/atr.ttf'
font_manager.fontManager.addfont(font_path)
prop = font_manager.FontProperties(fname=font_path)

# storing font as default (optional)
rcParams['font.family'] = 'sans-serif'
rcParams['font.sans-serif'] = prop.get_name()

# alternative route without setting sans-serif
print(prop.get_name())
rcParams['font.family'] = 'American Typewriter'

In [None]:
# saving a graph as a png
# NOTE: DO THIS INSTEAD OF SCREENSHOTTING!!!

plt.plot(["point 1", "point 2", "point 3"], [0, 2, 4], color='r')
plt.title("save this plot!")
# this function saves the graph
# you must call it before plt.show!!
plt.savefig('upload_this.png')
plt.show()

# **Seaborn**

The next section will briefly cover Seaborn.
- makes more aesthetically pleasing plots than matplotlib
- integrated with pandas, to make dataframes especially easy to plot
- integrated with matplotlib (can use any matplotlib settings/functions/etc)

## **Basic plot**

- uses lmplot
- integration with pandas- select columns and provide data

In [None]:
a = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "xlabels": ["point a", "point b", "point c"]})

a.head()

sns.lmplot(x='a', y='b', data=a)

In [None]:
sns.lmplot(x='a', y='b', data=a)

plt.ylim(1, 6)
plt.xlim(1, 3)

## **Different plots**

- many different ones, similar to matplotlib
  - full list: https://seaborn.pydata.org/tutorial/function_overview.html
- pandas intuitive integration with some dataframes
  - don't even need to specify the columns to plot, seaborn can figure it out

In [None]:
sns.boxplot(data=a)

## **Colors and other attributes**

- can be specified similar to matplotlib
- full list of attributes: https://seaborn.pydata.org/generated/seaborn.lmplot.html
  - some graph-specific ones, see each individually 
- matplotlib integration
  - can use functions from matplotlib
    - e.g. title, legend, etc.
  - can plot multiple datasets on 1 graph
    - delimited by plt.show() just like matplotlib
  - legends can be autogenerated (usually no need to use plt.legend())
  - axes can be grabbed directly from the plotting function (no need for fig and ax unless you are doing some more matplotlib-specific things)

In [None]:
colors = ["#b0b0b0", "#0b0b0b"]

sns.violinplot(data=a, palette=colors)

In [None]:
colors = ["#ff0000", "#0000ff"]

d = pd.DataFrame({"a": np.random.rand(20), "b": np.random.rand(20), "c": np.random.rand(20)})

plt.figure(figsize=(20, 10))

ax = sns.scatterplot(x='a',
               y='b', 
               data=d, 
               color="#ff0000",
               label="p1")

sns.scatterplot(x='b',
               y='c', 
               data=d, 
               color="#0000ff",
               s=200,
               label="p2")

plt.title('Sample plot')

plt.xticks([0.25, 0.5, 0.75])

plt.legend()

plt.show()