![Matplotlib logo](images/matplotlib_logo.png)

# Static Data Visualization with Matplotlib

Today we'll break down the components and styling charts using the [Matplotlib](https://matplotlib.org/#) library. Once you understand the basics of Matploglib other Python visualization libraries are quite easy to pick up.

## TODO:
- plot()
- show()
- savefig()
- simple plot customization
    - arguments for plot()
    - title()
    - xlabel()
    - ylabel()
- figure()
- subplot()
- make a custom plot from a Pandas DataFrame

## Resources:
- [Python Data Visualization — Comparing 5 Tools](https://codeburst.io/overview-of-python-data-visualization-tools-e32e1f716d10)
- Matplotlib [Gallery](http://matplotlib.org/gallery.html) and [Cookbook](http://scipy.github.io/old-wiki/pages/Cookbook/Matplotlib.html)


## Matplotlib Basics
---
Matplotlib is a 2D plotting library that is suitable for both Python scripts and Jupyter notebooks.  This library expands upon the simple plots that can be generated through Pandas, allowing you to customize to your hearts content.  We'll be using the [pyplot](https://matplotlib.org/api/pyplot_summary.html) module, which generates a plotting experience similar to MATLAB.

In [None]:
# these are the libraries we'll be using throughout the notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
#%matplotlib inline

In [None]:
# check matplotlib version
import matplotlib
matplotlib.__version__

In [None]:
# make a simple plot
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y)

In [None]:
# show the plot
plt.show()

In [None]:
# try showing the plot again
plt.show()

## Why doesn't the plot show a second time?

Matplotlib.pyplot is a bit different than standard Python libraries in that we are not creating a pointer to the plot we're creating.  Instead, the plot() function is creating a single instance of a plot that we can continue to modify, but once the show() function is used it clears that plot from memory.  

When working in a Jupyter notebook we can simply generate a plot in a single cell, and if we want to modify that plot we can add those modifications to that cell and run it again to see the changes.

In [None]:
# run this cell multiple times, uncomment each line to make plot changes
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y)
#plt.title('what a gorgeous plot title!')
#plt.xlabel("here's my x-axis label")
#plt.ylabel("here's my y-axis label")
#plt.grid()
plt.show()

In [None]:
# and again, if we try calling the plot above we won't get anything
# because it has already been cleared from memory
plt.show()

## Jupyter Magic
Jupyter notebooks have some magic features to enhance the matplotlib experience.  You can use either of these two lines to view the plots being made.
```python
%matplotlib inline
%matplotlib notebook
```
You only need to use these lines once, and they're commonly seen at the beginning of a Jupyter notebook when matplotlib is being imported.

In [None]:
# "%matplotlib inline" allows us to view the plot without calling 
# the show() function

# note that using this magic line is just like calling the show() function
%matplotlib inline
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y)

In [None]:
# try calling the show() function
# this illustrates that "%matplotlib inline" acts just like show()
plt.show()

## Restart the jupyter notebook kernel if you want to turn off "%matplotlib inline"

In [None]:
# for now we'll switch back to "inline"
%matplotlib inline
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y)

## Saving Plots
The savefig() will save the current plot to any folder that you specify.  If you do not specify a folder it will be saved in your current working directory (which is wherever this Jupyter notebook is on your machine).  You must provide a file name for the plot, and you can also specify the format of the file (the default is ".png").

In [None]:
# get a list of the supported filetypes for your plots
plt.gcf().canvas.get_supported_filetypes()

In [None]:
# save your figure to your current working directory
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y)
plt.savefig('my_pretty_plot')

In [None]:
# now save the plot as a .pdf in the folder "images"
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y)
plt.savefig('images/my_pretty_plot.pdf')

## Simple Plot Customizations
---
So we've covered how to make a plot with the plot() function, but we can also add some arguments to this function to chage the look of our plots, such as changing the line [colors](https://matplotlib.org/examples/color/named_colors.html) or the [marker shapes](https://matplotlib.org/api/markers_api.html) using the [Line2D](https://matplotlib.org/api/_as_gen/matplotlib.lines.Line2D.html) arguments.  You can also use the abbreviations for [colors](https://matplotlib.org/2.0.2/api/colors_api.html) and [line styles](https://matplotlib.org/gallery/lines_bars_and_markers/line_styles_reference.html).



In [None]:
# add some marker and line changes
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x, y, color='orange', marker='s', markersize=8, linestyle='dashed')

In [None]:
# abbreviated
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x, y, c='orange', marker='s', ms=8, ls='dashed')

In [None]:
# ultra abbreviated
# "C1" = second color in style sheet color cycle
# "s" = square marker
# "--" = dashed line
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x, y, 'C1s--', ms=8)

In [None]:
# you can also add data to the plot in layers
# this is handy if you want to use different line/marker colors
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y,'C1s', ms=8) # markers
plt.plot(x,y,'C0-') # line

In [None]:
# add a legend
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y,'C1s', x,y,'C0-',ms=8)
plt.legend(['marker data','line data']) # add legend for 
plt.show()

In [None]:
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y,'C1s', ms=8) # markers
plt.plot(x,y,'C0-') # line
plt.legend(my_legend)
plt.show()

## Style Sheets
---
[Style sheets](https://matplotlib.org/users/style_sheets.html) are a quick and easy way to completely modify the look of your plots.  You can use pre-defined style sheets or create your own!

In [None]:
# get a list of style sheets that are currently available to you
for i in plt.style.available:
    print(i)

In [None]:
# bonus points to anyone who can figure out how to display the current style sheet

In [None]:
# select a style sheet from the list above to change your plot style
plt.style.use('seaborn-dark')
x = [1,2,3,4]
y = [1,4,9,16]
plt.plot(x,y)

## Exercise
---
1. Make a new folder in your working directory called "styles"
    * Your working directory is the folder where this jupyter notebook is
2. Write a for loop that iterates through each of the matplotlib style sheets and generate a plot for each of those style sheets, save those plots to the "styles" folder in your working directory, and incorporate the style sheet name in the file name.

In [None]:
# solution

# first make a folder in your working directory called "images"
# this can be done in python using the os library
import os
os.mkdir('images')

# "plt.style.available" generates a list to iterate through

for i in plt.style.available:
    # check the state of i
    print(i)
    # change the style to i
    plt.style.use(i)
    # create plot
    x = [1,2,3,4]
    y = [1,4,9,16]
    plt.plot(x,y)
    # create plot filename that changes with i
    file_name = str('images/'+i+'.png')
    # save the plot
    plt.savefig(file_name)

## Lets Make Some Graphs!
---
We're going to revist the dataset that was used for the Pandas lesson to create a graph similar to the graph below made by Hans Rosling.  This will closely follow the [DataCamp matplotlib tutorial](https://campus.datacamp.com/courses/intermediate-python-for-data-science/matplotlib?ex=1).

![Hans Rosling - Gapminder](images/hans_rosling.jpg)

In [None]:
# import the "gapminder.tsv" file in the "data" folder as a Pandas DataFrame
df = pd.read_csv('data/gapminder.tsv', sep='\t')
df.sample(5)

In [None]:
# in this dataset we've got entries over multiple years
df['year'].unique()

In [None]:
# we're going to focus on the entries from 2007 for now
# lets make a dataframe of everything from 2007
df_2007 = df[df['year'] == 2007]

In [None]:
# and we'll plot gdp (x-axis) vs. life expectancy (y-axis) by country
x = df_2007['gdp per cap']
y = df_2007['life-exp']
plt.plot(x,y)

In [None]:
# gross, lets try making it a scatter plot
x = df_2007['gdp per cap']
y = df_2007['life-exp']
plt.scatter(x,y)

In [None]:
# YAS QUEEN!
# now we can begin customizing it

# first make the plot
x = df_2007['gdp per cap']
y = df_2007['life-exp']
plt.scatter(x,y)

# add some labels
plt.title('GDP vs. Life Expectancy in 2007') # graph title
plt.xlabel('GDP per Capita [USD]') # x-axis label
plt.ylabel('Life Expectancy [years]') # y-axis label

# modify the axis scale
plt.xscale('log') # change the scale of the x axis

# add some ticks
tick_val = [1000,10000,100000]
tick_lab=['1k','10k','100k']
plt.xticks(tick_val, tick_lab)

# and a grid!
plt.grid(True)

## Time to get FANCY!

In [None]:
# we're going to chage the size of each dot to represent the 
# population size of the country in 2007

# change the dot size based on the population of the country
pop_series = df_2007['pop']

# convert the population series to an array
pop_array = np.array(pop_series)

# modify the scale
pop = pop_array/1000000

# make plot
x = df_2007['gdp per cap']
y = df_2007['life-exp']
plt.scatter(x,y, s=pop)

# labels
plt.title('GDP vs. Life Expectancy in 2007') # graph title
plt.xlabel('GDP per Capita [USD]') # x-axis label
plt.ylabel('Life Expectancy [years]') # y-axis label

# axis modifications
plt.xscale('log') # change the scale of the x axis

# ticks
tick_val = [1000,10000,100000]
tick_lab=['1k','10k','100k']
plt.xticks(tick_val, tick_lab)

# grid
plt.grid(True)

## Color Them DOTS!
---
Now we're gonna get a little fancy with some color changes using the [matplotlib.cm](https://matplotlib.org/users/colormaps.html) module.  The plan is to color each marker based on the continent that the country is in.  To do this we're going to make a dictionary using the continent nameas as the keys and the colors as the values, then map that dictionary to our dataframe to assign a specific color to each country.  From there, we can just peel off the new color column and assign that to color the markers when we make the plot.

In [None]:
df['continent'].unique().tolist()

In [None]:
# select matplotlib color pallet (listed in the "matplotlib.cm" link above)
color_map = cm.get_cmap('viridis')

In [None]:
# we'll need a list of continents
continents = df['continent'].unique().tolist()

In [None]:
num_of_colors = len(continents)
colors = color_map([x/float(num_of_colors) for x in range(num_of_colors)])

color_dictionary = dict(zip(continents,colors))
color_dictionary

In [None]:
# add color column to the dataframe
df['color'] = df['continent'].map(color_dictionary)
df.head()

In [None]:
# Now we'll add those colors to the "c" argument when we make the plot

# recreate the df_2007 DataFrame
df_2007 = df[df['year'] == 2007]
x = df_2007.groupby('country')['gdp per cap'].mean()
y = df_2007.groupby('country')['life-exp'].mean()

# pull off that column to use in the figure
continent_colors = df_2007['color']

# change the marker scale based on population
pop_series = df_2007['pop']
pop_array = np.array(pop_series)
pop = pop_array/1000000

# make plot
# adding the continent colors here
# we will also make the markers transparent using alpha
plt.scatter(x,y, s=pop, color=continent_colors, alpha=.8)

# labels
plt.title('GDP vs. Life Expectancy in 2007') # graph title
plt.xlabel('GDP per Capita [USD]') # x-axis label
plt.ylabel('Life Expectancy [years]') # y-axis label

# axis modifications
plt.xscale('log') # change the scale of the x axis

# ticks
tick_val = [1000,10000,100000]
tick_lab=['1k','10k','100k']
plt.xticks(tick_val, tick_lab)

# add grid
plt.grid(True)

In [None]:
x = df_2007.groupby('country')['gdp per cap'].mean()
type(x)

In [None]:
# now that we have a plot that we like we can merge it all into a function
# we can also add file_name and file_type arguments to save it

# Now we'll add those colors to the "c" argument when we make the plot

# recreate the df_2007 DataFrame
df_2007 = df[df['year'] == 2007]
x = df_2007['gdp per cap']
y = df_2007['life-exp']

# pull off that column to use in the figure
continent_colors = df_2007['color']

# change the marker scale based on population
pop_series = df_2007['pop']
pop_array = np.array(pop_series)
pop = pop_array/1000000

def bubble_plot(x,y,file_name):
    """Generate a bubble plot from the 2007 gapminder dataset
    
    Arguments
        x(pandas.core.series.Series): values ploted on x-axis
        y(pandas.core.series.Series): values ploted on y-axis
        file_name(str): full file path and name
            avoid using spaces in path and file name
            default file format is ".png"
            append different file format to change
    """
    # make plot
    # adding the continent colors here
    # we will also make the markers transparent using alpha
    plt.scatter(x,y, s=pop, color=continent_colors, alpha=.8)

    # labels
    plt.title('GDP vs. Life Expectancy in 2007') # graph title
    plt.xlabel('GDP per Capita [USD]') # x-axis label
    plt.ylabel('Life Expectancy [years]') # y-axis label

    # axis modifications
    plt.xscale('log') # change the scale of the x axis

    # ticks
    tick_val = [1000,10000,100000]
    tick_lab=['1k','10k','100k']
    plt.xticks(tick_val, tick_lab)

    # add grid
    plt.grid(True)
    
    # save the plot
    # the file_name should include the path
    save_file = str(file_name)
    plt.savefig(save_file)

In [None]:
# test it out as a function
bubble_plot(x,y,'2007')

## Exercise
---
Try generating a plot for each unique year in the gapminder.tsv  
1. Include the appropriate year in the title of the graph  
2. Save the graphs in the "data" folder which is already in your current working directory and incorporate the appropriate year in the file name  

Remember, the current dataframe ("df") already has a color column already appended so you don't need to regenerate a color map through each iteration (unless you wanna get really fancy and change the collor palette for each graph :)

In [None]:
# solution
years = df['year'].unique().tolist()

for i in years:
    # make a year-specific dataframe
    year_df = df[df['year'] == i]
    
    # identify your x and y axis
    x = year_df['gdp per cap']
    y = year_df['life-exp']
    
    # make the list for colors
    color_list = year_df['color']
    
    # make a scale for the markers
    pop_series = year_df['pop']
    pop_array = np.array(pop_series)
    pop = pop_array/1000000
    
    # make your plot!
    plt.scatter(x,y, s=pop, color=continent_colors, alpha=.8)
    
    plt.title(f'GDP vs. Life Expectancy in {i}') # Python3.6 string formatting
    plt.xlabel('GDP per Capita [USD]')
    plt.ylabel('Life Expectancy [years]')

    plt.xscale('log')
    
    tick_val = [1000,10000,100000]
    tick_lab=['1k','10k','100k']
    plt.xticks(tick_val, tick_lab)

    plt.grid(True)
    
    # save the file
    plt.savefig(f'data/{i}')
    plt.show()

Try plotting the gdp vs. life expectancy for just China on the same plot and incorporate a color spectrum to color the markers on a gradient as a proxy to indicate the passage of time!