# Matplotlib Introduction

__Purpose:__
The purpose of this lecture is to learn how to visualiza data in Python using a library called matplotlib. We will learn how to create charts that can be displayed within the Jupyter notebook, as well as customize the fonts, colors, etc.   

__At the end of this lecture you will be able to:__
> 1. Understand how to create basic charts such as line charts, scatter plots, pie charts, histograms, etc.
> 2. Customize the charts by manipulating the size, color, fonts, etc.
> 3. Create multiple charts within one graph.
> 4. Save the charts as an image.

__Overview__

Matplotlib is a library that allows visualization within Python.  You can generate line plots, scatter plots, histograms, bar charts, etc., with just a few lines of code. 

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.

For simple plotting the pyplot module provides a MATLAB-like interface, particularly when combined with IPython. For the power user, you have full control of line styles, font properties, axes properties, etc, via an object oriented interface or via a set of functions familiar to MATLAB users.

The [matplotlib.pyplot](https://matplotlib.org/api/pyplot_summary.html) module contains functions that allow you to generate many kinds of plots quickly. 

Matplotlib was originally written by John D. Hunter, has an active development community, and is distributed under a BSD-style license. Michael Droettboom was nominated as matplotlib's lead developer shortly before John Hunter's death in 2012, and further joined by Thomas Caswell.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# this statement allows the visuals to render within your Jupyter Notebook
%matplotlib inline 

# Creating charts

Matplotlib accepts several types of data such as lists (as we previously showed), Numpy Arrays and Pandas Dataframes.

## Creating a line chart from a list

As a starting point let's use a simple example by drawing a line chart from a list with the values 1, 2, 1, 3. 

In [None]:
data_list = [1,2,1,3]
print(data_list, type(data_list))
plt.plot(data_list);

Note that the x-axis contains the index of the datapoint.  Remember that Python is 0 indexed.

In [None]:
x = [10,20, 21, 30]
plt.plot(x,data_list);

## Creating a line chart from a Numpy array

Now we will create the same chart using a Numpy Array and a Pandas DataFrame.

In [None]:
data_numpy = np.array(data_list)
print(data_numpy, type(data_numpy))
plt.plot(data_numpy);

## Creating a line chart from a Pandas Series

In [None]:
data_pandas_series = pd.Series(data_list)
print(data_pandas_series, type(data_pandas_series))
plt.plot(data_pandas_series);

## Creating a line chart from a Pandas DataFrame

In [None]:
data_pandas = pd.DataFrame(data_list)
print(data_pandas, type(data_pandas))
plt.plot(data_pandas);

# Formatting charts

## Adding Titles

We can also add a title to our figure using plt.title(), and we can modify the text properties.
A full list of properties can be found here: https://matplotlib.org/users/text_props.html 

In [None]:
plt.plot(data_list)
plt.title('First Plot');

In [None]:
plt.plot(data_list)
plt.title('Cursive Title',family='cursive',fontsize = 30,loc='left',color='r');

In [None]:
plt.plot(data_list)
plt.title('Bold Italic Title',fontsize = 40,style='italic',weight='bold',rotation=10,color='purple');

## Adding X-Y Labels

Similar to the title, we can also add labels in the $x$ and $y$ axis using plt.xlabel() and plt.ylabel().

In [None]:
plt.plot(data_list)
plt.xlabel('Label on the X axis')
plt.ylabel('Label on the Y axis',fontsize = 20, weight = 'bold',color='green');

## Gridlines

We can include gridlines with plt.grid()

In [None]:
plt.plot(data_list)
plt.grid()

## Axis Ticks - Location

We can also control the location of the ticks using plt.xticks() and plt.yticks()

In [None]:
plt.plot(data_list)
plt.grid()
ticks_x = np.linspace(0, 3, 10)
ticks_y = np.linspace(1, 3, 3)
plt.xticks(ticks_x)
plt.yticks(ticks_y);

## Axis Ticks - Text

Also, we can replace the text of the ticks.  Note that we can also modify the font!

In [None]:
plt.plot(data_list)
plt.grid()
ticks_x = np.linspace(0, 3, 4)
months = ['Jan','Feb','Mar','Apr']
plt.xticks(ticks_x, months, fontsize = '20', family = 'fantasy',color='orange');

## Multiple Lines
You can have more than one line in a plot, by using the plt.plot() multiple times within the same cell.

In [None]:
data_list = [1,2,1,3]
data_list_2 = [3, 2.3, 2, 0]
plt.plot(data_list)
plt.plot(data_list_2);

## Legends

Legends can be added to label each line with plt.legend().

In [None]:
plt.plot(data_list)
plt.plot(data_list_2)
plt.legend(['First Line','Second Line'],shadow = True, loc = 0);

## Color

The same way we have modified the color of the font, we can modify the color of the line. 
> **Alias:** For alias colors check here: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.colors.html

> **Name:** For name color information check here: https://www.w3schools.com/cssref/css_colors.asp

> **Hexadecimal:** The color can be represented in hexadecimal (https://en.wikipedia.org/wiki/Hexadecimal).  Every color has three components: Red, Green and Blue, aka RGB.  Each components can have a value between 0 and 255 (00 to FF in hexadecimal).  So if you want the color #3FEF0A (note that the # indicates hexadecimal), you are assigning a value of 3F to Red, EF to Green and 0A to Blue.

### Color with alias

In [None]:
plt.plot(data_list, 'g'); 

In [None]:
plt.plot(data_list, c='g'); 

In [None]:
plt.plot(data_list, color = 'g'); 

### Color with Name
https://www.w3schools.com/cssref/css_colors.asp

In [None]:
plt.plot(data_list, c = 'Aqua');

### Color with RGB Hexadecimal
The color needs three values for the Red, Green and Blue channel.  Each channel can have a value between 0 and 255.  You can use the function $hex()$ to convert a decimal to hexadecimal.

In [None]:
print('The decimal {} is {} in hexadecimal'.format(0,hex(0)))
print('The decimal {} is {} in hexadecimal'.format(128,hex(128)))
print('The decimal {} is {} in hexadecimal'.format(36,hex(36)))
print('The decimal {} is {} in hexadecimal'.format(198,hex(198)))
print('The decimal {} is {} in hexadecimal'.format(255,hex(255)))

In [None]:
plt.plot(data_list, c = '#00FFFF'); 

## Line Width

The linewidth of the chart can also be modified, with the linewidth parameter.

In [None]:
plt.plot(data_list, linewidth = 10)
plt.plot(data_list_2, linewidth = 1); 

## Line Style

Lines come in several types, such as solid, dashed, and so on.  They can be modified with the linestyle parameter.  You can find the styles here: https://matplotlib.org/gallery/lines_bars_and_markers/line_styles_reference.html

In [None]:
plt.plot(data_list, linestyle = '--',linewidth = 3, c = 'b')
plt.plot(data_list_2, linestyle = '-.',linewidth = 5, c = 'r') 
plt.plot(np.array(data_list_2)+1, linestyle = ':',linewidth = 2, c = 'k')
plt.plot(np.array(data_list)-1, linestyle = '-',linewidth = 4, c ='g'); 

# Types of Charts

There are a wide range of visuals we can create.  In this section we will cover the basics.

## Line Charts

In [None]:
new_x = np.arange(100)
new_y = np.random.randint(0,10,100) # random integer number between 0 and 9 (10 is excluded)
plt.plot(new_x,new_y);

## Bar Charts

In [None]:
new_x = np.arange(20)
new_y = np.random.randint(5,10,20)
plt.bar(new_x,new_y);

## Error Bars

These plots are commonly used to plot distribution of errors.

In [None]:
new_x = np.arange(1, 5, 1)
new_y = new_x**1.5

# example variable error bar values
yerr = np.sqrt(new_x)

plt.bar(new_x,new_y)
plt.errorbar(new_x, new_y,fmt='r.', yerr=yerr);

## Scatter Plot

In [None]:
new_x = np.random.randn(1000)
new_y = np.random.randn(1000)
plt.scatter(new_x,new_y);

Note that a lot of dots overlap in the middle.  We can play with the transparency of the color to see how the dots overlap by modifying a parameter called alpha between 0 and 1.

In [None]:
plt.scatter(new_x,new_y,alpha = 0.2);

Similarly we can modify the type of the marker (https://matplotlib.org/api/markers_api.html) and its size.

In [None]:
plt.scatter(new_x,new_y,marker = '+',s = 100);

## Histograms

In [None]:
new_x = np.random.randn(100000)
plt.hist(new_x,100);

## Pie Charts

In [None]:
new_x = np.arange(1,4)
plt.pie(new_x,labels=['a','b','c']);

In [None]:
plt.pie(new_x,labels=['a','b','c'],explode = [.1,.2,0]);

# Advanced Formatting

## Figure Size

The Pie Chart above does not look like a circle.  This is because the default size of a figure is 8 (width) x 6 (height).  We can modify this using the "figure" parameter "figsize".

In [None]:
plt.figure(figsize = [5,5]) # [width, height]
plt.pie(new_x,labels=new_x);

## Axis Range

We can also modify the range of the axis by using plt.axis([x_min, x_max, y_min, y_max]).

In [None]:
new_x = np.arange(100)
new_y = np.random.randint(0,10,100)
plt.plot(new_x,new_y)
plt.axis([-10, 110, -10, 20]); # [x_min, x_max, y_min, y_max]

## Log Scale

Let's look at an example where the data set covers several orders of magnitude ($e^{x}$).  When we plot the data in a linear scale, we lose some details.

In [None]:
new_x = np.arange(8)
new_y = np.e**new_x
plt.plot(new_x,new_y)
plt.scatter(new_x,new_y);

A solution is to plot the log transform of the data.

In [None]:
plt.plot(new_x,np.log10(new_y));
plt.scatter(new_x,np.log10(new_y));

An alternative is to ask the plotting function to display in a log scale.

In [None]:
plt.semilogy(new_x,new_y,c='r')
plt.semilogy(new_x,new_y,'.',ms=15,c='r'); #it is in log10

Observe that the y-axis displays the log values when you use semilogy().  Similarly we can plot the x-axis as a log scale (semilogx), and both axis as a log scale (loglog).

In [None]:
plt.semilogx(new_x,new_y,c='r')
plt.semilogx(new_x,new_y,'.',ms=15,c='r'); #it is in log10

In [None]:
plt.loglog(new_x,new_y,c='r')
plt.loglog(new_x,new_y,'.',ms=15,c='r'); #it is in log10

## Subplots

Sometimes we want to create multiple plots together.  We can use plt.subplots(number of rows, number of columns, number of the plot).

In [None]:
new_x = np.arange(20)
new_y = np.random.randint(5,10,20)

plt.figure(figsize=[15,5])

plt.suptitle('Main Title',fontsize = 16)

plt.subplot(1,2,1) # (number of rows, number of columns, number of plot)
plt.plot(new_x,new_y)
plt.title('Line Chart')

plt.subplot(1,2,2)
plt.bar(new_x,new_y)
plt.title('Bar Chart');

In [None]:
new_x = np.arange(20)
new_y = np.random.randint(5,10,20)

plt.figure(figsize=[15,7])

plt.suptitle('Main Title',fontsize = 16)

plt.subplot(2,1,1) # (number of rows, number of columns, number of plot)
plt.plot(new_x,new_y)
plt.title('Line Chart')

plt.subplot(2,1,2)
plt.bar(new_x,new_y)
plt.title('Bar Chart');

## Removing the Axis

In some cases we may prefer to hide the axis with plt.axis().

In [None]:
new_x = np.arange(100)
new_y = np.random.randint(0,10,100)

plt.figure(figsize=[15,5])

plt.subplot(1,2,1)
plt.plot(new_x,new_y)

plt.subplot(1,2,2)
plt.plot(new_x,new_y)
plt.axis('off');

## Modifying Defaults

### Axis Font Size

In [None]:
new_x = np.arange(100)
new_y = np.random.randint(0,10,100)

plt.rc('xtick', labelsize=20) 
plt.rc('ytick', labelsize=10) 

plt.plot(new_x,new_y);

## Saving Images

Images can be exported into other formats so they can later be included into reports or slideshows.
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

In [None]:
new_x = np.arange(100)
new_y = np.random.randint(0,10,100)
plt.plot(new_x,new_y)

plt.savefig('testplot.png')

## Including Text

In [None]:
new_x = np.arange(100)
new_y = np.random.randint(0,10,100)+(0.05*new_x)**2
plt.plot(new_x,new_y)
plt.text(60,5,'Sample Text',fontsize = 20,backgroundcolor = 'k',color = 'white'); # x and y position of start of text

## Including Arrows

In [None]:
plt.plot(new_x,new_y)
plt.arrow(18,25,15,-10,width=0.5,head_width=3,color='orange'); # x and y of starting, delta_x and delta_y

## Annotations
https://matplotlib.org/users/annotations_intro.html

In [None]:
plt.plot(new_x,new_y)
plt.annotate('Main Point', xy=(40, 12), xytext=(5, 25),
             arrowprops=dict(facecolor='red',shrink = 0.1),fontsize = 15);

# xy is x and y location of the arrow head
# xytext is x and y location of the start of the text

## Templates
There are many design templates.  Here we presents the xkcd (https://www.xkcd.com/373/) template.

In [None]:
with plt.xkcd():
    plt.plot(new_x,new_y)
    plt.xlabel('X Axis')