# PytzMLS2018: Python for ML and DS - Matplotlib

<center>**Anthony Faustine (sambaiga@gmail.com)**</center>

## 3.1. Matplotlib Basics

[Matplotlib](http://matplotlib.org/) is a Python package used widely throughout the scientific Python community to produce high quality 2D publication graphics. It transparently supports a wide range of output formats including PNG (and other raster formats), PostScript/EPS, PDF and SVG and has interfaces for all of the major desktop GUI (graphical user interface) toolkits. It is a great package with lots of options.

To use the matplotlib functions matplotlib should be imported as import matplotlib,but by importing the matplotlib.pyplot one has access to all MATLB like commands of Pyplot.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In order to see the resulting plots, matplotlib should be inlined in the first cell of IPython.


In [None]:
%matplotlib inline  

## 3.1.1 Basic Plot

**Plot()** command draws the plot, and is a versatile command which accepts arbitrary number of arguments. In order to see the plots **show()** command should be used. 

One can easily give legend information to the plots by **title()**, **xlabel()**, **ylabel()**


### Line plot

* You provide X and Y of points, matplotlib connects them in a line

In [None]:
# Load Data
x = np.linspace(0,1,100)
y = x**2
plt.plot(x,y)

we can add title and axis labels in our plot as follows:

In [None]:
plt.title("The graph of x vs $y^2$")
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.plot(x,y);

### Controlling line properties and Formatting

We can vary the line color or the line symbol of the matlibplot as follows:

In [None]:
plt.title("The graph of x vs $y^2$")
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.plot(x,y, 'r--');  # dotted red line 

**Other options for the color characters are:**

 * 'r' = red
 * 'g' = green
 * 'b' = blue
 * 'c' = cyan
 * 'm' = magenta
 * 'y' = yellow
 * 'k' = black
 * 'w' = white
 
 
** Options for line styles are**

 * '-' = solid
 * '--' = dashed
 * ':' = dotted
 * '-.' = dot-dashed
 * '.' = points
 * 'o' = filled circles
 * '^' = filled triangles

There are other ways to specify the line formats, for example by using keywords such as **linewidth** and **linestyle**, **color** etc as an argument of plot() function.

In [None]:
plt.plot(x,y, color="green", linewidth=4.0, linestyle="dotted")   
plt.title("The graph of x vs $y^2$")
plt.xlabel('x-axis')
plt.ylabel('y-axis');

### 1.2 Adding legend

Legends help in understanding the plots, they provide label the the  lines on a plot. Adding a legend to a plot only requires adding the keyword argument label (that will be used in the legend box) to the plot commands.

In [None]:
x = np.linspace(-np.pi , np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)

plt.title("Sin and Cos Plot")
plt.xlabel("x values")
plt.ylabel("y values")
plt.plot(x,y1, color="green", linewidth=4.0, linestyle="dotted", label="Sin")
plt.plot(x,y2, '--r', linewidth=4.0, label='cos')
plt.legend(loc='upper left', frameon=False) # Try to change frame on = True

### 1.3 Save figure to image file

Matplotlib can save plots directly to a file using **savefig()**. The method can be used like this:
```python
plt.savefig('plot.png')
```
To change the format, simply change the extension like so:

```python
plt.savefig('plot.pdf')
```

## 3.1.4 Subplot

In [None]:
plt.subplot(2,2,1)
plt.plot(x, x**2)
plt.subplot(2,2,2)
plt.plot(x, x**3)

## 3.2. Other types of plots

### 3.2.1 Scatter plot

* Takes X and Y just like .plot, but does not connect the dots with lines.
* Useful to explore data structure

* **scatter()** makes a scatter plot of x vs y, where x and y are sequence like objects of the same lengths.
x, y : array of n numbers Input data:

* **s **: size in points^2. , scalar or array of n numbers, optional, default: 20

* **c** : color or sequence of color (optional) , c uses the colormap function to map single value to a color

* **alpha**: The alpha blending value, between 0 (transparent) and 1 (opaque)



In [None]:
x = np.arange(100)
y = np.random.randn(100)

In [None]:
plt.title("The Scatter Plot")
plt.ylabel("y")
plt.xlabel("x")
plt.scatter(x,y, c = 'red', s=70)

### Note:
You can also compare two data series using  Matplotlib bar graph: Suppose we want to compare top five districts enrollment for female and male.

### 3. 2.3 Histograms

A histogram is a fantastic visualization for getting a feel for the distribution of your data set.

To plot histograms, **hist()** command of pyplot can be used which also accepts different styling and formatting.


**Note**: bins is an important parameter for histogram

In [None]:
data = [ 48.03014618,  55.62870238,  51.19323402,  45.85549349,
        65.52041789,  54.53224804,  50.25047032,  51.30577492,
        37.45501384,  78.03728206,  48.90642114,  61.07434741,
        46.48480404,  38.93493593,  43.59885548,  53.34441508,
        45.31284282,  59.24199798,  32.84369859,  37.07341227,
        28.55325686,  53.22702534,  49.08640992,  62.8207843 ,
        50.79301098,  62.75709926,  45.95268258,  48.42733047,
        60.76539322,  56.56223198]

In [None]:
plt.hist(y);
plt.title('Histrogram')

## References

- [python4datascience-atc](https://github.com/pythontz/python4datascience-atc)
- [PythonDataScienceHandbook](https://github.com/jakevdp/PythonDataScienceHandbook)
- [DS-python-data-analysis](https://github.com/jorisvandenbossche/DS-python-data-analysis)