# Data Visualisation in Python

This lesson will cover the following topics: 
* Matplotlib
* Seaborn

It is good to have knowledge of Numpy and Pandas, but we will quickly review them if you did not attend those lessons.

## Matplotlib

Matplotlib is the most popular Python data visualization and plotting library. It was created by John Hunter to replicate MatLab's (another programming language) plotting capabilities in Python. Matplotlib works really well with Pandas and Numpy arrays and thus is a very useful tool for Exploratory Data Analysis.

The official website and documentation for Matplotlib:
https://matplotlib.org/


A list of different types of plots and their examples can be found in the Matplotlib Gallery. Here you can see a cornucopia figures, shapes, statistical plot that Matplotlib is capable of generating:
https://matplotlib.org/gallery/index.html

Let's get started.

We start by importing ```matplotlib.pyplot``` module under the name plt (you can use any name but "plt" is a standard name used for matplotlib.pyplot)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

The line ```%matplotlib inline``` allows us to render tables within a Jupyter Notebook. If you are using another editor, you'll need to use: ```plt.show()``` at the end of all your plotting commands to have the figure pop up in another window.

Lets start with a simple example using two Python lists. Usually in data analytics you will be using numpy arrays or pandas columns. 

**Reminder:** Lists are simply a collection of numbers grouped together. More [here](https://developers.google.com/edu/python/lists).

In [None]:
# Create a list called x
x = [1, 2, 3, 4, 5]

# square each integer in list x and store in new list y (list comprehension)
y = [num**2 for num in x]

# Print each list
print(x)
print(y)

### Basic Commands
A simple line plot is created below using the above lists. I highly encourage you to use Shift+Tab for each matplotlib function to checkout the docstrings.

In [None]:
# plot a line graph with x values on x-axis and y values on y-axis 
plt.plot(x, y, color="blue")

# Add titles
plt.xlabel("X axis") # title for x-axis
plt.ylabel("Y axis") # title for y-axis

#### NumPy arrays

You can plot using NumPy arrays over lists as well.

We spoke briefly last week about [NumPy arrays](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html). A reminder that NumPy is a library in Python that has mathematical functions and objects that can be applied towards those functions.

NumPy arrays are like lists (a collection of many elements). The main difference is that NumPy arrays can only hold one _type_ of item, and thus the way they are stored in memory and utilised is quicker.

In [None]:
# Import the library "numpy" and give it a nickname "np"
import numpy as np

In [None]:
# Create 10 linearly spaced elements between the range 0 to 10
x = np.linspace(0, 10, 10)

# Raise these numbers to the third power
y = x**3

# Print
print(x)
print(y)

In [None]:
# Plot
plt.plot(x, y, color="red")

# Add titles
plt.xlabel("X axis")
plt.ylabel("Y axis")

### Multiplots on same canvas

We can use `plt.subplot()` to plot multiple plots

In [None]:
# Initialise a figure
plt.figure()  

# Command: plt.subplot(num_of_rows, num_of_columns, plot_number)
# Plot with (2 rows, 1 column, plot 1)
plt.subplot(2,1,1)  # the first subplot in the figure
plt.plot([1, 2, 3])

# Plot with (2 rows, 1 column, plot 2)
plt.subplot(2,1,2)  # the second subplot in the figure
plt.plot([4, 5, 6])

---
Here is another example of two plots on the same figure.

Note how we change the line colour (using the word `color`), and the linestyle (`solid/dashed`)

In [None]:
# Plot with (1 row, 2 columns, plot 1)
plt.subplot(1,2,1) # plt.subplot(num_of_rows, num_of_columns, plot_number)
# plot x, y with red dashed line (subplot number 1)
plt.plot(x, y, color="red", linestyle="dashed")

# Plot with (1 row, 2 columns, plot 2)
plt.subplot(1,2,2)
# plot y, x with green line and * markers (subplot number 2)
plt.plot(y, x, color="green", linestyle="solid", marker="*")

### Exercise

Let's do an exercise. The code below creates a set of points `y_1` and `y_2` from a `np.array`, `x`, that lists the numbers 1-10.

Plot:

* `x vs. y_1` and `x vs. y_2` on the same plot
* Plot `y_1` with green points (`'.'`), without a line
* Plot `y_2` with a red line
* Give an axis titles called "X axis" and "Y axis"

In [None]:
# Create x, y_1 and y_2
x = np.array(range(1, 11))
y_1 = x + np.random.choice([-2, 2], size=10) * np.random.rand(10)
y_2 = x

# Insert your code here


## Object Oriented Method for plotting graphs

With Object Oriented methods, we create figure objects and then call methods and attributes off of that object. This approach is better when dealing with multiplots on the same canvas.

In [None]:
# Initialise a numpy array in variable x
x = np.arange(10)

# Square each number in x
y = x**2

# Print
print(x)
print(y)

In the example below, we create an axes object, and save it as a variable called `axes` in this line...

```python
# adding axes to the figure
axes = fig.add_axes([0.2, 0.2, 1.0, 1.0]) # left, bottom, width, height
```

We can then manipulate this object.

In [None]:
# Create figure instance and add axes to it
fig = plt.figure()

# adding axes to the figure
# List is: [left, bottom, width, height], as fraction of the full figure width/height
axes = fig.add_axes([0.2, 0.2, 1.0, 1.0])

# plotting on the set of axes
axes.plot(x, y, color='green') 
axes.set_xlabel("X axis") # notice the use of set_
axes.set_ylabel("Y axis")
axes.set_title("Grpah title")

# Print data type of axes
print(type(axes))

As you may feel, the code is slightly more complex, but now we have full control over the figure. We can now control where the axes are placed and we can add more than one axis to the figure.

In [None]:
# Creates blank canvas
fig = plt.figure()

axes1 = fig.add_axes([0.2, 0.2, 1.0, 1.0]) # main axes
axes2 = fig.add_axes([0.4, 0.8, 0.4, 0.3]) # inset axes

# Larger Figure Axes 1
axes1.plot(x, y, color='g') # 'g' is the shorthand for "green".
axes1.set_xlabel('X_label_axes1')
axes1.set_ylabel('Y_label_axes1')
axes1.set_title('Axes 1 Title')

# Insert Figure Axes 2
axes2.plot(y, x, color='blue')
axes2.set_xlabel('X_label_axes2')
axes2.set_ylabel('Y_label_axes2')
axes2.set_title('Axes 2 Title');

### matplotlib.pyplot.subplots
This utility wrapper makes it convenient to create common layouts of subplots, including the enclosing figure object, in a single call.

In [None]:
# similar to plt.figure() but uses tuple unpacking to create figure and axis
# For tuple unpacking and multiple assignments refer: 
# https://treyhunner.com/2018/03/tuple-unpacking-improves-python-code-readability/

fig, axes = plt.subplots(nrows=1, ncols=2)

In [None]:
# axes is an array axes on which we can plot stuff
axes

Now we can plot things on each of the axes

In [None]:
# Get each axis
axes_0 = axes[0]
axes_1 = axes[1]

# Plot on axis 0
axes_0.plot(x, y, color='blue')
axes_0.set_xlabel("X axis")
axes_0.set_ylabel("Y axis")
axes_0.set_title("Axes 0 plot")

# Plot on axis 1
axes_1.plot(y, x, color='red')
axes_1.set_xlabel("X axis")
axes_1.set_ylabel("Y axis")
axes_1.set_title("Axes 1 plot")

# Show fig
fig # We need to do this since we did not initialise the fig in this cell

### Exercise

Let's do another exercise.

The following code creates a numpy array called `x` with the numbers -10 to 10. It then creates three arrays `y_1, y_2, y_3` using `x`, and then three more arrays, `y_1_scatter, y_2_scatter, y_3_scatter`, by adding random noise to the y_arrays.

Your job is to...

* Create a subplot with one row and three columns
* On the left-hand plot, draw a scatterplot of `x` vs. `y_1_scatter`, and then the corresponding trendline, `x` vs `y_1`.

Repeat this same process on the other two plots, using `y_1`, and `y_2`.

In [None]:
# Create x, y_1, y_2, y_3
x = np.array(range(-10, 11))
y_1 = x
y_2 = x**2
y_3 = x**3

# Create y_1_scatter, y_2_scatter, y_3_scatter
y_1_scatter = y_1 + np.random.choice([-2, 2], size=21) * np.random.rand(21)
y_2_scatter = y_2 + np.random.choice([-2, 2], size=21) * np.random.rand(21)
y_3_scatter = y_3 + np.random.choice([-2, 2], size=21) * np.random.rand(21)

# Insert your code here


## Matplotlib figure size and aspect ratio
In Matplotlib we can configure the size of the figure, aspect ratio and dpi (dots per inches: how many pixels the figure comprises of).

While creating a figure object, we can use ```figsize``` and ```dpi``` keyword arguments.

In [None]:
fig = plt.figure(figsize=(5,5), dpi=100)

In [None]:
# Initialise a numpy array in variable x
x = np.arange(10)

# Square each number in x
y = x**2

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,4), dpi=100)

axes_0 = axes[0]
axes_1 = axes[1]

axes_0.plot(x, y, color='blue')
axes_0.set_xlabel("X axis")
axes_0.set_ylabel("Y axis")
axes_0.set_title("Axis 0 plot")

axes_1.plot(y, x, color='red')
axes_1.set_xlabel("X axis")
axes_1.set_ylabel("Y axis")
axes_1.set_title("Axis 1 plot")


### Legends

You can add legends to a figure using the `label` parameter in `plot`:

In [None]:
fig = plt.figure()

axis = fig.add_axes([0,0,1,1])

axis.plot(x, x**2, label="x**2")
axis.plot(x, x**3, label="x**3")
axis.legend()

We can change the location of the legend inside the figure using the ```loc``` keyword in the ```legend()``` function.

In [None]:
axis.legend(loc=1) # upper right corner
axis.legend(loc=2) # upper left corner
axis.legend(loc=3) # lower left corner
axis.legend(loc=4) # lower right corner

axis.legend(loc=0) # let matplotlib decide the optimal location
fig

There are several more options for legend location. For more details see:
http://matplotlib.org/users/legend_guide.html#legend-location

## Plot styling: Color, linetypes and markers
Matplotlib gives you a plethora of customization options for the plots. 
As we have seen previously, we can change the color of a plot using the ```color``` argument. There are three ways to specify the value of the color. 

In [None]:
x = np.array([1,2,3,4,5]) # numpy array

# Print the array
print(x)

In [None]:
# Max a figure and axis object
fig, ax = plt.subplots()

# Plot the array on the axes
ax.plot(x, x+1, color='green', alpha=0.5) # alpha controls transparency
ax.plot(x, x+2, color='b') # color = 'blue'
ax.plot(x, x+3, color='#FF0000') # color = hex code  

The following arguments are used to change the plot styles:
* ```alpha```: controls the transparency of the plot
* ```linewidth``` or ```lw``` : changes the width of the line
* ```linestyle``` or ```ls```: changes the style of the line. Options include: ‘-‘, ‘–’, ‘-.’, ‘:’, ‘steps’
* ```marker``` : specifies the style of the marker used on the line. Options include '+', 'o', '*', 's', ',', '.', '1', '2', '3', '4', ...
* ```markersize```: specifies the size of the marker
* ```markerfacecolor```: specifies the color of the marker

In [None]:
# Make a figure on axis object
fig, ax = plt.subplots()

# Plot
ax.plot(x, x+1, color='green', alpha=0.5)
ax.plot(x, x+2, color='b', linestyle=':') 
ax.plot(x, x+3, color='#FF0000', ls='-.', linewidth=3) 
ax.plot(x, x+4, color='black', marker='o', markersize=8, markerfacecolor='white') 

## Challenges

### Challenge 1
Follow along with these steps:

* Create a figure object called fig using plt.figure()
* Use add_axes to add an axis to the figure canvas at [0,0,1,1]. Call this new axis ax.
* Plot (x,y) on that axes
* Plot (x,z) on the same figure
* Add a legend to differentiate which line represents 'y' and which line represents 'z'
* Set the axis labels and titles

In [None]:
# Do not change these lines
import numpy as np
x = np.arange(0,100)
y = x*2
z = x**2

In [None]:
# Insert remaining code here using 'x', 'y' and 'z' created in the last cell


### Challenge 2
Create a figure object and put two axes on it, ax1 and ax2. Located at [0,0,1.2,1.2] and [0.2,0.7,.4,.4] respectively.
Plot (x,y) on both axes. 

In [None]:
# Insert code here


Use the rest of the time to finish past challenges if you'd like!

## Downloading the notebook

If you would like to retain your work, please follow the following directions:

* On the top of this screen, in the header menu, click "File", then "Download as" and then "Notebook".

* You will need to download [Python 3.7 with Anaconda](https://www.anaconda.com/distribution/) to use this in the future.