# Synopsis

Visualizing your data is key to both understanding its basic properties and effectively communicating its results to an outside audience. In this unit we will learn:

1. How to import `matplotlib`
2. How to make a line plot
3. How to make histograms
3. How to make a bar chart
4. How to make a scatter plot

We will also cover some basic principles for creating clean and informative visualizations.


## Heavyweight computing libraries


### `numpy` - Numerical Python
[Package documentation](http://docs.scipy.org/doc/numpy/)


###  `scipy` - Scientific Python
[Package documentation](http://docs.scipy.org/doc/scipy/reference/)

These two packages enable us to reproduce much of the capabilities of software such as *Matlab*.  they contain functions enabling one to do linear algebra, solve differential equations, generate pseudorandom numbers, and conduct statistical analysis.

## Heavyweight plotting library

### `matplotlib`
[Package documentation]()

A number of plotting packages for Python have been released in the last few years. Currently, we like and recommend `matplotlib`. `Matplotlib` was created in 2003 and is the oldest Python plotting library that has remained under active development.

However, that doesn't mean that it's always the best for all purposes or that it will remain our recommendation for ever. Among biologists, the `seaborn` library has become popular. If you are able to make your data publicly available, there is a service called `plot.ly` with a Python library.

A problem with `matplotlib` is that its documentation is not particularly good. Typically, programmers copy the source code of visualizations they like and modify them. There are also lots and lots of **stackoverflow** answers concerning `matplotlib`. 

**`matplotlib` is completely costumizable.** 


**To learn more, browse the docummentation.**

# Read libraries and functions

In [None]:
%matplotlib inline


In [None]:
import sys
import scipy
import random

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm

from IPython.core.display import HTML
from IPython.lib.display import YouTubeVideo
from pathlib import Path

# Data visualization

Visualizing your data is a **key** component of data analysis, no matter how big or small your data is.  Our minds are hardwired to process visual information, in fact **one-third** of our brain is dedicated to image processing and **40% of all** nerve fibers connected to the brain come from the retina. Visualization is essentially a high-speed link to your cognitive systems.

Let me show you a quick example.

## Traffic accidents

Let's look at this image together, it plots the number of accidents per month against the time of day. Then tell me when accidents are likely to occur?

<img src='Images/visualization_raw_chart.png' width = '500px'></img>

Now let's try something different. This is the same chart, but now I've added a color scale. The individual numbers have a deeper shade of blue the more accidents that occur during that time period.

Now tell me, when are accidents most likely to occur? How long did it take you to figure it out?

<img src = 'Images/visualization_heatmap.png' width = '500px'></img>

## Effective visualizations make a difference

Effective visualizations allow us to make accurate decisions quicker. More importantly they help us make **correct** decisions. When you make a visualization, you're actually trying to make a point. The entire point is to persuade your audience of some fact that you know to be true. 

Whether you accept this statement or not, it is what happens. If you don't construct your visualization in a way that informs your audience and allows them to make a correct decision they can easily reach a different conclusion. 

### Creating a poor visualization can be disastrous.

You might remember or have read about the 1986 Challenger accident. The Challenger was a NASA Space Shuttle that exploded upon take-off killing the entire crew because the O-rings in the rockets cracked and allowed jet fuel to leak into the combustion chamber.

**Why did the O-rings crack?**

There were rumblings at NASA prior to the Challenger launch that defects in the O-rings occurred at cold temperatures. This data was looked at by a large number of people, but this is how it was presented to senior management (the people actually tasked with making the decision of **whether it was safe or not** to launch the rocket).

<img src = 'Images/challenger_original.png' width = '500px'></img>

The predicted temperature for the launch time of the Challenger was 26-29F.  Do you think that was safe based on the data above?

**Would you have cancelled the launch based on these data?**




Really hard to say, no?


Let's use a graph to present the data as a function of launch temperature instead of launch number.

<img src='Images/challenger_remade.png' width = '500px'></img>

Would you have cancelled the launch based on these data?

## Visualizations are about making patterns visible!

However, by using deceiving methods, they can make non-existent patterns appear real. The book "How to Lie with Statistics" provides many examples of deceiving techniques. You should learn about them so that you can recognize them and avoid being duped by them.

<img src = 'Images/wh_economic_growth.png' width = '500px'></img>


**Making a good graph is not trivial. It requires thinking and the ability to summarize information.** 

In fact, **it is no different from writing a good essay or writing good code.**  

You need to know enough to have a point of view.  You need to know how to present that point of view clearly. 

From here on we will focus on the actual nuts and bolts of crafting a graph in Python, but you should keep these principles in mind. 


# Plotting with `matplotlib`

To create a costumizable graph, we create a `figure` object.

We can pass arguments to the figure object when we create it. For example, we can change its size.

In [None]:
fig = plt.figure(figsize = (4, 4.5))
print(type(fig))

Those dimensions are actually in inches, the first one is the width and the other is the height.

Now we have to create something to plot, that something is a set of `axes` inside a `subplot`. Subplots let us have multiple graphs inside a single plot.

In [None]:
fig = plt.figure(figsize = (4, 4.5))

ax = fig.add_subplot(3, 1, 1)

ax.plot([1, 2, 3, 4], [1, 2, 3, 4], color = 'steelblue', 
        marker = 'o', lw = 2)
ax.plot([7, 8 , 9], [7, 8 , 9], color= 'orange', marker='^')

ax = fig.add_subplot(3, 1, 2)
ax.plot([7, 8 , 9], [7, 8 , 9], color= 'orange', marker='^')

ax = fig.add_subplot(3, 1, 3)
ax.plot([11, 12, 13], [11, 12, 13], color ='red')

plt.tight_layout()
plt.show()

When we specify subplots the first number is the number of rows of plots. The second number is the number of columns of plots. The third number is the specific plot that you wish to populate. This number goes from 1 to the maximum plot number (num_columns * num_rows). Hopefully this image will make it more clear

<img src='Images/matplotlib_subplots.png' width = '400px'></img>


Let's create some fake data that we can use to understand how to costumize plots.

In [None]:
x = range(5,100,5)
y = [i**2 for i in x]
z = [100*i for i in x]

rv1 = [random.random() for i in range(1000)]
rv2 = [random.random() for i in range(10000)]

Let's create a simple plot using the variables `x` and `y` from above.

In [None]:
fig = plt.figure( figsize = (5, 5/1.6) )
ax = fig.add_subplot(1, 1, 1)

ax.plot(x, y)

plt.show()

In [None]:
5/1.6

Of course, this graph is shit.  If you showed it to me, I would be rather upset.

First of all, you have no axis labels! 

**How am I to know what you are plotting?**


In [None]:
fig = plt.figure( figsize = (5, 5/1.6) )
ax = fig.add_subplot(1,1,1)

# We should add a label to our dataset that will go into a legend
ax.plot(x, y, label = "Parabola")

# Now we can label the axes. 
# Always label your axes! Who knows what is in the graph otherwise
ax.set_xlabel("x")
ax.set_ylabel("f(x)")

# Display legend
ax.legend()

plt.tight_layout()
plt.show()

# plt.savefig()

The fonts used in a graph should be easy to read. We can change readability by playing with font style and font size.  

**Sans-serif fonts (such as Helvetica and Arial) are better for screen and poster reading.**

**Font size can help us see what is important.**  The font for the axis label should be larger than the size of the one used for the tick labels.

I like to define a default size and then adjust other sizes in relation to that one.

In [None]:
my_fontsize = 15

In [None]:
# Create the figure
fig = plt.figure( figsize = (5, 5/1.6) )
ax = fig.add_subplot(1,1,1)

# We should add a label to our dataset that will go into a legend
ax.plot(x, y, label = "Parabola", color = 'steelblue', linewidth = 3, 
        marker = 'o', markersize = 5)

# Now we can label the axes. 
# Always label your axes! Who knows what is in the graph otherwise
ax.set_xlabel("$\mu$", loc = 'right', fontsize = 1.6*my_fontsize)
ax.set_ylabel("$f(\mu)$", loc = 'top', fontsize = 1.6*my_fontsize)
# ax.set_xlabel("$\mu$", fontsize = 1.6*my_fontsize)
# ax.set_ylabel("$f(\mu)$", fontsize = 1.6*my_fontsize)

# Display legend
ax.legend(loc='best', fontsize = my_fontsize, 
          markerscale = 1.2)

#Adding a panel label
ax.text(90, 500, "(a)", fontsize = 1.2 * my_fontsize)

plt.tight_layout()
plt.show()

In [None]:
# Create the figure
fig = plt.figure( figsize = (5, 5/1.6) )
ax = fig.add_subplot(1,1,1)

# We should add a label to our dataset that will go into a legend
ax.plot(x, y, label = "Parabola", color = 'steelblue', linewidth = 3, 
        marker = 'o', markersize = 5)

# Now we can label the axes. 
ax.set_xlabel("$\mu$", fontsize = 1.6*my_fontsize)
ax.set_ylabel("$f(\mu)$", fontsize = 1.6*my_fontsize)

# Display legend (I don't like box around legend)
ax.legend(loc='best', frameon=False, fontsize = my_fontsize, 
          markerscale = 1.2)

#Adding a panel label 
ax.text(90, 500, "(a)", fontsize = 1.2 * my_fontsize)

# Turn off the spines for two of the 4 axes
for axis in ['bottom','left']:
    ax.spines[axis].set_linewidth(1.5)
    ax.spines[axis].set_position(("axes", -0.02))
for axis in ['top','right']:
    ax.spines[axis].set_visible(False)

# We'll also need to turn off the ticks on the axes that we turned off
# and adjust the length and tickness
ax.tick_params(width = 1.5, length = 6)
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')

# Set axes limits
ax.set_xlim(0, 100)
ax.set_ylim(0, 10000)

plt.tight_layout()
plt.show()
# plt.savefig('quadratic_half_frame.png')

You see now that the number of options that you can possibly configure to make a graph look like **how you want** is **enormous!** 

There's way too much to go over in this course, so we're going to stop going over more and more options now. 

If you want to learn more about the intricacies of `matplotlib` I think that this is a good [tutorial](http://www.labri.fr/perso/nrougier/teaching/matplotlib/).



# Taking advantage of pre-defined styles

A way to get around the default style of `matplotlib` is by using a relatively new addition to `matplotlib`: `styles`. There are a few (but should be more soon) pre-canned styles that look pretty okay. These by no means generate *publishable* figures, but they look decent enough to show a colleague.

You can actually change the default style in the entire notebook if you execute

    plt.style.use('style_name')
    
But I don't want to change the style of every plot in the notebook. When you don't want to change the style globally, you can just write a graph like this:

    with plt.style.context('style_name'):
        #Your graph code here
        
That `with` statement basically says that all of the code inside that block should use that setting. Once we leave the `with` block the `with` statement is no longer applied.


## The `ggplot` style

In [None]:
with plt.style.context('ggplot'):
    fig = plt.figure( figsize = (5, 5/1.6) )
    ax = fig.add_subplot(1,1,1)
    
    # We should add a label to our dataset that will go into a legend
    ax.plot(x, y, label = "Parabola")
    
    # Label the axes.
    ax.set_xlabel("x", fontsize = 1.6*my_fontsize)
    ax.set_ylabel("f(x)", fontsize = 1.6*my_fontsize)
    
    # Display legend
    ax.legend(loc='best', frameon=False, fontsize = my_fontsize)

## The Nate Silver [538](http://www.fivethirtyeight.com) style

In [None]:
with plt.style.context('fivethirtyeight'):
    fig = plt.figure( figsize = (5, 5/1.6) )
    ax = fig.add_subplot(1,1,1)
    
    # We should add a label to our dataset that will go into a legend
    ax.plot(x, y, label = "Parabola")
    
    # Label the axes. 
    ax.set_xlabel("$x$", fontsize = 1.6*my_fontsize)
    ax.set_ylabel("$f(x)$", fontsize = 1.6*my_fontsize)
    
    # Display legend
    ax.legend(loc='best', frameon=False, fontsize = my_fontsize)

## The `bmh` style

In [None]:
with plt.style.context('bmh'):
    fig = plt.figure( figsize = (5, 5/1.6) )
    ax = fig.add_subplot(1,1,1)
    
    # We should add a label to our dataset that will go into a legend
    ax.plot(x, y, label = "Parabola")
    
    # Label the axes. 
    ax.set_xlabel("$x$", fontsize = 1.6*my_fontsize)
    ax.set_ylabel("$f(x)$", fontsize = 1.6*my_fontsize)
    
    # Display legend
    ax.legend(loc='best', frameon=False, fontsize = my_fontsize)

## The `xkcd` style

In [None]:
with plt.xkcd():
    fig = plt.figure( figsize = (5, 5/1.6) )
    ax = fig.add_subplot(1,1,1)

    # We should add a label to our dataset that will go into a legend
    ax.plot(x, y, label = "Parabola", linewidth = 3)
    
    # Now we can label the axes. 
    ax.set_xlabel("x", fontsize = 1.6*my_fontsize)
    ax.set_ylabel("f(x)", fontsize = 1.6*my_fontsize)
    
    # Display legend
    ax.legend(loc = 'best', frameon=False, fontsize = my_fontsize)

## Changing background color

Also, something to be aware of is that grey backgrounds typically don't print out very well unless you have a nice color printer!

So typically we'll need to change the axis background color to white before printing. We can do that when we create the axes.

In [None]:
with plt.style.context('bmh'):
    fig = plt.figure( figsize = (6, 4) )
    ax = fig.add_subplot(1,1,1, facecolor = 'orange')

    # We should add a label to our dataset that will go into a legend
    ax.plot(x, y, label = "Parabola", linewidth = 3)
    
    # Now we can label the axes. 
    # Always label your axes! Who knows what is in the graph otherwise
    ax.set_xlabel("$x$", fontsize = 1.6*my_fontsize)
    ax.set_ylabel("$f(x)$", fontsize = 1.6*my_fontsize)
    
    ax.text(10, 4500, 'JUST KIDDING', fontsize = 40)
    
    # Display legend
    ax.legend(loc = 'best', frameon=False, fontsize = my_fontsize)

# `matplotlib` has much more than just line plots

`matplotlib` enables us to make `scatter`, `bar`, `histogram`, `heatmaps`, `box plots`, and `violin` plots. 

You can see some simple examples of all of these types at the Matplotlib Gallery [page](http://matplotlib.org/gallery.html).

One of the most used types of plots is a [histogram](https://en.wikipedia.org/wiki/Histogram). This is the type of plot you use when you want to examine the distribution of values in a dataset.

**Find time to explore!!**