## Lesson 7: Plotting using Matplotlib

Okay, there's a lot more to this than we can realistically cover in one lecture, but there are lots of things you can do to make really cool figures in Python. We're going to be using Matplotlib, which is a plotting library that took a lot of the plotting functionality from the popular MATLAB software, re-wrote it in Python, and made it a lot easier to use.

Matplotlib has a ton of features and can be incredibly powerful, thus, we only have time to cover the basics. Our goal is to give you a good enough understanding of how everything is set up so that you can start on your own and teach yourself the rest of what you'll need. Fortunately, the inline documentation for Matplotlib is pretty good, and you can look at the [extensive gallery](http://matplotlib.org/gallery.html#api) of examples and figure out how to make similar plots with your own data.

We will be using the pyplot style. For more information on the different levels and sytles of Matplotlib see the [usage FAQ](http://matplotlib.org/faq/usage_faq.html).

Let's make our first plot! Let's say we have some data that's approximately a line, but there's some noise in it. Let's plot it:

In [None]:
%matplotlib inline

The above line tells iPython notebook to display created figures in the notebook, which is handy for these demonstrations

In [None]:
import matplotlib.pyplot as plt
import random

x = range(0,100)
y = [0.5 * i + 5 + 10*random.uniform(0,3) for i in x]

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(x, y)

plt.show()

Not too hard, but what did we just do?

### *matplotlib.pyplot.**figure()***  
This instantiates a figure object, which you can fill with one or more subplot objects (e.g. Fig. 1A, Fig. 1B, Fig. 1C, etc). **figure()** has many optional arguments which set global properties of your figure, like the size and resolution. One of the most usefull is the **figsize** kwarg.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import random

x = range(0,100)
y = [0.5 * i + 5 + 10*random.uniform(0,3) for i in x]

fig = plt.figure()
##plt.figure??
##fig = plt.figure(figsize=(6,12))
ax = fig.add_subplot(1,1,1)
ax.plot(x, y)

plt.show()


### *figure.**add_subplot()***  
This method of figure objects instantiates an Axes object within the figure. In Matplotlib what we would think of as a graph, plot, or figure, is called an Axes, after the X and Y axes. Axes have many properties like x and y limits, a set of major (and sometimes minor) ticks for the x and y axes, an optional legend, and lots more data. Most of the time when you are making a figure you'll be working with the Axes. 

![Parts of a Figure](http://matplotlib.org/_images/fig_map.png "Parts of a Figure")

The [manual](http://matplotlib.org/api/figure_api.html#matplotlib.pyplot.figure) tells us to look at [matplotlib.pyplot.subplot()](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot) for an explanation of the three mandatory arguments to **add_subplot()**. We see they are **(nrows, ncols, plot_number)**. Note that plot_number starts at 1, not 0 like you would expect from Python. So to create a 4 x 1 array of plots we could do:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import random

x = range(0,100)
y = [0.5 * i + 5 + 10*random.uniform(0,3) for i in x]

fig = plt.figure(figsize=(9,3))

ax1 = fig.add_subplot(1,4,1)
ax2 = fig.add_subplot(1,4,2)
ax3 = fig.add_subplot(1,4,3)
ax4 = fig.add_subplot(1,4,4)

# God, that was tedious, lets just keep all these in a list.
#axs = [fig.add_subplot(4,1,i+1) for i in range(4)]

ax1.plot(x, y)
#ax2.plot(x, y)
#ax3.plot(x)
#ax4.plot(y)
#print x,y
plt.show()

### *Axes.**plot()***  
Lastly, we plot our data series using this aptly named method. We can consult the documentation to find out what other kinds of arguments we can give it.

In [None]:
ax1.plot??

You can see that there's a lot of different things you can do for something as simple as plotting... Markers, colors, lines. If you keep reading, you can even incorporate labels for the lines. Let's try this code, now, and see what it looks like:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import random

x = range(0,100)
y = [0.5 * i + 5 + 10*random.uniform(0,3) for i in x]

fig = plt.figure(figsize=(6,12))
axs = [fig.add_subplot(4,1,i+1) for i in range(4)]

axs[0].plot(x, y, label="Original")
axs[1].plot(x, y, 'bo', label="Blue Points")
axs[2].plot(x, 'r+', label="Red + signs, x only")
axs[3].plot(y, '--', label="Dashed line, y only")

for i in range(4): axs[i].set_xlabel("x")

#legends = [ax.legend(loc='upper left') for ax in axs]
#plt.tight_layout() ##This might make your plot look nicer!

plt.show()

Above, we did a lot of things:
1. Introduced ways of making different colors and styles of plotting data. 
2. We also introduced adding labels and how to insert legends.
3. We also introduced a **tight_layout** to make your figure look nicer!

However, we could have plotted everything on a single figure instead!

In [None]:
x = range(0,100)
y = [0.5 * i + 5 + 10*random.uniform(0,3) for i in x]

fig = plt.figure(figsize=(6,12))
ax=fig.add_subplot(111)

ax.plot(x, y, label="Original")
ax.plot(x, y, 'bo', label="Blue Open Points")
ax.plot(x, 'r+', label="Red + signs, x only")
ax.plot(y, '--', label="Dashed line, y only")

ax.legend(loc='upper left')

plt.show()

##Let's also resize to make the figure more viewable - where do we do that?
##Notice any differences from the above sub-figures?


If we decide that we don't like the labels that we gave it before, we can pass a list of labels to **legend()**. Below, we change the legend, such that it no longer shows the given labels:

In [None]:
x = range(0,100)
y = [0.5 * i + 5 + 10*random.uniform(0,3) for i in x]

fig = plt.figure(figsize=(12,6))
ax=fig.add_subplot(111)

ax.plot(x, y, label="Original")
ax.plot(x, y, 'bo', label="Blue Open Points")
ax.plot(x, 'r+', label="Red + signs, x only")
ax.plot(y, '--', label="Dashed line, y only")

ax.legend(['First Entry', '2nd Entry'], loc='upper left')

plt.show()

##Notice any differences from the above sub-figures?

To see what other arguments we can give the **legend()** method, lets consult Python's help system:

In [None]:
ax.legend?

In [None]:
x = range(0,100)
y = [0.5 * i + 5 + 10*random.uniform(0,3) for i in x]

fig = plt.figure(figsize=(12,6))
ax=fig.add_subplot(111)

ax.plot(x, y, label="Original")
ax.plot(x, y, 'bo', label="Blue Open Points")
ax.plot(x, 'r+', label="Red + signs, x only")
ax.plot(y, '--', label="Dashed line, y only")

ax.legend(['First Entry', '2nd Entry'], loc='lower right', numpoints=1, fancybox=True, shadow=True)

plt.show()

### matplotlib.pyplot.savefig()
Ok, so let's say you've spent all this time and you're reasonably satisfied with the figure you've created. To save the figure into a file, use the **savefig** function:

In [None]:
x = range(0,100)
y = [0.5 * i + 5 + 10*random.uniform(0,3) for i in x]

fig = plt.figure(figsize=(12,6))
ax=fig.add_subplot(111)

ax.plot(x, y, label="Original")
ax.plot(x, y, 'bo', label="Blue Open Points")
ax.plot(x, 'r+', label="Red + signs, x only")
ax.plot(y, '--', label="Dashed line, y only")

ax.legend(['First Entry', '2nd Entry'], loc='lower right', numpoints=1, fancybox=True, shadow=True)

plt.savefig('Lesson7_Fig1.png',format='png')
#OR
plt.savefig('Lesson7_Fig1.pdf',format='pdf')

In [None]:
%%bash
ls

One useful trick in jupyter notebook is that you can also display images! Below are ways to directly look at the image in the notebook, if you don't want to go find the folder to open the file!

In [None]:
from IPython.display import Image
Image("Lesson7_Fig1.png") #,width=300,height=100)

In [None]:
from IPython.display import IFrame
IFrame("Lesson7_Fig1.pdf", width=600, height=300)

Pretty cool! You can load your data, graph it in the way that you want, and then save that figure, ready to go, or import into Illustrator or any other image editor of your choice for further editing.

Below we show some more complex examples, with many different types of plots:
1. Scatter Plots
2. Histogram
3. Bar Plots

## Scatter Plots

Scatter plots are useful to look at correlation. Usually, you have two sets of data, and you plot one on the x-axis and the other on the y-axis. Thus, any one data point in **x** matches up to a point in the other dataset **y**. This plots exactly as the tutorials we have been trying above. 

Let's take the data file "Lesson7_dataset1.txt" shown below (press Shift+Enter to save it to a text file). We will make a scatter plot comparing the UstIshim individual to the Han individual.


In [None]:
%%writefile Lesson7_dataset1.txt
P1/P2	Oroqen	Daur	Hezhen	Uygur	Xibo	Japanese	Korean	Tu	Tujia	Miao	Yi	She	Naxi	Atayal	Ami	Lahu	Dai	Kinh	Burmese	Thai	Cambodian
UstIshim	-9.9	-8.3	-10.1	-3.6	-9.4	-10.5	-9.7	-8.3	-9.9	-10.2	-10.7	-10.1	-10.4	-8.8	-10.2	-10	-10.7	-9.5	-9.2	-8.9	-8.9
Kostenki14	-10.8	-9	-11.2	-4.6	-10.6	-11.2	-10.6	-9.5	-10.9	-11.7	-11.9	-11.2	-11.4	-9.1	-11.1	-10.5	-11.5	-10.4	-10	-9.8	-9.4
Loschbour	-10.1	-8.5	-10.7	-3.3	-9.9	-10.9	-10.2	-8.5	-10.7	-10.8	-11.5	-10.5	-11	-8.8	-10.6	-10.7	-11.2	-9.8	-9.6	-9	-8.8
Han	0.8	2.5	0.6	12.3	2.3	0.6	1.2	4.1	0.5	-0.2	-0.8	0.7	0.7	1.1	-0.1	0.4	0.4	2.2	3	3.2	3.5

In [None]:
#%matplotlib inline
import matplotlib.pyplot as plt

pd="/Users/melyang/Desktop/PythonBootcamp2017/lessons/"
filename=pd+"Lesson7_dataset1.txt"
datfile=open(filename,'r')

datdict={}
for line in datfile:
    x=line.strip().split()
    if x[0]=="P1/P2": continue
    else:
        datdict[x[0]] = [float(i) for i in x[1:]]
        
datfile.close()

myx=datdict['Han']
myy=datdict['UstIshim']
     
fig = plt.figure(figsize=(3,3))
ax=fig.add_subplot(111)

ax.plot(myx, myy, 'o', label="Han vs UstIshim")
plt.tight_layout()
plt.show()

Now let's expand and try all pairs!

In [None]:
#%matplotlib inline
import matplotlib.pyplot as plt

pd="/Users/melyang/Desktop/PythonBootcamp2017/lessons/"
filename=pd+"Lesson7_dataset1.txt"
datfile=open(filename,'r')

datdict={}
for line in datfile:
    x=line.strip().split()
    if x[0]=="P1/P2": continue
    else:
        datdict[x[0]] = [float(i) for i in x[1:]]
        
datfile.close()

fig = plt.figure(figsize=(5,5))
ax=fig.add_subplot(1,1,1)
for indx,xkey in enumerate(datdict.keys()):
    for ykey in datdict.keys()[indx+1:]:
        myx=datdict[xkey]
        myy=datdict[ykey]
        ax.plot(myx, myy,'o',markersize=5,label="%s vs %s" % (xkey,ykey))


ax.legend(loc='upper right',fontsize=8)
plt.tight_layout()
plt.show()


Okay, we start to see patterns, but this figure is still very confusing. 

Let's do a few things: 
1. Put labels on the x- and y- axes.
2. Add two lines over x=0 and y=0 to orient ourselves.

In [None]:
#%matplotlib inline
import matplotlib.pyplot as plt

pd="/Users/melyang/Desktop/PythonBootcamp2017/lessons/"
filename=pd+"Lesson7_dataset1.txt"
datfile=open(filename,'r')

datdict={}
for line in datfile:
    x=line.strip().split()
    if x[0]=="P1/P2": continue
    else:
        datdict[x[0]] = [float(i) for i in x[1:]]
        
datfile.close()
     
fig = plt.figure(figsize=(5,5))
ax=fig.add_subplot(1,1,1)
for indx,xkey in enumerate(datdict.keys()):
    for ykey in datdict.keys()[indx+1:]:
        myx=datdict[xkey]
        myy=datdict[ykey]
        ax.plot(myx, myy,'o',markersize=5,label="%s vs %s" % (xkey,ykey))

##1##
ax.set_ylabel("Z for D(Ind2,Asn;Tianyuan,Mbuti)")
ax.set_xlabel("Z for D(Ind1,Asn;Tianyuan,Mbuti)")
##

##2##
ax.axhline(y=0,color='black')
ax.axvline(x=0,color='black')
## How can we make these lines appear behind the dots instead of in front?

ax.legend(loc='upper right',fontsize=8)
plt.tight_layout()
plt.show()


In the figure above, what we were plotting is a statistic called the D-statistic, which looks for relative amounts of shared alleles (actually we plotted the Z-score corresponding to each D-statistic, but our results should be qualitatively similar). 

We plotted D(X, Asian Popn; Tianyuan, Mbuti). The Tianyuan individual is a 40,000-year-old individual from outside of Beijing (see [this paper](http://www.pnas.org/content/110/6/2223.short) - these data are actually unpublished but hopefully will be published soon!)

Basically, D > 0 indicates the individual X shares ***more*** alleles with the Tianyuan individual than other Asian populations share, while D < 0 indicates the individual X shares ***less*** alleles with the Tianyuan individual than other Asian populations share. 

We see that the UstIshim, Kostenki14, and Loschbour individuals show very negative results when compared to each other, indicating they all do not share nearly as many alleles with the Tianyuan individual than other Asian populations. 

Is there anything misleading about the figure? 
How many interesting things are going on? 
Are there two different things happening in the Han comparisons?

In [None]:
#%matplotlib inline
import matplotlib.pyplot as plt

pd="/Users/melyang/Desktop/PythonBootcamp2017/lessons/"
filename=pd+"Lesson7_dataset1.txt"
datfile=open(filename,'r')

datdict={}
for line in datfile:
    x=line.strip().split()
    if x[0]=="P1/P2": continue
    else:
        datdict[x[0]] = [float(i) for i in x[1:]]
        
datfile.close()

fig = plt.figure(figsize=(5,5))
ax=fig.add_subplot(1,1,1)

ax.axhline(y=0,color='black')
ax.axvline(x=0,color='black')

for indx,xkey in enumerate(datdict.keys()):
    for ykey in datdict.keys()[indx+1:]:
        if xkey=="Han":
            myx=datdict[ykey]
            myy=datdict[xkey]
            mylabeltuple=(ykey,xkey)
        else:
            myx=datdict[xkey]
            myy=datdict[ykey]
            mylabeltuple=(xkey,ykey)
        ax.plot(myx, myy,'o',markersize=5,label="%s vs %s" % mylabeltuple)

ax.set_ylabel("Z for D(Ind2,Asn;Tianyuan,Mbuti)")
ax.set_xlabel("Z for D(Ind1,Asn;Tianyuan,Mbuti)")


ax.legend(loc='lower right',fontsize=8)
plt.tight_layout()
plt.show()


## Histograms
Histograms give a distribution of your statistics. 

This might showcase our results even better than the scatter plot above.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

pd="/Users/melyang/Desktop/PythonBootcamp2017/lessons/"
filename=pd+"Lesson7_dataset1.txt"
datfile=open(filename,'r')

datdict={}
for line in datfile:
    x=line.strip().split()
    if x[0]=="P1/P2": continue
    else:
        datdict[x[0]] = [float(i) for i in x[1:]]
        
datfile.close()
     
fig = plt.figure(figsize=(5,5))
ax=fig.add_subplot(1,1,1)

for xkey in datdict.keys():
    ax.hist(datdict[xkey],label="X = %s" % xkey)
    
ax.axvline(x=0, color='r', linestyle='dashed', linewidth=2)
ax.set_xlabel("X")
ax.set_ylabel("Z for D(X,Asn;Tianyuan,Mbuti)") #alpha=0.5

ax.legend(loc='upper right',fontsize=8)
plt.tight_layout()
plt.show()


## Bar Plots

Here's a bar plot example!

Let's take the protein script from Exercise 2 and the modified form we used in Lesson 5, where we used the **collections** module. Perhaps we want to get a count of each amino acid we used? 

In [None]:
#%matplotlib inline
import matplotlib.pyplot as plt

pd="/Users/melyang/Desktop/PythonBootcamp2017/resources/"
protSeq = []
f1 = open(pd+'2q6h.pdb', 'r')
for next in f1:
    if next[:6] == 'SEQRES':
        line = next.strip().split()
        del line[:4]
        for aa in line:
            protSeq.append(aa)
f1.close()

import collections
mycounts=collections.Counter(protSeq).most_common()
mylabels=[i[0] for i in mycounts]
mycountvals=[i[1] for i in mycounts]

fig = plt.figure(figsize=(10,3))
ax = fig.add_subplot(1, 1, 1)

x = range(len(mylabels))
ax.bar(x,mycountvals,align='center')
ax.set_xlim(-1,len(mylabels))
ax.set_xticks(x)
ax.set_xticklabels(mylabels)
plt.tight_layout()
plt.show()

One final note: the [matplotlib documentation](http://matplotlib.org/contents.html) can be immensely helpful in aiding you to make a plot that fits your science needs. Not only is there a gallery of example plots, but there are also demo scripts that give you the code for how to make these plots. These demo scripts can easily be adapted for your own plotting purposes. No need to try and remember all of these functions and methods! Plus, unless you make the same kind of plot many many times, it would be virtually impossible to remember all of the different methods and functions that this plotting library contains.