## Import our modules.  Remember it is always good to do this at the begining of a notebook.

If you don't have seaborn, you can install it with `conda install seaborn`.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Use the notebook magic to render matplotlib figures inline with the notebook cells

In [None]:
%matplotlib inline

### Let's begin!
First, we need to load our data.

We'll use pandas `read_csv` function.  Now, if you have trouble opening the file, remember how we solved the problem this morning.

In [None]:
df = pd.read_csv("HCEPDB_moldata.csv")

Let's take a look at the data to make sure it looks right with head and then look at the shape of the data frame.

In [None]:
df.head()

In [None]:
df.shape

OK, that's a lot of data.  Let's take a random subsampling of the full dataframe to make playing with the data faster. This is something you may consider doing when you have large data sets and want to do data exploration.  Thankfully, pandas has a nice feature called sample that will take a randome sample from our dataframe.

In [None]:
df_sample = df.sample(frac=.1)

In [None]:
df_sample.head()

In [None]:
df_sample.shape

Cool. Cool, cool, cool.  Now we have a subset of data for some plotting fun.  We say some basic plots this AM with pandas, but let's do some nicer ones.  Let's start with PCE vs HOMO energy.

In [None]:
df.plot.scatter('pce', 'e_homo_alpha')

Ooops!  We used the wrong dataframe.  That took a while, didn't it.  We can use the magic %%timeit to see how long that took.  By default %%time it repeats the function call some number of times and averages it.  For this purpose let's do one time.  See the timeit docs [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).

In [None]:
%%timeit -n 1 -r 1
df.plot.scatter('pce', 'e_homo_alpha')

Now let's see for our subsampled dataframe.

In [None]:
%%timeit -n 1 -r 1
df_sample.plot.scatter('pce', 'e_homo_alpha')

Nice... A lot shorter!  Notice about 10% of the data resulted in a 1/10 run time.  Makes sense.

But this thing is UGLY!  Let's see if we can't pretty it up.  First thing is that `pd.plot.XXX` returns a plot object that we can modify before it gets rendered by calling certain methods on the object.  Remember you can always use the Jupyter notebook tab completion after an object to find out what methods are available.

In [None]:
p_vs_homo_plt = df_sample.plot.scatter('pce', 'e_homo_alpha')
p_vs_homo_plt.set_xlabel('PCE')
p_vs_homo_plt.set_ylabel('HOMO')
p_vs_homo_plt.set_title('Photoconversion Efficiency vs. HOMO energy')

That's a bit better, but there are still some things we can do to make it look nicer.  Like put it on a grid and make the y-axis label more accurate and increase the size as well as setting the aspect ratio.

In [None]:
p_vs_homo_plt = df_sample.plot.scatter('pce', 'e_homo_alpha', figsize=(6,6))
p_vs_homo_plt.set_xlabel('PCE')
p_vs_homo_plt.set_ylabel('$E_{HOMO}$')
p_vs_homo_plt.set_title('Photoconversion Efficiency vs. HOMO energy')
p_vs_homo_plt.grid()

Let's take a moment to figure something out.  Let's figure out how to do the following:
* How to change the x range to be 2 to 10
* How to change the y range to be -6 to -2
* How to change the font size to 18
* How to change the colors and transparency

### The pandas visualization tools documentation is really good:
* [docs here](https://pandas.pydata.org/pandas-docs/stable/visualization.html)

One thing that is very useful is a scatterplot matrix to show the relationship between variables.  Let's make one now.  Be patient as this makes a lot of plots!

In [None]:
from pandas.tools.plotting import scatter_matrix
scatter_matrix(df_sample, figsize=(10,10), alpha=.2)

WOW!  That is insane!  But it does give us a quick overview of the relationship between all the variables in the data frame.  That id column plot is goofy.  The ids are the molecule ids and don't contain any molecular information.  Let's turn that column into an index and move on.

In [None]:
df_sample.set_index('id', inplace=True)

In [None]:
df_sample.head()

OK, moving on, let's look at making density plots.  These show the probability density of particular values for a variable.  Notice how we used a different way of specifying the plot type.

In [None]:
df_sample['pce'].plot(kind='kde')

Let's plot the kde overtop of the histogram (remember the histogram from this AM?).  The key here is to use a secondary axis.  First we save the plot object to `ax` then pass that to the second plot.

In [None]:
ax = df_sample['pce'].plot(kind='hist')
df_sample['pce'].plot(kind='kde', ax=ax, secondary_y=True)

## NEAT!

What about trying other plot styles?  We can do this by calling `matplotlib.style.use(...)`.  Let's try the `ggplot` style that looks like the ggplot2 default style from R.

In [None]:
import matplotlib
matplotlib.style.use('ggplot')

In [None]:
df_sample['pce'].plot(kind='kde')

You can find the list of matplotlib styles [here](https://tonysyu.github.io/raw_content/matplotlib-style-gallery/gallery.html).

## Seaborn for fun and pretty pictures!
Matplotlib is great for basic scatter plots, bar plots, time series, etc.  But if we want to do really fancy plots, we need to look to other tools like Seaborn.  This is a super quick intro to seaborn.

We'll make three different contour / surface plots.
* Basic contour plot
* Density plot

Examples roughly taken from [here](https://python-graph-gallery.com/1136-2/).

In [None]:
sns.set_style('white')
sns.kdeplot(df_sample['pce'], df_sample['e_homo_alpha'])

In [None]:
sns.kdeplot(df_sample['pce'], df_sample['e_homo_alpha'], cmap='Reds', shade=True, bw=.15)

### Super COOL!

Let's go back to pandas and matplotlib and look at subplots.

In [None]:
fix, axes = plt.subplots(nrows=2, ncols=1, figsize=(6,6))
df_sample.plot(x='pce', y='e_homo_alpha', ax=axes[0])
df_sample.plot(x='pce', y='e_gap_alpha', ax=axes[1])

Ooops!  That doesn't look at all right?  What's wrong with this figure?

### In class exercise

Fix up the above subplots so that they look like what we might expect.  Also, add titles, increase the font size, change colors and alpha, and finally figure out how to change the margins and layout so they are side by side.