# Week 7: Data Visualization
## `Seaborn` 

## Building on `matplotlib`

`matplotlib` was built to emulate `MATLAB`'s plotting functinality.  

Let's take a quick pole of the room....  
If you are a `MATLAB` user and you actually like plotting in `MATLAB`, raise your hand...

<img src="https://memegenerator.net/img/instances/60808274/crickets-chirping-awful-quiet-round-here.jpg" width="80%" style="margin-left:auto; margin-right:auto">

## [Seaborn](https://seaborn.pydata.org/)

* a Pythion visualization library built on top of `matplotlib`
* a 'higher level' UI
* aesthetic as the default

<img src="https://pbs.twimg.com/media/EhGuwXWXgAEERcn.png" width="30%" style="margin-left:auto; margin-right:auto">

## Let's see how Seaborn makes this easy...

In [None]:
# to get started
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

## revisit the Pima Indians dataset

In [None]:
path = 'https://raw.githubusercontent.com/SmilodonCub/DS4VS/master/datasets/diabetes.csv'
diabetes_pima = pd.read_csv( path )
pimacolumns_2change = ['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI']
diabetes_pima[ pimacolumns_2change ] = diabetes_pima[ pimacolumns_2change ].replace(0, np.nan )
diabetes_pima.head()

In [None]:
sns.scatterplot(x="Age", y="DiabetesPedigreeFunction", hue="Outcome", data=diabetes_pima)
plt.show()

a relatively simple call produces an arguably 'prettier' plot from fewer lines of code than `matplotlib`

## The Messidor Diabetic Retinopathy Debrecan dataset

a dataset that contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. (this is not the image dataset) [For more information ](https://archive.ics.uci.edu/ml/datasets/Diabetic+Retinopathy+Debrecen+Data+Set)[here is a publication](https://arxiv.org/pdf/1410.8576v1.pdf)

In [None]:
columns = [ 'quality', 'prescreen_result', 
           'MA_05', 'MA_06', 'MA_07', 'MA_08', 'MA_09', 'MA_10',
           'EP_03', 'EP_04', 'EP_05', 'EP_06', 'EP_07', 'EP_08', 'EP_09', 'EP_10', 
           'macula2OD', 'OD_diameter', 'AM_FM_class', 'DR_classification']

# MA == microaneurysm
# EP == exudate pixels
url = 'https://raw.githubusercontent.com/SmilodonCub/DS4VS/master/datasets/messidor_features.csv'
DR_df = pd.read_csv( url, names = columns )

print( DR_df.shape )
DR_df.head()

## Let's do some light EDA on the set

In [None]:
DR_df.info()
#DR_df.describe()

### use `Seaborn` to explore features ~ `DR_classification`

this is an interesting dataset. I promise we will pick it apart when we go over machine learning, but for now, let's just visually explore the data using `Seaborn`

In [None]:
# optic disc diameter ~ DR_classification
sns.boxplot( x = 'DR_classification', y = 'OD_diameter', data = DR_df )
plt.show()

In [None]:
# optic disc diameter ~ DR_classification
sns.boxplot( x = 'DR_classification', y = 'macula2OD', data = DR_df )
plt.show()

## Pivot Long - reshape the data facilitate a facetted plot

In [None]:
DR_df['case'] = DR_df.index
melted_DR = pd.melt( DR_df, id_vars = ['case', 'DR_classification'], 
        value_vars = ['MA_05', 'MA_06', 'MA_07', 'MA_08', 'MA_09', 'MA_10'])
melted_DR.head( 15 )

In [None]:
sns.catplot(data=melted_DR, x='DR_classification', y='value',col='variable', kind='box', col_wrap=2 )
plt.show()

## Now you try!

pivoting data wide to long:

1. melt `DR_df` for the 'exudate pixels' (EP_) columns
    * id_vars: columns to remain as index (rows
    * value_vars: columns to be 'melted' across new rows
2. visualize a facited boxplot for each EP alpha level by the target varaible, `DR_classifcation`

In [None]:
# use this space here. feel free to copy/pasta 

## Use `Seaborn` to explore some fMRI data

In [None]:
# this dataset is an example set that comes with your installation of Seaborn
fmri_df = sns.load_dataset( 'fmri' )
print( fmri_df.shape )
fmri_df.head()

In [None]:
# light preprocessing
# change the index to subject
fmri_df = fmri_df.set_index( 'subject' )

In [None]:
sns.relplot( x = 'timepoint', y = 'signal', hue = 'region', style = 'event', data = fmri_df )
plt.show()

In [None]:
sns.relplot( x = 'timepoint', y = 'signal', hue = 'region', style = 'event', data = fmri_df,
           kind = 'line' )
plt.show()

In [None]:
sns.relplot(
    data=fmri_df, x='timepoint', y='signal',
    hue='region', col='event', kind='line', col_wrap=2
)
plt.show()

## `Seaborn`  summary

Unfortunately, we cannot spend all day exploring `Seaborn`.  
However, spend some time exploring the [`Seaborn` Gallery](https://seaborn.pydata.org/examples/index.html) and the [Python Graph Gallery](https://www.python-graph-gallery.com/) for more inspiration

## so many other libraries...

...one of them might be just right for your data:

* [Case Studies in Neural Data Analysis](https://mark-kramer.github.io/Case-Studies-Python/intro.html)
* [open-neuroscience (python & beyond](https://open-neuroscience.com/)
* [MNE-Python for mostly human neurophys data](https://github.com/mne-tools/mne-python)
* [Nitime - timeseries analysis for neurodat](https://github.com/nipy/nitime)

## a parting example

the xkcd plotting mode for `matplotlib`

In [None]:
with plt.xkcd():
    
    fig = plt.figure()
    ax = fig.add_axes((0.1, 0.2, 0.8, 0.7))
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_ylim([-30, 10])

    data = np.ones(100)
    data[70:] -= np.arange(30)

    ax.annotate(
        'THE DAY I STARTED\nTHE PhD PROGRAM',
        xy=(70, 1), arrowprops=dict(arrowstyle='->'), xytext=(15, -10))

    ax.plot(data)

    ax.set_xlabel('time')
    ax.set_ylabel('my overall health')
    
plt.show()

<img src="https://imgs.xkcd.com/comics/python.png" width="45%" style="margin-left:auto; margin-right:auto">



## Next week we will build interactive visualizations
<img src="https://content.techgig.com/photo/80071467/pros-and-cons-of-python-programming-language-that-every-learner-must-know.jpg?132269" width="100%" style="margin-left:auto; margin-right:auto">