# Python for Neuroscientists Week 7: Data Visualization


# Loading and preparing our *neural data!!*
We'll begin today by loading in some (real!) data. Before we do that, let's import some packages:

In [None]:
#remember that when we use 'as', we are simply telling Python what term we want to refer to our imported packages by
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#this does some important stuff behind the scenes for plotting specifically for jupyter notebooks (same in vscode), ask us if you want to learn more
%matplotlib inline 

Now, let's load and examine our data!

In [None]:
data = pd.read_csv('SST_data.csv')
data

This dataset is borrowed from [Neuromatch Academy](https://compneuro.neuromatch.io/projects/neurons/README.html), but this is very much real data, generated by the Allen Institute! Briefly, what you see here are two-photon calcium imaging signals from a single mouse performing a visual change detection task. I've curated this dataset just a little bit, so we're only looking at SST-expressing interneurons. To better understand what's going on here, let's traverse this dataframe a little bit.

P.S. If you want to learn more about the dataset, check out the youtube video in the NMA link!

In [None]:
#You can index dataframes using .COLUMN_NAME - Pandas is very flexible!
data.cell_id.unique()

In [None]:
singlecell_trial_data = data[(data.trial_id == 24) & (data.cell_id == 1086500633)]
singlecell_trial_data

Ok, now that we've examined the dataset a little bit, let me provide some documentation:

`dF/F` is the instantaneous calcium imaging signal <br>
`time_from_stim` is the timepoint of each row of data, aligned to an image presentation <br>
`cell_id` self explanatory I hope <br>
`exposure` whether the image for a a trial was familiar or novel  <br>
`trial_id` each image presentation is a separate trial <br>
`omitted` whether a trial had an omitted image <br>
`pupil_area` measured 500ms after stimulus presntation <br>
`mean_response` average dF/F over the 500ms following image presentation <br>

## Problem 1
Can you determine how many trials are in this dataset?
<details>
<summary>Click here for hint</summary>
Take a look at what we did above for cell_id - how would you do the same for trial_id, and how do you count the number of values in an array?
</details>

# Plotting Simple Data with Matplotlib
Let's say we just want to plot the trace of a single cell, on a single trial. The package **Matplotlib** (which we imported as `plt`) is perfect for this type of plotting. Let's pull out some data.

In [None]:
single_trial_data = data[(data.trial_id == 605) & (data.cell_id == 1086500092)]
single_trial_trace = np.array(single_trial_data.dF/F)

In [None]:
single_trial_trace

In [None]:
plt.plot(single_trial_trace)

Pretty simple right! We can even include the timepoints on the x-axis if we want:

In [None]:
single_trial_timepoints = np.array(single_trial_data['time_from_stim'])
plt.plot(single_trial_timepoints, single_trial_trace)

#### Labels
I mean, this noisy data is great and all, but what are we even looking at on our x and y axes? Let's add some labels and a title!

In [None]:
plt.plot(single_trial_timepoints, single_trial_trace)
plt.xlabel('Time from image presentation')
plt.ylabel('dF/F')
plt.title('Is this a publishable graph?')

#### Resizing figures
We can also resize our plot if we'd like, by calling the function `plt.figure`. If you're interested in what's going on here, talk to us after class.

In [None]:
plt.figure(figsize = (3, 2))
plt.plot(single_trial_timepoints, single_trial_trace)
plt.xlabel('Time from image presentation')
plt.ylabel('dF/F')
plt.title('This is a publishable graph')

Ok, this data is a little underwhelming at a single-cell level, ngl. What if we want to plot a bunch of data together?

In [None]:
plt.plot(data['dF/F'])

Um that doesn't work. We could spend a lot of time parsing through our dataframe to organize the data correctly, but there are easier ways to plot datasets with many variables:


# Plotting data with Seaborn

One thing that's important to note about our dataset is that it's long-form, not wide (e.g. column labels don't correspond to time, but rather, each row represents a separate single observation). While this may seem confusing at first, it makes things intensely convenient when we use **Seaborn**, a plotting package that's built on top of matplotlib.

Seaborn is super simple: all you need to provide is your data, and then specify what columns you want on the x and y axis. For example:

In [None]:
#ignore this for now, it just makes things look good
sns.set_style("ticks")
sns.set_context("notebook")

sns.lineplot(data = data, x = 'time_from_stim', y = 'dF/F')

What if we want our x-axis to be categorical instead? Let's google a solution!

## Problem 2
Make a barplot, but instead of using mean response data, plot pupil size instead.

## Splitting up data in Seaborn
What if we want to split up our lineplot by familiar and novel trials? Seaborn makes this super easy by allowing you to pass a label to `hue`.

In [None]:
sns.lineplot(data = data, x = 'time_from_stim', y = 'dF/F', hue = 'exposure')

Notice how seaborn always includes error intervals? We can turn these off if we'd like, but let's leave them be for now. Let's now check the documentation to find other ways to split up data.

## Problem 3
1) Can you figure out how to make a histogram of pupil area in Seaborn? Here's some edited data for you to use. <br>
2) Once you do this, can you figure out how to split the histogram by `exposure`? <br>
3) CHALLENGE: If you have more time, play around with your plot! Be creative and see what else you can add to your histogram, using the documentation as a guide.

In [None]:
data_sample = data.sample(1000)

## Figure aesthetics
Raw seaborn plots look ... not bad. It is nice to have more control, though. Let's start with setting styles and contexts. I'm gonna use another plot type - this one shows individual datapoints.

In [None]:
sns.set_style("whitegrid")
sns.set_context('talk')
sns.catplot(data = data_sample, x = 'exposure', y = 'mean_response')

In [None]:
sns.set_style("white")
sns.set_context('poster')
sns.catplot(data = data_sample, x = 'exposure', y = 'mean_response')

In [None]:
sns.set_style("white")
sns.set_context('poster')
sns.catplot(data = data_sample, x = 'exposure', y = 'mean_response', s = 1)

#### Colors!
Seaborn has many options to set colors for plots. Usually, the best strategy is to use a build in palette, but you can override this if you'd like as well.

In [None]:
sns.set_context('talk')
sns.catplot(data = data_sample, x = 'exposure', y = 'mean_response', color = 'blue')

In [None]:
sns.set_context('talk')
sns.catplot(data = data_sample, x = 'exposure', y = 'mean_response', palette = 'gray')

#### Labels
Say we want to change the names of some of our labels. There are a few ways to do this, but the easiest is to interface with **matplotlib**


In [None]:
sns.set_style('ticks')

sns.lineplot(data = data, x = 'time_from_stim', y = 'dF/F', hue = 'exposure', palette = 'Blues')
plt.xlabel('Delta Stim')
plt.ylabel('Neural Activity')
plt.title('My first seaborn plot title!')

#### Figure size

In [None]:
plt.figure(figsize=(10,4))
sns.lineplot(data = data, x = 'time_from_stim', y = 'dF/F', hue = 'exposure', palette = 'Blues')
plt.xlabel('Delta Stim')
plt.ylabel('Neural Activity')
plt.title('My first seaborn plot title!')

#### Multiple plots in one figure
Seaborn has a really nice function called `relplot` which makes it very easy to split data based on various features into grids of graphs. This is super useful when you have tons of categorical multivariate data to parse through.

In [None]:
sns.relplot(data = data, kind = 'line', x = 'time_from_stim', y = 'dF/F', col = 'exposure', row = 'omitted')

## Problem 4
1) Take one of the graphs we've created in class and change: <br>
    a) colors - find a colorblind friendly palette for me ;) <br>
    b) labels - what would you want the x and y axes to say? <br>
    c) title - up to you <br>
    d) CHALLENGE_1: modify the legend, if there is one on the graph - can you change the bounding box color/style? <br>
    e) CHALLENGE_2: can you figure out a way to change the font of your graphs? <br>

    Feel free to use the documentation as a guide, and google away!

### Saving Figures
Saving figures can be tricky at times. You have to specify a high DPI (300, depends on the size of your plot) if you want decent resolution, and sometimes axes can get lopped off.

In [None]:
plt.figure(figsize=(8,4))
sns.lineplot(data = data, x = 'time_from_stim' , y = 'dF/F', hue = 'exposure', style = 'omitted')
plt.xlabel('Delta Stim')
plt.ylabel('dF/F')
plt.title('Real Data')

plt.savefig('my_figure.png', dpi = 300)

Oh no! Our axes, it's broken!

In [None]:
plt.figure(figsize=(8,4))
sns.lineplot(data = data, x = 'time_from_stim' , y = 'dF/F', hue = 'exposure', style = 'omitted')
plt.xlabel('Delta Stim')
plt.ylabel('dF/F')
plt.title('Real Data')
#use this bbox_inches command to fix things
plt.savefig('my_figure.png', dpi = 300, bbox_inches = 'tight')

In [None]:
plt.figure(figsize=(8,4))
sns.lineplot(data = data, x = 'time_from_stim' , y = 'dF/F', hue = 'exposure', style = 'omitted')
plt.xlabel('Delta Stim')
plt.ylabel('dF/F')
plt.title('Real Data')
#use this bbox_inches command to fix things
plt.savefig('my_figure.pdf', dpi = 300, bbox_inches = 'tight')