# Data visualization with Matpotlib: Scatter plots

**Created by: Kirstie Whitaker**
<br>Adapted by: Katie Bottenhorn

**Created on: 29 July 2019**
<br>Edited on: 13 August 2019

From a manuscript in preparation about how the functional organization of the brain is related to IQ in a sample of FIU students taking their first physics course.

This tutorial is going to recreate a figure that presents some the brain-behavior results from our study of the relationship between brain organization and IQ.

![](https://raw.githubusercontent.com/62442katieb/NH19-Visualization/master/figures/fig4_small.png)
<br>**Figure 4. Post-instruction IQ is related to characteristic path length during physics reasoning.** And there are differences with respect to students' sex and classroom environment.
### Take what you need

The philosophy of the tutorial is that I'll start by making some very simple plots, and then enhance them up to "publication standard".

You should take _only the parts you need_ and leave the rest behind.
If you don't care about fancy legends, or setting the number of minor x ticks, then you can stop before we get to that part.

The goal is to have you leave the tutorial feeling like you know _how_ to get started writing code to visualize and customise your plots.

## There are so many good resources available online!

Seaborn has a [rich gallery](https://seaborn.pydata.org/examples/index.html) with example code & data for creating a wide variety of plots.

Nilearn provides a [similar resource](http://nilearn.github.io/plotting/index.html) for making figures with brains.

## Import modules

We're importing everything up here.
And they should all be listed in the [requirements.txt](requirements.txt) file in this repository.
Checkout the [README](README.md) file for more information on installing these packages.

In [None]:
from IPython.display import display

import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import os
import pandas as pd
from statsmodels.formula.api import ols
import seaborn as sns

from nilearn import plotting
import warnings
warnings.filterwarnings("ignore")

## The task

In this study we asked introductory physics students at FIU to lie in an fMRI scanner and complete aseries of memory and reasoning tasks.

The task we'll focus on here required students to engage in physics reasoning to answer conceptual questions about Newtonian mechanics from the Force Concept Inventory (FCI).

* The **FCI** questions (shown on the left in the figure below) showed diagrams of  

* The **control** questions (shown on the right in the figure below) were the control questions and the participants were simply asked asked perceptual questions, etc. that required little to no reasoning.

![](https://raw.githubusercontent.com/62442katieb/NH19-Visualization/master/figures/FCI-example.png)

The answers to the **FCI** question are demonstrate different conceptions about the laws of physics.

The answers to the **control** questions are pretty straightforward.

### Hypothesis
1. There is a relationship between IQ and brain function while students are reasoning questions.
2. These relationships are different with respect to the course type (active learning vs. lecture) in which the students were enrolled.

## With great power comes great responsibility

I've listed some hypotheses above.
We can't confirm or reject them by visualizing the data.

Just because a line _looks_ like it is going up or down, doesn't mean it is statistically significantly doing so.

You can tell many stories with a picture...including ones that mislead people very easily.

Be careful!

## The data

The data is stored in the `data/small_data.csv` file, which a subset of the variables from the larger study.

It contains a subset of variables from a much wider dataset, for ease of use.

The important columns are:

* `AgeOnScanDate`
* `Strt.Level`
* `F`
* `Mod`
* `post_phys_cpl`
* `post_ctrl_cpl`
* `normalized head size`
* `post phys fci fd`
* `post ctrl fci fd`
* `post_fsiq`

The first three variables are demographic information (age, year in school, and sex), the third, `Mod`, is the course type in which the student was enrolled (1: active learning class, 0: traditional lecture class).

The `post_fsiq` variable contains post-instruction IQ scores.

The `post_phys_cpl` and `post_ctrl_cpl` variable has each subject's post-instruction characteristic path length measure (describes ease of brain-wide functional integration) computed from fMRI data collected _while_ they performed the aforementioned task under FCI and control conditions.

In [None]:
# Read in the data
df = pd.read_csv('data/small_data.csv', index_col=0)

# Take a look at the first 5 rows
print ('====== Here are the first 5 rows ======')
display(df.head())

# Print all of the columns - its a big wide file 😬
print ('\n\n\n====== Here are all the columns in the file======')
display(df.columns)

# And now lets see the summary information of this subset of the data
print ('\n\n\n====== Here are some summary statistics from the columns we need ======')

display(df.describe())

In [None]:
df['post_phys_cpl_norm'] = (df['post_phys_cpl'] - np.mean(df['post_phys_cpl'].values))/np.std(df['post_phys_cpl'].values)
df['post_ctrl_cpl_norm'] = (df['post_ctrl_cpl'] - np.mean(df['post_ctrl_cpl'].values))/np.std(df['post_ctrl_cpl'].values)

## A quick scatter plot

The first thing we'll do is take a look at our first hypothesis: that brain organization during physics reasoning, but not during control courses, is associated with IQ.

Lets start by making a scatter plot with **IQ** on the x axis and **characteristic path length during physics reasoning** on the y axis.

In [None]:
 # <----- A simple scatter plot

Cool, what about the accuracy on the analogy task?

In [None]:
plt.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm']) # <----- Update to pull from a differen column

That's nice, but it would probably be more useful if we put these two on the _same_ plot.

In [None]:
plt.scatter(df['post_fsiq'], df['post_phys_cpl_norm']) # <----- Both of the commands above
plt.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm']) #        on consecutive lines

Woah, that was very clever!

Matplotlib didn't make two different plots, it assumed that I would want these two plots on the same axis because they were in the same cell.

If I had called `plt.show()` inbetween the two lines above I would have ended up with two plots: 

In [None]:
plt.scatter(df['post_fsiq'], df['post_phys_cpl_norm'])
# <---------------------------------------- Show plot in the middle of the two commands
plt.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'])

## Being a little more explicit

The scatter plot above shows how easy it is to plot some data - for example to check whether you have any weird outliers or if the pattern of results generally looks the way you'd expect.

You can stop here if your goal is to explore the data ✨

But some times you'll want to have a bit more control over the plots, and for that we'll introduce the concepts of a matplotlib `figure` and an `axis`.

To be honest, we aren't really going to introduce them properly because that's a deeeeeep dive into the matplotlib object-orientated architecture.
There's a nice tutorial at [https://matplotlib.org/users/artists.html](https://matplotlib.org/users/artists.html), but all you need to know is that a **figure** is a figure - the canvas on which you'll make your beautiful visualisation - and it can contain multiple **axes** displaying different aspects or types of data.

In fact, that makes it a little easier to understand why the way that many people create a figure and an axis is to use the [`subplots`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots) command.
(And here's a [stack overflow answer](https://stackoverflow.com/questions/34162443/why-do-many-examples-use-fig-ax-plt-subplots-in-matplotlib-pyplot-python) which explains it in a little more depth.)

If you run the command all by itself, you'll make an empty axis which takes up the whole figure area:

In [None]:
 # <--------- A simple set up

Lets add our plots to a figure:

In [None]:
fig, ax = plt.subplots() # <------------------------------- A simple set up
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm']) # <------- The commands above
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'])
plt.show()

Did you see that this time we changed `plt.scatter` to `ax.scatter`? 

That's because we're being more specific about _where_ we want the data to be plotted.
Specifically, we want it on the first (only) axis in our figure.

We also got explicit about telling jupyter to show the plot with `plt.show()`.
You don't need this, but its good practice for when you start coding lots of plots all in one go and don't want them to all appear on the same axis 😂

In [None]:
df = df.sort_values(by='post_fsiq', axis=0)

## Let's add a regression line

I used [`statsmodels`](https://www.statsmodels.org/stable/index.html) to fit the model and to get the predicted values. 

I'm gonna throw a few more variables in the mix, too, to make sure our differences in topology aren't related to head motion or head and IQ isn't confounded by head size.

In [None]:
# FCI
formula_phys = 'post_phys_cpl_norm ~ post_fsiq'
mod_phys = ols(formula=formula_phys, data=df)
results_phys = mod_phys.fit()
print(results_phys.summary())
predicted_phys = results_phys.predict()
print(predicted_phys)

# Control
formula_ctrl = 'post_ctrl_cpl_norm ~ post_fsiq'
mod_ctrl = ols(formula=formula_ctrl, data=df)
results_ctrl = mod_ctrl.fit()
print(results_ctrl.summary())
predicted_ctrl = results_ctrl.predict()
print(predicted_ctrl)

Lets plot that modelled pattern on our scatter plot. But with added regression lines! And for that, we'll use `ax.plot` instead of `ax.scatter`, and the good 'ol `y = mx + b`.

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm'])
ax.plot(df['post_fsiq'], predicted_phys)
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'])
ax.plot(df['post_fsiq'], predicted_ctrl)
plt.show()

That looks fine, but the lines are a little thin... let's bump things up.

In [None]:
df['pred_phys_cpl'] = predicted_phys
df['pred_ctrl_cpl'] = predicted_ctrl

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm'])
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'])
ax.plot(df['post_fsiq'],  df['pred_phys_cpl'], ) # <----- Add linewidth arguement
ax.plot(df['post_fsiq'], df['pred_ctrl_cpl'], ) # <----- Add linewidth arguement
plt.show()

## Add a legend

These two lines aren't labelled!
We don't know which one is which.

So lets add a legend to the plot.

The function is called `ax.legend()` and we don't tell it the labels directly, we actually add those as _attributes_ of the scatter plots by adding `label`s.

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.plot(df['post_fsiq'], df['pred_phys_cpl'], linewidth = 5)
ax.plot(df['post_fsiq'], df['pred_ctrl_cpl'], linewidth = 5)
# <------- Add legend
plt.show()

Except I don't want the lines labeled, because that's redundant.

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.plot(df['post_fsiq'], df['pred_phys_cpl'], linewidth = 5, 
        )
ax.plot(df['post_fsiq'], df['pred_ctrl_cpl'], linewidth = 5, 
        ) # <------- Add empty label
ax.legend()
plt.show()

Woah! The label moved!

The legend positioning is very clever, matplotlib has put it in the location with the fewest dots, which changed based on its size!

Here's the [documentation](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html) that shows you how to be explicit about where to put the legend.
The default value for `loc` is `best`, and we can happily keep that for the rest of this notebook.

If you really wanted to put it somewhere else, you can set the location explicitly.
For example, in the center on the right hand side.

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.plot(df['post_fsiq'], 
        df['pred_phys_cpl'], 
        linewidth = 5, label='')
ax.plot(df['post_fsiq'], 
        df['pred_ctrl_cpl'], 
        linewidth = 5, label='')
ax.legend() # <-------- Move the legend
plt.show()

Oof, that's terrible.

Where's a good place to put it?

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.plot(df['post_fsiq'], 
        df['pred_phys_cpl'], 
        linewidth = 5, label='')
ax.plot(df['post_fsiq'], 
        df['pred_ctrl_cpl'], 
        linewidth = 5, label='')
ax.legend(loc='best')
plt.show()

## Change the colors

The fact that our physics reasoning data (dots) and regression line are both blue, and that the control data are both orange, is a consequence of the order in which we've asked matplotlib to plot the data.

At the moment, matplotlib is coloring the first scatter plot with its first default color, and the second with the second default color.

Then when we give it a different type of plot (`plot` vs `scatter`) it starts the color cycle again.
(You can see the order of the colours in the [documentation of when they were introduced](https://matplotlib.org/3.1.1/users/dflt_style_changes.html#colors-in-default-property-cycle).)

If we move the order of the two regression lines the colours will change:

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.plot(df['post_fsiq'], df['pred_phys_cpl'], 
        linewidth = 5, label='')
ax.plot(df['post_fsiq'], df['pred_ctrl_cpl'], 
        linewidth = 5, label='') # <----- switched the order of phys & ctrl
ax.legend(loc='best')
plt.show()

So that's no good!

Let's explicitly set the colors.

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm'],
           label = 'FCI') # <--------- Add the color
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'],
           label = 'Control') # <--------- Add the color
ax.plot(df['post_fsiq'], df['pred_phys_cpl'], 
        linewidth = 5, label='')# <--------- Add the color
ax.plot(df['post_fsiq'], df['pred_ctrl_cpl'], 
        linewidth = 5, label='')# <--------- Add the color
ax.legend(loc='best')
plt.show()

Cool....the colours are fixed....but wow those aren't nice to look at 😕

# Introducing Seaborn!
[Seaborn](https://seaborn.pydata.org) is a Python data visualization library based on matplotlib.
It provides a high-level interface for drawing attractive and informative statistical graphics.

It does many beautiful things (a few of which we're going to explore) but it can sometimes be so clever that it becomes a little opaque.

If in doubt, remember that seaborn will almost always return an `axis` object for you, and you can change those settings just as you would in matplotlib.

In fact, all that work we just did with matplot lib could be done _very_ simply with Seaborn's `regplot`.

In [None]:
import seaborn as sns

## Color management with Seaborn

One of the very nice things that seaborn does is manage colors easily.

The red and blue that I used in the published figure came from the ["Set1" Brewer color map](http://colorbrewer2.org/#type=qualitative&scheme=Set1&n=5).

We can get the RGB values for the colors in this qualitative color map from seaborn's `color_palette` function, and visualize them using the "palette plot" (`palplot`) function.

In [None]:
color_list = sns.color_palette("Set1", n_colors=5)

for color in color_list:
    print(color)

#palplot

Ok, now that we have our much nicer colors, let's change the red and blue in our accuracy plot.

In [None]:
fig, ax = plt.subplots()
ax.scatter(df['post_fsiq'], df['post_phys_cpl_norm'],
           label = 'FCI', color = 'blue') # <--------- Update color
ax.scatter(df['post_fsiq'], df['post_ctrl_cpl_norm'],
           label = 'Control', color = 'red') # <---------- Update color
ax.plot(df['post_fsiq'], 
        df['pred_phys_cpl'], 
        linewidth = 5, label='') # <----------------- Update color
ax.plot(df['post_fsiq'], 
        df['pred_ctrl_cpl'], 
        linewidth = 5, label='') # <----------------- Update color

ax.legend(loc='best')
plt.show()

But, wait! There's more! Seaborn makes pretty plots.

In [None]:
fig,ax = plt.subplots()
#regplot
ax.legend(loc='best')
plt.show()

## Really jazzing up the plot with seaborn

1. I don't like those colors and I really liked the Seaborn `regplot` more than these matplotlib plots.
2. Seaborn has _so many_ cooler color options. Even if we stick with red & blue (see: [crayon_palette](https://seaborn.pydata.org/generated/seaborn.crayon_palette.html#seaborn.crayon_palette)).
3. Another other two really beautiful things that seaborn can do is set the **context** of the plot, and the figure **style**.

There are lots of great examples in the [aesthetics tutorial](https://seaborn.pydata.org/tutorial/aesthetics.html) which I really encourage you to have a play around with.

Let's check out the `poster` context, with a `darkgrid` background.

In [None]:
sns.set_context("poster", font_scale=1)
sns.set_style('darkgrid')
 # <--- now we don't have to manually set colors anymore

Now that we've run the code above, lets re-make our scatter plot:

In [None]:
fig, ax = plt.subplots()
sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.legend(loc='best')
plt.show()

Wowzers trousers. 
That's no good 😬

How about `notebook` context with a `ticks` background?

In [None]:
fig, ax = plt.subplots()
sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.legend(loc='best')
plt.show()

Fun, we've got back to the matplotlib default!

I think my favorite style setting is `notebook` context with a `font_scale` of 1.5 and the `whitegrid` style.

In [None]:
sns.set_context("notebook", font_scale=1.5)
sns.set_style('whitegrid')

And I'm picky about colors. Luckily, Seaborn's got [options](https://seaborn.pydata.org/tutorial/color_palettes.html). Two of my favorite color paletting options are `crayon_palette` and `husl_palette`. 

You can use Crayola crayon names to make a palette. I like Red Orange and Cerulean...

In [None]:
#set the palette to crayola bc it's pretty

In [None]:
fig, ax = plt.subplots()
sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.legend(loc='best')
plt.show()

When you run the `set_context` and `set_style` commands they become global settings for all plots that you subsequently make in the same notebook (or script).

Personally I load them in at the top of all my notebooks because I think they make the plots look nicer 💁‍♀️

Oh, and I like the plots [despined](https://seaborn.pydata.org/tutorial/aesthetics.html#removing-axes-spines) too, so lets do that real quick:

In [None]:
fig, ax = plt.subplots()
sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.legend(loc='best')
# <------------------ Despine the plot
plt.show()

## Axis labels, limits and tick placement

### Labels

Our plots have ugly axis labels!
Nobody should have to interpret our ugly variable names.

Let's go ahead and label them 😸

In [None]:
fig, ax = plt.subplots()
sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
# <--------------- Set x axis label
# <--------- Set y axis label
ax.legend(loc='best')
sns.despine()
plt.show()

### Tick placement

Seaborn and matplotlib have made a good guess at where to put the x and y ticks, and in this case, the ticks work.

But for the sake of illustration, let's shake things up.

In [None]:
fig, ax = plt.subplots()
sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
# <----------------------- Set x tick positions 
ax.set_xlabel('Full-scale IQ')
ax.set_ylabel('Path length (normalized)')
ax.legend(loc='best')
sns.despine()
plt.show()

Yeah, that looks a little busy. But now you know!

For the y ticks we can use the ticker `locator_params` to put them in the best place for getting a maximum of 6 ticks.

This is basically what matplotlib is already doing, but I'll show you the command just in case you want to use it in the future.

For example, if you wanted to force the plot to have 4 bins, you'd set `nbins` to 4:

In [None]:
fig, ax = plt.subplots()
sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.locator_params(nbins=4, axis='y') # <----------------------- Set y tick positions 
ax.set_xlabel('Full-scale IQ')
ax.set_ylabel('Path length (normalized)')
ax.legend(loc='best')
sns.despine()
plt.show()

And note that even when we set `nbins` to 6 (as I wrote in the original figure code), it actually only gives us 5 ticks, because matplotlib - correctly - can't find a sensible way to parse the range to give us 6 evenly spaced ticks on they y axis.

In [None]:
fig, ax = plt.subplots()
sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI')
sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control')
ax.locator_params(nbins=6, axis='y') # <----------------------- update to 6 bins 
ax.set_xlabel('Full-scale IQ (normalized)')
ax.set_ylabel('Path length (normalized)')
ax.legend(loc='best')
sns.despine()
plt.show()

One quick point to remember here: the x and y axis **limits** are not the same as the **tick locations**.
The limits are the edges of the plot.
The tick locations are where the markers sit on the axes.

## Remember our second hypothesis, though?

A second hypothesis that these relationships between the brain & IQ are different with respect to the course type (active learning vs. lecture) in which the students were enrolled.

Let's beef up our regressions...

In [None]:
# FCI
formula_phys = '''post_phys_cpl_norm ~ 
                    post_fsiq + post_fsiq * modeling + post_fsiq * female + 
                    post_fsiq * female * modeling + female * modeling + 
                    head_size + post_phys_fd + age_scan + start_level'''
mod_phys = ols(formula=formula_phys, data=df)
results_phys = mod_phys.fit()
print(results_phys.summary())

# Control
formula_ctrl = '''post_ctrl_cpl_norm ~ 
                    post_fsiq + post_fsiq * modeling + post_fsiq * female + 
                    post_fsiq * female * modeling + female * modeling + 
                    head_size + post_ctrl_fd + age_scan + start_level'''
mod_ctrl = ols(formula=formula_ctrl, data=df)
results_ctrl = mod_ctrl.fit()
print(results_ctrl.summary())

Using `plt.subplots`, we can actually add panels (or subplots) to a single figure. I'm going to separate FCI and control conditions so we can dive into class effects.

In [None]:
fig, ax = plt.subplots() # <---- one row, two columns!
g = sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI', ax=ax[0])
h = sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control', ax=ax[1])

g.set_xlabel('Full-scale IQ') 
g.set_ylabel('Path length (FCI)')
h.set_xlabel('Full-scale IQ') 
h.set_ylabel('Path length (Control)')
sns.despine()
plt.show()

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=2) # Make it  bigger
g = sns.regplot(df['post_fsiq'], df['post_phys_cpl_norm'], label = 'FCI', ax=ax[0])
h = sns.regplot(df['post_fsiq'], df['post_ctrl_cpl_norm'], label = 'Control', ax=ax[1])

g.set_xlabel('Full-scale IQ') 
g.set_ylabel('Path length (FCI)')
h.set_xlabel('Full-scale IQ') 
h.set_ylabel('Path length (Control)')
sns.despine()
plt.show()

🚀🌟🚀🌟🚀🌟🚀🌟🚀🌟🚀🌟

Fantastic!
We've made a two-panel plot!
Just like that.

Seaborn has really neat ways to make multi-paneled plots, though. So we're going to switch from `regplot` to `lmplot`, which further simplifies our code.

However, seaborn handles data a little differently. What we have right now is _wide_ data, with lots of columns and only one row per participant. Some Seaborn functions want _long_ data, which can get a little tricky...

In [None]:
df.head()

We can make our data long by just manipulating this dataframe a bit. 

Manipulating dataframes is a little tricky, but a valuable skill that you'll end up using much more frequently than you'd like to admit...

In [None]:
df_long = df.melt(id_vars=['modeling', 'female', 'post_fsiq'], 
                  value_vars=['post_phys_cpl_norm', 'post_ctrl_cpl_norm'],
                  var_name='condition')
df_long.replace(to_replace='post_phys_cpl_norm', value='FCI', inplace=True)
df_long.replace(to_replace='post_ctrl_cpl_norm', value='Control', inplace=True)

df_long.head()

And now we're replacing `1` and `0` in our `modeling` and `female` variables, because we're done running stats, now we're just on to visualizing. 

In [None]:
df_long.replace({'modeling':{0: 'Lecture', 1:'Active'}, 
                 'female': {0: 'Male', 1: 'Female'}}, inplace=True)
df_long.rename({'modeling': 'Class type',
                'female': 'Sex'}, axis=1, inplace=True)
df_long.head()

Using `lmplot` instead of `regplot`, we'll plot our two conditions (FCI, Control) next to each other in one figure in only one line of code! 😱

In [None]:
sns.lmplot('post_fsiq', 'value', df_long, col='condition') # <---- plot IQ vs CPL, separated by condition
plt.plot()

Now let's see what the relationships between path length and IQ look like during fci and control conditions in the two separate classes using the `hue` parameter.

In [None]:
sns.lmplot('post_fsiq', 'value', df_long, col='condition', 
           ) # <--- different colors for class type? male and female students' data in separate rows?
plt.plot()

That's all well and good, but the spacing and the axes are a little off. That's because `lmplot` and other Seaborn grid plots wrap the `fig,ax = plt.subplots()` line we had been including, and they have their own defaults. Let's adjust some of these defaults...

In [None]:
sns.lmplot('post_fsiq', 'value', df_long, col='condition', hue='Class type', row='Sex',
           ) # <---- now we're not constraining the axes
plt.plot()

And let's fix the labels again. It's a little trickier, now, because we're not directly manipulating the axes, we're manipulating an object that contains the axes.

In [None]:
j = sns.lmplot('post_fsiq', 'value', df_long, col='condition', hue='Class type', row='Sex',
               sharex=False, sharey=False)
j.set_titles() # <---- using the row and column names, redo the graphs' titles
j.set_axis_labels() # <---- and set the x- and y-axis labels

And, dear friends, what good would making all these pretty plots be if we couldn't save them?

## Brain plot?

We've talked a lot about brains, but... all I've made are scatter plots?

You can spend as much time making brain plots as I just have making scatter plots. I'm just going to show you one two that make brain slices and surfaces, respectively, but there's a whole gallery of different options available from [nilearn](http://nilearn.github.io/plotting/index.html).

In [None]:
from nilearn import plotting
from nilearn import datasets

Nilearn has a bunch of useful plotting options, but it's also got a lot of good, good data you can import directly into a jupyter notebook or python script. We're going to use the `fetch_atlas_yeo_2011` function to grab an atlas of brain regions (of the sort I used to compute characteristic path length, which I used above).

In [None]:
yeo = datasets.fetch_atlas_yeo_2011(data_dir=None, url=None, resume=True, verbose=1)
yeo_17 = yeo.thick_17

Now we've got the Yeo atlas (of 17 brain networks) for use as an object in this notebook! 🎉 But what is it?

Just a path! Nilearn is amazing because wherever you need a nifti image as input, you can just use a file path and it'll read the image in for you. _**Amazing!**_

Now let's change the plot a little bit. `plot_roi` defaults to an ortho viewer, so you get coronal, sagittal, and axial plots.

You can chose one (or more) with the `display_mode` parameter.

In [None]:
plotting.plot_roi(yeo_17) # <--- change display mode

And we can change the colors, but the rules here are a little stricter and there's a list of acceptable maps and only certain types of custom color maps (not _exactly_ the same as palettes).

In [None]:
plotting.plot_roi(yeo_17, display_mode='z') # <--- add colormap

But I really like visualizing things on brain surfaces. Nilearn's got a function for that!

First, we need a brain surface model. I'm going to grab Freesurfer's average surface and an example statistical map. 

In [None]:
fsaverage = datasets.fetch_surf_fsaverage()
motor_images = datasets.fetch_neurovault_motor_task()
stat_img = motor_images.images[0]

Now we'll resample the Yeo atlas to this surface.

In [None]:
from nilearn import surface

texture = surface.vol_to_surf(stat_img, fsaverage.pial_right)

In [None]:
k = plotting.plot_surf_stat_map(fsaverage.infl_right, texture, hemi='right',
                            title='Surface right hemisphere', colorbar=True,
                            bg_map=fsaverage.sulc_right, threshold=1.)
k.show()

Let's compare to a plot of slices?

In [None]:
view = plotting.view_surf(fsaverage.infl_right, texture, threshold='90%',
                          bg_map=fsaverage.sulc_right)
view

The end. Well done 💖