<img src="./img/HWNI_logo.svg"/>

# Organizing Data with Pandas

In neuroscience, we often work with complicated datasets. For example, a full "data point" from a neuroscience experiment might include several numbers (input stimulus and neural response) along with a host of metadata -- subject ID, brain region, genotype, experiment date, and so on. Trying to manage this with a collection of arrays is an exercise in frustration, and dictionaries hardly improve the situtation.

Instead, we can borrow a tool from other data-heavy sciences: the _data frame_. A data frame is like a matrix, in that it contains information in rows and columns, but it is even more like a table, in that the rows and columns have names. We'll dive deeper into this below.

The premier package implementing data frames in Python is
[pandas](http://pandas.pydata.org/pandas-docs/stable/overview.html).
If you've worked with tables in another context, e.g. in the R language,
or if you're very comfortable with Python,
you can check out the
[10 minutes to pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html)
mini-tutorial on the pandas website, which covers some of the same material as this tutorial,
but at a brisker pace.

The following tutorial is based on
[Brandon Rhodes' PyCon tutorial](https://www.youtube.com/watch?v=5JnMutdy6Fw).
That tutorial takes between four and six hours,
but it does an excellent job in building a foundation for working with data frames,
so it's very worth your while if you have the time.

Before we can get started, we need to import all of the libraries we'll need in this tutorial.

In [3]:
# makes our plots show up inside Jupyter
%matplotlib inline

# numpy - linear algebra and matrices for python
import numpy as np

# pandas - "DataFrames" to organize our data
import pandas as pd

# scipy - data science toolkit for python
from scipy import stats, integrate

# matplotlib - workhorse plotting library
import matplotlib.pyplot as plt

# seaborn - easy plotting for statistical visualizations
#   based off of matplotlib
import seaborn as sns

# choose colors that work for most color-blind folks
sns.set_palette("colorblind")

sns.set(color_codes=True)

import util.plots as plot 

# this makes our tables easier to read
from IPython.core.display import HTML
css = open('./css/style-table.css').read()
#this line has to be the last in its cell
HTML('<style>{}</style>'.format(css)) 