# Tutorial 2a: Tidy data and split-apply-combine

In [1]:
import numpy as np
import pandas as pd

import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()

Note: * The code in this tutorial comes from the tutorial by Justin Bois, which can be found [here](http://bebi103.caltech.edu.s3-website-us-east-1.amazonaws.com/2017/tutorials/t2a_tidy_data.html)
## Introduction
The data we are investigating come from David Prober's lab. A description of their work on the genetic regulation of sleep can be found on the research page of the lab website. There is a movie of the moving/sleeping larvae similar to the one used to produce the data set we are using in this tutorial. The work based on this data set was published in (Gandhi et al., *Neuron*, 85, 1193–1199, 2015).

## The genotype data
First, we will load the genotype file.

In [2]:
with open('data/130315_1A_genotypes.txt', 'r') as f:
    for _ in range(30):
        print(next(f), end='')

# Genotype data from the Gandhi, et al. experiment ending March 13, 2013
#
# The experiment was performed in a 96 well plate with zebrafish
# embryos.  Gene sequencing was later used to identify the genotype
# of the fish in each well.  Not all fish could be genotyped.
#
# The mutants being studied have deletions in the gene coding for
# arylalkylamine N-acetyltransferase (aanat), which is a key enzyme
# in the rhythmic production of melatonin.  Melatonin is a hormone
# responsible for regulation of circadian rhythms.  It is often taken
# as a drug to treat sleep disorders.  The goal of this study is to
# investigate the effects of aanat deletion on sleep pattern in
# 5+ day old zebrafish larvae.
#
# Each column lists the wells corresponding to the genotype of
# each fish.  If a number is missing (between 1 and 96), the 
# genotype of that fish is not known.
# 
# These data were kindly provided by Avni Gandhi and Audrey Chen
# from David Prober's lab.  They were part of the paper Gandh