<img src="./img/HWNI_logo.svg"/>

# Lab 03a - Unpaired t-Tests

In [1]:
# makes our plots show up inside Jupyter
%matplotlib inline

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import scipy.stats

# choose colors that work for most color-blind folks
sns.set_palette("colorblind")
sns.set(color_codes=True)

import util.lab03utils as utils 

# this makes our tables easier to read
utils.formatDataframes()

In this lab, we'll be looking at using unpaired t-tests on data. On the programming side, we'll review some of our plotting skills and learn more about how to load datasets into pandas and how we organize those datasets.

## Dataset Introduction

Octopamine has been implicated in modulating feeding behaviors in both vertebrates and invertebrates. Pargyline has been shown to increase the levels of octopamine in the nervous system. 

We'll look at data from two experiments on octopamine and feeding behavior.

In the first, the effect of pargyline on sucrose consumption was tested in blowflies. Two groups of blowfies were used in this study: one group was injected with parglyine (n=295 flies) while the control group was injected with saline (n = 300 flies). The amount of sucrose consumed was then measured. [adapted from Samuels & Witmer, pg 220. Originally: Long & Murdock, PNAS 1983]

**Q1**. Why is an unpaired test appropriate for this data set?

## Loading Data

The cell below loads the data into pandas. The functions for loading external data all begin with "read", so you can check out your options by typing in `pd.read` and then hitting `Tab`. Other options include Excel files, the clipboard, and `.json` files. You can look at the documentation for each function in the usual way, with the `?` symbol.

The most basic format for storing data is the "comma-separated values", or `.csv`, format. There's a brief discussion of this format in the course tutorial on pandas. You can also view comma-separated values files in Excel and even save some `.xls` files as `.csv` files.

In [2]:
flyData = pd.read_csv('data/3a.csv',index_col=None)

flyData.sample(10)

Unnamed: 0,SucrConsump,Injection,Exp_Idx
823,28.5,PargYomb,2
120,15.5,Saline,1
775,18.9,PargYomb,2
108,18.0,Saline,1
629,18.6,Saline,2
567,50.3,Parg,1
376,35.0,Parg,1
310,28.1,Parg,1
429,56.5,Parg,1
456,56.5,Parg,1


Following the principles of ["tidy data"](http://www.jeannicholashould.com/tidy-data-in-python.html), we've stored each of our observations in a row. An observation includes the raw data (in this case, the amount of sugar consumed), the kind of injection the fly received, and, since we'll be looking at two experiments in this lab, an identifier for the experiment during which this datapoint was measured.

## Visualizing the Data

Begin by plotting the histograms and computing means and standard deviations for both groups of flies in experiment #1. Remember: you'll need to subset your data by experiment index and by injection.

Think about your histogram: should the bins be the same or different for the two groups? Is a rugplot helpful? Be ready to discuss your choices in class.

** Q2 ** Based off of this visualization and these statistics, do you expect the difference of the means to be statistically significant (at the traditional/obligatory $\alpha = 0.05$? Why or why not?

Now, visualize the data as a barplot with 95% confidence intervals error bars.

** Q3 ** Based off of the barplot, what do you expect the result of your t-test to be?

Now, plot the data as a boxplot.

** Q4 ** The boxplot can be used to see whether the assumptions for a t-test are met. What pieces of information would you use and what would they tell you?

## Running the Test

Now, we can use the `scipy` package to run a t-test to determine if the difference between the groups is statistically significant. The function `scipy.stats.ttest_ind` 
([docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html))
will run a t-test for you.

**Q5** What are your results?

**Q6** One of the keyword arguments for `ttest_ind` is `equal_var`. This lets us switch from a version of the t-test that assumes both groups have the same variance to one that does not assume this. Which version is more appropriate in this case?

In a follow-up experiment to further confirm octopamine positively modulates feeding behavior, an additional experiment was done with yohimbine (an antagonist of octopamine receptors in insects). One group was injected with parglyine and yohimbine (n = 130) while an additional control group was injected with saline (n = 100). The amount of sucrose consumed was then measured.

Repeat all of the above exercises for this experiment. There's no need to write all of your code again in new cells. If you wrote your code with good style, it should all be reusable -- just change which experiment number is used to subset the data all the way up at the top. You also don't need to add new cells for your written answers: simply include responses for both experiments in the cells above.