# Homework 2: Data storage formats & manipulation (`pandas` and `xarray`)

In [None]:
# Creating the data needed for this assignment (only run once)
%run "HW 2 data generator.py"

In [None]:
import h5py
import os
import numpy as np
import xarray as xr
import pandas as pd
import json
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

## Problem 2.1: Loading `hdf5` data and working with data from `xarray` 

In your `data` directory is `animal1_session1.h5`, fake experimental data from an animal running around a behavioral rig for two minutes as it is exposed to stimuli. It includes:

* Behavioral data
    * x,y body position in rig as a function of time
    * x,y whisker position relative to body as a function of time
    * stimulus onset times
    * stimulus offset times
* Ephys data
    * spike times from 2 cells
    
In this problem we'll 
1. use `h5py` to load the data, 
2. construct `xarray` objects,
3. do some math with these, 
4. subsmple these xarrays,
5. and plot our results using `matplotlib`. 
   

In [None]:
### 2.1.1 load this data file as `h5data` using `h5py` 





In [None]:
### 2.1.2 Using the data in the h5py file construct two `DataArray` objects.
# one for the body position x-coordinate as `x` using `t` as a coordinate 
# one for the body position y-coordinate as `y` using `t` as a coordinate 
# make sure to include the attributes "long_name" and "units" in the `DataArray`
# (these can be found in the h5py file attributes)
#
# EXTRA CREDIT: read about xarray's `DataSet` object and include BOTH of these
# data sets into one object called `pos`








In [None]:
### 2.1.3 Using the data in the h5py file construct two more `DataArray` objects.
# one for the relative whiscer position x-coordinate as `rel_wsk_x` using `t` as a coordinate 
# one for the relative whiscer position y-coordinate as `rel_wsk_y` using `t` as a coordinate 
# make sure to include the attributes "long_name" and "units" in the `DataArray`
# (these can be found in the h5py file attributes)
#
# EXTRA CREDIT: read about xarray's `DataSet` object and include BOTH of these
# data sets into one object called `wsk_pos`
#
# combine these DataArrays with those from the previous problem to create two new
# DataArrays for aboslute whisker position in the rig: 'wsk_x' & `wsk_y'









In [None]:
### 2.1.4 Now the stimulus information in the h5data to subsample your data:
# Figure out how to grab two time periods from your data: (1) during the 1st
# stimulus presentation (2) for the entire inter-stimulus period between the 
# 1st stimulus onset and the second
# 
# HINT: the stimulus onset and offset times are not explicitly in your "t" coordinate.
# Use the methods we discussed in class to either grab the nearest positions to these 
# times or interpolate from the data that we do have. 









In [None]:
### 2.1.5 Let's plot our data! Make a plot of y position vs x position 
# from the beginning of stimulus 1 to the beginning of stimulus 2.
# Your plot should include 
#
# 1. Axes labels with units
# 2. The animal's body position in the rig
# 3. The animal's whisker position in the rig
# 4. Some kind of annotation (legend, label, etc.) to indicate which curve 
#    is which.  
# 5. Some sort of highllight or emphasis on the stimulus presentation period
#    between stimulus onset and offset.
#
# To the greatest extent possible, pay attention to and modify the style of 
# the graphical elements in the plot assuming the viewer is most interested in
# comparing how the whisker motion changes from during the stimulus to afterwards









## Problem 2.2: Saving data to an HDF5 file 

This one is a little more freeform. I want you to think about data from a potetnial experimental session and figure out how you might organize that into an `hdf5` file. 

### 2.2.1 

**Specify what the groups, subgroups, and arrays will be here using markdown**




In [None]:
### 2.2.2 Use the numpy array consruction methods to  create some fake 
# data variables (as nupy arrays) similar to what you would like to store. 

# this doesn't need to be realistic (it can be all oness) but as much as 
# possible the _shapes_ of the arrays should be as close to reality as you 
# can make it. 










In [None]:
### 2.2.3 Build your hdf5 file! Make sure to annotate each section and include 
# metadata (including "units" and "long_name" as a minimum)

h5data = h5py.File("your_name_here", ????)

h5data.create_group(?????)
h5data.create_dataset(??????)
#...

# Don't forget to close your data when you are dome!
h5data.close()

## Problem 2.3: Building a `DataFrame` from a series of json files

In this problem we'll be loading behavioral data across animals from a set of json files and manipulating this data in a pandas `DataFrame`. Explicetly, you will 

1. Load behavioral data from a set of `json` files,
2. Create a new pandas `DataFrame` using that data, 
3. Process the datas in pandas to generate a new variable,
4. select groups from your data
5. plot these groups

The experiment consisted of 6 animals (3 male, 3 female) being run through a behavioral assay which resulted in two behavioral scores out of 12: strength and dexterity. Animals were run in 4 behavioral sessions each a week apart. 

Each session for each animal is stored as a `json` file in te directory `behavioral_exp`. The file names are `ANIMAL_NAME-pAGE.json`. Take a look in the directory to make sure this makes sense. 

In [None]:
### 2.3.1 Using package `json` use a for loop and any logic you need to
# load each of these session data files as a dictionary and append them 
# to a list called `behavioral_data`. 
#
# HINT os.list_dir(PATH) will return the list of all files in PATH
# Only catch is that it also returns the current directory symbol `.` and 
# the super-directory `..`. You will have to write an if statment to ignore these 
# like 
#
# if not in ['.', '..']:
#      blah....


In [None]:
### 2.3.3 Construct a pandas data frame called `behavior_df` from 
# `behavioral_data`. Compare its entries to the original json files
# to make sure it looks correct. Play around with sampling various 
# parts of the data. 









In [None]:
### 2.3.4 add a new column to your `behavior_df` called `meta` 
# which is just the sum of strength and dexterity.








In [None]:
### 2.3.5 Construct a plot with 2 subaxes placed one on top 
# of the other. On the top axis plot strength v. age with two lines 
# one for each sex. On the bottom axis plot dexterity v. age with 
# two lines - one for each sex.
# 
# 1. Axes labels with units
# 2. Some kind of annotation (legend, label, etc.) to indicate which curve 
#    is which.  
#
# To the greatest extent possible, pay attention to and modify the style of 
# the graphical elements in the plot assuming the viewer is most interested in
# comparing how the whisker motion changes from during the stimulus to afterwards





