# Pandas, data frames and making nice figures with Seaborn (and Matplotlib)
[the official homepage](https://pandas.pydata.org)

## Basic data structures - start with Series then build up to DataFrames

[Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/dsintro.html)

* A **Series** is a 1D array that can hold any type of data (numeric types, non-numeric, Python objects and so forth).
    * Unlike a 1D numpy array, each entry is **labeled** with an index that is used to keep track of what each entry is, and can be used to lookup the value corresponding to each index during analysis.
    * These labels are fixed - they will always index the same value unless you explicitly break that link.
    * The list of labels that forms the index can either be declared upon series creation or, by default, it will range from 0 to len(data)-1.
        * If you're going to use Pandas to organize your data, specifying usable and informative labels is a good idea!
        
<div class="alert alert-warning">
Pandas will allow you to specify non-unique labels. This can be fine for operations that don't rely on idnexing by label. However, operations that do rely on unique labels for indexing may through an unexpected error so its good general practice to use unique labels!
</div>

## import libs

In [1]:
import numpy as np
import matplotlib as plt

# import a generic pandas object and also a few specific functions
import pandas as pd 
from pandas import DataFrame, read_csv

# new - get current path for file i/o later on in tutorial
import os
cwd = os.getcwd()

# also define the default font we'll use for figures. 
fig_font = {'fontname':'Arial', 'size':'20'}

## Create a series from an numpy ndarray

In [6]:
# use pd.Series
N = 16
data = np.random.rayleigh(scale=1, size=N)

# make a list of subject names for use as an index
var_name = 'S'
index=[]
for n in np.arange(N):
    index.append(var_name+str(n))

print(index)

# then make our series by passing in data and our index labels
np_to_s = pd.Series(data, index=index)
np_to_s

['S0', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10', 'S11', 'S12', 'S13', 'S14', 'S15']


S0     1.015489
S1     0.892702
S2     0.140444
S3     1.315811
S4     0.562052
S5     0.403826
S6     1.748016
S7     0.739514
S8     3.459767
S9     1.233524
S10    1.654765
S11    0.956057
S12    1.970684
S13    1.954968
S14    1.139489
S15    0.758600
dtype: float64

## DataFrames

* 
* 
* 


## Make a data set that we can play with, will import some real data later on
* just make up some stuff here...lets say responses in 5 different neurons to different stimuli 

In [None]:
# seed random number generator
np.random.RandomState(0)

# dependent variables - 5 neurons...
neurons = ['Nrn1','Nrn2','Nrn3','Nrn4','Nrn5','Nrn6','Nrn7','Nrn8','Nrn9','Nrn10']  

# independent variables...responses in Hz to two stimulus conditions
resp1_hz = [14, 27, 62, 88, 45, 56, 75, 63, 33, 46]

# set up our response to stimulus 2...use random.randint for fun
min_resp = 1  # inclusive
max_resp = 90 # exclusive
resp2_hz = np.random.randint(1, 90, len(resp1_hz))

## New - use 'zip' function to wrap up the data from each list into one list
* does like it sounds like it does - takes three iterators and groups them together into a single iterator with the 1st element in each iterator together, then the second, etc. 
* length of resulting iterator limited by the length of the shortest input iterator

[reference](https://www.w3schools.com/python/ref_func_zip.asp)

In [None]:
neuron_data = list(zip(neurons, resp1_hz, resp2_hz))
print(neuron_data)

## Make a dataframe object to hold the contents of the data set

In [None]:
df = pd.DataFrame(data = neuron_data, columns = ['neuron', 'resp1', 'resp2'])

# take a look at the nice output here...
df

## Saving data in a csv file

In [None]:
 # lets save our header as well so that it doesn't think our first row is the header when we read the file back in
df.to_csv('spike_rates.csv',index=False,header=True)

In [None]:
# use our current working directory to build a path to the file
print(cwd)
file_name = cwd + '/spike_rates.csv'
print(file_name)

df = pd.read_csv(file_name)
df

## Get a high-level summary of the data

In [None]:
df.describe()

## Can also apply a set of more targeted analyses using the df object

* [Pandas doc for all functions](https://pandas.pydata.org/pandas-docs/stable/api.html#api-dataframe-stats)

In [None]:
df.mean(axis=0)