## Lalama IV: It Gets Easier

Alright.

I showed you all that stuff I just showed because you probably will have to use it one day if you do any kind of coding.

But for most analyses we do, somebody already wrote code that does that analysis.

We can just use the great code that they wrote and that thousands of other people have already tested and improved.

Now we're standing on the shoulder of giants! Or something like that.

## numpy: like Matlab in Python

**numpy** is a library that provides matrix-like structures that many scientists use.

It extends **slice notation** so that we can refer to rows and columns of a matrix.

It also implements many functions that Matlab users will be familiar with.

```Python
import numpy as np
zero_arr = np.zeros((10,10)) # creates a 10 by 10 array / matrix of zeros
print(zero_arr[5,5])
>>> 0
```

We can rewrite our mean and standard deviation dictionary functions using numpy functions.

** These functions are much faster, especially when dealing with large datasets, because numpy is a Python wrapper around very efficient scientific computing libraries written in speedy languages like Fortran and C **

In [None]:
import numpy as np

def compute_mouse_dict_mean_and_std(mouse_dict):
    """
    takes mouse_dict returned by make_mouse_dict
    and computes mean for each strain.
    returns mean_mouse_dict.
    """
    mean_mouse_dict = {}
    stdev_mouse_dict = {}
    
    for key,val in mouse_dict.items():
        mean_mouse_dict[key] = np.mean(val)
        stddev_mouse_dict[key] = np.std(val)
    return mean_mouse_dict,stdev_mouse_dict

Let's get our `sort_by_strain` function back in memory

## pandas: it gets even easier

Let's use the **pandas** library to do the heavy lifting of importing csv files.

**pandas** allows us to work with objects called **dataframes** that you may be familiar with if you have ever used the statistical programming language R.

In [None]:
import pandas as pd

filename = 'Willott1_table-1.csv'
df = pd.read_csv(filename,skiprows=6,header=0) # df is short for 'dataframe'

In [None]:
df.head() # head method shows the first few rows

In [None]:
%matplotlib inline
df.boxplot(column='ASR_100',by='strain',rot=45,figsize=(12,10))

In [None]:
# borrowed from http://deparkes.co.uk/2016/11/04/sort-pandas-boxplot/

# use dict comprehension to create new dataframe from the iterable groupby object
# each group name becomes a column in the new dataframe
df2 = pd.DataFrame({col:vals['ASR_100'] for col, vals in df.groupby('strain')})
# find and sort the median values in this new dataframe
meds = df2.median().sort_values()
# use the columns in the dataframe, ordered sorted by median value
# return axes so changes can be made outside the function
ax = df2[meds.index].boxplot(rot=45, return_type="axes")