<html>
<table width="100%" cellspacing="2" cellpadding="2" border="1">
<tbody>
<tr>
<td valign="center" align="center" width="45%"><img src="../media/Univ-Utah.jpeg"><br>
</td>
    <td valign="center" align="center" width="75%">
<h1 align="center"><font size="+1">University of Utah<br>Population Health Sciences<br>Data Science Workshop</font></h1></td>
<td valign="center" align="center" width="45%"><img
src="../media/U_Health_stacked_png_red.png" alt="Utah Health
Logo" width="128" height="134"><br>
</td>
</tr>
</tbody>
</table>
<br>
</html>


# Python Packages
In the last notebook we saw how to write our own functions to manipulate data. That's helpful for when we have common manipulations we need to perform multiple times. But we don't want to write functions for every single data analysis task. Instead, we can leverage code written by other people in the Python communities.

**Packages** (also called **libraries**) are collections of functions and other Python code, *packaged* together and distributed for the open-source community to use. In this notebook, we'll look at a few of the most important packages in Python for data science.
- `numpy`
- `random`
- `scipy`?
- `pandas`
- `matplotlib` and `seaborn`

## Importing packages
To use a package in Python, we first have to *import* it. The syntax for this is:

```python
import package_name
```
Let's start looking at some packages.

## `numpy`
`numpy` is maybe the foundational package for data science. `numpy` implements methods for arrays and matrices and allows us to perform mathematical operations on them. It also implements important functions for calculating summary statistics from our data

First, we import the library. Then we can access functions in the library using `numpy.function_name`.

In [1]:
import numpy

`numpy` has a `mean` function, so we don't need to write our own function to calculate averages. Here's how to calculate the mean of the list in the previous notebook using numpy.

In [3]:
a = [4, 0, 2, 2, 0, 10, 7, 8, 5, 0]
numpy.mean(a)

3.8

Sometimes we use a library so much we don't want to type its whole name every time. So for some libraries (including numpy), it's standard practice to assign an alias by saying:

```python
import package_name as alias
```

Then we use `alias.function_name` in our code. 

For `numpy`, we typically refer to it as `np`:

In [5]:
import numpy as np
np.mean(a)

3.8

In [7]:
np.std(a)

3.4292856398964493

If you only need to use one particular function in a library, instead of importing the entire package you can just import the function you want (again with or without an alias):

```python
from package_name import function_name
from package_name import function_name as alias
```

#### TODO
Import the `std` function from numpy and assign it an alias of `sd`.

In [8]:
from numpy import std as sd

In [10]:
sd(a)

3.4292856398964493

In [11]:
sd is np.std

True

### Arrays and matrices
One of the most important contributions of `numpy` is its implementation of `arrays`. Arrays are similar to lists, but with a lot of important extensions:

- They have mathematical operations defined on them that allows for easy and efficient data manipulation
- The can have multiple dimensions, allowing us to build matrices

In [36]:
a_arr = np.array(a)
a_arr

array([ 4,  0,  2,  2,  0, 10,  7,  8,  5,  0])

In [37]:
a_arr.sum()

38

In [15]:
sum(a)

38

In [27]:
a_arr.mean()

3.8

In [28]:
a_arr.std()

3.4292856398964493

In [29]:
a_arr ** 2

array([ 16,   0,   4,   4,   0, 100,  49,  64,  25,   0])

In [30]:
a_arr / 3

array([1.33333333, 0.        , 0.66666667, 0.66666667, 0.        ,
       3.33333333, 2.33333333, 2.66666667, 1.66666667, 0.        ])

In [46]:
a_arr.shape

(10,)

### Multiple arrays
Performing operations on multiple arrays is easier with numpy than with lists. For example, we can add or multiply the elements together:

In [43]:
b_arr = np.array([4, 1, 4, 3, 9, 2, 0, 0, 5, 4])

In [47]:
a_arr + b_arr

array([ 8,  1,  6,  5,  9, 12,  7,  8, 10,  4])

In [48]:
a_arr * b_arr

array([16,  0,  8,  6,  0, 20,  0,  0, 25,  0])

In [49]:
a_arr.dot(b_arr)

75

However, this will throw an error if the shapes of the arrays don't line up:

In [50]:
a_arr + b_arr[:-1]

ValueError: operands could not be broadcast together with shapes (10,) (9,) 

### Multi-dimensional arrays

In [51]:
matrix = np.array([a, b])

In [52]:
matrix

array([[ 4,  0,  2,  2,  0, 10,  7,  8,  5,  0],
       [ 4,  1,  4,  3,  9,  2,  0,  0,  5,  4]])

In [53]:
matrix.transpose()

array([[ 4,  4],
       [ 0,  1],
       [ 2,  4],
       [ 2,  3],
       [ 0,  9],
       [10,  2],
       [ 7,  0],
       [ 8,  0],
       [ 5,  5],
       [ 0,  4]])

In [56]:
# Get the first row
matrix[0,:]

array([ 4,  0,  2,  2,  0, 10,  7,  8,  5,  0])

In [61]:
# Get the second and third columns
matrix[:,1:3]

array([[0, 2],
       [1, 4]])

In [63]:
# Get the element in the second column off the first row
matrix[0, 1]

0

In [66]:
I = np.eye(3, 3)
I

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### Pandas

In [4]:
import pandas as pd

## Matplotlib and seaborn

In [7]:
import matplotlib.pyplot as plt

import seaborn as sns
sns.set()