[Pre-MAP Course Website](http://depts.washington.edu/premap/seminar/cohort-16-2020-seminar/) | [Pre-MAP GitHub](https://github.com/UWPreMAP/PreMAP2020) | [Google](https://www.google.com)

# Python packages


We do specialized tasks in Python with _packages_. A package is a collection of Python functions that someone wrote and bundled together for you to use. Some of the Python packages that we'll learn to use include: 

| Package | Uses                     |
|---------|--------------------------|
| `numpy` | Math with arrays (more on this below) | 
| `scipy` | A math toolkit built for use by scientists | 
| `matplotlib` | Visualization (plotting!) | 
| `astropy` | Astronomy-specific functions of all kinds | 

## Numpy

Numpy is the most important package that we're going to teach you about, because it allows you to do calculations very quickly with Python. Below, we'll discover why it's useful.

*** 

Let's say you want to take the $\sin$ or $\cos$ of an angle. There are numpy function that do this for you. 

To gain access to numpy's functions, you always need to do this command first: 
```python
import numpy as np
```
Run the line above in the cell below: 

In [1]:
import numpy as np

Now there's a _package_ stored in the variable called `np` that we can access anywhere in this notebook. There are _functions_ for $\sin$ and $\cos$ that live within numpy. The way to access a function within a package is by calling the function name with a period after it, then the name of the function you want. So for $\sin$, you can do: 
```python
np.sin(0)
```
The `np.` part says "give me this function from numpy". The `sin()` part says "the function that I want to use is $\sin$", and the `0` is the angle that we want to take the $\sin$ of, in units of radians. Run that line in the cell below, and experiment with different angles. Try `np.cos` too.

In [78]:
np.sin(1)
np.cos(0)

1.0

Numpy also has some built-in numbers that you might use. For example, $\pi$ is stored (to high precision!), in `np.pi`. Print out numpy's $\pi$ in the cell below: 

In [5]:
np.pi

3.141592653589793

Now let's say you had a list of angles, like `angles`$= [0, \pi/2, \pi, 3\pi/2, 2\pi]$. You could call `np.sin(angles[0])` to get the $\sin$ of the first angle, then `np.sin(angles[1])` on the second angle, etc. But that would be a really slow way to do it! 

### Arrays

The quick way is to create a _numpy array_. A numpy array is a vector or matrix of numbers, similar to the built in Python lists we saw in the last lesson. ```numpy``` can act on arrays more efficiently than Python can with ordinary lists.

Let's make a numpy array filled with the angles above:
```python
# First, here's the list that we want to have an array of: 
angle_list = [0, 1/2 * np.pi, np.pi, 3/2 * np.pi, 2 * np.pi]

# Here's how we make a numpy array out of the list
angle_array = np.array(angle_list)
```
Write out those lines in the cell below. 

In [9]:
angle_list = [0, 1/2 * np.pi, np.pi, 3/2 * np.pi, 2 * np.pi]
angle_array = np.array(angle_list)

np.sin(angle_array)

array([ 0.0000000e+00,  1.0000000e+00,  1.2246468e-16, -1.0000000e+00,
       -2.4492936e-16])

Let's break down the command `np.array(angle_list)`. The `np.` says we're going to use a function from numpy, the `array()` says we're going to make an array out of the thing in the parentheses, and the `angle_list` is the _input_ or the _argument_ of the function. 

Now you can do things with the numpy array that you couldn't do with a Python list. Here are some of them, which you should experiment with in the cell below: 
```python
# Sum of all elements in the array:
angle_array.sum()

# Mean of all elements in the array:
angle_array.mean()

# Maximum of the elements in the array:
angle_array.max()

# Minimum of the elements in the array: 
angle_array.min()

# Standard deviation of the elements in the array: 
angle_array.std()
```
Try running each of the above commands one-by-one in the cell below to see what they output.

In [13]:
print('Sum:',angle_array.sum())

print('Mean:',angle_array.mean())

Sum: 15.707963267948966
Mean: 3.141592653589793


### Syntax Tips and Help ###
So what happens when you forget the name of a `numpy` function, or the how to use a particular function? `Jupyter` has some cool built-in features that you should take advantage of!

For example, say you forgot the name of the `sum` function -- you can type `np.` and then press `tab`, and you'll see that the notebook lists what functions are available in `numpy`! Spoiler alert, it's a loooong list since `numpy` has many functionalities. 

The point is you can use the `tab` tool (recall tab completion trick) to help you remember or recognize the function you're looking for. 

Just like how in `bash` environments you can read details on how to use a function, you can do that in `Python` as well. For example, say you forgot how to use the `np.sin` function -- you can type `np.sin?` + `Cntrl`+`return` and the notebook will return an inline window that tells you almost everything you need to know about the function. 

Try playing around with `np.` + `tab` and `np.cos?`, or any function with a `?` at the end, below.

In [14]:
np.sin?

### Example 1: Calculations with arrays

What is the $\sin$ and $\cos$ of each angle? Use the numpy array `angle_array` as the argument to the `np.sin` and `np.cos` functions in the cell below:

In [18]:
print(np.sin(angle_array))

print(np.cos(angle_array))

[ 0.0000000e+00  1.0000000e+00  1.2246468e-16 -1.0000000e+00
 -2.4492936e-16]
[ 1.0000000e+00  6.1232340e-17 -1.0000000e+00 -1.8369702e-16
  1.0000000e+00]


Now you might be saying - wait a minute, $\sin(3\pi/2) = 0$, not $\approx$`1e-16`, what's that about? The short answer is - computers often get very very close to approximating the numbers that we actually want, but not all of the way there. You can get better precision if you tell the computer to use more memory. 

***

## Array arithmetic

There are lots of situations where you'll want to create a certain kind of array, and numpy has functions to help. 

You can make an array of consecutive integers from zero to nine with the function `np.arange`:
```python
consecutive_integers = np.arange(10)
```
This function returns different things depending on the amount of _arguments_ that you give it. If there's one number in the parentheses, it tells `np.arange` what number to stop at (exclusive). If there are three numbers, they signify `np.arange(start, stop, step)`. For example, `np.arange(1, 9, 2)` would start at 1, stop before 9, with a step size of 2, so it would return `[1, 3, 5, 7]`.

In the cell below, make an array with 10,000 sequential integers, starting with the zero. Save the array into a variable called `consecutive_integers`, and print it out:

In [27]:
consecutive_integers = np.arange(100)
np.arange(0,50,2)
np.arange(1,9,2)

# Creating an array of integers from 0 to 10000 in steps of 1
new_consecutive_integers = np.arange(10000)
# The same array as above using all three arguments
new_consecutive_integers = np.arange(0,10000,1)

new_consecutive_integers

array([   0,    1,    2, ..., 9997, 9998, 9999])

You'll see that numpy is polite. It knows you probably don't want to see all ten thousand integers, so it prints just the beginning and end of the array. 

### Exercise 2: Indexing/slicing numpy arrays
The same indexing and slicing rules that we learned for lists work on arrays. Keep in mind that arrays are indexed starting with 0, just like ```Python``` lists and when you give a range of indexes, it won't include the last one listed. In the cell below, print the 42nd element of the `consecutive_integers` array, and print the 101-103rd (inclusive) elements of the array:

In [33]:
print(consecutive_integers)
print(consecutive_integers[41])

consecutive_integers[100:103]

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]


41

Unlike lists, you can do arithmetic with numpy arrays. For example, if you had the following list: 
```python
heights = [162, 185, 174, 191]
```
and you tried to add one to the list, this is what you would get: 
```python
print(heights + 1)
```
```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-ca23c21090bd> in <module>()
      1 heights = [162, 185, 174, 191]
----> 2 print(heights + 1)

TypeError: can only concatenate list (not "int") to list
```
However, this does work if `heights` is a numpy array. 

### Exercise 3: Array arithmetic

In the cell below: 
1. Create a variable called `heights_array`, which contains a numpy array of `heights` (using the `np.array` function we learned above)
2. Try adding, multiplying, subtracting, dividing, exponentiating the array

In [43]:
heights_list  = [60,63,65,70,73,71,65]
heights_array = np.array(heights_list)

heights_array

array([60, 63, 65, 70, 73, 71, 65])

### Exercise 4: Inequalities
You can also evaluate inequalities with whole arrays at once. Find which values of the array above are greater than 180: 

In [50]:
heights_boolean_array = heights_array > 70

very_tall = heights_array[heights_boolean_array]
very_tall

array([73, 71])

Notice - numpy arrays don't have to contain numbers (floats and integers). They can be _booleans_, and other things too!

A boolean is a special type with the value `True` or `False`.

### Exercise 5: Fancy indexing 

We now want to print only the heights in the array that are greater than 180. Given what you know so far, print the heights greater than 180, by accessing them with their indices, i.e. something like 

```python
print(heights_array[  ])
                    ^^
             put an index here
```

In [57]:
print(heights_array)

heights_gt_63 = heights_array > 63
print(heights_gt_63)

print(heights_array[heights_gt_63])

[60 63 65 70 73 71 65]
[False False  True  True  True  True  True]
[65 70 73 71 65]


In Exercise 4, you found that you can figure out which numbers in the array were greater than 180 all at once. It turns out, if you save that array of booleans: 

```python
heights_gt_180 = heights_array > 180
```

You can use `heights_gt_180` like a group of indices on `heights_array` to get just the heights where `heights_gt_180 == True`. 

In the cell below, try:
```python
heights_gt_180 = heights_array > 180
print(heights_array[heights_gt_180])
```
Did it print out the right indices? What if you flip the greater than to a less than? What if you try `==` instead?

In [61]:
heights_eq_63 = heights_array == 63
heights_eq_63

print(heights_array[heights_eq_63])

[63]


## Putting it all together

Using all of these skills together, let's do something that we couldn't do easily with a scientific calculator. Let's find sum of all of the positive, even numbers less than 10,000.

We'll do that in a few steps below: 

In [63]:
even_int = np.arange(0,10000,2)
print(even_int)

[   0    2    4 ... 9994 9996 9998]


In [64]:
even_int.sum()

24995000

In [66]:
total = np.sum(even_int)
print(total)

24995000


In [75]:
integers = np.arange(-10000,10000,2)
integers

every_other = integers[1::3]
every_other

array([-9998, -9992, -9986, ...,  9982,  9988,  9994])

In [None]:
# An alternative way to find the even numbers
numbers = np.arange(10000)
print(numbers % 2)
print(numbers % 2 == 0)
evens = numbers % 2 == 0
print(np.sum(numbers[evens]))

***

### Exercise 6

Using the above steps as a template, figure out the sum of the **odd** numbers less than **100,000**:

In [70]:
odd_int = np.arange(1,100000,2)
odd_int.sum()

np.sum(odd_int)

2500000000