In [1]:
%pip install numpy



## Grouping Together Your Data into a Collection
Python also has operators for collecting related data together.  Most of this course will revolve around the pros and cons of different ways of collecting data, but let's take a look at them:

| "tuple" (fixed sequence) | "list" (changeable sequence) | "str" (sequence of text characters) |  "set" {mathematical set) 
| :---------:| :----:    | :--------:    | :--------:    |
|  (1, 2, 3) | [1, 2, 3] | "123" or '123' | {1, 2, 3} |

#### Group Exercises: Making Collections

Example: Make a **list** of the first three positive even numbers

In [1]:
[2, 4, 6]

[2, 4, 6]

Make a **tuple** of the first three odd numbers.

In [1]:
(1, 3, 5)

(1, 3, 5)

Make a **list** containing the names of two countries in Europe.

In [2]:
["Germany", "Italy"]

['Germany', 'Italy']

Make a **set** of three animals.

In [4]:
{'Dog', 'Hamster', 'Cat'}

{'Cat', 'Dog', 'Hamster'}

Make a **tuple** of the two possible *bool* values

In [5]:
(True, False)

(True, False)

## Statistics Functions from Numpy

**Numpy** is a Python package that, among other things, has many useful statistics **functions**.  These take any array-like object as an input and can be found inside the **np** library.  Sometimes, the same functionality can be found both as a Numpy function  and an array method, giving you the choice of how you'd like to use it.  


```python
>>> np.mean([1, 2, 3, 4])
2.5

>>> np.ptp([1, 2, 3, 4])
3
```

A couple lists of functions in Numpy can be found here:
  - Math:  https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html
  - Statistics: https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.statistics.html


Some useful function names: mean, min, sum, max, var, std, p2p, median, nanmedian, nanmax, nanmean, nanmin

We'll see more later.

**Exercise**: Using only Numpy functions, calculate the statistics on the following numbers:

In [4]:
import numpy as np
data = [2, 8, 5, 9, 2, 4, 6]
data

[2, 8, 5, 9, 2, 4, 6]

**Example**: What is the maximum of the data?

In [5]:
np.max(data)

9

What is the mean of the data?

In [6]:
np.mean(data)

5.142857142857143

What is the sum of the data?

In [7]:
np.sum(data)

36

What is the minimum of the data?

In [8]:
np.min(data)

2

The variance?

In [9]:
np.var(data)

6.408163265306122

The standard deviation?

In [10]:
np.std(data)

2.531435020952764

The data's median?

In [11]:
np.median(data)

5.0

# Arrays in Numpy

**Numpy** has a very useful data collection: the **array**.  Arrays are very similar to lists, with one exception:
  - **all elements in the array must be of the same data type (e.g. int, float, bool)**
  
Despite that limitation, arrays are extremely useful for data analysis, and we'll be taking advantage of its many features throughout the course.  So let's start by learning how to easily generate different patterns of data with arrays!

### Building Arrays

Let's generate some arrays using Numpy functions!  Some commonly-used are examples are **arange()**, **linspace()**, **zeros()**, and the random number generation functions in **random**.

| function | Purpose |  Example |
| :-----------: | :-------------: | :-------------: |
| **np.array()**  | Turns a list into an array |   np.array([2, 5, 3]) |
| **np.arange()**                  | Makes an array with all the integers between two values | np.arange(2, 7) |
| **np.arange()**                  | Makes an array with all the integers between two values, with a given spacing | np.arange(2, 7, 0.3) |
| **np.linspace()**               | Makes a specific-length array |  np.linspace(2, 3, 10) |
| **np.zeros()**                    | Makes an array of all zeros | np.zeros(5) |
| **np.ones()**                     | Makes an array of all ones | np.ones(3) |
| **np.random.random()** | Makes an array of random numbers | np.random.random(100) |
| **np.random.randn()**     | Makes an array of normally-distributed random numbers | np.random.randn(100) |


#### Exercises

Import the numpy package as `np`:

In [2]:
import numpy as np

Turn this list into an array:

In [3]:
[4, 7, 6, 1]

[4, 7, 6, 1]

In [4]:
np.array([4, 7, 6, 1])

array([4, 7, 6, 1])

Make an array containing the integers from 1 to 15.

In [14]:
np.arange(1, 16)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

Make an array of the values from 2 and 6, spaced 0.5 from each other.

In [16]:
np.arange(2, 6.1, 0.5)

array([2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ])

Make an array of only 6 numbers between 1 and 10, evenly-spaced between them.

In [17]:
np.linspace(1, 10, 6)

array([ 1. ,  2.8,  4.6,  6.4,  8.2, 10. ])

How about an array of 10 evenly-spaced values between 100 and 1000?

In [25]:
np.linspace(100, 1000, 10)

array([ 100.,  200.,  300.,  400.,  500.,  600.,  700.,  800.,  900.,
       1000.])

Turn this list into a an array...

In [21]:
a = [True, False, False, True]
a

[True, False, False, True]

In [20]:
np.array(a)

array([ True, False, False,  True])

Make an array containing 20 zeros.

In [22]:
np.zeros(20)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])

Make an array contain 20 ones!

In [24]:
np.ones(20)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1.])

Generate an array of 10 random numbers

In [26]:
np.random.random(10)

array([0.00260908, 0.63026806, 0.21361039, 0.5251968 , 0.13778592,
       0.974034  , 0.44345238, 0.40914629, 0.38656548, 0.27392163])

### Combining array generation with statistics functions

These exercises all involve two steps:
  1. Make the data
  2. Calculate something on the data

for example:
```python
np.mean(np.arange(1, 10))  # the mean of the integers from 1 to 9
```

**Exercises**

*Example*: What is the standard deviation of the integers between 2 and 20?

In [27]:
np.std(np.arange(2, 21))

5.477225575051661

Is it the same as the standard deviation of the integers between 1 and 10?

In [28]:
np.std(np.arange(11))

3.1622776601683795

What is the standard deviation of the numbers generated from the np.random.randn() function?  

In [29]:
np.std(np.random.randn(100))

1.0204318052571688

What is the sum of an array of 100 ones?

In [32]:
np.sum(np.ones(100))

100.0

What is the sum of an array of 100 zeros?

In [34]:
np.sum(np.zeros(100))

0.0

### Array Broadcasting: Combing Arrays with Operators

Remember our math operators?  We can use them on arrays, too!

|Assign to a Variable, | | Add,  | Subract, | Multiply, | Divide, | Power, | Integer Divide, | Remainder after Division | 
|  :---------------:  | :-: |:---:| :-----: | :------: | :----: | :---: | :------------: | :----------------------: |
|         =           | | +   |    -    |    *     |   /    |   **  |       //       |           %              |

Numpy also has functions that can transform each value in an array using a math operation.  For example: `np.log()`, `np.abs()`, `np.sin()`, `np.cos()`, `np.tan()`, `np.sqrt()`

**Exercises**

Example: Add 10 to all of the numbers below

In [8]:
x = np.array([1, 3, 6, 8, 10])
x + 10

array([11, 13, 16, 18, 20])

Multiply everything in the array below by 10

In [None]:
np.array([1, 3, 6, 8, 10])

In [36]:
x = np.array([1, 3, 6, 8, 10])
x * 10

array([ 10,  30,  60,  80, 100])

Multiply all the numbers from 1 to 100 by 1000

In [38]:
np.arange(1, 101) * 1000

array([  1000,   2000,   3000,   4000,   5000,   6000,   7000,   8000,
         9000,  10000,  11000,  12000,  13000,  14000,  15000,  16000,
        17000,  18000,  19000,  20000,  21000,  22000,  23000,  24000,
        25000,  26000,  27000,  28000,  29000,  30000,  31000,  32000,
        33000,  34000,  35000,  36000,  37000,  38000,  39000,  40000,
        41000,  42000,  43000,  44000,  45000,  46000,  47000,  48000,
        49000,  50000,  51000,  52000,  53000,  54000,  55000,  56000,
        57000,  58000,  59000,  60000,  61000,  62000,  63000,  64000,
        65000,  66000,  67000,  68000,  69000,  70000,  71000,  72000,
        73000,  74000,  75000,  76000,  77000,  78000,  79000,  80000,
        81000,  82000,  83000,  84000,  85000,  86000,  87000,  88000,
        89000,  90000,  91000,  92000,  93000,  94000,  95000,  96000,
        97000,  98000,  99000, 100000])

**Example**: Calculate the absolute value of the following data

In [40]:
x = np.array([-5, 7, -2, 4, -10])
np.abs(x)

array([ 5,  7,  2,  4, 10])

Calculate the cosine of all integers between 0 and 6

In [42]:
np.cos(np.arange(7))

array([ 1.        ,  0.54030231, -0.41614684, -0.9899925 , -0.65364362,
        0.28366219,  0.96017029])

Calculate the square root of 10 uniformly-generated numbers between 1 and 5 (tip: `np.random.uniform`)

In [45]:
np.sqrt(np.random.uniform(1, 5, 10))

array([1.75104708, 1.92066842, 2.16665447, 2.15086798, 1.226737  ,
       2.12393023, 2.22621496, 1.17422232, 1.84649746, 2.10125579])