# Numpy part 1 - working with arrays

## Prerequsites 
- Python basics part 1
- Python basics part 2

## Learning objectives 
- Import the numpy package
- Create data arrays from lists
- Generate data arrays using built-in numpy functions
- Generate arrays of random numbers
- Select and slice portions of data arrays

## References
- https://wesmckinney.com/book/numpy-basics (primary)
- https://numpy.org/doc/stable/user/absolute_beginners.html (secondary)

### Import the numpy package
- A Python package contains multiple "functions" that help you do different more advanced computations, e.g., machine learning algorithms. 
- We start from the most basic (yet most popular) package: `numpy` -- array and matrix operations (more on this later)
    - For those who know MATLAB (no worries at all if you don't), this package gives functionality comparable to MATLAB.
- How to install/import the package?
    - Simple: `import numpy as np`. What does it mean?
    - It means: (i) I want to import `numpy`; (ii) I want to name the package as `np` -- so that every time I use this package, I only need to write `np.` (don't forget the dot after `np`). 
    - Import numpy by executing the code block below

In [None]:
import numpy as np

#### Create an array using numpy
- Array is a power tool in numpy. It can be a vector (one-dimensional), a matrix (two-dimensional), or even a high-dimensional object.
- Create an array: `np.array()`. What does the dot mean?
    - In my own words: "under the package numpy (np), we use a sub-package called array". 
- Let's create a one-dimensional array -- Try both on your own. 
    - We can do it by using a (numeric) list -- recall from our last Python note -- my preferred way.
    - `list_1 = [ 1, 20, 4.1, 5]`
    - `array_1 = np.array(list_1)`
    - Or we can create it directly.
    - `array_1 = np.array([1, 20, 4.1, 5])`

In [None]:
list_1 = [ 1, 20, 4.1, 5]
array_1 = np.array(list_1)
print(array_1)

#### Create a multi-dimensional array from lists
- Verify in the code block below that both of the following gives you an array with two rows and four columns
- `list_1 = [ 1, 20, 4.1, 5]`
- `list_2 = [3, 4.1, 2, 8]`
- `array_2 = np.array([list_1, list_2])` or `array_2 = np.array([[ 1, 20, 4.1, 5], [3, 4.1, 2, 8]])`
  - Note 1: you must put '`[]`' around list_1, list_2.
  - Note 2: your lists must be the same length. Otherwise, you get an error.

In [None]:
list_1 = [1, 20, 4.1, 5]
list_2 = [3, 4.1, 2, 8]
array_2 = np.array([list_1, list_2]) 
print(array_2)

### Create arrays using built-in numpy functions 

#### Create arrays with zeros -- `np.zeros()` 
- What if you want to create an array like `[0,0,0,0,0,0,0,0,0,0]`?
- The function `np.zeros(10)` generates an array of 10 zeros -- note it creates `0.` (dot) indicating it is a float number.
- What's more: the function `np.zeros([10, 2])` generates an array with 10 rows and two columns, all zeros.

In [None]:
np.zeros(10)

#### Create arrays with ones -- `np.ones()`
- What if you want to create an array like `[1,1,1,1,1,1,1,1,1,1]`?
- The function `np.ones(10)` generates an array of 10 ones. -- try `np.ones([10, 2])` too.

In [None]:
np.ones(10)

#### Create arrays with twos -- np.twos() ???
- NO.
- A small twist: we can do it with `np.ones()`: try `np.ones(10)*2` -- same logic goes for other numbers. 

In [None]:
np.ones(10)*2

#### Create (slightly more complicated) arrays using `np.arange()`
- What if you want to create an array like `[0,1,2,3,4,5,6,7,8,9]`?
- Numpy has a built-in function for you: `np.arange(0,10,1)`. What it means?
    - (i) it creates an array starting with _0_
    - (ii) it terminates __before__ (!) 10 
    - (iii) it increases by _1_.
- Try this below. 

In [None]:
np.arange(0,10,1)

- Try `np.arange(0,10,0.1)` -- increment is 0.1
- Try `np.arange(1,10,1)` -- starting from 1 with increment by 1

#### Some additional notes about `np.arange()`
- `np.arange(10)` is short for `np.arange(0,10,1)`
    - because 0 is the default setting for the first argument, and 1 is the default setting for the third argument.
- `np.arange(1, 10)` is short for `np.arange(1,10,1)`
    - again, 1 is the default setting for the third argument.
- It depends on your preference and tastes, but missing arguments is not my own preferred way of coding.

### Random numbers generators
- numpy includes a module "random" that we can call with `np.random`.
- The `np.random` module includes (not limited to) the following functions to generate random numbers:
    - `np.random.rand()` returns a random uniformly drawn 0 and 1.
    - `np.random.randint()` returns a random _integer_ uniformly drawn between two bounds (details below).
    - `np.random.random()` returns a (pseudo) random number drawn from a uniform distribution ranging between 0 and 1
    - `np.random.normal()` returns random numbers drawn from a normal distribution
    - `np.random.standard_normal()` returns random numbers drawn from a _standard_ normal distribution

#### Create random numbers and arrays of random numbers -- `np.random.randint()`
- the function `np.random.randint(10)` returns a random integer uniformly drawn between __0__ and 10.  -- __Note__: The maximum possible number returned is 9 (not 10)
- You can specify the lower bound (first argument/input) and upper bound (second argument)
    - the function `np.random.randint(1,high=7)` returns a random integer uniformly drawn between 1 and 7 -- The highest integer returned is 6 (not 7).
- You can generate an array instead of a single number:
    - the function `np.random.randint(1,high=7, size=(5,3))` returns an array with 5 rows and 3 columns, each with a random number between 1 and 7.
    - experiment with creating random integers of arrays of varying sizes.  Confirm that the 'high' values are never drawn.

#### Create arrays of random numbers from normal distributions - `np.random.random`, `np.random.standard_normal`, `np.random.normal`
- Recall that the function `np.random.random()` returns a (pseudo)random number drawn from a uniform distribution ranging between 0 and 1
    - `np.random.random(size=(4, 3))` returns an array with 4 rows and 3 columns, each with a random number drawn from a uniform distribution ranging between 0 and 1
- `np.random.standard_normal(size=(2,2))` returns an array with 2 rows and 2 columns, each with a random number drawn from a standard normal distribution.
    - _Recall_: a standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.
- `np.random.normal(loc=4, scale=5, size=(3, 5))` returns an array with 3 rows and 5 columns,
    - each with a random number drawn from a normal distribution with a mean of 4 and a standard deviation of 5.
- Experiment with creating arrays of different sizes with random numbers drawn from uniform and normal distributions.  If you want only one number, do not specify the size.
- execute `np.random?` (question mark included) in the code block below to learn more about the different distributions you could choose.
    - Hint: the question mark `?` is also a built-in function in Python that works as asking Python "What does xxx mean?" In this case xxx refers to `np.random`

### Size and shape of numpy arrays

In [None]:
# run this code block to create the example array called 'sta'
# notice the array prints below the code block
sta = np.array([[1, 2, 3, 4], [7, 8, 9, 10], [4, 5, 6, 12]])
print(sta)

- The numpy array object `sta` carries a few _attributes_. The following codes help you find these attributes:
- `sta` has the attribute 'size':  `sta.size` returns the number of elements in the array.
    - This returns the number of elements without regard to the number of rows and columns
- `sta` also has an attribute of 'shape'. `sta.shape` returns a tuple with the number of rows and columns of the array sta   
    - Aside: the built-in function `len(sta)` returns the number of rows in the array, not the number of elements.
- In the code block below, experiment with .size, .shape, and len()
- create some new arrays with different names and check sizes and shapes 

In [None]:
sta.size

In [None]:
sta.shape

In [None]:
len(sta)

### Selecting, indexing, and slicing numpy arrays
#### Selecting ranges of arrays
- Use square brackets `[]` when selecting elements of an array
- Use a `:` to indicate a range of values.  A `:` on its own means take a whole row or column.
- The first row or column is indexed with a '0', the last row or column is indexed with '-1'
- `sta[0,0]` returns the first row and first column of sta
- `sta[0,]`, `sta[0,:]`, `sta[0:1]` all return the first row and all columns of sta.
- `sta[0:2]`, `sta[0:2,]`, `sta[0:2,:]` all return the first _two_ rows and all columns of sta.
- `sta[1:3, 2:4]` retuns the 2nd and 3rd row of sta and the 3rd and 4th columns.
- `sta[1:-1, -1]` returns the 2nd row and all rows after it (regardless of how many) and the last column of sta 
- experiment with indexing in the code block below.  What would you enter to get the 1st and second row and 2nd and 3rd columns?

In [None]:
sta[0:2,:]

#### Selecting specific rows and columns
- `sta[[0,2],[0,3]]` returns elements in the 1st and 3rd row and the 1st and 4th columns.
- experiment with creating lists of rows and lists of columns below

In [None]:
sta[[0,2],[0,3]]

#### selecting rows and columns conditional on values
- `sta >4` returns an array with the same size and shape as sta with "True" in the places where elements are greater than 4 and "False" where they are not.
- `sta[sta > 4]` returns a one-dimensional array with all the elements for sta that are greater than 4
- Try >=  for greater than or equal to, <= for less than or equal to, == for equal to, < for less than, > for greater than.
- Experiment below

In [None]:
sta>4