# Introduction to NumPy

NumPy is a linear algebra library for Python.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [15, 5]

# 1. NumPy Arrays

NumPy arrays will create an n-dimensional array

$\text{n-dimensions:}$ $\begin{cases} \text{1-dimension} & \text{list} \\ \text{2-dimension} & \text{list of lists} \\ \text{3-dimension} & \text{list of lists of lists} \end{cases}$

## 1-Dimensional Array, i.e Vector

In [62]:
# Create a simple stock return list
stock_list = [3.5, 5, 2, 8, 4.2]

# Parse the list as a numpy array'

returns = np.array(stock_list)
returns

array([3.5, 5. , 2. , 8. , 4.2])

## 2-Dimensional Array, i.e. Matrix

Nested lists with each set of lists representing a row

In [4]:
A = np.array([[1,2],[3,4]])
A

array([[1, 2],
       [3, 4]])

## a. Changing Shape of Arrays

- Check the **shape** of the array: using the shape attribute of `np.arrays() `

- **Reshape** the array specifying the appropriate number of row and column dimensions

In [8]:
# Check the shape
A.shape

(2, 2)

In [10]:
# Reshape the array
A.reshape(1,4)

array([[1, 2, 3, 4]])

### Create Matrices with Reshape

1. Generate a range of numbers

2. Reshape them into matrix form

In [3]:
np.arange(0,9).reshape(3,3)

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

## b. Indexing and Slicing Arrays

Same principle that applies to Python Lists

### 1-Dimensional Array

In [15]:
# Print first and last element of the returns array
print("first element: {}, last element: {}, elements in between: {}".format(
    returns[0], returns[len(returns) - 1], returns[1:3]))

first element: 3.5, last element: 4.2, elements in between: [5. 2.]


### 2-Dimensional Array (Matrix)

1. **Slicing with row and columns:**

`A[row element i : row element j, col element n : col element m]`

In [26]:
print("{}, first column: {}, first row: {}".format(
            A[:,:], A[:,0], A[0,:]))

[[1 2]
 [3 4]], first column: [1 3], first row: [1 2]


2. **Indexing Matrices by Rows:**

Inputting a single number in as the array argument returns the row indexed to the specified index in the argument

`A[row index]`

**Accessing elements** in the indexed row:

- Either by specifying the element in the index
- or by accessing it by adding another index

`A[row index, element in the row]`

`A[row index][element in the row]`

In [43]:
print("Entire row indexing: {}, \n access elements inside the index: {},\n access elements outside the index: {}"
      .format(A[1], A[1,1], A[1][1]))

Entire row indexing: [3 4], 
 access elements inside the index: 4,
 access elements outside the index: 4


## c. Broadcasting

Which is one feature that is not available on Python lists

- *Broadcasting*: is a way to replace values in an array wthe other predefined values

### Setting a Value with Index Range



In [15]:
arr = np.arange(0, 11)
arr[0:5] = 100
arr

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])

### Broadcasting a slice of the array

broadcasting on a slice, changes the original array

In [4]:
arr = np.arange(0,11)
slice_arr = arr[0:6]
slice_arr[0:6] = 99
print("The original array is:{}, the sliced array is:{}".format(arr, slice_arr))

The original array is:[99 99 99 99 99 99  6  7  8  9 10], the sliced array is:[99 99 99 99 99 99]


### Copying an array

Data is not copied:

- Creating a copy of an array simply views the originak array

In [5]:
arr.copy()

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

## d. Array Functions

Most functions are applied to an array element-wise (as scalar-multiplication is).

In [44]:
print("log returns: {}, \n mean returns: {}, \n max return: {}".format(
            np.log(returns), np.mean(returns), np.max(returns)))

log returns: [1.25276297 1.60943791 0.69314718 2.07944154 1.43508453], 
 mean returns: 4.54, 
 max return: 8.0


**Scalars on arrays:**

In [63]:
returns*2 + 5

array([12. , 15. ,  9. , 21. , 13.4])

### Index of the Maximum Value - `.argmax()`

returns the index of the maximum value

same with `.argmin()`

In [65]:
returns[returns.argmax()]

8.0

### Applying a Function Across the Columns or Rows of a Matrix

- **Columns:** `matrix.sum(axis=0)`

- **Rows:** `matrix.sum(axis=1)`

## e. NaN Values

NaN values appear when there is missing or non-existant data. `nan` values can lead to errors so it is important to know how to drop them.

- When attempting to apply functions on a vector containing `nan` values, it throws an error

In [36]:
# Create a vector containing a nan value
vector = np.array([1,2,np.nan, 4, 5])

# try to apply a function to the vector
np.mean(vector)

nan

### Indexing Arrays by Boolean Values

#### Step 1: Create a boolean vector of the array as an index

The vector should contain the boolean values of those values which ar **not nan**, `~` means not

`boolean_index = ~np.isnan(vector)`

#### Step 2: Index the array with the boolean vector of not nans

` vector[boolean_index]`


In [41]:
ix = ~ np.isnan(vector)
ix
np.mean(vector[ix])

# can also use nan meana
np.nanmean(vector)

3.0

## f. Dot Product in NumPy

**For a 1-dimensional array:**

`np.dot(x, y)` computes the dot product of two vectors

**For a 2-dimensional array:** 

<p style="text-align:center;">$\text{(m x n) } \cdot \text{ (n x m)} = \text{(m x m)}$</p> 

## g. Conditional Selection

Indexing an arry by boolean values, i.e True and False values

### Get Subset of Array with Values > x

- By using conditional selection we get an array of boolean values: `[False, False, False, True, True]`

- By indexing the boolean array we get the subset that matches with the boolean values equal to True: `[Value, Value]`

In [3]:
# generate the array
arr = np.arange(1,11)

# subset the array by values greater than 4
arr > 4

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

In [4]:
# Index the boolean condition to get the subset
arr[arr>4]

array([ 5,  6,  7,  8,  9, 10])

# 2. NumPy Functions

### Zeroes Matrix - `np.zero()`

Creates a NumPy array with the given dimension entirely filled in with $0$, data is floats as to not lose any information.

**Vector:** Insert a single number

`np.zeros(number)`

<p style="text-align:center;">$\begin{bmatrix} 0 & 0 & 0 \end{bmatrix}$</p>

**Matrix:** Insert a tuple

`np.zeros((number rows, number columns))`

<p style="text-align:center;">$\begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}$</p>

In [5]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

### Ones Matrix - `np.ones()`

In [6]:
np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

### Identity Matrix - `np.eye()`

In [9]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### Sequencing in NumPy - `np.arange(start,stop,skip)`

Creates a range from start to end with skips just like the range function in python.#

`np.arange(N)`: Creates a sequence of $N$ numbers, similar to `range(N)` function in Python

### Array of Linearly Spaced Numbers - `np.linspace(start, stop, num)`

Returns evenly spaced numbers over a specified interval:

- starting number
- ending number
- how many numbers you want between the start and end values

In [8]:
np.linspace(2, 3, num = 5)

array([2.  , 2.25, 2.5 , 2.75, 3.  ])

# 3. NumPy Random Functions

## Uniform Distribution - rand

`np.random.rand()` creates an array of the given shape and populates it with random samples from a uniform distribution over `[0,1)`, i.e. all the numbers have the same probability of being picked.

<p style="text-align:center;">$\sim U(0,1)$</p>

In [10]:
# 2x2 matrix over a uniform distribution
np.random.rand(2,2)

array([[0.77949609, 0.45646019],
       [0.45271163, 0.62223751]])

## Standard Normal Distribution - randn

returns a sample from a standard normal distribution or Gaussian Distribution:

<p style="text-align:center;">$\sim N(0,1)$</p>

The closer you are to $0$ the closer you are to be picked as a random number

In [13]:
np.random.randn(5)

array([ 1.20697392, -0.75359773, -0.35253517,  2.25899063,  0.15725628])

## Random Integer - `np.random.randint(low, high, how many)`

In [55]:
np.random.randint(2,9, 9)

array([6, 8, 8, 3, 2, 2, 5, 6, 2])