# Basic `numpy`

In this notebook we will explore the python package `numpy` in more depth:

- numpy arrays
- Linear algebra with `numpy`.

- Pseudorandom numbers in `numpy` and


## `numpy` array

`numpy` is a python package that is a real workhorse of machine learning and data science.

If you are new to python, this will be the first true package you will import. That being said we should check that you have the package installed, try to run the following code chunk. (Note if you installed the Anaconda platform, <a href="https://www.anaconda.com/">https://www.anaconda.com/</a>, `numpy` should be installed already).

In [2]:
## it is standard to import numpy as np
import numpy as np

In [None]:
## let's check what version of numpy you have
## when I wrote this I had version 1.23.5
## yours may be different
print(np.__version__)

1.23.5


In the second Jupyter notebook, we saw that numpy ndarrays can have two dimensions. For example, a 2x2 numpy array is like a list of two lists, where each inner list contains two elements.


In [None]:
## this produces a 2-dimensional array
## it is a 2x2 array
array2 = np.array([[1,2], [3,4]])
print(array2)
print()

## we can check the array's dimensions with np.shape()
## np.shape() returns a tuple with the size of each dimension
## array2 should be a 2 by 2 array
print("array2 is a", np.shape(array2), "ndarray")

[[1 2]
 [3 4]]

array2 is a (2, 2) ndarray


In [None]:
array2.shape

(2, 2)

<i>Note: the dimensionality of an `ndarray` is very important. We will see that certain algorithms will not run if the `ndarray` is the wrong shape.</i>

In [2]:
## You code
## Try making a 2x2x2 array and print its shape (a list of three lists, each contains two elements)


In [9]:
## You code
## Try making a 2x2x2x2 array

### Preset `numpy` Arrays

There are a number of standard array types that you will want to use, that can be quickly generated.

In [6]:
## np.ones(shape) makes an array of all ones of the desired shape
print(np.ones(1))

print()

print(np.ones((4,10)))

print()

[1.]

[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]



In [8]:
## You code: Try making an a 2x2x2 array of 1s and print it


In [None]:
## You code: Try making an a 2x3x4 array of 1s and print it


In [10]:
## You code
## np.zeros(shape) is similar to np.ones, but instead of 1s
## it makes an array of 0s
##Try making an a 2x2x2 array of 0s and print it


In [None]:
## You code: Try making an a 2x3x4 array of 0s and print it

Identity matrix is a square matrix where on the diaganl we have one, and every where else we have zero. We can use `np.eye(n)` to create an nxn identity matrix:

In [11]:
##  2x2 identity matrix

np.eye(2)


array([[1., 0.],
       [0., 1.]])

In [None]:
## you code: make a 9x9 identity matrix

## Matrix Alegbra using numpy

Consider matrix $A$, which is an $m \times n$ matrix with $m$ rows and $n$ columns:

$$
A = \begin{bmatrix} a_{11} &  a_{12} &\dots & a_{1n}\\ a_{21} &  a_{22} & \dots & a_{2n}\\ \vdots& \vdots& \vdots  \\  a_{m1}& a_{m2} &\dots &  a_{mn}\\
\end{bmatrix}
$$

The entry in the $i$th row and $j$th column is referred to as the __$(i,j)$-entry__ of matrix $A$. Entries with the same row and column index ($a_{i,i}$ for any $i$) are called __diagonal entries__ of $A$. The $i$th column of $A$, denoted as $a_i$, represents a vector in $\mathbb{R}^m$. We can represent matrix $A$ in a compact form by its columns:

$$
A= \begin{bmatrix} a_{1} &  a_{2} &\dots & a_{n}\\ \end{bmatrix},
$$

or by its entries:

$$A= (a_{i,j})$$

__Example:__  Let

$$
A = \begin{bmatrix} 0 & 2 & -1\\ 2 & 3 & 1 \end{bmatrix}.
$$

 A is a $2 \times 3$ matrix. We can use numpy array to store this matrix in a variable called A

In [12]:
A = np.array([[0,2,-1],[2,3,1]])
print(A)

[[ 0  2 -1]
 [ 2  3  1]]


We can also find the size of A as follows:

In [13]:
#Size of A

print('A has ', A.shape[0],  'rows, and', A.shape[1],  'columns')

A has  2 rows, and 3 columns


We can also disect the matrix and print rows and columns separately:

In [14]:
print('The first row of A is ', A[0,:])

print(30*'*')

print('\n The second row of A is ', A[1,:])

print(30*'*')

print('\n The first column of A is ', A[:,0])

print(30*'*')

print('\n The second column of A is ', A[:,1])

print(30*'*')

print('\n The third column of A is ', A[:,2])




The first row of A is  [ 0  2 -1]
******************************

 The second row of A is  [2 3 1]
******************************

 The first column of A is  [0 2]
******************************

 The second column of A is  [2 3]
******************************

 The third column of A is  [-1  1]


__Note:__ Disecting an array allows us to access internal data. Make sure you are comfortable using it.

In [15]:
## More Examples:
B = np.array([[1, 2, 3],[4, 5, 6]])
print('B =\n', B)


#Size of B

print('\n B has ', B.shape[0],  'rows, and', B.shape[1],  'columns')

# the second column of B

print('\n The second column of B is ', B[:,1])

print(30*'*')

C = np.array([[0, 2] ,[2, 3],[5, -2]])
print('\n C = \n', C)


#Size of C

print('\n C has ', C.shape[0],  'rows, and', C.shape[1],  'columns')


# print the columns of C

for i in range(C.shape[1]):
    print('\n The', i+1, '-th column of C is ', B[:,i])



B =
 [[1 2 3]
 [4 5 6]]

 B has  2 rows, and 3 columns

 The second column of B is  [2 5]
******************************

 C = 
 [[ 0  2]
 [ 2  3]
 [ 5 -2]]

 C has  3 rows, and 2 columns

 The 1 -th column of C is  [1 4]

 The 2 -th column of C is  [2 5]


__In Class Example__

Consider the matrices A and B above. Then compute the following:

1. $3A-B$

2. $A+C$

### Matrix Product

Let $A$ be an $m \times n$ and $B$ be an $n \times p$ matrix. The product AB is an $m\times p$ matrix whose columns are $Ab_1, Ab_2, \dots, Ab_p$:

$$
AB= \begin{bmatrix} Ab_{1} &  Ab_{2} &\dots & Ab_{n}\\ \end{bmatrix}.
$$

We can also descibe the matrix product by the entries: the ($ij$)-entry of $AB$ is the product of the $i$th-row of $A$ and the $j$th column of $B$ in the following way:

$$
a_ib_j = \begin{bmatrix} a_{i1} &  a_{i2} &\dots & a_{in}\\ \end{bmatrix} \begin{bmatrix} b_{1j} &  b_{2j} &\dots & a_{nj}\\ \end{bmatrix} = a_{i1}b_{j1} + a_{i2}b_{2j} + \dots + a_{in}b_{nj} = \Sigma^{n}_{k=1}a_{ik}b_{kj}
$$


The product $BA$ is not defined if $p\neq m$ (the _neighboring dimensions_
do not match).

__Example__
Let

$$
A = \begin{bmatrix} 1 & 2 & 3 \\
                      4& 5 & 6
    \end{bmatrix}\quad \text{and}\quad B = \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6\\ \end{bmatrix}
$$.

Compute $AB$ and $BA$.


In [16]:
#A
A = np.array([[1,2,3],[4,5,6]])

#B
B = np.array([[1,2],[3,4],[5,6]])

#Compute the matrix product
AB = A @ B

print(AB)

[[22 28]
 [49 64]]


__Example__

Let

$$
C = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 4 & 5 & 6 & 7 \end{bmatrix}.
$$

The matrix product $AC$ is not defined because the number of columns in matrix $A$ does not match the number of rows in matrix $C$. If we proceed with computing $AC$ we will get:


In [17]:
C = np.array([[1, 2, 3, 4], [4, 5, 6, 7]])

# Compute AC
AC = A @ C

print(AC)

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 3)

## `numpy` for generating pseudorandom numbers

__randomess VS pseudo-randomness?__

If there is an algorithm or a program that generate a random number, then it can be predicted, so it is not truly random; thus we call it pseudo random. read more here:

https://numpy.org/doc/stable/reference/random/generated/numpy.random.random.html


### How to select (uniformly) n random number from [0,1]?

In [3]:
## random generators are stored in np.random
## a np.random.random() gives a number selected uniformly
## at random from [0,1]

np.random.random()



0.7416529373343141

In [6]:
## Generating two random numbers
np.random.random(2)

array([0.13479924, 0.35213417])

In [11]:
## Generating an array of drwas: 5 list of [a,b] where a, b are generated randomly from [0,1]
##Note that the 5 and 2 are in a tuple

np.random.random((5,2))

array([[0.37265996, 0.27123018],
       [0.43960568, 0.84133817],
       [0.32941216, 0.01025254],
       [0.03136454, 0.55313339],
       [0.5873321 , 0.83735259]])

### Draw samples from normal distribution

Read more here:
https://numpy.org/doc/stable/reference/random/generated/numpy.random.randn.html

In [10]:
## a single draw
np.random.randn()

-0.4715921213462492

In [12]:
## Two draws
np.random.randn(2)

array([-0.69428399,  0.8953828 ])

In [13]:
## an array of draws
## note that we don't have to put
## the 10 and 2 in a tuple to get a 10 by 2 array

np.random.randn(10,2)

array([[-3.93593091,  0.16972481],
       [ 1.46053073,  0.28045233],
       [ 1.30115927, -0.88831037],
       [-1.16158642, -1.36105153],
       [-2.10677301,  1.03471304],
       [ 0.67558301,  1.61032721],
       [-1.12562245, -0.53625104],
       [ 2.75448109,  0.34583798],
       [-0.05224217,  0.40711562],
       [-0.83204865,  0.17747772]])

### Draw samples from a binomial distribution

`np.random.binomial(n,p, size)`
where
 n is the number of trials,
 p is the probability of success, and
 size is the output shape

Read more here:
https://numpy.org/doc/stable/reference/random/generated/numpy.random.binomial.html


In [21]:
## A third example
## np.random.binomial()
## an array of binomial(n,p) outcomes
## https://numpy.org/doc/stable/reference/random/generated/numpy.random.binomial.html
np.random.binomial(n=6, p=.6, size=(10,10))

array([[5, 3, 4, 6, 4, 4, 3, 2, 1, 3],
       [4, 2, 2, 3, 5, 5, 4, 3, 5, 4],
       [3, 3, 0, 4, 4, 0, 2, 4, 3, 4],
       [4, 2, 2, 4, 5, 6, 5, 3, 5, 4],
       [4, 4, 4, 5, 2, 3, 5, 2, 4, 4],
       [1, 1, 4, 1, 3, 3, 3, 5, 3, 5],
       [3, 3, 4, 4, 2, 4, 3, 4, 3, 4],
       [1, 3, 3, 2, 3, 3, 3, 3, 3, 4],
       [2, 3, 5, 5, 4, 6, 3, 4, 5, 3],
       [6, 3, 2, 3, 3, 5, 4, 6, 4, 4]])

In [23]:
# result of flipping a coin 10 times, tested 1000 times.

n, p = 10, .5  # number of trials, probability of each trial
s = np.random.binomial(n, p, 100)
s

array([4, 4, 4, 5, 5, 5, 6, 4, 6, 2, 5, 5, 5, 6, 5, 7, 5, 4, 4, 4, 6, 8,
       6, 3, 7, 3, 5, 5, 4, 5, 4, 6, 4, 5, 6, 5, 6, 4, 5, 5, 3, 8, 3, 6,
       6, 2, 3, 6, 3, 5, 7, 6, 7, 6, 2, 4, 5, 4, 5, 4, 7, 3, 6, 4, 5, 6,
       4, 7, 3, 6, 5, 7, 5, 6, 4, 5, 0, 7, 3, 6, 4, 4, 7, 4, 6, 7, 5, 5,
       6, 4, 4, 4, 5, 4, 5, 7, 5, 7, 5, 5])

### Random Seeds

You may have noticed that your randomly generated numbers are different everytime you run your code. This is expected because they are random numbers. It would be quite the coincidence if two different runs came up with the exact same random draw (for the random distributions we have looked at above).

If we want to ensure that you get the same random draw across runs you first need to set a random seed. In `numpy` this is done with `numpy.random.seed()`, <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.seed.html">https://numpy.org/doc/stable/reference/random/generated/numpy.random.seed.html</a>.

In [24]:
## Run this code chunk as many times as you'd like
## it should always give the same number

## to set a random seed you call np.random.seed(integer >= 0)
## Note that your number can be any integer so long as it is non-negative
np.random.seed(440)

np.random.randn()

-0.3202545309014756

In [25]:
## Lets generate another number
np.random.randn()

1.0905067400712454

In [26]:
## if we wanty to get the first number, we use seed
np.random.seed(440)

np.random.randn()

-0.3202545309014756

##Exercises




1. Create a 10 by 5 array of random normal draws and name it `X`.
   
   - a) Print the entry at the 3rd row and 4th column of `X`.
   - b) Write a loop to print each row of `X`.
   - c) Write a loop to print each column of `X`.
   - d) Calculate the mean of the entire matrix `X` (i.e., the mean of all the numbers in `X`).
   - e) Define a vector `r` that contains the mean of each row of `X` (so the first element of `r` is the mean of the first row of `X`).
   - f) Define a vector `c` that contains the mean of each column of `X` (so the first element of `c` is the mean of the first column of `X`).


2. Repeat parts d, e, and f, but this time replace the mean with the maximum value.


3. Define an augmented matrix $[X|r]$ that attaches vector $r$ to matrix X as the last column.

4. Define an augmented matrix $Y$ that attaches vector $c$ to matrix X as the first row. Check your solution.


5.  Which pair of the following matrices can be multiplied? Compute their matrix product.

$$
A = \begin{bmatrix}
1 & 3 \\
4 & -2 \\
3 & 2
\end{bmatrix}
$$

$$
B = \begin{bmatrix}
1 & 4 & 5\\
0 & 2  & 3
\end{bmatrix}
$$


$$
C = \begin{bmatrix}
1 & 3 & 0\\
2 & -1  & 3 \\
0 & 1 & 1
\end{bmatrix}
$$


6. Find two matrices $A$ and $B$ such that the products $AB$ and $BA$ are defined but $AB \neq BA$.


7.  Given $
A= \begin{bmatrix}
1 & 2 \\ 1& 2 \end{bmatrix}$, find a nonzero matrix  $C$ for which $AC= \begin{bmatrix}
0 & 0 \\ 0 & 0
\end{bmatrix}$


8. Start with a non-trivial 5x5 matrix. Print the following submatrices:

- The top-left 2x2 submatrix

- The top-left 3x4 submatrix

- The bottom-right 3x2 submatrix

- The 4x4 submatrix obtained by removing the first row and the first column






9. A company drills 9 wild-cat oil exploration wells, each with an estimated probability of success of 0.1. All nine wells fail. What is the probability of that happening? Do 20,000 trials of the model, and count the number that generate zero positive results.



10. `.reshape()` is a useful tool in numpy that allows you to change the shape of your array. If you are not familiar with it, visit [numpy documentation on `.reshape()`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.reshape.html) to read more. Then, solve the following problems:

   - a) Create a 1-dimensional array `A` with 20 random normal draws. Reshape `A` into a 4 by 5 matrix.

   - b) Starting with the 4 by 5 matrix `A` from part (a), use `.reshape()` to convert it into a 5 by 4 matrix.

   - c) Create a 3-dimensional array `B` with 24 random normal draws, then reshape it into a shape that has dimensions 2 by 3 by 4.

   - d) Starting with the 2 by 3 by 4 array `B` from part (c), reshape it into a 6 by 4 matrix. What do you notice about the relationship between the shapes before and after reshaping?

   - e) Explain why the product of the dimensions of the original array and the reshaped array must be equal. Provide an example using a 2 by 6 matrix.