<a href="https://colab.research.google.com/github/cherlimSG/Python-revision/blob/main/Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#NUMPY

Numpy is the core library for scientific computing in Python. Numpy provides you with an array data structure that has some benefits over Python lists:
  - more compact
  - faster access in reading and writing items

First, you will import numpy package.

In [6]:
import numpy as np

#Arrays

Three commonly used arrays:

![alt text](https://drive.google.com/uc?id=1a4Q_PM5W85UTKitcb6VG2NEtnj3-_8Xm)

A numpy array is a grid of values. These values are of the same type. 

The number of dimensions is the rank of the array.

- A 1D array has rank 1
- A 2D array has rank 2 and so on

You may come across "axis 0" which refers to rows of a 2D array and "axis 1" which refers to columns of a 2D array. If you have a 3D array, you have "axis 2" which can be though as the "depth" of the array as illustrated in the above figure.

We can also visualize a 1D NumPy array as a list of numbers, a 2D NumPy array as a matrix, a 3D NumPy array as a cube of numbers, and so on.

In [7]:
# Run this cell to see the result
# a is a 1D array which has 3 elements
# a.shape rturn the array dimensions
# a[0] is the first element of a ....

a = np.array([1, 2, 3])  # Create a rank 1 array
print(type(a), a.shape, a[0], a[1], a[2])

# You can change value of any element of the array by assigning it a new value
a[0] = 10                
print(a)                  

<class 'numpy.ndarray'> (3,) 1 2 3
[10  2  3]


Answers: <class 'numpy.ndarray'> (3,) 1 2 3

[10  2  3]

In [8]:
# Here we create a 2D array. It has 2 rows and 3 columns
# Elements of the array is arranged in this pattern
# b[0, 0] b[0, 1] b[0, 2] 
# b[1, 0] b[1, 1] b[1, 2]

b = np.array([[1, 2, 3],[4, 5, 6]])
print(b)

[[1 2 3]
 [4 5 6]]


Answers: 

[[1 2 3]

 [4 5 6]]

In [9]:
print(b.shape) # you will see (2, 3). 2 is the number of rows and 3 is the number of columns

print(b[0, 0], b[0, 1], b[1, 0])

(2, 3)
1 2 4


Answers:

(2, 3)

1 2 4


##This table summarizes some useful funtions in numpy.
![alt text](https://drive.google.com/uc?id=1Bn7x0EJSWYMQuAxP64HO2oy_2KZSIUvI)

Examples of functions

In [10]:
# Create an matrix of all zeros
a = np.zeros((2,2))  
print(a)

[[0. 0.]
 [0. 0.]]


In [11]:
# Create an array of all ones
b = np.ones((1,3))   
print(b)

[[1. 1. 1.]]


In [12]:
# Create a 3x3 identity matrix
# [1 0 0
#  0 1 0
#  0 0 1]
d = np.eye(3)        
print(d)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


###This function **``np.linspace()``** is very useful.

In [13]:
#Create a 1D array with 11 equally spaced values from 0 to 2
x = np.linspace(0, 2, 11)
print(x)

[0.  0.2 0.4 0.6 0.8 1.  1.2 1.4 1.6 1.8 2. ]


###Another one is **``np.arange()``** to create **arithmetic sequence** (ring any bell :) ). 

Example: 2, 4, 6, 8 ... or 1, 5, 9, 13 ...

In [14]:
# Create a 1D array with values from 0 to 20 (exclusively) incremented by 1.5
y = np.arange(0, 20, 1.5)
print(y)

[ 0.   1.5  3.   4.5  6.   7.5  9.  10.5 12.  13.5 15.  16.5 18.  19.5]


In [15]:
# Not in the above list:
# Create a constant array
c = np.full((3, 6), 7) 
print(c)

[[7 7 7 7 7 7]
 [7 7 7 7 7 7]
 [7 7 7 7 7 7]]


###To create random numbers, use **``np.random.random()``** (uniformly distributed in the half-open interval [0.0, 1.0))

In [16]:
e = np.random.random((2,2)) # Create an array filled with random values
print(e)

[[0.05245545 0.34437753]
 [0.34067785 0.0375749 ]]


###To create random samples from a normal (Gaussian) distribution, use **``np.random.normal(loc=0.0, scale=1.0, size=None)``**. loc = mean and scale = standard variation

#Exercise#

In [17]:
mu, sigma = 0, 0.1 # mean = 0  and standard deviation = 0.1 
# generate 1000 sample
s = np.random.normal(mu, sigma, 1000)

# You can check the sample mean and sample standard deviation by
# np.mean(s) and np.std(s, ddof = 1) (Ignore about number 1 here. If you want to know more then we can talk)

# Now, can you check if the sample mean and sample standard deviation are not much different from mu and sigma?
# First, find the difference from mu and the sample mean. You may want to use abs() to find the magnitude of the difference
# Then, compare it with a small number. For example, abs(.....) < .....
# You will see True or False which is Boolean values in your first tutorial
print(type(s))
print(s.shape)
print("Sample mean = ", np.mean(s))
print("Sample STD = ", np.std(s, ddof=1))
print(abs(mu - np.mean(s))) 
#The difference is because the distribution of sample mean is normally distributed.

<class 'numpy.ndarray'>
(1000,)
Sample mean =  -0.0019248836258467877
Sample STD =  0.10537995852745459
0.0019248836258467877


#Indexing and Slicing
- Indexing: accessing the elements of an array
- Slicing: accessing rows and columns (or subarrays)

In [18]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])

# Access the top left entry in the array
print(a[0,0])

# Negative indices work for NumPy arrays too. Access the bottom right entry in the array:
print(a[-1,-1])

1
12


Slicing: You need to specify a slice for each dimension of the array.

In [19]:
# There are a few ways to access the third row of a
# a[2, :] (you can think of : as "all columns"), or
# a[2:3, :], or
# a[[2], :]
# What is the difference among the three ways?
# The first way results in a lower rank.
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
row_3_1 = a[2, :]     
row_3_2 = a[2:3, :]  
row_3_3 = a[[2], :]  
print(row_3_1, row_3_1.shape)
print()
print(row_3_2, row_3_2.shape)
print()
print(row_3_3, row_3_3.shape)

# Do you see that row_3_1 has rank 1?

[ 9 10 11 12] (4,)

[[ 9 10 11 12]] (1, 4)

[[ 9 10 11 12]] (1, 4)


In [20]:
# Same thing happens when accessing columns of an array.
# Let's access the second column of a
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]

col_2_1 = a[:, 1]
col_2_2 = a[:, 1:2]
col_2_3 = a[:, [1]]
print(col_2_1, col_2_1.shape)
print()
print(col_2_2, col_2_2.shape)
print()
print(col_2_2, col_2_3.shape)

[ 2  6 10] (3,)

[[ 2]
 [ 6]
 [10]] (3, 1)

[[ 2]
 [ 6]
 [10]] (3, 1)


#Exercise: Access a subarray#

In [21]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])
print(a, a.shape)

# Use slicing to acesss the subarray consisting of the first 2 rows
# and columns 1 and 2. Asign it to b. So b is a array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print(b, b.shape)


# Try to extract the subarray consisting of the last two rows
# and  the last two columns. Asign it to c. Display c
# [[7 8]
#  [11 12]]

c=a[-2:, -2:]
print(c, c.shape)




[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]] (3, 4)
[[2 3]
 [6 7]] (2, 2)
[[ 7  8]
 [11 12]] (2, 2)


NOTE: ``b`` is a **view**, not a **copy** of ``a``. For differences between views and copies, see another tutorial (if time permits)

Boolean array indexing: this type of indexing is frequently used to select the elements of an array that satisfy some condition. 

In [22]:
a = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])

print(a)
print()
# Let's find elements of a that are less then 5
bool_index = (a < 5)
# this returns a numpy array of Booleans whose shape is the same as a
# Each element of bool_index tells you whether the corresponding element of a is < 5 or not.

# Print bool_index
print(bool_index)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

[[ True  True  True  True]
 [False False False False]
 [False False False False]]


In [23]:
# We then can create a 1D array consisting of the elements of a that are < 5

print(a[bool_index])
print()

# We combine the two steps into a single line
print(a[a < 5])

[1 2 3 4]

[1 2 3 4]


# Exercise

In [24]:
# An exercise for you:
# Given a = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])
a = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])#
# How can I extract numbers that are in second and third row that are between 7 and 10 inclusive
print(a, a.shape)
b = a[1:3]
print("Elements >=7 & <= 10: ", b[(b >=7) & (b <=10)])






[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]] (3, 4)
Elements >=7 & <= 10:  [ 7  8  9 10]


#Array Operations

Basic mathematical functions ($+$, $-$, $*$, $/$, $**$) operate *elementwise*.

$**$: exponentiation

In [25]:
v = np.array([2, 1, 4])
w = np.array([-1, 0, 1])

print(v + w)

[1 1 5]


In [26]:
print(v - w) 

[3 1 3]


In [27]:
print(v * w)

[-2  0  4]


In [28]:
print(w / v)

[-0.5   0.    0.25]


In [29]:
# The exponent operator $**$ also acts elementwise in the array 
print("v = ", v)
print("v^3 =", v ** 3)

v =  [2 1 4]
v^3 = [ 8  1 64]


#Array Functions
Some useful fucntions

![alt text](https://drive.google.com/uc?id=1788oz7RdckkLcbwlXMmnFEs0HDDyUhuC)

Let's pratice array operations and functions in 2D arrays (matrices).

In [30]:
x = np.array([[1, 2, 3],[4, 5, 6]])
y = np.array([[7, 8, 9],[10, 11, 12]])

# Elementwise sum
print(x + y)

# Another method
print(np.add(x, y).shape)

[[ 8 10 12]
 [14 16 18]]
(2, 3)


In [31]:
# Elementwise substration
print(x - y)

# Another method
print(np.subtract(x, y))

[[-6 -6 -6]
 [-6 -6 -6]]
[[-6 -6 -6]
 [-6 -6 -6]]


In [32]:
# Elementwise product
print(x * y)

# Another method
print(np.multiply(x, y))

[[ 7 16 27]
 [40 55 72]]
[[ 7 16 27]
 [40 55 72]]


In [33]:
# Elementwise division
print(x / y)

# Another method
print(np.divide(x, y))

[[0.14285714 0.25       0.33333333]
 [0.4        0.45454545 0.5       ]]
[[0.14285714 0.25       0.33333333]
 [0.4        0.45454545 0.5       ]]


In [34]:
# Let's generate randome numbers
arr = np.random.rand(5,10)
print(arr)

print("mean = ", np.mean(arr))

[[0.58763931 0.11648903 0.67687416 0.69006148 0.16555603 0.48817681
  0.70725857 0.46213806 0.27752224 0.64456488]
 [0.20716233 0.90345737 0.39723793 0.07144165 0.92801278 0.85875567
  0.11727751 0.03777565 0.70051459 0.38716357]
 [0.99467216 0.76384389 0.62664823 0.3414399  0.82995625 0.7561013
  0.37946427 0.67694981 0.54700593 0.92128858]
 [0.13431238 0.79281245 0.75440205 0.04527978 0.74041395 0.32255795
  0.98180989 0.10740998 0.92605031 0.66332546]
 [0.94582508 0.51265732 0.09099875 0.474637   0.64202012 0.3296855
  0.64508071 0.49404141 0.51808095 0.18533613]]
mean =  0.531383741990599


Stop for a moment. We have a matrix of $5\times 10$. Why do we only get one value of mean? What does the function calculate?

Let's pratice reading documents to have answers to your questions.
See this link: https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html to understand the meaning of the above result.

Proceed to answer the following two questions.

In [35]:
# Given a = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])
a = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])
print(a)
# Calculate the mean
print(a.mean())


[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
6.5


# Exercise

In [36]:
# How to calculate mean values of rows?

print("mean by row = ", a.mean(axis=1))

mean by row =  [ 2.5  6.5 10.5]


In [37]:
# How to calculate mean values of columns?
print("mean by col = ", a.mean(axis=0))

mean by col =  [5. 6. 7. 8.]


# Exercise

To multiply matrices, we can use $@$. Note that $*$ is elementwise multiplication.

In the example below:
- $x$ is a $2\times 3$ matrix 
- $y$ is a $3\times 4$ matrix
- The product $z=xy$ is a $2\times 4$ matrix

In [38]:
a= np.array([[1, 1],[2,2]])
b= np.array([[2, 2],[1,2]])

# How do we multiply two 2x2 matrixes?

print(a @ b)


[[3 4]
 [6 8]]


In [39]:
x = np.array([[1, 2, 3], [4, 5, 6]]) #2x3
y = np.array([[7 , 8, 9, 10], [11, 12, 13, 14], [15, 16, 17, 18]]) #3x4

z = x @ y #2x4

print("z = ", z)

print(z.shape)

# z.ndim returns 2
# 2 means that z has rank 2
# rank 2 array in numpy is like a matrix
print(z.ndim)

z =  [[ 74  80  86  92]
 [173 188 203 218]]
(2, 4)
2


# Exercise: Let's try function  `np.sum()`:

In [40]:
x = np.array([[1, 2, 3], [4, 5, 6]])
print(x)

# calculate sum of all elements of x
print(np.sum(x))  

# How about sum by columns?
print(np.sum(x, axis=0))  

# How about sum by rows?
print(np.sum(x, axis=1))  

[[1 2 3]
 [4 5 6]]
21
[5 7 9]
[ 6 15]


Besides performing mathematical functions on arrays, we also need to reshape or manipulate data in arrays. Perhaps the simplest example of this type of operation is transposing a matrix. If a matrix of size $2\times 3$ is transposed, the resultant has the size of $3\times 2$

In [41]:
x = np.array([[1, 2, 3], [4, 5, 6]])

print(x)

print("Transpose = \n", x.T)

[[1 2 3]
 [4 5 6]]
Transpose = 
 [[1 4]
 [2 5]
 [3 6]]


In [42]:
# Another example. If you have a row vector of 1x3
# After transposing, you have a column vector of 3x1
y = np.array([[1, 2, 3]])

print(y)

print("Transpose = \n", y.T)

[[1 2 3]]
Transpose = 
 [[1]
 [2]
 [3]]


Full list of mathematical functions provided by numpy in this [link](http://docs.scipy.org/doc/numpy/reference/routines.math.html).

