## **[Week 3] NumPy & Plotting data**
Today, we will learn about NumPy Basics and How to plot data.

## **NumPy**

1. NumPy is the fundamental package for scientific computing in Python and is used to perform computations on multi-dimensional data easily and effectively.
2. Numpy provides a new data structure called arrays which allow efficient vector and matrix operations and a number of linear algebra operations

### Import NumPy package



In [None]:
#Import numpy library
import numpy as np

### Scalar, Vector, Matrix, Tensor
```np.array``` is a central data structure of the NumPy library

In [None]:
# define a scalar
a = np.array(1)
print('a = ', a.__repr__())
print('dimension of a: ', a.ndim)
print('shape of a: ', a.shape)

# define a vector
b = np.array([1., 2., 3.])
print('\nb = ', b.__repr__())
print('dimension of b: ', b.ndim)
print('shape of b: ', b.shape)

# define a matrix
c = np.array([[1., 2., 3.], [4., 5., 6.]])
print('\nc = ',c.__repr__())
print('dimension of c: ', c.ndim)
print('shape of c: ', c.shape)

# define a tensor (N-dimensional array)
d = np.array([[[[1., 2., 3.], [1., 2., 3.], [1., 2., 3.]],
               [[4., 5., 6.], [4., 5., 6.], [4., 5., 6.]]],
              [[[7., 8., 9.], [7., 8., 9.], [7., 8., 9.]],
               [[10., 11., 12.], [10., 11., 12.], [10., 11., 12.]]]])
print('\nd = ',d.__repr__())
print('dimension of d: ', d.ndim)
print('shape of d: ', d.shape)

### NumPy array creation functions
NumPy provides many functions to define NumPy arrays

In [None]:
a = np.zeros(10)
a, a.shape

In [None]:
a = np.ones(10)
a, a.shape

In [None]:
a = np.arange(0, 10, 1)
a, a.shape

In [None]:
a = np.linspace(0,2,9)
a, a.shape

In [None]:
a = np.random.random((2,3,4))
a, a.shape

In [None]:
# Convert datatype of elements
a = np.arange(10)
print(a.dtype)   # original datatype

b = a.astype(np.float64) # convert to float
print(b.dtype)

### Indexing & Slicing

In [None]:
# numpy arrays can be indexed.
a = np.ones(5)
print(a)

a[0] = 6
a[4] = 2
print(a)

In [None]:
# -1 indicates last element
a[-1]

In [None]:
# include 0 index and exclude -1 (last) index of element
a[0:-1]

In [None]:
# for 2 dimensional array
a = np.arange(1, 13).reshape(3, 4)
a

In [None]:
# indexing the first row
a[0]

In [None]:
# indexing second element of the first row
a[0, 1]

In [None]:
# slicing with respect to both 1 , 2 - dimensional elements
a[0:3:2, 1:4:2]

### Shape Manipulation

In [None]:
# reshape

a = np.arange(10)
print(a, a.shape)

b = a.reshape(2,5)
print(b, b.shape)

In [None]:
# add an additional dimension
print(a.shape)
a[None].shape  # add extra dimension in first dim

In [None]:
# add in second dim
a[:, None].shape

In [None]:
# add in last dim
a[..., None].shape

In [None]:
# add in second last dim
a[..., None, :].shape

In [None]:
# stack
a = np.ones((3,2))
b = np.zeros((3,2))
print(np.vstack([a, b]))
print(np.hstack([a, b]))

In [None]:
# concatenate
print(np.concatenate([a, b], axis=0))
print(np.concatenate([a, b], axis=1))

In [None]:
# matrix transpose
a = np.arange(10).reshape(2,5)
print(a)
print(a.T)
print(a.transpose())

In [None]:
# tensor transpose
a = np.arange(24).reshape(2,3,4)
print(a, a.shape)

# swap axis 0 and 2
b = np.transpose(a, [2, 1, 0])
print(b, b.shape)

### Numpy Operations (Math)

In [None]:
# Basic mathematical functions in the numpy module are available and operate elementwise on arrays.
# support all basic numerical operations such as +. -. *, /, ** ..
a = np.arange(0, 3, 1)
b = np.arange(1, 4, 1)
print('a = ', a.__repr__())
print('b = ', b.__repr__(), end='\n')

print('a = ', a)
print('a + 5 = ', a + 5)
print('a^2 = ', a ** 2)
print('sin(a) = ', np.sin(a))
print('logical operation of a < 1: ', a < 1)

In [None]:
# operator * is not vector multiplication but elementwise multiplication.
print('array a = ', a)
print('array b = ', b)
a * b

In [None]:
# for vector multiplication, we use the dot function to compute inner products of vectors
# 0*1 + 1*2 + 2*3 = 8
np.dot(a, b), a@b

In [None]:
# Matrix multiplication

a = np.array([[1,2],[3,4]])
b = np.array([[4,3],[2,1]])

np.matmul(a,b), np.dot(a,b), a@b

In [None]:
# using np.ndarray.method()
a = np.arange(6).reshape(2,3)
print('array a = ', a)
print('max value of each column: ', a.max(axis = 0)) # max of each column
print('min value of each row:', a.min(axis = 1)) # min of each row
print('sums of all elements:', a.sum()) # sum of all elements
print('sums of each row:', a.sum(axis = 1)) # sum of each row
print('max value of array (matrix) c:', a.max()) # max of c

In [None]:
# take operation while keeping dimension
np.sum(a, axis=0, keepdims=True)

In [None]:
np.sum(a, axis=1, keepdims=True)

### Broadcasting
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when computing mathematical operations.

In [None]:
# Vector and scalar
a = np.arange(3)
b = 2.

print("a = ", a, "/ b = ", b)
print("a+b = ", a+b)
print("a-b = ", a-b)
print("a*b = ", a*b)
print("a/b = ", a/b)

In [None]:
# Matrix and vector
a = np.arange(1, 7).reshape(2, 3)
b = np.arange(1, 4)

print("a = ", a, "/ b = ", b)
print("a+b = ", a+b)
print("a-b = ", a-b)
print("a*b = ", a*b)
print("a/b = ", a/b)

In [None]:
# Tensor and matrix
a = np.arange(1, 13).reshape(2, 2, 3)
b = np.arange(1, 7).reshape(2, 3)

print("a = ", a, "/ b = ", b)
print("a+b = ", a+b)
print("a-b = ", a-b)
print("a*b = ", a*b)
print("a/b = ", a/b)

### Boolean Array Indexing (Masking)


In [None]:
a = np.arange(1, 10).reshape(3, 3)
print('a = ', a.__repr__())

In [None]:
# Support element-wise logical operation (return as True or False)
even = a % 2 == 0
print(even.__repr__())

In [None]:
# indexing the elements corresponding to its True boolean index and return as a rank 1 array
a[even]

### Copy in numpy

3 cases of copying NumPy array

In [None]:
# Case 1

a = np.zeros((2,2))
b = a #No copy at all # Share both the data and properties(e.g., dimension of array)
print('b: \n', b.__repr__())

b[1,1] = 1
print('b: \n', b.__repr__())
print('a: \n', a.__repr__()) # a is also changed


b.shape = (1,4)
print('shape of a: ', a.shape) #The shape of a is also changed

In [None]:
# Case 2 : Shallow copy

a = np.zeros((2,2))
b = a.view()                    #Shallow copy # Share the data but not properties(e.g., dimension of array)
print('b: \n', b.__repr__())

b[1,1] = 1
print('b: \n', b.__repr__())
print('a: \n', a.__repr__())    # a is also changed!!

b.shape = (1,4)
print('shape of a: ', a.shape)  #The shape of a is not changed

In [None]:
# Case 3 : Deep copy

a=np.zeros((2,2))
c = a.copy()                  #Deep copy # Create an independet variable not sharing both the data and properties
print('c: \n', c.__repr__())

c[1,1] =1
print('c: \n', c.__repr__())
print('a: \n', a.__repr__())   # a is not changed

## Plotting data

Matplotlib is a popular libarary for creating static, animated, interactive visualizations in Python.

### Import matplotlib

In [None]:
import matplotlib.pyplot as plt

### Generate data & Define a function

In [None]:
x_temp = np.arange(0, 10, 0.01)
x = np.linspace(0, 10, 1000)
print(x_temp.shape, x.shape)

y = 3*x**2 - 20*x + 25

### Plot the function

In [None]:
plt.plot(x, y, label='3x^2 - 20x + 25', color='red')
plt.grid()
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.legend()
plt.show()

In [None]:
plt.scatter(x, y, label='3x^2 - 20x + 25', color='green', s=1)
plt.grid()
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.legend()
plt.show()

## Exercises

We are going to generate 100 data points from each of 3 multivariate normal distributions:
\begin{align*}
&X\sim\mathcal{N}(\mu, \Sigma),\\
&\text{where}\quad \mu_1=\begin{bmatrix} 0 & 10 \end{bmatrix}, \quad \mu_2=\begin{bmatrix} 10 & 0 \end{bmatrix},\quad \mu_3=\begin{bmatrix} 20 & 20 \end{bmatrix}, \quad\Sigma=\begin{bmatrix} 3 & 0 \\ 0 & 3 \end{bmatrix}.
\end{align*}

Now plot the data and the centroid of data sampled from each of the distributions.

### **Exercise 1**. generate 1000 data points from each normal distribution (use ```np.random.normal```)

In [None]:
n = 1000
###########################

# Enter your code

###########################

x1.shape, x2.shape, x3.shape

### **Exercise 2**. compute the mean and covariance for x1, x2, x3

In [None]:
# Enter your code

### **Exercise 3**. plot the sampled data and the centroids

In [None]:
###########################

# Enter your code

###########################

plt.grid()
plt.legend(loc='upper left')
plt.show()

## References

https://numpy.org/

https://cs231n.github.io/python-numpy-tutorial/#numpy

http://aikorea.org/cs231n/python-numpy-tutorial/

https://nbviewer.jupyter.org/gist/FinanceData/274d1a051b8ef10379b35b3fa72dd931

https://matplotlib.org/