**Author:** Arun Prakash A <br>
**Affilation**: BS, IIT Madras


# Why NumPy
* The fundamental Mathematical objects that we deal with Machine learning/ Deep Learning are

 * Scalars (constants like learning rate $\eta$)
 * Vectors ( A training sample of size $ 1 \times n$)
 * Matrix ( Parameters, Batch of samples,..)

* For example,

  * $\mathbf{X} \in \mathbb{R}^{m \times n}$, a dataset with $m$ samples and $n$ features
  * $\mathbf{y}$, the corresponding label
  * $\mathbf{\hat{y}=wx+b}$, mathematical operation

* Typically, we intend to carry out some operations such as

   * load\Create (say, data\parameter matrix)
   * Access elements (element could be a scalar, vector..)
   * Manipulate elements (apply functions, delete,..)
   * Mathematical operations (add, invert, transform, ..)

* **NumPy** does all these (and more) efficiently!. The name itself implies that its a library dedicated for comprehensive numrical analysis using Linear algebra, random process, signal processing..

* Many domain specific packages such as scipy, scikit-learn, OpenCV are built on top of NumPy.

* NumPy is fast because it is written in C. 

* It is also well documented

* Please solve the 100 exercises listed at : https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.md

In [None]:
import numpy as np

<h3> Creation of n-dim arrays </h3>

   * Let's create a simple array like $x=[1,2,3]$ and look at some of the attributes of the array

In [None]:
x = np.array(1)
print(x,'\t',x.ndim,'\t',x.size,'\t',x.shape)

1 	 0 	 1 	 ()


 * dimension (ndim): 0 means a scalar, 1 means a sequence, 2 means a matrix, more than 2 means a tensor (or n-dim array)

 * size : Number of elements in the array
 * shape: Number of elements in each dimension

In [None]:
# a sequence
x = np.array([1,2,3])
print(x,'\t',x.ndim,'\t',x.size,'\t',x.shape)

[1 2 3] 	 1 	 3 	 (3,)


* Now, let us add one more outer square bracket to the list (argument passed to the np.array)

In [None]:
# a (row) vector
x = np.array([[1,2,3]])
print(x,'\t',x.ndim,'\t',x.size,'\t',x.shape)

[[1 2 3]] 	 2 	 3 	 (1, 3)


 * It is a row vector. How do we **create** a column vector? 

In [None]:
x = np.array([[1],[2],[3]])
print(x,'\t',x.ndim,'\t',x.size,'\t',x.shape)

[[1]
 [2]
 [3]] 	 2 	 3 	 (3, 1)


* Now, we can simply extend this to a matrix of size $3 \times 3$.
* Observe that we already have 3 elements in **axis=0**. Those are [1],[2],[3]. 

* Each element is of size 1. So extending this by adding more more elements creates a matrix.

In [None]:
x = np.array([[1,0,1],[2,0,2],[3,0,3]])
print(x,'\t',x.ndim,'\t',x.size,'\t',x.shape)

[[1 0 1]
 [2 0 2]
 [3 0 3]] 	 2 	 9 	 (3, 3)


 * Therefore, elements in **axis=0** can be thought of as rows and **axis=1** can be thought of as columns for convenience.

 * Like this we can create $n-dim$ array of shape $(m,n,k,o..)$. I am skipping this and leave it to you as an exercise.

 * **Important:** The elements in the array should be of same type. 

<h3> Common arrays <h3>

  * Often we need to create arrays of all zeros, ones, random values and so on.
  * Numpy has functionalities to create such arrays. The most commonly used are
    
     * np.arange()
     * np.linspace()
     * np.meshgrid()
     * np.zeros(), 
     * np.ones(), 
     * np.random.randn(), 
     * np.empty(), 
     * np.full(),
     * np.diag()

You can refer to the exhaustive list here: https://numpy.org/doc/stable/reference/routines.array-creation.html

In [None]:
print(np.zeros((3,3)))

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [None]:
print(np.zeros_like(x))

[[0 0 0]
 [0 0 0]
 [0 0 0]]


In [None]:
np.full((3,3),fill_value=5)

array([[5, 5, 5],
       [5, 5, 5],
       [5, 5, 5]])

* Generating random arrays is a common thing in ML. <br>
* Example: Initializing parameters
* So want the experiment to be reproducible
* There are two ways: np.random.seed() np.random.RandomState()

In [None]:
np.random.seed(42)
print(np.random.randint(0,10,size=(3,3)))

[[6 3 7]
 [4 6 9]
 [2 6 7]]
[[4 3 7]
 [7 2 5]
 [4 1 7]]


In [None]:
rand_state = 42
rand_gen = np.random.RandomState(rand_state)
print(rand_gen.randint(0,10,size=(3,3)))

[[6 3 7]
 [4 6 9]
 [2 6 7]]


<h3> Accessing elements <h3>

  * Now, we look at various way of accessing elements in an array.
  * We can get an individual element or a **slice**
  * If we have **n-dim** array , then may we need to use **n-indices** to get a single element.
   * For example, x[i,j] denotes the **ith** elements in axis:0 and **jth** element in axis:1
  * The index starts from zero.

  * **Important:** When a new array is returned, it is often a **view** of an original array. Some operations make a **copy **from the origianl array. 

In [None]:
# Access the first element of axis:0
print(x[0])
print(x[0,:])

[1 0 1]
[1 0 1]


* Each element is axis:0 is a row
* However, as you can see, we lost one dimension. Of course, we can reshape it. 
* However is it possible to preserve it while accessing?

In [None]:
print(x[0].shape)

(3,)


In [None]:
print(x[[0]]) 
#or
print(np.take(x,indices=[0],axis=0))
# still more ways are there

[[1 0 1]]
[[1 0 1]]


In [None]:
# Access the first element of both axis:0 and 1
print(x[0,0])

1


In [None]:
# other way of doing the same
axis_zero = x[0]
print(axis_zero[0])

1


In [None]:
# slicing
print(x[0:2,0:2])

[[1 0]
 [2 0]]


In [None]:
#boolean indexing
idx = x > 2
print(idx)

[[False False False]
 [False False False]
 [ True False  True]]


* Now pass this boolean array to $x$

In [None]:
print(x[idx])

[3 3]


In [None]:
y = np.random.randint(0,4,size=(10,1))
print(y)

[[3]
 [1]
 [1]
 [1]
 [3]
 [3]
 [0]
 [0]
 [3]
 [1]]


In [None]:
print(y[(y==0) | (y==1)])

[1 1 1 0 0 1]


<h3> Array Manipulation </h3>

  * Manipulating arrays means changing their shape, swaping the axis, concatenating, splitting, and many  such operations
  * For comprehensive list: https://numpy.org/doc/stable/reference/routines.array-manipulation.html
  

In [None]:
x = np.array([[1,0,1],[2,0,2],[3,0,3]])
x_other = np.ones((3,3))
print(x)
print(x_other)


[[1 0 1]
 [2 0 2]
 [3 0 3]]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [None]:
print(x.reshape(1,9)) # row major 
print(x.flatten()) # note the flattend array returns a sequence
print(np.vstack((x,x_other)))

[[1 0 1 2 0 2 3 0 3]]
[1 0 1 2 0 2 3 0 3]
[[1. 0. 1.]
 [2. 0. 2.]
 [3. 0. 3.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [None]:
x_cat = np.concatenate((x,x_other),axis=0)
print(x_cat)

[[1. 0. 1.]
 [2. 0. 2.]
 [3. 0. 3.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [None]:
print(np.split(x_cat,2,axis=0))

[array([[1., 0., 1.],
       [2., 0., 2.],
       [3., 0., 3.]]), array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])]


<h3> Operations on Array </h3>

  * All agebriac operations: add, subtract.
  * Boolean operations: AND,OR,XOR,..
  * Mathematical functions: sin,cos,exp,..
  * Reduction functions: max,min,mean

  * Once again, some of these operations can be carried along a specific axis if required 
  

In [None]:
np.random.seed(42)
W = np.random.randint(0,3,size=(2,3))
x = np.ones((3,1))
print(W)

[[2 0 2]
 [2 0 0]]


* Let's try adding $w$ and $x$
* observe that the shape of $w$ and $x$ are different along one dimention (dim:1).
* NumPy does **Broadcasting** (replicating elements along a singleton dimention) automatically.

In [None]:
print(W.T+x) # transposing W to match along dim:0 

[[3. 3.]
 [1. 1.]
 [3. 1.]]


* Matrix operation

In [None]:
y = np.matmul(W,x)
print(y)

[[4.]
 [2.]]


In [None]:
y = W@x 
print(y)

[[4.]
 [2.]]


In [None]:
# y = np.matmul(x,W) # throws an error 

In [None]:
# matrix inverse
print(np.linalg.pinv(W)) # pseudo inverse

[[ 4.35380958e-17  5.00000000e-01]
 [ 0.00000000e+00  0.00000000e+00]
 [ 5.00000000e-01 -5.00000000e-01]]


In [None]:
print(np.sin(x))

[[0.84147098]
 [0.84147098]
 [0.84147098]]


In [None]:
print(np.min(W,axis=0)) 

[2 0 0]


In [None]:
print(np.argmax(W,axis=1))

[0 0]


* This is very minimal introduction to NumPy.
* NumPy also supports constants: `np.inf`,`np.nan`,`np.pi`,`np.e`,`np.newaxis`
* I highly encourage you to take a look at the documentation to get a comprehensive understanding about NumPy and its potential.
* **Suggested topics:** Loading and storing, masked arrays (simply means you mask some of the elements, so that they will be excluded from any operations carried on the array), searching, sorting