**Tools - NumPy**

*NumPy is the fundamental library for scientific computing with Python. NumPy is centered around a powerful N-dimensional array object, and it also contains useful linear algebra, Fourier transform, and random number functions*

In [1]:
import numpy as np

# Arrays

## Creating Arrays

**`np.zeros`**: creates an array containing zeros  
`np.ones`: creates an array containing ones  
`np.full`: creates array with given value  
`np.array`: initialises using array  
**`np.arange`**: range of values (preferable for integers)  
**`np.linspace`**: equally spaced values (for floats)  
**`np.random.rand`** : uniform dist random values  
`np.random.randn` : normally dist random values  

In [2]:
array_1d = np.zeros(3)
array_nd = np.zeros((2,3,4))
print("1D Array: \n", array_1d)
print("N-Dimensional Array \n", array_nd)

1D Array: 
 [0. 0. 0.]
N-Dimensional Array 
 [[[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]]


In [3]:
print("Ones:\n", np.ones(3))
print("Full:\n", np.full((2, 3), np.pi))
print("Array:\n", np.array([[1,2,3],[1,2,3]]))  
print("ARange:\n", 
      np.arange(1, 5, 0.5))
print("Linspace:\n", 
     np.linspace(1, 5, 10))
print("Uniform Random Values:\n",
     np.random.rand(2,2))
print("Normally Dist Random Values:\n",
     np.random.randn(2,2))

Ones:
 [1. 1. 1.]
Full:
 [[3.14159265 3.14159265 3.14159265]
 [3.14159265 3.14159265 3.14159265]]
Array:
 [[1 2 3]
 [1 2 3]]
ARange:
 [1.  1.5 2.  2.5 3.  3.5 4.  4.5]
Linspace:
 [1.         1.44444444 1.88888889 2.33333333 2.77777778 3.22222222
 3.66666667 4.11111111 4.55555556 5.        ]
Uniform Random Values:
 [[0.35766008 0.57758419]
 [0.71746577 0.61351738]]
Normally Dist Random Values:
 [[ 0.24462382  0.74956829]
 [ 0.38604949 -2.25960428]]


## Some vocabulary
- In NumPy, each dimension is called an **axis**.
- The number of axes is called the **rank**.
- An array's list of axis lengths is called the **shape** of the array.
- The **size** of an array is the total number of elements, which is the product of all axis lengths (eg. 3*4=12)
- NumPy arrays have type **ndarray**

In [4]:
a = np.zeros((3, 4))
print("Shape:", a.shape)
print("Rank:", a.ndim) # or len(a)
print("Size:", a.size)
print("Type:", type(a))

Shape: (3, 4)
Rank: 2
Size: 12
Type: <class 'numpy.ndarray'>


## Reshaping an Array

### **`reshape`**
The `reshape` function returns a new `ndarray` object pointing at the *same* data. This means that modifying one array will also modify the other.

In [5]:
g = np.arange(24)
g2 = g.reshape(4,6)
print(g2)
print("Rank:", g2.ndim)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
Rank: 2


### **`ravel`**
Finally, the `ravel` function returns a new **one-dimensional** `ndarray` that also points to the same data:

In [6]:
g2.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

## Operators

### Arithmetic operations

All the usual arithmetic operators (`+`, `-`, `*`, `/`, `//`, `**`, etc.) can be used with `ndarrays`. They apply **elementwise**:

In [7]:
a = np.array([14, 23, 32, 41])
b = np.array([1,  2,  3,  4])
print("a + b  =", a + b)
print("a - b  =", a - b)
print("a * b  =", a * b)
print("a / b  =", a / b)
print("a // b  =", a // b)
print("a % b  =", a % b)
print("a ** b =", a ** b)

a + b  = [15 25 35 45]
a - b  = [13 21 29 37]
a * b  = [ 14  46  96 164]
a / b  = [14.         11.5        10.66666667 10.25      ]
a // b  = [14 11 10 10]
a % b  = [0 1 2 1]
a ** b = [     14     529   32768 2825761]


### Broadcasting
In general, when NumPy expects arrays of the same shape but finds that this is not the case, it applies the so-called *broadcasting* rules:

### Upcasting
When trying to combine arrays with different `dtype`, NumPy will upcast to a type capable of handling all possible values (regardless of what the actual values are).

### Conditional Operators
Conditional operators also apply elementwise:

In [8]:
m = np.array([20, -5, 30, 40])
m < [15, 16, 35, 36]

array([False,  True,  True, False])

In [9]:
m[m < 25] #boolean Indexing

array([20, -5])

# Mathematical and Statistical Functions

## `ndarray` methods
- `mean`: computes the mean of all elements, regardless of shape
- `sum`: computes the sum according to the axis provided

In [10]:
a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])
print("Mean:", a.mean())

Mean: 6.766666666666667


In [11]:
c=np.arange(24).reshape(2,3,4)
c.sum(axis=0)  # sum across matrices
c.sum(axis=1)  # sum across rows
c.sum(axis=(0,2))  # sum across matrices and columns

array([ 60,  92, 124])

## Universal functions
NumPy also provides fast elementwise functions called universal functions, or **ufunc**. They are vectorized wrappers of simple functions. 
- `square` squares each element in the array
- `abs` absolute value
- `sqrt` square root
- `exp` exponential
- `log` base log
- `ceil` ceiling
- `isnan` checks if NaN, returns boolean

In [12]:
a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])
np.square(a)

array([[  6.25,   9.61,  49.  ],
       [100.  , 121.  , 144.  ]])

### Binary ufuncs

There are also many binary ufuncs, that apply elementwise on two `ndarray`s.
- `add` adds the two arrays
- `greater` returns a boolean array
- `maximum` finds maximum for each element

In [13]:
a = np.array([1, -2, 3, 4])
b = np.array([2, 8, -1, 7])
np.add(a, b)  # equivalent to a + b
np.greater(a, b)  # equivalent to a > b
np.maximum(a, b)

array([2, 8, 3, 7])

# Array Indexing

## One-dimensional Arrays

In [14]:
a = np.array([1, 5, 3, 19, 13, 7, 3])
a[3]
a[2:5]
a[2:-1] #till the second last element
a[:2]
a[1::2] #start from index 1, with step size 2 (i.e get index 3 next)
a[::-1] #reverse order of array

array([ 3,  7, 13, 19,  3,  5,  1])

**Note:** ndarray slices are actually views on the *same data buffer*. This means that if you create a slice and modify it, it will modify the original `ndarray` as well.

Need to use `copy` method to get a copy of the data.

In [15]:
a_slice = a[2:6]
another_slice = a[2:6].copy()
a_slice[0] = 1000
another_slice[1] = 5000
print(a) #the original array is modified by a_slice

[   1    5 1000   19   13    7    3]


## Multi-dimensional Arrays
Multi-dimensional arrays can be accessed in a similar way by providing an index or slice for each axis, separated by commas

In [16]:
b = np.arange(48).reshape(4, 12)
b[1, 2]  # row 1, col 2
b[1, :]  # row 1, all columns
b[:, 1]  # all rows, column 1

array([ 1, 13, 25, 37])

## Fancy indexing
You may also specify a list of indices that you are interested in. This is referred to as *fancy indexing*.

In [17]:
b[(0,2), 2:5]  # rows 0 and 2, columns 2 to 4

array([[ 2,  3,  4],
       [26, 27, 28]])

In [18]:
b[:, (-1, 2, -1)]  # all rows, columns -1 (last), 2 and -1 (again)

array([[11,  2, 11],
       [23, 14, 23],
       [35, 26, 35],
       [47, 38, 47]])

## Ellipsis(`...`)
Can also write an ellipsis (`...`) to ask that all non-specified axes be entirely included.

In [19]:
c = b.reshape(4,2,6)
c[2, ...] # Return matrix 2, all rows, all columns
c[2, ..., 3] # Return matrix 2, all rows, column 3

array([27, 33])

## Boolean Indexing
Can provide an `ndarray` of boolean values on one axis to specify the indices to be accessed

In [20]:
b = np.arange(48).reshape(4, 12)
rows_on = np.array([True, False, True, False])
b[rows_on, :]  # Rows 0 and 2, all columns. Equivalent to b[(0, 2), :]

cols_on = np.array([False, True, False] * 4)
b[:, cols_on]  # All rows, columns 1, 4, 7 and 10

array([[ 1,  4,  7, 10],
       [13, 16, 19, 22],
       [25, 28, 31, 34],
       [37, 40, 43, 46]])

## **`np.ix_`**
To use boolean indexing on multiple axes, use the `ix_` function

In [21]:
b[np.ix_(rows_on, cols_on)]

array([[ 1,  4,  7, 10],
       [25, 28, 31, 34]])

# Stacking Arrays
It is often useful to stack together different arrays. NumPy offers several functions to do just that.

In [22]:
q1 = np.full((2,4), 1.0)
q2 = np.full((3,4), 2.0)
q3 = np.full((2,4), 3.0)

## **`vstack`**
To stack arrays **vertically**, needs the arrays to have same number of columns

In [23]:
q4 = np.vstack((q1, q2, q3))
q4

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [2., 2., 2., 2.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [3., 3., 3., 3.]])

In [24]:
q4.shape

(7, 4)

## **`hstack`**
To stack arrays **horizontally**, needs the arrays to have same number of rows

In [25]:
q5 = np.hstack((q1, q3))
q5

array([[1., 1., 1., 1., 3., 3., 3., 3.],
       [1., 1., 1., 1., 3., 3., 3., 3.]])

In [26]:
q5.shape

(2, 8)

## **`concatenate`**
The `concatenate` function stacks arrays along any given existing axis.

In [27]:
q7 = np.concatenate((q1, q2, q3), axis = 0) # Equivalent to vstack
q7

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [2., 2., 2., 2.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [3., 3., 3., 3.]])

In [28]:
q8 = np.concatenate((q1, q3), axis = 1) # Equivalent to hstack
q8

array([[1., 1., 1., 1., 3., 3., 3., 3.],
       [1., 1., 1., 1., 3., 3., 3., 3.]])

## **`stack`**
The `stack` function stacks arrays along a new axis. All arrays have to have the same shape

In [29]:
q9 = np.stack((q1, q3))
q9

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[3., 3., 3., 3.],
        [3., 3., 3., 3.]]])

In [30]:
q9.shape

(2, 2, 4)

# Splitting Arrays
Splitting is the opposite of stacking.
- `vsplit`: to split a matrix vertically
- `hsplit`: to split a matrix horizontally
- `split`: to split a matrix according to given axis (0 for vertical, 1 for horizontal)

In [31]:
r = np.arange(24).reshape(6,4)
r

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [32]:
r1, r2, r3 = np.vsplit(r, 3) #r1, r2, r3 will all have the same shape
r1 

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [33]:
r4, r5 = np.hsplit(r, 2) #r4, r5 will all have the same shape
r4

array([[ 0,  1],
       [ 4,  5],
       [ 8,  9],
       [12, 13],
       [16, 17],
       [20, 21]])

In [34]:
r6, r7, r8 = np.split(r, 3, axis=0) #equivalent to vsplit
r9, r10 = np.split(r, 2, axis=1) #equivalent to hsplit

# Linear Algrebra
NumPy 2D arrays can be used to represent matrices efficiently in python. We just go through some of the main matrix operations available.

## Matrix Transpose
The `T` attribute is equivalent to calling `transpose()` when rank is $\geq$ 2

In [35]:
m1 = np.arange(10).reshape(2, 5)
m1

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [36]:
m1.T

array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])

## Matrix Multiplication
Multiply two matrices using the `dot()` method  
**Caution:** `n1 * n2` is *not* a matric multiplication, it is an *elementwise product*.

In [37]:
n1 = np.arange(10).reshape(2, 5)
n2 = np.arange(10).reshape(5, 2)
n1.dot(n2)

array([[ 60,  70],
       [160, 195]])

## Matrix Inverse
Many of the linear algebra functions are available in the `numpy.linalg` module.  
Use the `inv` function to compute a square matrix's inverse.

In [38]:
import numpy.linalg as linalg

In [39]:
m3 = np.array([[1,2,3],[5,7,11],[21,29,31]])

In [40]:
linalg.inv(m3)

array([[-2.31818182,  0.56818182,  0.02272727],
       [ 1.72727273, -0.72727273,  0.09090909],
       [-0.04545455,  0.29545455, -0.06818182]])

## Identitiy Matrix
Can create an identity matrix of size NxN by calling `eye`

In [41]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

## Determinant
The `det` function computes the matrix determinant

In [42]:
linalg.det(m3)

43.99999999999999

## Eigenvalues and Eigenvectors
The `eig` function computes the eigenvalues and eigenvectors of a square matrix

In [43]:
eigenvalues, eigenvectors = linalg.eig(m3)
eigenvalues #lambda

array([42.26600592, -0.35798416, -2.90802176])

In [44]:
eigenvectors

array([[-0.08381182, -0.76283526, -0.18913107],
       [-0.3075286 ,  0.64133975, -0.6853186 ],
       [-0.94784057, -0.08225377,  0.70325518]])

## Diagonal and Trace

In [45]:
np.diag(m3) # the values in the diagonal of m3 (top left to bottom right)

array([ 1,  7, 31])

In [46]:
np.trace(m3) # equivalent to np.diag(m3).sum()

39

## Singular Value Decomposition
The `svd` function takes a matrix and returns its singular value decomposition.

In [47]:
m4 = np.array([[1,0,0,0,2], [0,0,3,0,0], [0,0,0,0,0], [0,2,0,0,0]])

U, S_diag, V = linalg.svd(m4)
U

array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0., -1.],
       [ 0.,  0.,  1.,  0.]])

In [48]:
S_diag

array([3.        , 2.23606798, 2.        , 0.        ])

The `svd` function just returns the values in the diagonal of Σ, but we want the full Σ matrix, so let's create it.

In [49]:
S = np.zeros((4, 5))
S[np.diag_indices(4)] = S_diag
S  # Σ

array([[3.        , 0.        , 0.        , 0.        , 0.        ],
       [0.        , 2.23606798, 0.        , 0.        , 0.        ],
       [0.        , 0.        , 2.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ]])

In [50]:
V

array([[-0.        ,  0.        ,  1.        ,  0.        ,  0.        ],
       [ 0.4472136 ,  0.        ,  0.        ,  0.        ,  0.89442719],
       [-0.        ,  1.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  1.        ,  0.        ],
       [-0.89442719,  0.        ,  0.        ,  0.        ,  0.4472136 ]])

In [51]:
U.dot(S).dot(V) # U.Σ.V == m4

array([[1., 0., 0., 0., 2.],
       [0., 0., 3., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 2., 0., 0., 0.]])

## Solving a System of Linear Scalar Equations
The `solve` function solves a system of linear scalar equations, such as:
- $2x+6y=6$
- $5x+3y=-9$

In [52]:
coeffs  = np.array([[2, 6], [5, 3]])
depvars = np.array([6, -9])
solution = linalg.solve(coeffs, depvars)
solution

array([-3.,  2.])

# Vectorization
Instead of executing operations on individual array items one at a time, your code is much more efficient if you try to stick to array opterations. This is called *vectorization*. This way, you can benefit from from NumPy's many optimizations.

Example: Want to generate a 768x1024 array based on the formula $sin(xy/40.5)$  
We will use NumPy's `meshgrid` function which generates coordinate matrices from coordinate vectors

In [53]:
x_coords = np.arange(0, 1024)  # [0, 1, 2, ..., 1023]
y_coords = np.arange(0, 768)   # [0, 1, 2, ..., 767]
X, Y = np.meshgrid(x_coords, y_coords)
X

array([[   0,    1,    2, ..., 1021, 1022, 1023],
       [   0,    1,    2, ..., 1021, 1022, 1023],
       [   0,    1,    2, ..., 1021, 1022, 1023],
       ...,
       [   0,    1,    2, ..., 1021, 1022, 1023],
       [   0,    1,    2, ..., 1021, 1022, 1023],
       [   0,    1,    2, ..., 1021, 1022, 1023]])

In [54]:
Y

array([[  0,   0,   0, ...,   0,   0,   0],
       [  1,   1,   1, ...,   1,   1,   1],
       [  2,   2,   2, ...,   2,   2,   2],
       ...,
       [765, 765, 765, ..., 765, 765, 765],
       [766, 766, 766, ..., 766, 766, 766],
       [767, 767, 767, ..., 767, 767, 767]])

In [55]:
data = np.sin(X*Y/40.5)
data

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.02468885, 0.04936265, ..., 0.07705885, 0.1016508 ,
        0.12618078],
       [0.        , 0.04936265, 0.09860494, ..., 0.15365943, 0.20224852,
        0.25034449],
       ...,
       [0.        , 0.03932283, 0.07858482, ..., 0.6301488 , 0.59912825,
        0.56718092],
       [0.        , 0.06398059, 0.12769901, ..., 0.56844086, 0.51463783,
        0.45872596],
       [0.        , 0.08859936, 0.17650185, ..., 0.50335246, 0.42481591,
        0.34293805]])