### NumPy Array and Vectorization

As mentioned previously, NumPy is a library that has ndarray as its basic data structure used to handle arrays and matrices. A NumPy array has a grid of values all of which are of the same data type, mostly integers and floats. These arrays can also be created from Python lists. Below are some examples:

#### Import necessary packages and libraries 

In [1]:
import numpy as np  # Convention for importing numpy
import pandas as pd

In [2]:
arr = [6, 7, 8, 9]
print(type(arr))  # prints <class 'list'>

<class 'list'>


In [3]:
# change the above list to a numpy dimensional array
arr = np.array(arr)
print(type(arr))

<class 'numpy.ndarray'>


In [4]:
# prints shape of array <rows and columns>
print(arr.shape)

(4,)


In [5]:
# prints the data type of the array
print(arr.dtype)

int32


In [6]:
# get the dimension of a with ndim
print(arr.ndim)

1


In [7]:
# create a 2d array
b = np.array([[2, 4, 6, 8, 10], [1, 3, 5, 7, 9]])
print(b)

[[ 2  4  6  8 10]
 [ 1  3  5  7  9]]


In [9]:
# this prints the dimension of the array
print(b.ndim)  # b is an 2d array

2


In [11]:
# print the shape of the array
print(b.shape)   # b has a (2, 5) shape, 2 rows and 5 colums

(2, 5)


There are also some inbuilt functions that can be used to initialize numpy which include `empty()`, `zeros()`, `ones()`, `full()`, `random.random()`.

In [12]:
# a 2x3 array with random values
a = np.random.random((2, 3))
print(a)

[[0.73478273 0.69997497 0.97870667]
 [0.09605622 0.77720442 0.28277449]]


In [13]:
# a 2x3 array of zeros

np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [14]:
# a 2x3 array of ones

np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [15]:
# a 3x3 identity matrix

idn_arr = np.identity((3))
print(idn_arr)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [16]:
print(type(idn_arr))

<class 'numpy.ndarray'>


In [17]:
print(idn_arr.dtype)

float64


### Intra-operability of arrays and scalars.

`Vectorisation` in numpy arrays allows for faster processing by eliminating `for loops` when dealing with arrays of equal shape. This allows for batch arithmetic operations on the arrays by applying the operator elementwise.  Similarly, scalars are also propagated element-wise across an array. For arrays with different sizes, it is impossible to perform `element-wise` operations instead, numpy handles this by broadcasting provided the dimensions of the arrays are the same or, one of the dimensions of the array is `1`.

In [18]:
a = np.array([[2, 3, 5, 6, 7, 9], [1, 4, 5, 7, 10, 12], [23, 7.5, 9, 2.1, 6, 3]])
b = np.random.random((3, 3))
c = np.array([[34, 90, 8, 23, 12, 55], [45, 90, 67, 78, 77, 64], [21, 90, 32, 54, 87, 98]])
d = np.array([[120, 8, 34, 32, 23, 45], [65, 89, 90, 99, 76, 89], [23, 90, 85, 47, 89, 28]])

In [19]:
print(a)

[[ 2.   3.   5.   6.   7.   9. ]
 [ 1.   4.   5.   7.  10.  12. ]
 [23.   7.5  9.   2.1  6.   3. ]]


In [20]:
print(c)

[[34 90  8 23 12 55]
 [45 90 67 78 77 64]
 [21 90 32 54 87 98]]


In [21]:
print(d)

[[120   8  34  32  23  45]
 [ 65  89  90  99  76  89]
 [ 23  90  85  47  89  28]]


In [22]:
print(b)

[[0.0486693  0.87253714 0.46565415]
 [0.04170736 0.00966989 0.77399904]
 [0.0081405  0.99430778 0.16685757]]


In [23]:
#The elements in the example arrays above can be accessed by indexing like lists in Python such that:
a[0] ,  a[2] , b[0, 0] , b[1, 2] , c[0, 1]  

(array([2., 3., 5., 6., 7., 9.]),
 array([23. ,  7.5,  9. ,  2.1,  6. ,  3. ]),
 0.04866929680004939,
 0.7739990399075707,
 90)

In [24]:
#Elements in arrays  can also be retrieved by slicing rows and columns or a combination of indexing and slicing.

d[1,  0:2] 

array([65, 89])

In [25]:
e = np.array([[10, 11, 12],[13, 14, 15], 
              [16, 17, 18],[19, 20, 21]])
print(e)

[[10 11 12]
 [13 14 15]
 [16 17 18]
 [19 20 21]]


In [26]:
# slicing
e[:3, :2]

array([[10, 11],
       [13, 14],
       [16, 17]])

In [27]:
e[1:3, 2:]

array([[15],
       [18]])

In [28]:
e[2:4, 1:3:]

array([[17, 18],
       [20, 21]])

In [29]:

#There are other advanced methods of indexing which are shown below.
# integer indexing

e[[2, 0, 3, 1],[2, 1, 0, 2]]

array([18, 11, 19, 15])

In [30]:
e[[3, 2, 1, 0], [1, 0, 1, 2]]

array([20, 16, 14, 12])

In [31]:
# boolean indexing meeting a specified condition

e[e>15] 

array([16, 17, 18, 19, 20, 21])

Numpy also has inbuilt mathematical functions like `sum()`, `mean()`, `std()`, `corrcoef()`, `min()` and others. It interestingly allows for comparing arrays using `==` to check if two arrays have the same elements,  elements in the first array are greater than or less than those of the second array using  `> and  <`.

### File input and output with arrays

`Numpy` arrays can be loaded from and saved to binary files with `.npy` as the extension using `load()` and `save()` respectively. This can also be done with text files with text files using `loadtxt()` and `savetxt()`.

In [32]:
c + d 

array([[154,  98,  42,  55,  35, 100],
       [110, 179, 157, 177, 153, 153],
       [ 44, 180, 117, 101, 176, 126]])

In [33]:
5 / d

array([[0.04166667, 0.625     , 0.14705882, 0.15625   , 0.2173913 ,
        0.11111111],
       [0.07692308, 0.05617978, 0.05555556, 0.05050505, 0.06578947,
        0.05617978],
       [0.2173913 , 0.05555556, 0.05882353, 0.10638298, 0.05617978,
        0.17857143]])

In [34]:
c ** 2

array([[1156, 8100,   64,  529,  144, 3025],
       [2025, 8100, 4489, 6084, 5929, 4096],
       [ 441, 8100, 1024, 2916, 7569, 9604]], dtype=int32)

In [35]:
np.full((2,3), fill_value=23)

array([[23, 23, 23],
       [23, 23, 23]])

In [37]:
# check the sum of the array
c.sum()

1025

In [38]:
# Mean of the array
c.mean()

56.94444444444444

In [39]:
# standard deviation of the array
c.std()

28.8934291304023

In [41]:
# the correlation coefficient of the array
np.corrcoef(c)

array([[1.        , 0.3469303 , 0.51602686],
       [0.3469303 , 1.        , 0.62442061],
       [0.51602686, 0.62442061, 1.        ]])

In [42]:
# the minimum value of the array
c.min()

8