# NumPy Introduction

* Importing numpy pacakage with a common alias as **_np_**
* **_Seed_**: for reproducibility of the same pseduo random numbers

In [2]:
import numpy as np
np.random.seed(123)

## NumPy Arrays

* Numpy provides N-dimentional array type (ndarray)
* N-dimensional container of items of same type
* ndarrays are homogenous, every item takes up same size block of memory
* each item in the array is interpreted specified by a data-type object (int, float etc...)
* An item extracted from an array by indexing is represented by a array-scalar python object
* In numpy dimensions are know as axes, number of dimensions/axes is rank

![image.png](attachment:image.png)

**_(image source: docs.scipy.org)_**

## Why NumPy ?

When we have python lists with all those functionalites why numpy is required ? <br><br>
Using numpy gives us performance boost, memory efficient. Provides functionalities like airthmetic operations, indexing, slicing, iterating, shape manipulation, copies, stacking, broadcasting........

In [269]:
numpy_array_10pow6 = np.arange(10**6)
numpy_array_10pow7 = np.arange(10**7)
python_list_10pow6 = list(range(10**6))
python_list_10pow7 = list(range(10**7))

* ### Memory size

    **_sys.getsizeof_**: returns the size of the object in bytes

In [270]:
import sys

print("Comparison array and list of 10 power 6 elements")
print("np array :",sys.getsizeof(numpy_array_10pow6),"bytes")
print("py list :",sys.getsizeof(python_list_10pow6),"bytes\n")

print("Comparison array and list of 10 power 7 elements")
print("np array :",sys.getsizeof(numpy_array_10pow7),"bytes")
print("py list :",sys.getsizeof(python_list_10pow7),"bytes")

Comparison array and list of 10 power 6 elements
np array : 8000096 bytes
py list : 9000112 bytes

Comparison array and list of 10 power 7 elements
np array : 80000096 bytes
py list : 90000112 bytes


* ### Performance

   * **_timeit magic command_**: returns time taken for execution of the statement in that line 
   * Also during dot product and matrix multiplication numpy commands are vectorized which make it lot quicker

In [271]:
print("Exececution time sum of 10 power 6 elements \n")
print("np array:", end="\t")
%timeit numpy_array_10pow6.sum()
print("py list:", end="\t") 
%timeit sum(python_list_10pow6)
print("\n")

print("Exececution time sum of 10 power 7 elements \n")
print("np array:", end="\t") 
%timeit numpy_array_10pow7.sum()
print("py list:", end="\t") 
%timeit sum(python_list_10pow7)

Exececution time sum of 10 power 6 elements 

np array:	587 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
py list:	5.7 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Exececution time sum of 10 power 7 elements 

np array:	7.83 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
py list:	60.7 ms ± 1.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


# Different type of Arrays

**Attributes of ndarray**
* **_ndarray.ndim_** : no. of axes/dimensions/rank
* **_ndarray.shape_** : tuple indicating the size in each axes/dimensions
* **_ndarray.size_** : total no. of elements

## Scalars / Rank 0 Array

* A scalar is a value, a single element in a array.
* It has no axes, no shape

In [124]:
x = np.array(3)
print ("x:\t\t\t", x)
print("dimensionality of x:\t", x.ndim)
print("shape of x:\t\t", x.shape)
print("size of x:\t\t", x.size)

x:			 3
dimensionality of x:	 0
shape of x:		 ()
size of x:		 1


## Vector / Rank 1 Array / 1-d Array

* A vector is a collection of scalars of ndim 0, in one axes
* shape is (n,)

In [141]:
x = np.array([1, 2, 3.1])
print ("x:\t\t\t", x)
print("first element(scalar):\t", x[0])
print("dimensionality of x:\t", x.ndim)
print("shape of x:\t\t", x.shape)
print("size of x:\t\t", x.size)

x:			 [1.  2.  3.1]
first element(scalar):	 1.0
dimensionality of x:	 1
shape of x:		 (3,)
size of x:		 3


## Matrix / Rank 2 Array / 2-d Array   (with only one row)

* This array is confused for 1d array as it has only one row, in numpy it is a 2d array as you can see it below with one row

In [139]:
x = np.array([[1, 2, 3.1]])
print("x:\t\t\t", x)
print("1st row 1st col(scalar):", x[0,0])
print("dimensionality of x:\t", x.ndim)
print("shape of x:\t\t", x.shape)
print("size of x:\t\t", x.size)

x:			 [[1.  2.  3.1]]
1st row 1st col(scalar): 1.0
dimensionality of x:	 2
shape of x:		 (1, 3)
size of x:		 3


## Matrix / Rank 2 Array / 2-d Array

* Regular matrix with collection of vectors in another axes, therefore 2 axes 

In [137]:
x = np.array([[1, 2, 3.1],[1, 2, 4.1]])
print ("x:", x)
print("2nd row(vector):\t", x[1])
print("2st row 2st col(scalar):", x[1,1])
print("dimensionality of x:\t", x.ndim)
print("shape of x:\t\t", x.shape)
print("size of x:\t\t", x.size)

x: [[1.  2.  3.1]
 [1.  2.  4.1]]
2nd row(vector):	 [1.  2.  4.1]
2st row 2st col(scalar): 2.0
dimensionality of x:	 2
shape of x:		 (2, 3)
size of x:		 6


### And so on Rank N Arrays with N axes

# Array Creation

**np.array()**
* Using array function you can initilize a np array
* You can specify the datatype of the elements by passing it in the parameter dtype with (int, float, complex and np datatype like np.int16, np.int32, np.float16......), it overrides the original scalar's datatype
* if no datatype is specified the highest datatype of the elements is chosen for all the elements

In [154]:
a = np.array([[1,2],[3,4.1]])
b = np.array([1,2.1], dtype=int)
print(a,"\t", a.dtype)
print(b,"\t\t", b.dtype)

[[1.  2. ]
 [3.  4.1]] 	 float64
[1 2] 		 int64


**Range functions to create array**
* **_np.arange()_** : creates a vector from 1st argument to 2nd argument(excluded), 3rd argument by default will be 1 its the step size(difference between any two consecutive elements) for the series.
* **_np.linsapce()_** : similar to arange creates a vector for the range but the 3rd arguments specifies how many elements for the array and is distrubuted in that range.

In [162]:
a = np.arange(2, 10, 2)
b = np.linspace(0, 2*np.pi, 10) # can be used to plot continous wave function, more no. of points smoother the curve.b
print(a)
print(b)

[2 4 6 8]
[0.         0.6981317  1.3962634  2.0943951  2.7925268  3.4906585
 4.1887902  4.88692191 5.58505361 6.28318531]


**Random Initilizations**
* **_np.random.rand()_** : fills the shape arguments with range 0 to 1
* **_np.random.randn()_** : fills the shape arguments with range -1 to 1

In [167]:
a = np.random.rand(2,3)
b = np.random.randn(2,3)*2 # can multiply range
print(a)
print(b)

[[0.95444886 0.49208351 0.70651353]
 [0.03651707 0.09268238 0.84055847]]
[[ 0.33259947  1.13511979  0.86989798]
 [ 1.09772632 -3.92057403 -1.50910288]]


**other functions**

* **_np.eye()_** : returns the identity matrix 
* **_np.zeros()_** : returns a matrix full of zeros, shape to passed as a single tuple
* **_np.ones()_** : returns a matrix full of ones, shape to passed as a single tuple

In [175]:
a = np.eye(3,3)
b = np.zeros((3,3))
c = np.ones((3,3))*5
print(a)
print(b)
print(c)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[5. 5. 5.]
 [5. 5. 5.]
 [5. 5. 5.]]


**saving and loading**

* **_np.fromfucntion()_** : loads the array from the function argument, shape, datatype
* **_np.save()_** : saves the 2nd argument np array to 1st argument named file of .npy format
* **_np.load()_** : loads the np array in .npy format

In [186]:
a = np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int)
print(a)
np.save("array", a)
b = np.load("array.npy")
print(b)

[[0 1 2]
 [1 2 3]
 [2 3 4]]
[[0 1 2]
 [1 2 3]
 [2 3 4]]


# Array Indexing

**Accessing and slicing**

* **:** is used to specify range or default consider all the elements in that axes
* **...**  is to specify all the elements of the unmentioned axes

In [212]:
a = np.array([[1, 2, 3],[4, 5, 6]])
print(a)
print("1st row 2nd column:\t", a[0,1])
print("sliced 2nd column:\t", a[:,1])
print("sliced matrix of all rows and 1,2 cloumns:", a[:,:2])
a = np.random.rand(2,2,2,2)
print("\n Rank 4 array \n",a)
print("\n Rank 2 slice \n", a[...,1,1])

[[1 2 3]
 [4 5 6]]
1st row 2nd column:	 2
sliced 2nd column:	 [2 5]
sliced matrix of all rows and 1,2 cloumns: [[1 2]
 [4 5]]

 Rank 4 array 
 [[[[0.74039869 0.03213296]
   [0.54849274 0.44230883]]

  [[0.60553825 0.51906164]
   [0.42737223 0.30629609]]]


 [[[0.9951699  0.91499586]
   [0.86243641 0.98017631]]

  [[0.90230312 0.72574973]
   [0.28827273 0.9451015 ]]]]

 Rank 2 slice 
 [[0.44230883 0.30629609]
 [0.98017631 0.9451015 ]]


**Slicing in NumPy and Python is different**

* Slice in numpy array is a view, changes made to a view also reflects in the original array 
* Slice in python list is a copy, copy is stored in the new variable

In [203]:
a = np.array([1, 2, 3])
b = [1, 2, 3]

c = a[:2]
d = b[:2]

c[0] = d[0] = 0

print(a, c)
print(b, d)

[0 2 3] [0 2]
[1, 2, 3] [0, 2]


**Integer array indexing**

In [222]:
a = np.random.rand(3, 3, 3)
print(a)
b = np.array([0, 1, 2])
c = np.array([2, 0, 1])
print("\n", a[c, b, c])

[[[0.18027451 0.28442072 0.96635281]
  [0.1206562  0.57207423 0.91136645]
  [0.60349547 0.21829406 0.72088607]]

 [[0.14659171 0.98779983 0.59963645]
  [0.63677712 0.26196782 0.09660217]
  [0.5268713  0.0662461  0.25746729]]

 [[0.45406827 0.75385073 0.87031914]
  [0.86152508 0.96987361 0.00764907]
  [0.34635318 0.29776545 0.81585002]]]

 [0.87031914 0.1206562  0.0662461 ]


**Boolean array indexing**

In [226]:
a = np.random.rand(3, 3)
print(a, "\n")
print(a>.5, "\n") # returns array of bools with true and flase in place
print(a[a>.5]) # returns all the elements passing the condition

[[0.473138   0.99371261 0.19330956]
 [0.76728982 0.84129166 0.57824388]
 [0.81208889 0.13357838 0.30892271]] 

[[False  True False]
 [ True  True  True]
 [ True False False]] 

[0.99371261 0.76728982 0.84129166 0.57824388 0.81208889]


**Iterating through first axes**

In [237]:
a = np.random.rand(3, 3, 3)

for row in a:
    print(row)

[[0.49181313 0.19781451 0.12585074]
 [0.27911155 0.97225961 0.84858664]
 [0.87524066 0.7998581  0.01968029]]
[[0.36603756 0.89621382 0.81323298]
 [0.58005876 0.94044935 0.5836186 ]
 [0.64224449 0.69378467 0.04219345]]
[[0.14411849 0.50324051 0.32215656]
 [0.82325582 0.46070257 0.19970968]
 [0.30359463 0.35687496 0.27319664]]


**Iterating through all the elements**

* **_ndarray.flat_** : is a iterator 
* **_ravel()_** : returns the flattened array

In [238]:
for i in a.flat:
    print(i)

0.4918131345963038
0.19781451476390344
0.12585074148158115
0.27911154606987987
0.9722596138223761
0.8485866381555572
0.8752406630877777
0.7998580993390798
0.01968029325099996
0.3660375629089514
0.8962138232702693
0.8132329770049697
0.5800587597729296
0.9404493507579278
0.5836186005066847
0.6422444886257379
0.6937846702626401
0.042193445184700695
0.1441184949416282
0.5032405071710186
0.32215656205265975
0.8232558158885318
0.4607025734004946
0.199709678360793
0.30359462768253875
0.3568749567485344
0.2731966391901639


In [239]:
a.ravel()

array([0.49181313, 0.19781451, 0.12585074, 0.27911155, 0.97225961,
       0.84858664, 0.87524066, 0.7998581 , 0.01968029, 0.36603756,
       0.89621382, 0.81323298, 0.58005876, 0.94044935, 0.5836186 ,
       0.64224449, 0.69378467, 0.04219345, 0.14411849, 0.50324051,
       0.32215656, 0.82325582, 0.46070257, 0.19970968, 0.30359463,
       0.35687496, 0.27319664])

# Array Shape Manipulation

* As long as the number of elements in the previous and next shape, it'll be reshaped.
* Original array is read first axes n wise then axes n-1 wise and so on... lastly axes 0 wise(row)
* Reshaped array is filled first axes n wise then axes n-1 wise and so on... lastly axes 0 wise(row)

In [254]:
a = np.arange(100)
print(a.shape, "\t", a, "\n")

# First axes 1(col) 25 elements are filled and then moves one step in axes 0(row) direction.
b = a.reshape(4,25)
print(b.shape, "\t", b, "\n")

c = b.reshape(10,10)
print(c.shape, "\t", c, "\n")

(100,) 	 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99] 

(4, 25) 	 [[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24]
 [25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
  49]
 [50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
  74]
 [75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
  99]] 

(10, 10) 	 [[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]] 



In [258]:
# reshaped back to original vector
d = b.reshape(100,)
print(d.shape, "\t", d, "\n")

# first axes 2(10 elements), then in axes 1(2 elements), then axes 0(5 elements)
e = c.reshape(5,2,10)
print(e.shape, "\t", e, "\n")

# Vector's Transpose is the same vector
f = a.T
print(f.shape, "\t", f, "\n")

# Rank 2 and above arrays' Transpose will change the order of axes
g = e.T
print(g.shape, "\t", g, "\n")

(100,) 	 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99] 

(5, 2, 10) 	 [[[ 0  1  2  3  4  5  6  7  8  9]
  [10 11 12 13 14 15 16 17 18 19]]

 [[20 21 22 23 24 25 26 27 28 29]
  [30 31 32 33 34 35 36 37 38 39]]

 [[40 41 42 43 44 45 46 47 48 49]
  [50 51 52 53 54 55 56 57 58 59]]

 [[60 61 62 63 64 65 66 67 68 69]
  [70 71 72 73 74 75 76 77 78 79]]

 [[80 81 82 83 84 85 86 87 88 89]
  [90 91 92 93 94 95 96 97 98 99]]] 

(100,) 	 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99] 

(

* **_ndarray.reshape()_** : returns reshaped array 
* **_ndarray.resize()_** : resize the original array

In [259]:
a.resize(2,50)
print(a.shape, "\t", a)

(2, 50) 	 [[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
  48 49]
 [50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
  74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
  98 99]]


If a dimension is given as -1 axes size is automatically calculated, only one axes can have -1

In [262]:
a.reshape(2,2,-1)

array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
         16, 17, 18, 19, 20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
         41, 42, 43, 44, 45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
         66, 67, 68, 69, 70, 71, 72, 73, 74],
        [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
         91, 92, 93, 94, 95, 96, 97, 98, 99]]])

# Array Math Operations

**Basic airthmetic**

In [263]:
x = np.array([[1,2], [3,4]])
y = np.array([[1,2], [3,4]])
print ("x + y:\n", x + y)# or np.add(x,y) 
print ("x - y:\n", x - y)# or np.sub(x,y)
print ("x * y:\n", x * y)# or np.mul(x,y)
print ("x / y:\n", x / y)# or np.div(x,y)

x + y:
 [[2 4]
 [6 8]]
x - y:
 [[0 0]
 [0 0]]
x * y:
 [[ 1  4]
 [ 9 16]]
x + y:
 [[1. 1.]
 [1. 1.]]


**Dot Product, Matrix Multiply**

**_numpy.dot()_**

* If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred

* If both a and b are 1-D arrays, it is inner product of vectors 

* If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred

* If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b

* If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b


In [11]:
a = np.array(6)
b = np.arange(6)
print("Both are scalars:", np.dot(a,b))

a = np.arange(6)
b = np.arange(6)
print("\nBoth are vectors:", np.dot(a,b))

a = np.random.rand(2,3)
b = np.random.rand(3,2)
print("\nBoth are matrices:", np.dot(a,b))

a = np.random.rand(2,3)
b = np.arange(3)
print("\na rank 2, b rank 1:", np.dot(a,b))

a = np.random.rand(2,3)
b = np.random.rand(2,3,3)
print("\na rank n, b rank m:", np.dot(a,b))

Both are scalars: [ 0  6 12 18 24 30]

Both are vectors: 55

Both are matrices: [[0.33555593 0.94312628]
 [0.35659898 0.86055118]]

a rank 2, b rank 1: [1.55646513 1.10113944]

a rank n, b rank m: [[[0.48609489 0.83852528 0.94792158]
  [1.44295022 1.04069418 1.00043468]]

 [[0.39222918 0.69907829 0.801805  ]
  [1.25636144 0.99211599 1.18329515]]]


**Why @ or matmul is perferred for matrix multiplication**

In [3]:
x = np.arange(10**6).reshape(1000,1000)
%timeit np.dot(x,x)
%timeit x.dot(x)       
%timeit x @ x          
%timeit np.matmul(x,x) 

1.95 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.92 s ± 40.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
632 ms ± 8.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
645 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


**other math methods**

In [21]:
# trigno
a = np.arange(5)
print("Trigno \n",np.sin(a))
print(np.cos(a))
print(np.tan(a))
print(np.degrees(a))

# approximation
a = np.random.rand(5)
print("\nApproximation \n",np.around(a, 3)) # 2nd argument speicifices the no. of decimals to approximated to 

# floor, largest integer less than the scalar
a = np.random.rand(5)
print("\nFloor \n",np.floor(a)) # largest integer less than the number

# ceil, smallest integer greater than the scalar
a = np.random.rand(5)
print("\nFloor \n",np.floor(a)) # largest integer less than the number

print("\nRemainder\n",np.mod(a,2))
print("\nSquare root\n",np.sqrt(a))
print("\nReciprocal\n",np.reciprocal(a))
print("\nExponent\n",np.power(a,3))


Trigno 
 [ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]
[ 1.          0.54030231 -0.41614684 -0.9899925  -0.65364362]
[ 0.          1.55740772 -2.18503986 -0.14254654  1.15782128]
[  0.          57.29577951 114.59155903 171.88733854 229.18311805]

Approximation 
 [0.029 0.636 0.032 0.745 0.473]

Floor 
 [0. 0. 0. 0. 0.]

Floor 
 [0. 0. 0. 0. 0.]

Remainder
 [0.76939734 0.57377411 0.10263526 0.69983407 0.66116787]

Remainder
 [0.87715297 0.75747879 0.32036738 0.83656086 0.8131223 ]

Remainder
 [1.29971856 1.74284614 9.74324036 1.42891013 1.5124752 ]

Remainder
 [0.45546188 0.18889604 0.00108116 0.34275615 0.28902487]
