<a href="https://colab.research.google.com/github/ashu5644/Machine-Learning-Tools/blob/main/Numpy_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np

**Python VS C Language**
1. Python language is human friendly and easier to code compared to C language.
2. Python is relatively slow compared to C language.
3. Python is application centric while C is OS centric i.e C is preferred in applications where we need to work close to OS and python is preffered in applications where we need to work close to software logic and don't care much on OS memory mangement etc.

Numpy is written in C with python wrapper, so it gives us comfort of python syntax with speed of C.
Complex mathematical operations can be computed very fast and efficently in Numpy, which are difficult to do with core python data structuers.


**Properties of Numpy**
 1. Numpy core object is N-Dimnesional array
 2. N-D array objects are homogeneous always, i.e single type of data can be stored in an object, which is similar to C language objects and essential for speed.
 3. N-D array objects are of fixed size, i.e object size is static and not dynamic i.e predefined size, which is similar to C language objects and essential for speed.



Standalone Functions (np.func)
  - They are designed to work on any array-like input (e.g., lists, tuples, numpy arrays) and often provide a more general or high-level operation.

Array Methods (np_array.func)
  - These methods are specific to numpy array instances and often provide operations or transformations that are tightly coupled with the array's internal structure.


**N-D Array Creation**

In [None]:
import numpy as np

In [None]:
# Method-1
arr_1d = np.array([[1,2,3,4]], dtype=np.int32)
arr_2d = np.array([[1,2,3,4],[5,6,7,8]], dtype=np.int32)
print(arr_1d)
print("*"*50)
print(arr_2d)
print("*"*50)
# Method-2
arr_zeros_nd = np.zeros((3,5))
arr_ones_nd = np.ones((3,5))
arr_identity_nd = np.identity(5) # only square arrays are allowed
print(arr_zeros_nd)
print("*"*50)
print(arr_ones_nd)
print("*"*50)
print(arr_identity_nd)
print("*"*50)
# Method-3
arr_2 = np.arange(start=0,stop=5,step=1)
print(arr_2)
print("*"*50)
# Method-4
arr_3 = np.linspace(start=0,stop=5,num=2)
print(arr_3)

# Difference between np.arange, np.linspace is following:
# in np.arange we specify step size of gap b/w consecutive data points, while in np.linspace we specify num_samples to generate as third parameter

[[1 2 3 4]]
**************************************************
[[1 2 3 4]
 [5 6 7 8]]
**************************************************
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
**************************************************
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
**************************************************
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
**************************************************
[0 1 2 3 4]
**************************************************
[0. 5.]


**Properties and attributes of N-D array**

In [None]:
arr_4 = np.array([[2,3,4],[5,6,7]])
print(f"shape: {arr_4.shape}")
print("*"*50)
print(f"size: {arr_4.size}")
print("*"*50)
print(f"ndim: {arr_4.ndim}")
print("*"*50)
print(f"dtype: {arr_4.dtype}")

# Differecne between shape and size is following:
# size represents total number of items, while shape represents number of items in each dimension.
print("*"*50)
print(arr_4.astype('float').dtype)

# astype function can be crucial in data cleaning steps to decrease memory footprint of data as pper requirements

shape: (2, 3)
**************************************************
size: 6
**************************************************
ndim: 2
**************************************************
dtype: int64
**************************************************
float64


**Python List VS Numpy N-D Arrays**
1. N-D arrays are ***faster***
  1. Due to broadcasting support, no loop requirement, multiple operations being done in parallel in numpy unlike sequential operations in python
2. N-D arrays are ***convinent***
  1. Syntax of n-d array operations is more human friendly and minimal compared to raw python
3. N-D arrays occupy ***less memory*** for same data
  1.  python array stores reference and values of all data-points, as stored location can be non-contguous due to non-predefined/dynamic/hetrogeneous data-type of python lists.
  2. numpy array stores data points in continuous memory block due to predefined and homogenous data-type, that's why it's more memory efficient

In [51]:
# Less Memory
import numpy as np
neles = 10**5
arr_numpy = np.arange(neles)
arr_python = list(range(neles))

import sys
python_list_size = sys.getsizeof(arr_python) + sum(sys.getsizeof(item) for item in arr_python)
numpy_list_size = arr_numpy.nbytes
print(f"python_list_size(Bytes): {python_list_size}")
print(f"numpy_list_size(Bytes): {numpy_list_size}")
print(f"Memory-Ratio: {numpy_list_size/python_list_size}") # 22% of python size only

python_list_size(Bytes): 3600052
numpy_list_size(Bytes): 800000
Memory-Ratio: 0.22221901239204322


In [None]:
# Less Time
import time
arr_numpy1 = np.arange(neles)
arr_numpy2 = np.arange(neles)
t1 = time.time()
arr_numpy3 = arr_numpy1+arr_numpy2
t_numpy = time.time()-t1
print(f"Time-Numpy: {time.time()-t1} seconds")
arr_py1 = range(neles)
arr_py2 = range(neles)
t1 = time.time()
arr_py3 = sum(arr_py1)+sum(arr_py2)
t_python = time.time()-t1
print(f"Ratio: {t_numpy/t_python}") # 5% of python time

Time-Numpy: 0.0005674362182617188 seconds
Ratio: 0.05699857712803058


**Reshaping Numpy Array**

1. Ravel Return only reference/view of the original array
2. Flatten Return copy of the original array

In [None]:
import numpy as np
arr = np.random.randint(0,20,(5,4))
print(arr)
# Reshape
print("*"*50)
print(arr.reshape((4,5)))
# Ravel (Make 1d array)
print("*"*50)
print(arr.ravel()) # Return only reference/view of the original array
# Flatten (Make 1d array)
print("*"*50)
print(arr.flatten()) # Return copy of the original array
# Transpose
print("*"*50)
print(np.transpose(arr, (1,0)))
# Stacking (vertical, horizontal)
arr1 = np.random.randint(0,10,(3,2))
arr2 = np.random.randint(0,10,(3,2))
print("*"*50)
print(np.vstack((arr1,arr2))) # top-down
print("*"*50)
print(np.hstack((arr1,arr2))) # left-right
# Breaking/Splitting (vertical, horizontal)
print("*"*50)
print(np.vsplit(np.vstack((arr1,arr2)),2)) # only equal shape divison allowed
print("*"*50)
print(np.hsplit(np.hstack((arr1,arr2)),2)) # only equal shape divison allowed

[[18  8  9  4]
 [10 15  3  1]
 [ 0 17  8 14]
 [18  5  0 11]
 [13 14 18  1]]
**************************************************
[[18  8  9  4 10]
 [15  3  1  0 17]
 [ 8 14 18  5  0]
 [11 13 14 18  1]]
**************************************************
[18  8  9  4 10 15  3  1  0 17  8 14 18  5  0 11 13 14 18  1]
**************************************************
[18  8  9  4 10 15  3  1  0 17  8 14 18  5  0 11 13 14 18  1]
**************************************************
[[18 10  0 18 13]
 [ 8 15 17  5 14]
 [ 9  3  8  0 18]
 [ 4  1 14 11  1]]
**************************************************
[[5 1]
 [9 5]
 [8 3]
 [0 7]
 [2 9]
 [3 6]]
**************************************************
[[5 1 0 7]
 [9 5 2 9]
 [8 3 3 6]]
**************************************************
[array([[5, 1],
       [9, 5],
       [8, 3]]), array([[0, 7],
       [2, 9],
       [3, 6]])]
**************************************************
[array([[5, 1],
       [9, 5],
       [8, 3]]), array([[0, 7],
       [2, 

**Indexing, Slicing, Iteration of N-D arrays**

In [None]:
# Indexing and slicing
arr6 = np.arange(24).reshape(4,6)
print(arr6)
print("*"*50)
print(arr6[2]) # extract 2nd row
print("*"*50)
print(arr6[:, 2]) # extract 2nd column
print("*"*50)
print(arr6[3, 2]) # extract 3row, 2nd column
print("*"*50)
print(arr6[0:3, 0:2]) # extract first 3 rows, first 2 columns
print("*"*50)
print(arr6[[0,2], [1,3]]) # extract individual elements by (x,y) cordinates specify as separate list
# Indexing with boolean-array
print("*"*50)
print(arr6[arr6>15]) # always gives 1-d array, as this opration works on elementwise data, and output can be of any irregaulr shape so final output comes as 1-d always.

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
**************************************************
[12 13 14 15 16 17]
**************************************************
[ 2  8 14 20]
**************************************************
20
**************************************************
[[ 0  1]
 [ 6  7]
 [12 13]]
**************************************************
[ 1 15]
**************************************************
[16 17 18 19 20 21 22 23]


In [None]:
# extract 1st3rd row, 0th 2nd column (it is bit tricky)
# can't specify both row and column selection in sngle operation, it will result into specific element selection as shown above
arr1 = arr6[:,[0,2]][[1,3],:] # 0th 2nd column followed by 1stand 3row
arr2 = arr6[[1,3],:][:,[0,2]] # 1stand 3row followed by 0th 2nd column
print("*"*50)
print(arr1)
print("*"*100)
print(arr2)

**************************************************
[[ 6  8]
 [18 20]]
****************************************************************************************************
[[ 6  8]
 [18 20]]


In [None]:
for i in np.nditer(arr6):
    print(i)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


**Numpy Operations**

In [None]:
# Multiply by scalr
print(arr6)
arr7 = arr6*2
print("*"*50)
print(arr7)
# Boolean operation
print("*"*50)
print(arr7<10)
# Dot product
arr8 = np.random.randint(0,10,(3,4))
arr9 = np.random.randint(0,10,(4,5))
print("*"*50)
print(np.dot(arr8,arr9), np.dot(arr8,arr9).shape)
# Min, Max, median, mode, mean, std. etc. etc along different axes (aggregate oprations)
print("*"*50)
print(np.min(arr8), np.min(arr8,axis=0),np.min(arr8,axis=1))
# Pointwise operations
print("*"*50)
print(np.exp(arr8))
print("*"*50)
print(np.sin(arr8))

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
**************************************************
[[ 0  2  4  6  8 10]
 [12 14 16 18 20 22]
 [24 26 28 30 32 34]
 [36 38 40 42 44 46]]
**************************************************
[[ True  True  True  True  True False]
 [False False False False False False]
 [False False False False False False]
 [False False False False False False]]
**************************************************
[[ 33 108 108  68  96]
 [ 97  98  68 144 206]
 [ 40 125 104 101 134]] (3, 5)
**************************************************
1 [3 5 1 1] [1 1 3]
**************************************************
[[2.00855369e+01 1.48413159e+02 8.10308393e+03 2.71828183e+00]
 [8.10308393e+03 8.10308393e+03 2.71828183e+00 1.09663316e+03]
 [2.00855369e+01 1.48413159e+02 1.09663316e+03 2.98095799e+03]]
**************************************************
[[ 0.14112001 -0.95892427  0.41211849  0.84147098]
 [ 0.41211849  0.41211849  0.