<a href="https://colab.research.google.com/github/Nitesh-Nandan/Collab-Notebooks/blob/main/ml/courses/numpy_pandas_scikit_learn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Overview
It is a hands-on exercise for the course: https://www.educative.io/courses/machine-learning-numpy-pandas-scikit-learn/overview

## Seven Steps the Machine Learning
- Data Collection
- Data Processing and Preparation
- Feature Engineering
- Model Selection
- Model Training and Data Pipeline
- Model Validation
- Model Persistence




# Data Manipulation with NumPy

*The act of converting raw data into a meaningful form, is an essential skill to have.*

## NumPy Arrays

In [3]:
# NumPy arrays are basically just Python lists with added features

import numpy as np

arr = np.array([-1, 2, 3, 4, 5], dtype = np.float32)
print(repr(arr))

array([-1.,  2.,  3.,  4.,  5.], dtype=float32)


In [6]:
# When the elements of a NumPy array are mixed types, then the array's type will be upcast to the highest level type

arr = np.array([0, 0.1, 2])
print(arr.dtype)
print(repr(arr))

float64
array([0. , 0.1, 2. ])


In [5]:
# Similar to Python lists, when we make a reference to a NumPy array it doesn't create a different array. Therefore, if we change a value using the reference variable, it changes the original array as well. We get around this by using an array's inherent copy function. The function has no required arguments, and it returns the copied array.

a = np.array([0, 1])
b = np.array([9, 8])
c = a
print('Array a: {}'.format(repr(a)))
c[0] = 5
print('Array a: {}'.format(repr(a)))

d = b.copy()
d[0] = 6
print('Array b: {}'.format(repr(b)))

Array a: array([0, 1])
Array a: array([5, 1])
Array b: array([9, 8])


In [7]:
# We cast NumPy arrays through their inherent astype function. The function's required argument is the new type for the array. It returns the array cast to the new type.

arr = np.array([0, 1, 2])
print(arr.dtype)
print(repr(arr))

arr = arr.astype(np.float32) # copies the data and cast
print(arr.dtype)
print(repr(arr))

int64
array([0, 1, 2])
float32
array([0., 1., 2.], dtype=float32)


In [9]:
# When we don't want a NumPy array to contain a value at a particular index, we can use np.nan to act as a placeholder. A common usage for np.nan is as a filler value for incomplete data.

arr = np.array([np.nan, 1, 2])
print(repr(arr))

arr = np.array([np.nan, 'abc'])
print(repr(arr))

# Will result in a ValueError: If we uncomment line 8 and run again.
# np.array([np.nan, 1, 2], dtype=np.int32)
np.array([np.nan, 1, 2], dtype=np.float32)

array([nan,  1.,  2.])
array(['nan', 'abc'], dtype='<U32')


array([nan,  1.,  2.], dtype=float32)

In [10]:
# To represent infinity in NumPy, we use the np.inf special value. We can also represent negative infinity with -np.inf.

print(np.inf > 1000000)

arr = np.array([np.inf, 5])
print(repr(arr))

arr = np.array([-np.inf, 1])
print(repr(arr))

# Will result in a OverflowError: If we uncomment line 10 and run again.
#np.array([np.inf, 3], dtype=np.int32)
np.array([np.inf, 3], dtype=np.float32)

True
array([inf,  5.])
array([-inf,   1.])


array([inf,  3.], dtype=float32)

## NumPy Basics

In [17]:
# Ranged Data

arr = np.arange(5)
print(repr(arr))

arr = np.arange(0, 5.5, 0.5)
print(repr(arr))

arr = np.arange(-1, 4)
print(repr(arr))

arr = np.arange(-1.5, 4, 2)
print(repr(arr))

array([0, 1, 2, 3, 4])
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
array([-1,  0,  1,  2,  3])
array([-1.5,  0.5,  2.5])


In [22]:
# Linspace

'''
default num is 50, and the default dtype is float
'''

arr = np.linspace(5, 11, dtype=np.int32)
# print(repr(arr.shape))


arr = np.linspace(5, 11, num=4, endpoint=False)
print(repr(arr))

arr = np.linspace(5, 11, num=10, dtype=np.int32)
print(repr(arr))

(50,)
array([5. , 6.5, 8. , 9.5])
array([ 5,  5,  6,  7,  7,  8,  9,  9, 10, 11], dtype=int32)


In [23]:
# Reshaping the Data

'''
The function we use to reshape data in NumPy is np.reshape. It takes in an array and a new shape as required arguments. The new shape must exactly contain all the elements from the input array. For example, we could reshape an array with 12 elements to (4, 3), but we can't reshape it to (4, 4).

We are allowed to use the special value of -1 in at most one dimension of the new shape. The dimension with -1 will take on the value necessary to allow the new shape to contain all the elements of the array.
'''

arr = np.arange(8)

reshaped_arr = np.reshape(arr, (2, 4))
print(repr(reshaped_arr))
print('New shape: {}'.format(reshaped_arr.shape))

reshaped_arr = np.reshape(arr, (-1, 2, 2))
print(repr(reshaped_arr))
print('New shape: {}'.format(reshaped_arr.shape))

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])
New shape: (2, 4)
array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])
New shape: (2, 2, 2)


In [25]:
# Flatten

arr = np.arange(8)
arr = np.reshape(arr, (2, 4))
flattened = arr.flatten()
print(repr(arr))
print('arr shape: {}'.format(arr.shape))
print(repr(flattened))
print('flattened shape: {}'.format(flattened.shape))

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])
arr shape: (2, 4)
array([0, 1, 2, 3, 4, 5, 6, 7])
flattened shape: (8,)


**Transpose**

The transpose of a matrix is obtained by flipping the matrix over its diagonal. In other words, the row and column indices of each element are swapped.

For a matrix $A$ with dimensions $m*n$:
- The transpose, denotes as $A^T$, has dimension $n*m$
- The element at postion $(i, j)$ in $A$ moves to position $(j,i)$ in $A^T$

Example:

$if \space A = \begin{bmatrix}
1 & 2\\
3 & 4\\
5 & 6
\end{bmatrix}$

<br/>

$then \space A^T = \begin{bmatrix}
1 & 3 & 5\\
2 & 4 & 6
\end{bmatrix}$

In [26]:
# Transposing
'''
transpose of matrix is also applicable in 3D array
'''

arr = np.arange(8)
arr = np.reshape(arr, (4, 2))
transposed = np.transpose(arr)
print(repr(arr))
print('arr shape: {}'.format(arr.shape))
print(repr(transposed))
print('transposed shape: {}'.format(transposed.shape))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])
arr shape: (4, 2)
array([[0, 2, 4, 6],
       [1, 3, 5, 7]])
transposed shape: (2, 4)


## NumPy Maths