#Machine Learning
-> Ability for computers to learn something without being explicitly programmed -- Andrew NG   
eg: Classfying an email as a spam or not

## Why do we need ML?  

While some tasks cant be software engineered, the others cant be. For eg: finding a shortest path could be performed by DSA algorithms, but classfying between a car image and a bike image ? Or pen and a pencil ?  

Machine Learning algorithms learn to differentiate just like humans do, by making mistakes and correcting them later - or more precisely ```learning from the past experiences```


## Types of ML algorithms  

```Supervised``` - We have the labels while training the data. 
eg: Dog vs Cat classifier, wine quality, Fraud detection etc

```Unsupervised``` - We dont have the labels, infact we dont even know have the target defined. 
eg: Number of tee shirts (S,M,L), Number of iPhones launch ?  

```Reinforcement``` - Algorithms best on the reward scenarios eg: Balancing Stick, juggling football, etc

Supervised learning can be used to solve two types of problems: ```Classification``` and ```Regression```  

![class_reg_image](https://drive.google.com/uc?id=13jSXFri2f8hWZYbqGpV5lA6XDcqsvoEC)

## Mathematics behind ML

lets try to understand what happens mathematically in ML:  


lets say you have an equation of line ```y = 2x + 3```

given x point [1,2,3,4,5] you'll get the **y as [5,7,9,11,13]**

Now in ML what happens is generally we have the data points --   
```[x,y] = [(1,5), (2,7), (3,9), (4,11), (5,13)]```

*NOTE: x could be a feature (column) and y a label*

and we have to formulate the equation of a line 

# Model Training

In [None]:
import os
import pandas as pd 

# numerical calculation library
import numpy as np

# Viz libraries
import matplotlib.pyplot as plt
import seaborn as sns

#Machine learning libraries
# sklearn remains one of the most imp library for beginner Machine Learning 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression 
from sklearn.neighbors import KNeighborsClassifier

# calculating performance
from sklearn.metrics import accuracy_score, mean_squared_error

```scikit-learn``` remains one of the most imp library for Machine Learning. You could perform almost all the traiditional ML algorithms using the library, all you'd need is some data.   
Please refer to the beginner-friendly blog on [scikit-learn](https://jakevdp.github.io/PythonDataScienceHandbook/05.02-introducing-scikit-learn.html)  

You could also refer to the official doc [here](https://scikit-learn.org/stable/)

## Data construction

In [None]:
path = "/content/drive/MyDrive/Colab_Notebooks/orness/clean_data/"

In [None]:
df = pd.read_csv(path+"winequality.csv")

In [None]:
df.head()

In [None]:
df.info()

Converting the problem into ```binary classification```

In [None]:
Y = df['TARGET'].apply(lambda y: 1 if y > 5 else 0)

In [None]:
# Why apply ?

In [None]:
X = df.drop('TARGET', axis=1)

In [None]:
X.shape, Y.shape

In [None]:
X

## Data Split

We need to split our data into ```train set``` and ```test set```.  

So we build our ML model using the train data and then check its performance on the test data.

In [None]:
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.25,random_state=42)

In [None]:
x_train.shape, x_test.shape, y_train.shape, y_test.shape 

## Model Training 1. Logistic Regression

We'll use logistic regression algorithm to model our data  

![log_reg](https://drive.google.com/uc?id=1yqtkIOzHt5RYc-pgR4Q8MB60hBY86DPa)

In [None]:
LR = LogisticRegression() # solver
LR.fit(x_train, y_train)

If you want more info on Logistic Regression please check out the [scikit learn page](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)  
For mathematical aspect of it along with the code, please follow [here](https://www.analyticsvidhya.com/blog/2021/07/an-introduction-to-logistic-regression/)

In [None]:
# Predict the x_test data on our model 
y_pred = LR.predict(x_test)

In [None]:
accuracy_score(y_test, y_pred)

## Model Training 2. K-Nearest Neigbor

Lets try running a K-NN algorithm and see the performance

In [None]:
KNN = KNeighborsClassifier()
KNN.fit(x_train, y_train)

Lets see what exactly KNN is trying to do here  

![knn](https://drive.google.com/uc?id=1S5UvP-cGM1E1AlvjqPvJJn_48tf3hon6)

In [None]:
y_pred_knn = KNN.predict(x_test)

In [None]:
accuracy_score(y_test, y_pred_knn)

## Manual test data

We passed a negative integer for the other dimension. That means the remaining dimension becomes whatever shape is needed to make it hold all the original data

In [None]:
# test on any row of the column to see the correctness 
input = np.asarray(df.drop('TARGET', axis=1).iloc[2]).reshape(1, -1)
LR.predict(input)

Please refer the [Data Sci book](https://jakevdp.github.io/PythonDataScienceHandbook/) here to further hone your skills.  
Data Sci Github - https://github.com/khuyentran1401/Data-science  
Data Sci resources - https://github.com/tirthajyoti/Data-science-best-resources   
ml blog - https://machinelearningmastery.com/start-here/#getstarted  
Practise numpy and pandas [here](https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Pandas%20and%20Numpy/Pandas_Operations.ipynb)

## Model pickling (dumping)

Model Pickling is an important step once your model is trained. So that our model (which is simply a matrix of weights) could be further used to make the prediction. Furtheron this matrix could be served in the front end UI to make the predictions. 



In [None]:
#library to dump the python objects into binary  
import pickle

In [None]:
with open('model_pkl', 'wb') as files:
    pickle.dump(LR, files)

In [None]:
ls

In [None]:
with open('model_pkl' , 'rb') as f:
    lr = pickle.load(f)