## Imports

In [None]:
import pandas as pd

## MNIST Data Collection

In [None]:
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/kaggle_mnist/train.csv
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/kaggle_mnist/test.csv

--2021-07-25 06:00:10--  https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/kaggle_mnist/train.csv
Resolving nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com (nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com)... 52.219.66.42
Connecting to nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com (nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com)|52.219.66.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 76775041 (73M) [text/csv]
Saving to: ‘train.csv’


2021-07-25 06:00:14 (18.8 MB/s) - ‘train.csv’ saved [76775041/76775041]

--2021-07-25 06:00:14--  https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/kaggle_mnist/test.csv
Resolving nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com (nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com)... 52.219.62.87
Connecting to nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com (

In [None]:
train_df = pd.read_csv('train.csv')
train_X_df = train_df.drop('label', axis=1)
train_Y_df = train_df['label']
test_X_df = pd.read_csv('test.csv')

## Applying the [KNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) on MNIST Dataset using [Pipelines](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html)

### Creating Pipeline

* Create a pipeline with a Scaler using the [KNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) as an estimator 
  * Tryiny different scaling techniques: [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler), [MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler), [RobustScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler) and No scaler

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier

pipe = Pipeline(steps=[('scaler', MinMaxScaler()),
                       ('knn', KNeighborsClassifier())])

### Hyperparameter Tuning

* Passing the pipeline to [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) for hyperparameter tuning and get the best **k** (nearest neighbours) and **p** (power metric in minkowski's distance) values.



In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'knn__n_neighbors': [5, 10],
    'knn__p': [1, 2]
}

grid_search = GridSearchCV(pipe, param_grid=param_grid, scoring='accuracy', refit=True, cv=2) 
grid_search.fit(train_X_df, train_Y_df)
print(grid_search.best_params_)
print(grid_search.score(train_X_df, train_Y_df))

{'knn__n_neighbors': 5, 'knn__p': 2}
0.9791428571428571


### Predictions on the Test Data

* Predicting the target values for `test_X_df` using the best model (model trained using the optimal hyperparameters on the entire train data).


In [None]:
best_model = grid_search.best_estimator_
predicted_test_Y = best_model.predict(test_X_df)
predicted_test_Y

array([2, 0, 9, ..., 3, 9, 2])