**Project Aim** <br>
This notebooks is to apply Support Vector Machines to a Classification problem

Develop a model using Support Vector Machine which should correctly classify the handwritten digits from 0-9 based on the pixel values given as features. Thus, this is a 10-class classification problem.

**Project Assumptions**

**Project Objective**
- Basic EDA of the MNIST Dataset
    - How should pixal data be analysed?
    - Use new plots from Seaborn
    - Implement groupby functions in pandas to analyse different data aggregations
    - What columns are identical or have very similar distributions?
    - Outlier analysis and associated impact
- Implement one solution to reduce dimensionality
- Understand the framework of support vector machines 
    - Build a model that overfits a small proporion of the dataset and how to reach this conclusion
    - Understand methods to select features for classification problems
    - Include bias/variance trade off analysis
    - liblinear and libvsm libraries 
    - Cross Validation Strategy (LOO_CV)
    - Sci-Kit Learns Transformation Pipeline (page 66 - Hands on ML with Sci-kit Learn sklearn.pipeline import Pipeline)
- Application of multiple Kernal Types and model performance analysis
- Understand and apply an appropriate loss function 
    - Hinge Loss
- Feature Selection/Scaling
    - Add randomised features
    - Implement Scaling/Normalisation using Sci-kit learn Pipeline
- Hyperparameter Tuning (Implement and understand grid search plus evaluate where the model over/underfits)
    - Grid Search for parameters (page 72 - Hands on ML): Determine which parameters need to be optimised 
- Understand how explainability may be applied to SVM Classification problems
- Investigate if SVM model can be plotted 
    - Error evaluation: Where does the system commonly make mistakes and how can this be improved?
- Dockerise the solution 
- Document brief conclusions 
    - Think where bias might be incorporated in the model and how this can be treated --- Poor quality measures?
    - What is the scope that the model can predict and where could it have issues generalising e.g. new categories?
- Document and understand 3 key learningd from other kaggle solutions

**Support Vector Machine Notes**
- Hands-On Machine Learning with Scikit-Learn & Tensorflow
- Mastering Predictive Analytics with R
- Andrew Ng SVM <br>
https://www.youtube.com/watch?v=XfyR_49hfi8&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=74

**Parameters**
- Choice of parameter C
- Choice of kernel (similarity function)
    - Linear SVM uses no Kernel: Standard linear classifier (use when large number of features but small dataset)
    - Gaussian: Will require selection of parameter sigma^2 (use when small number of features but large training set)
        - **Perform feature scaling before implementing a gaussian kernel**

**Algorithm Performance on MNIST Dataset**
- Benchmark Model:
- Optimal Performance (Personal): 
- Optimal Performance (Kaggle): Error rates as low as 0.23% (99.77 Accuracy)

**References** <br>
https://www.researchgate.net/publication/230800948_The_Secrets_to_Managing_Business_Analytics_Projects <br>
https://towardsdatascience.com/an-intro-to-kernels-9ff6c6a6a8dc <br>
https://towardsdatascience.com/visualizing-statistical-plots-with-seaborn-6b6e60ce5e71 <br>
https://medium.com/analytics-vidhya/5-lesser-known-seaborn-plots-most-people-dont-know-82e5a54baea8 <br>
https://towardsdatascience.com/11-examples-to-master-pandas-groupby-function-86e0de574f38 <br>
https://machinelearningmastery.com/feature-selection-subspace-ensemble-in-python/ <br>
https://towardsdatascience.com/one-potential-cause-of-overfitting-that-i-never-noticed-before-a57904c8c89d 

In [1]:
import os
import pandas as pd

In [2]:
os.getcwd()

'/Users/Rej1992/Documents/GitHub/SupportVectorMachines/notebooks'

In [4]:
train_data = pd.read_csv('/Users/Rej1992/Desktop/SVM_Data/mnist_train.csv')
test_data = pd.read_csv('/Users/Rej1992/Desktop/SVM_Data/mnist_test.csv')

In [26]:
def _remove_columns_unique_values(data):
    
    nunique = data.apply(pd.Series.nunique)
    cols_to_drop = nunique[nunique == 1].index
    
    return data.drop(cols_to_drop, axis=1)

test_data = _remove_columns_unique_values(test_data)

In [30]:
def _determine_col_intersection(data1, data2):
    
    if(len(data1.columns) > len(data2.columns)):
        larger_df = data1
        smaller_df = data2
    else:
        larger_df = data2
        smaller_df = data1
        
    
    return larger_df[~larger_df.isin(smaller_df)]

train_data = _determine_col_intersection(test_data,train_data)

In [31]:
train_data

Unnamed: 0,label,1x13,1x14,1x15,1x16,2x5,2x6,2x7,2x8,2x9,...,28x15,28x16,28x17,28x18,28x19,28x20,28x21,28x22,28x23,28x24
0,5.0,0,0,0,0,0,,,,,...,,,,,,,,,0,0
1,0.0,0,0,0,0,0,,,,,...,,,,,,,,,0,0
2,4.0,0,0,0,0,0,,,,,...,,,,,,,,,0,0
3,1.0,0,0,0,0,0,,,,,...,,,,,,,,,0,0
4,9.0,0,0,0,0,0,,,,,...,,,,,,,,,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59995,8.0,0,0,0,0,0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
59996,3.0,0,0,0,0,0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
59997,5.0,0,0,0,0,0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
59998,6.0,0,0,0,0,0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
