# Introduction
In this kernel I have played with the SVM hyperparameters using Grid Search to gain better accuracy on this problem. SVM generally tries to separate classes by constructing hyperplanes between them. These hyperplanes are constructed such that some chosen nearest points to the boundary/hyperplane in separate classes are as far as possible from the boundary/hyperplane. This provides for more possibility to correctly classify new data. The distance between chosen point and boundary is called margin and SVM objective is to maximize margin for the number of chosen points using hyperparameters.

## Importing Libraries
Used scikit-learn's SVM for classification

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split,StratifiedKFold
from sklearn.grid_search import GridSearchCV
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns

color = sns.color_palette()

### Loading Train and Test data

In [None]:
df_train = pd.read_csv('../input/train.csv',nrows=5000,skiprows = range(1,2000))
df_train.head()

In [None]:
df_test = pd.read_csv('../input/test.csv')
df_test.head()

In [None]:
df_train.shape

In [None]:
df_test.shape

### Distribution of classes in Training Data
As per the bar chart, we can conclude that the training data has balanced classes and adequately represent the population of each class.

In [None]:
arr = df_train.label.value_counts()
plt.bar(arr.index,arr.values)

In [None]:
y = df_train['label']
df_train = df_train.drop('label',axis=1)

In [None]:
df_train.shape

In [None]:
y.shape

## Using Grid Search for parameter tuning
#### About Important Parameters 
1.  The most important parameter is the kernel function, it controls the type of boundary to be constructed to separate classes. There are four types ( rbf, poly, linear, sigmoid ) of function but most widely used are - 
          "rbf" - radial basis function which maps data to higher dimension to construct a hyperplane to divide classes.
          "linear" - creates a linear boundary to separate classes.
          
2.  C is the cost of misclassification. In other words it is trade-off between classification of training example and smooth decision boundary. C decides proportion of support vector/training examples to be chosen to construct the boundary. Larger the C - higher the variance and vice-versa as decision boundary is trying to maximise the margin for large number of points in space so its going to overfit training data. 

3.  "gamma" tells how far the influence of single training data is to be taken to construct the boundary. If gamma is large then we are giving high weight to near points and less to far away points leading to a high bias. On the other hand if gamma is small the we assign high weights to far away chosen points and less to nearby points thus causing high variance as data is boundary is trying to fit based on far away pata more for each class.

Note: Values of gamma and C are highly dependent on dataset provided.

In [None]:
params = [
  {'C': [1, 5, 7, 10], 'random_state': [ 42, 179 ], 'kernel': ['linear']},
  {'C': [1, 5, 7, 10], 'random_state':[ 42, 179], 'gamma': [5.0, 2.0, 1.0, 0.1], 'kernel': ['rbf']}
]   
#  {'C': [0.1, 0.5, 1, 5, 7, 10,], 'random_state':[ 42, 179], 'gamma': [1.0, 0.1, 0.01, 0.025, 0.001], 'degree':[2,3,4], 'kernel': ['poly']}

### Declaring Grid Search Cross Validation

In [None]:
gs_svm = GridSearchCV(estimator=svm.SVC(), param_grid=params, n_jobs=4)

### Training over un-preprocessed Data

In [None]:
# X_train, X_valid, y_train, y_valid = train_test_split(df_train, y, test_size = 0.3, random_state = 42)
# print(X_train.shape)
# print(X_valid.shape)
# print(y_train.shape)
# print(y_valid.shape)
# gs_svm.fit(X_train, y_train) 
# print(gs_svm.best_params_) 
# print(gs_svm.best_score_)

### Pre-processing Data
Data with positive pixel value are assigned 1

In [None]:
# #  Scaling the pixel values as SVM is not scale invariant
# new_df_train = df_train.copy()
# new_df_train[new_df_train>0] = 1
# new_df_test = df_test.copy()
# new_df_test[new_df_test>0] = 1
new_df_train = df_train.copy()
new_df_test = df_test.copy()

In [None]:
#X_train, X_valid, y_train, y_valid = train_test_split(new_df_train, y, test_size = 0.3, random_state = 42)
# print(X_train.shape)
# print(X_valid.shape)
# print(y_train.shape)
# print(y_valid.shape)
# gs_svm.fit(X_train, y_train) 
# print(gs_svm.best_params_) 
# print(gs_svm.best_score_)

In [None]:
# kfold = StratifiedKFold(n_splits=3,random_state=42)
# for train_idx,valid_idx in kfold.split(new_df_train,y):
#     X_train = new_df_train.iloc[train_idx]
#     y_train = y.iloc[train_idx]
#     X_valid = new_df_train.iloc[valid_idx]
#     y_valid = y.iloc[valid_idx]
#     gs_svm.fit(X_train, y_train)
#     print(gs_svm.score(X_valid,y_valid))
#     print(gs_svm.best_params_)

In [None]:
# df_test[df_test>0] = 1
# y_predict = gs_svm.predict(df_test)
# gs_svm.fit(X_train, y_train)  
# print(gs_svm.best_score_)
# print(gs_svm.best_param_)

### Pre-processing using PCA
PCA( Principal component Analysis ) is a method of constructiong new characteristics such that old characteristics can be recovered. It is a better approximate of higher dimension data in lower dimension. By dimension I mean features. In this correlated features/dimensions are replaced with a linear combination or any function of these features. PCA looks for properties that show as much variation as possible. 

In [None]:
pca_model = PCA(n_components=40)
new_df_train = pd.DataFrame(pca_model.fit_transform(new_df_train))
new_df_test = pd.DataFrame(pca_model.transform(new_df_test))

In [None]:
#X_train, X_valid, y_train, y_valid = train_test_split(new_df_train, y, test_size = 0.3, random_state = 42)
# print(X_train.shape)
# print(X_valid.shape)
# print(y_train.shape)
# print(y_valid.shape)
# gs_svm.fit(X_train, y_train)  
# print(gs_svm.best_score_)
# print(gs_svm.best_param_)

In [None]:
kfold = StratifiedKFold(n_splits=3,random_state=42)
for train_idx,valid_idx in kfold.split(new_df_train,y):
    X_train = new_df_train.iloc[train_idx]
    y_train = y.iloc[train_idx]
    X_valid = new_df_train.iloc[valid_idx]
    y_valid = y.iloc[valid_idx]
    gs_svm.fit(X_train, y_train)
    print(gs_svm.score(X_valid,y_valid))
    print(gs_svm.best_params_)
y_predict = gs_svm.predict(new_df_test)

In [None]:
y_predict.shape

In [None]:
sample_csv = pd.read_csv('../input/sample_submission.csv')
sample_csv['Label'] = y_predict
sample_csv.to_csv("output.csv", index=False, header=True)