# Parameter estimation using grid search with cross validation

Here we are using Digits dataset which is available sklearn library to understand about the paramater estimation using grid search with cross validation

## Importing the Libraries

In [1]:
# Librry for importing the datasets of sklearn
from sklearn import datasets
# Library for importing the train and test set data
from sklearn.model_selection import train_test_split
# Library for importing Grid search
from sklearn.model_selection import GridSearchCV
# Library for getting classification report 
from sklearn.metrics import classification_report
# Library for SVC
from sklearn.svm import SVC

## Creating Module for Ipython

In [2]:
print(__doc__)

Automatically created module for IPython interactive environment


## Load the dataset

In [3]:
# Loading the digits dataset that is available
digits = datasets.load_digits()

## Converting the data into matrix

In [4]:
# To apply an classifier on this data, we need to flatten the image, to turn the data in a (samples, feature) matrix:

# Store the length of the image into a variable named n_samples
n_samples = len(digits.images)
# Store the reshape of images with parameters as n_samples and -1 in X and target in y 
X = digits.images.reshape((n_samples, -1))
y = digits.target

## Modelling

In [5]:
# Split the dataset in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0)

''' 1. Here, set the parameters by cross validation with a variable name tuned_parameters
    2. Create variable name scores which stores 'Precision' and 'Recall'
    3. Use a for loop to do the parameter search in scores for score
         i)  Create a model of variable named GridSearch with SVC(),tuned_parameters,scoring='%_macro' % score as parameters
         ii) Fit the model with training data
         iii) Print the best paramters 
         iv) Print the Grid scores on development set
For this Get the mean and std from the model and store in means and stds variable
Using a for loop inside for zip(means,stds,clf_cv_results['params']), print mean,std,params with precision
The second for loop is finished
         v) Get the predicted value of X_test and store in y_pred and store y_test in y_true
         vi) Print the classification report with y_true and y_pred
'''

### Grid Search

Stpes to do:
    1. Create a variable named tunned_paramters to set the parameters by cross validation
    2. Create a variable scores which stres 'precision' and 'recall'
    3. Using a for loop for score in scores
          i) Create a model named clf for Grid search with SVC(), tuned_parameters, scoring='%s_macro' % score as parameters
          ii) Fit the training data inot the model
          iii) Print the best parameters
          iv) Store mean and std of the model into means and stds variable
          v) Using a nested for loop, print the mean, std*2 and params of the model
          vi) Store the predicted value in y_pred and y_test value in y_true variables
          vii) Print the classification report

In [6]:
# Set the parameters by cross-validation
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

scores = ['precision', 'recall']

for score in scores:
    print("# Tuning hyper-parameters for %s" % score)
    print()

    clf = GridSearchCV(
        SVC(), tuned_parameters, scoring='%s_macro' % score
    )
    clf.fit(X_train, y_train)

    print("Best parameters set found on development set:")
    print()
    print(clf.best_params_)
    print()
    print("Grid scores on development set:")
    print()
    means = clf.cv_results_['mean_test_score']
    stds = clf.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
    print()

    print("Detailed classification report:")
    print()
    print("The model is trained on the full development set.")
    print("The scores are computed on the full evaluation set.")
    print()
    y_true, y_pred = y_test, clf.predict(X_test)
    print(classification_report(y_true, y_pred))
    print()

# Note the problem is too easy: the hyperparameter plateau is too flat and the
# output model is the same for precision and recall with ties in quality.

# Tuning hyper-parameters for precision





Best parameters set found on development set:

{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}

Grid scores on development set:

0.983 (+/-0.015) for {'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}
0.956 (+/-0.027) for {'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
0.985 (+/-0.014) for {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
0.981 (+/-0.020) for {'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}
0.985 (+/-0.014) for {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
0.981 (+/-0.019) for {'C': 100, 'gamma': 0.0001, 'kernel': 'rbf'}
0.985 (+/-0.014) for {'C': 1000, 'gamma': 0.001, 'kernel': 'rbf'}
0.981 (+/-0.019) for {'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'}
0.976 (+/-0.002) for {'C': 1, 'kernel': 'linear'}
0.976 (+/-0.002) for {'C': 10, 'kernel': 'linear'}
0.976 (+/-0.002) for {'C': 100, 'kernel': 'linear'}
0.976 (+/-0.002) for {'C': 1000, 'kernel': 'linear'}

Detailed classification report:

The model is trained on the full development set.
The scores are computed on the full evaluation set.

             



Best parameters set found on development set:

{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}

Grid scores on development set:

0.982 (+/-0.016) for {'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}
0.953 (+/-0.027) for {'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
0.984 (+/-0.016) for {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
0.981 (+/-0.020) for {'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}
0.984 (+/-0.016) for {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
0.980 (+/-0.019) for {'C': 100, 'gamma': 0.0001, 'kernel': 'rbf'}
0.984 (+/-0.016) for {'C': 1000, 'gamma': 0.001, 'kernel': 'rbf'}
0.980 (+/-0.019) for {'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'}
0.974 (+/-0.005) for {'C': 1, 'kernel': 'linear'}
0.974 (+/-0.005) for {'C': 10, 'kernel': 'linear'}
0.974 (+/-0.005) for {'C': 100, 'kernel': 'linear'}
0.974 (+/-0.005) for {'C': 1000, 'kernel': 'linear'}

Detailed classification report:

The model is trained on the full development set.
The scores are computed on the full evaluation set.

             

#### This is a basic example to do Pramater estimation using grid search with cross validation.