## Finding best parameters of Support Vector Machine on cancer dataset(sklearn.datasets)

### Loading the ‘load_breast_cancer’ dataset from sklearn.datasets and renaming the dataset ‘cancer_data.’ 

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC,SVR,LinearSVC
from sklearn import metrics
from sklearn.metrics import classification_report

In [2]:
cancer_data=load_breast_cancer()

### Explore the data by looking at the shape of the ‘cancer_data.’

In [3]:
cancer_data.data.shape

(569, 30)

In [4]:
X = cancer_data.data

Y = cancer_data.target

### Creating a test and train dataset. The test set represents 40% of the total dataset.

In [5]:
# Split the data into Trainging and Testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y,test_size = 0.40,random_state =42)

### Importing the ‘SVM’ module from sklearn to create a support vector classifier in svc() by passing the argument kernel as the linear kernel.

In [6]:
cls = SVC(kernel = "linear")

In [7]:
cls.fit(X_train,Y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='linear', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

In [8]:
cls.fit(X_train,Y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='linear', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

In [9]:
#Get predictions
y_pred = cls.predict(X_test)

### Evaluating the model by providing separately the accuracy, precision, recall, and the classification report. 

In [10]:
print ("Accuracy: {:.2f}".format(metrics.accuracy_score(Y_test,y_pred)))
print ("Precision :{:.2f}".format(metrics.precision_score(Y_test,y_pred)))
print ("Recall: {:.2f}".format(metrics.recall_score(Y_test,y_pred)))

Accuracy: 0.96
Precision :0.97
Recall: 0.97


In [11]:
print("The classification report:")
print(classification_report(Y_test,y_pred))

The classification report:
              precision    recall  f1-score   support

           0       0.95      0.94      0.94        80
           1       0.97      0.97      0.97       148

    accuracy                           0.96       228
   macro avg       0.96      0.96      0.96       228
weighted avg       0.96      0.96      0.96       228



### Definitions

ACCURACY: It is the ratio of correctly predicted observations to total number of observations. The model has an accuracy of 96% which is good and dependable.<br>
PRECISION: It is the ratio of correctly predicted positive observations to total predicted positive observations. Precision is similar to Accuracy. So the model's prediction says the models output is 97% correct.<br>
RECALL: The ratio of correctly predicted positive observations to total observations.

### Using kernel=’rbf’, C=[0.1,1, 100], ‘epsilon’=[0.1,0.5,1] and gamma=[1,3,5], finding the best parameters?

In [12]:
param_grid={'C':[0.1,1,100],'epsilon':[0.1,0.5,1],'gamma':[1,3,5]}

In [13]:
grid=GridSearchCV(SVR(kernel='rbf'),param_grid,refit=True,verbose=3)

In [14]:
predict = cls.predict(X_test)

In [15]:
grid.fit(X_train,Y_train)

Fitting 3 folds for each of 27 candidates, totalling 81 fits
[CV] C=0.1, epsilon=0.1, gamma=1 .....................................
[CV] ........ C=0.1, epsilon=0.1, gamma=1, score=-0.193, total=   0.0s
[CV] C=0.1, epsilon=0.1, gamma=1 .....................................
[CV] ........ C=0.1, epsilon=0.1, gamma=1, score=-0.193, total=   0.0s
[CV] C=0.1, epsilon=0.1, gamma=1 .....................................
[CV] ........ C=0.1, epsilon=0.1, gamma=1, score=-0.252, total=   0.0s
[CV] C=0.1, epsilon=0.1, gamma=3 .....................................
[CV] ........ C=0.1, epsilon=0.1, gamma=3, score=-0.193, total=   0.0s

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s



[CV] C=0.1, epsilon=0.1, gamma=3 .....................................
[CV] ........ C=0.1, epsilon=0.1, gamma=3, score=-0.193, total=   0.0s
[CV] C=0.1, epsilon=0.1, gamma=3 .....................................
[CV] ........ C=0.1, epsilon=0.1, gamma=3, score=-0.252, total=   0.0s
[CV] C=0.1, epsilon=0.1, gamma=5 .....................................
[CV] ........ C=0.1, epsilon=0.1, gamma=5, score=-0.193, total=   0.0s
[CV] C=0.1, epsilon=0.1, gamma=5 .....................................
[CV] ........ C=0.1, epsilon=0.1, gamma=5, score=-0.193, total=   0.0s
[CV] C=0.1, epsilon=0.1, gamma=5 .....................................
[CV] ........ C=0.1, epsilon=0.1, gamma=5, score=-0.252, total=   0.0s
[CV] C=0.1, epsilon=0.5, gamma=1 .....................................
[CV] ........ C=0.1, epsilon=0.5, gamma=1, score=-0.064, total=   0.0s
[CV] C=0.1, epsilon=0.5, gamma=1 .....................................
[CV] ........ C=0.1, epsilon=0.5, gamma=1, score=-0.064, total=   0.0s
[CV] 

[CV] ........ C=100, epsilon=0.5, gamma=3, score=-0.064, total=   0.0s
[CV] C=100, epsilon=0.5, gamma=3 .....................................
[CV] ........ C=100, epsilon=0.5, gamma=3, score=-0.064, total=   0.0s
[CV] C=100, epsilon=0.5, gamma=3 .....................................
[CV] ........ C=100, epsilon=0.5, gamma=3, score=-0.036, total=   0.0s
[CV] C=100, epsilon=0.5, gamma=5 .....................................
[CV] ........ C=100, epsilon=0.5, gamma=5, score=-0.064, total=   0.0s
[CV] C=100, epsilon=0.5, gamma=5 .....................................
[CV] ........ C=100, epsilon=0.5, gamma=5, score=-0.064, total=   0.0s
[CV] C=100, epsilon=0.5, gamma=5 .....................................
[CV] ........ C=100, epsilon=0.5, gamma=5, score=-0.036, total=   0.0s
[CV] C=100, epsilon=1, gamma=1 .......................................
[CV] .......... C=100, epsilon=1, gamma=1, score=-0.064, total=   0.0s
[CV] C=100, epsilon=1, gamma=1 .......................................
[CV] .

[Parallel(n_jobs=1)]: Done  81 out of  81 | elapsed:    0.6s finished


GridSearchCV(cv='warn', error_score='raise-deprecating',
             estimator=SVR(C=1.0, cache_size=200, coef0=0.0, degree=3,
                           epsilon=0.1, gamma='auto_deprecated', kernel='rbf',
                           max_iter=-1, shrinking=True, tol=0.001,
                           verbose=False),
             iid='warn', n_jobs=None,
             param_grid={'C': [0.1, 1, 100], 'epsilon': [0.1, 0.5, 1],
                         'gamma': [1, 3, 5]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=3)

In [16]:
#The best parameters
print("The best parameters:"+format(grid.best_params_))

The best parameters:{'C': 1, 'epsilon': 0.1, 'gamma': 1}


### What does C, epsilon, and gamma represent?

1. 'C': SVMs have a parameter 'C' that determines how much error the SVM will allow for. If 'C' is large, then the SVM has a hard margin — it won’t allow for many misclassifications, and as a result, the margin could be fairly small. If 'C' is too large, the model runs the risk of overfitting. It relies too heavily on the training data, including the outliers. On the other hand, if 'C' is small, the SVM has a soft margin. Some points might fall on the wrong side of the line, but the margin will be large. This is resistant to outliers, but if 'C' gets too small, you run the risk of underfitting.<br>
2. 'Gamma': Intuitively, the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors.<br>
3. 'Epsilon': The value of ϵ defines a margin of tolerance where no penalty is given to errors. The larger ϵ is, the larger errors you admit in your solution.