# Dobrenz - Logistic Regression, SVM and Kernel SVM Parameter Notebook

## Import Libraries and Data

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

In [5]:
cancer = load_breast_cancer()

In [6]:
cancer.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [7]:
df_feat = pd.DataFrame(cancer['data'], columns=cancer['feature_names'])

In [8]:
df_feat.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


## Standardize the Data

In [10]:
scaler = StandardScaler()

In [11]:
X_train_scaled = scaler.fit_transform(df_feat)

In [12]:
X = X_train_scaled
y = cancer['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

## Logistic Regression

In [14]:
logmodel = LogisticRegression(
    penalty='l2',
    dual=False,
    tol=0.0001,
    C=1.0,
    fit_intercept=True,
    intercept_scaling=1,
    class_weight=None,
    random_state=None,
    solver='lbfgs',
    max_iter=100,
    multi_class='auto',
    verbose=0,
    warm_start=False,
    n_jobs=None,
    l1_ratio=None,)

### Log Regression Parameters

* Penalty is the use of the L1, L2, or ElasticNet for regularization
* Dual Primal Formulation: This is the standard form of the logistic regression problem, which directly optimizes the logistic loss function with respect to the coefficients of the model.  Dual Formulation: The dual formulation deals with the Lagrange multipliers of the problem, rather than the weights directly. This can lead to faster computations under certain conditions, particularly when the number of features (variables) is greater than the number of samples.
* tol is the tolerance for stopping. 
* C controls the bias-variance trade-off of the statistical learning technique. When C is small, we seek narrow margins that are rarely violated; this amounts to a classifier that is highly fit to the data, which may have low bias but high variance. On the other hand, when C is larger, the margin is wider and we allow more violations to it; this amounts to fitting the data less hard and obtaining a classifier that is potentially more biased but may have lower variance.
* Fit Intercept is whether or not an intercept should be added to the decision function

In [17]:
logmodel.fit(X_train, y_train);

In [18]:
predictions = logmodel.predict(X_test)

In [19]:
from sklearn.metrics import classification_report, confusion_matrix

In [20]:
print(confusion_matrix(y_test, predictions))

[[ 66   1]
 [  3 118]]


In [21]:
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.96      0.99      0.97        67
           1       0.99      0.98      0.98       121

    accuracy                           0.98       188
   macro avg       0.97      0.98      0.98       188
weighted avg       0.98      0.98      0.98       188



* Precision: This metric indicates the accuracy of positive predictions. It's defined as the ratio of true positives to the total number of instances predicted as positive (true positives plus false positives).
* Recall (or Sensitivity or True Positive Rate): This metric measures the ability of the model to find all true positives. It's defined as the ratio of true positives to the actual positives (true positives plus false negatives). Recall is critical when the cost of missing a true positive is high.
* F1-Score: This is the harmonic mean of precision and recall, providing a balance between the two metrics. It is particularly useful when you have classes that are imbalanced. The F1-score is a way to combine both precision and recall into a single measure that captures both properties.
* Support: This number represents the actual number of instances in each class in the dataset. It helps identify the distribution of classes within the dataset.

## Support Vector Machines

Used when the decision boundaries are linear

In [24]:
from sklearn.svm import SVC

In [25]:
model = SVC()

In [26]:
model.fit(X_train, y_train);

In [27]:
predictions = model.predict(X_test)

## SVM Confusion Matrix with Default Params

In [29]:
print(confusion_matrix(y_test, predictions))

[[ 65   2]
 [  4 117]]


In [30]:
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.94      0.97      0.96        67
           1       0.98      0.97      0.97       121

    accuracy                           0.97       188
   macro avg       0.96      0.97      0.97       188
weighted avg       0.97      0.97      0.97       188



## Grid Search SVM

In [32]:
from sklearn.model_selection import GridSearchCV

In [33]:
param_grid = {'C':[0.1,1,10,100,1000], 'gamma':[1,0.1,0.01,0.001,0.0001]}

Gamma is the free parameter in the radial basis function. A small gamma means a gaussian with a large variance. A large gamma will lead to a high bias a low variance in the model and vice versa. 

In [35]:
grid  = GridSearchCV(SVC(), param_grid, verbose=3)

Verbose or Verbosity determines how much information the algorithm will print out while it is running. This can be useful for debugging or understanding how the algorithm is progressing.

In [37]:
grid.fit(X_train, y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END ....................C=0.1, gamma=1;, score=0.623 total time=   0.0s
[CV 2/5] END ....................C=0.1, gamma=1;, score=0.618 total time=   0.0s
[CV 3/5] END ....................C=0.1, gamma=1;, score=0.618 total time=   0.0s
[CV 4/5] END ....................C=0.1, gamma=1;, score=0.618 total time=   0.0s
[CV 5/5] END ....................C=0.1, gamma=1;, score=0.618 total time=   0.0s
[CV 1/5] END ..................C=0.1, gamma=0.1;, score=0.974 total time=   0.0s
[CV 2/5] END ..................C=0.1, gamma=0.1;, score=0.961 total time=   0.0s
[CV 3/5] END ..................C=0.1, gamma=0.1;, score=0.882 total time=   0.0s
[CV 4/5] END ..................C=0.1, gamma=0.1;, score=0.934 total time=   0.0s
[CV 5/5] END ..................C=0.1, gamma=0.1;, score=0.908 total time=   0.0s
[CV 1/5] END .................C=0.1, gamma=0.01;, score=0.935 total time=   0.0s
[CV 2/5] END .................C=0.1, gamma=0.01

In [38]:
grid.best_params_

{'C': 1000, 'gamma': 0.0001}

The best C value for this model is 1000 and gamma value is 0.0001

## SVM Confusion Matrix with Best Params

In [41]:
grid_predictions = grid.predict(X_test)

In [42]:
print(confusion_matrix(y_test, predictions))

[[ 65   2]
 [  4 117]]


In [43]:
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.94      0.97      0.96        67
           1       0.98      0.97      0.97       121

    accuracy                           0.97       188
   macro avg       0.96      0.97      0.97       188
weighted avg       0.97      0.97      0.97       188



## Kernel SVM

Used when decision boundaries are non linear.

In [46]:
iris = datasets.load_iris()
X = iris.data
y = iris.target

In [47]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


In [48]:
X_scaled = scaler.fit_transform(X_train)

In [49]:
k_svm = SVC(kernel='rbf', C=1.0, gamma='scale')

In [50]:
k_svm.fit(X_train, y_train);

In [51]:
predictions = k_svm.predict(X_test)

In [52]:
print(confusion_matrix(y_test, predictions))

[[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]


In [53]:
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



This shows that the model performed perfectly on the dataset. 

Change the C value to 100 to show the change in model performance

In [56]:
k_svm = SVC(kernel='rbf', C=0.100, gamma='scale')
k_svm.fit(X_train, y_train);
predictions = k_svm.predict(X_test)

In [57]:
print(confusion_matrix(y_test, predictions))

[[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]


In [58]:
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



Change the C value to a low value and see the effect it has on the model

In [60]:
k_svm = SVC(kernel='rbf', C=0.01, gamma='scale')
k_svm.fit(X_train, y_train);
predictions = k_svm.predict(X_test)

In [61]:
print(confusion_matrix(y_test, predictions))

[[ 0 19  0]
 [ 0 12  1]
 [ 0  0 13]]


The model did not correctly identify the first class and misclassfied those as another type of flower. It was a complete misclassification of the first class. Reducing the C value has made the classifier too narrow. This is underfitting by making the model too simple. 

This defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors.