# Linear Models for Classification

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import mglearn
from sklearn.model_selection import train_test_split

We will consider Logistic Regression and Support Vector Machine. Both use L2 regularization by default and the hyperparameter to control the extent of regularization is now C. 
With low values of C, we have high regularization and less overfitting so the model will generalize well. 
With high values of C, the model tries to compensate more for outliers and noise leading to overfitting. 
The default is C=1


In [2]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC

## Logistic Regression


###### We will us the in built cancer dataset. First, we will try the default C = 1

In [6]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state=42)
logreg = LogisticRegression(max_iter=100000).fit(X_train, y_train)

print("Training set score: {:.3f}".format(logreg.score(X_train, y_train)))
print("Test set score: {:.3f}".format(logreg.score(X_test, y_test)))

Training set score: 0.958
Test set score: 0.958


###### Now, C = 100, we are expecting less regularization due to higher C and thus more overfitting and better perfomance on training set

In [8]:
logreg100 = LogisticRegression(C=100, max_iter=100000).fit(X_train, y_train)

print("Training set score: {:.3f}".format(logreg100.score(X_train, y_train)))
print("Test set score: {:.3f}".format(logreg100.score(X_test, y_test)))

Training set score: 0.984
Test set score: 0.965


###### Now, C = 100, we are expecting stronger regularization due to lower C and thus less overfitting and slightly worse perfomance on training set

In [9]:
logreg001 = LogisticRegression(C=0.01, max_iter=100000).fit(X_train, y_train)

print("Training set score: {:.3f}".format(logreg001.score(X_train, y_train)))
print("Test set score: {:.3f}".format(logreg001.score(X_test, y_test)))

Training set score: 0.953
Test set score: 0.951
