In [None]:
import sklearn.datasets
newsgroups = sklearn.datasets.fetch_20newsgroups_vectorized()
X, y = newsgroups.data, newsgroups.target
print(X.shape)
print(y.shape)

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 1)
knn.fit(X, y)
y_pred = knn.predict(X)
knn.score(X, y)

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
knn.fit(X_train, y_train)
knn.score(X_test, y_test)


**APPLYTING LOGISTIC REGRESSION AND SVM**

In [None]:
# APPLYING LOGISTIC REGRESSION

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(X_train, y_train)
lr.predict(X_test)
lr.score(X_test, y_test)

In [None]:
# Example using Wine Dataset
import sklearn.datasets
wine = sklearn.datasets.load_wine()
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(wine.data, wine.target)
print(lr.score(wine.data, wine.target))

#Determining the confidence scores
print(lr.predict_proba(wine.data[:1]))

In [None]:
# BASIC SVM CLASSIFIER

import sklearn.datasets 
wine = sklearn.datasets.load_wine()
from sklearn.svm import LinearSVC
# Using Linear Classifier
svm = LinearSVC() #Using Default Hyperparameter
svm.fit(wine.data, wine.target)
print(svm.score(wine.data, wine.target))

#Using NonLinear Classifier
from sklearn.svm import SVC
svm = SVC() #Using Default Hyperparameter
svm.fit(wine.data, wine.target)
print(svm.score(wine.data, wine.target))

Hyperparameter is a choice about the model made by the user before fitting the model
It controls the complexity of the model

**UNDERFITTING:**
If the model is too simple, it maybe unable to capture the patterns in the data leading to low training accuracy

**OVERFITTING:**
If the model is too complex, it may learn the peculiarities of the training set leading to lower test accuracy


LINEAR DESCISION BOUNDARY

- A descision boundary tells us what class out classifier will predict for any value of X
- The dividing line between the features is called the descision boundary
- Linear Descision Boundary - Descision Boundary is a straight line
- Non-Linear Descision Boundary - Descision Boundary is not a straight line and is in the form of a curve

Vocabulary:

- Classification: It is supervised learning when the y values(target values) are categories 
- Regression: It is supervised learning when the y values(target values) are trying to predict a continuous value Linearly Seperable - Dataset: It can be perfectly classified by a linear descision boundary

LINEAR CLASSIFIERS

The sum of all the elements of the dot product of two matrices can be simply done using the @ symbol

In [None]:
import numpy as np
X = np.arange(3)
print(X)
y = np.arange(3, 6)
print(y)
print("The dot product of two matrices is: ", X*y)
print("The sum of the dot product of two matrices is: ", np.sum(X*y))
print("The sum of the dot product of two matrices using @: ", X@y)

LINEAR CLASSIFIER PREDICTION

( raw model output ) = ( Coeffectients * Features ) + Intercept [Equation of a line -> y = mx + c]
- We then check if the raw model output is positive or negative If it is positive -> predict one class If it is negative -> predict other class
- This pattern is the same for Logistic Regression and Linear SVM's
- In scikit learn terms, we can say that Logistic Regression and Linear SVM's have different fit function but the same predict functions

In [None]:
# IMPLEMENTATION OF LINEAR CLASSIFIER PREDICTION IN BREAST CANCER CLASSIFIER DETECTION

import sklearn.datasets
breast_cancer = sklearn.datasets.load_breast_cancer()
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(breast_cancer.data, breast_cancer.target)
print(lr.predict(breast_cancer.data)[10]) # predicting the value for 10th index
print(lr.predict(breast_cancer.data)[20]) # predicting the value for 20th index

In [None]:
# We can get the learn coefficient and intercept using lr.coef_ and lr.intercept_

print("Raw Model Output for the 10th Index: ", lr.coef_ @ breast_cancer.data[10] + lr.intercept_)  # This will return the raw model output for the 10th index

print("Raw Model Output for the 20th Index: ", lr.coef_ @ breast_cancer.data[20] + lr.intercept_)  # This will return the raw model output for the 10th index

**LOSS FUNCTIONS**

- Machine learning algorithms involve minimising loss
- Minimising loss is changing the values of coefficitient such that the loss function is minimal
- The loss function is a penalty score that assesses how poorly the model is doing on the training data
- .score() function is not the loss function

i) Least Squares: Squared loss

- it mininises the sum of squares of the errors on the training sets
- Error = True Target Value - Predicted Target Value
- It is not appropriate for classification problems because the y values are categories and not numbers

**CLASSIFICATION ERRORS**

0-1 LOSS:
- A natural loss for classification is the number of errors it makes. This is the 0-1 loss function
- Loss function = 0, if the prediction is correct = 1, if the prediction is incorrect
- By summing this function for all the training examples, we get the total number of errors made by the model
- This loss is very hard to minimise which is why it is not used in logistic regression

In [None]:
# MINIMIZING LOSS FUNCTIONS USING SCIPY.OPTIMIZE.MINIMISE

from scipy.optimize import minimize
print(minimize(np.square, 0).x)  
# the second argument is our initial guess
#.x at the end grabs the input value that makes the function as small as possible
print(minimize(np.square, 2).x)  

In [None]:
# IMPLEMENTING LINEAR REGRESSION FROM SCRATCH

# The squared error, summed over training examples
def my_loss(w):
    s = 0
    for i in range(y.size):
        # Get the true and predicted target values for example 'i'
        y_i_true = y[i]
        y_i_pred = w@X[i]
        s = s + (y_i_true - y_i_pred)**2
    return s

# Returns the w that makes my_loss(w) smallest
w_fit = minimize(my_loss, X[0]).x
print(w_fit)

# Compare with scikit-learn's LinearRegression coefficients
lr = LinearRegression(fit_intercept=False).fit(X,y)
print(lr.coef_)

**LOSS FUNCTION DIAGRAMS**

1) 0-1 LOSS FUNCTION:

![Screenshot 2024-04-05 190737](Screenshot%202024-04-05%20190737.png)

2) LINEAR REGRESSION LOSS DIAGRAM:
- This is a quadratic curve as it takes the sum of squares of all errors
![Screenshot 2024-04-05 190905](Screenshot%202024-04-05%20190905.png)

3) LOGISTIC LOSS:
- Used in logistic regression
- Smooth version of the 0-1 loss
- As we move towards the left, as the value of raw model output decreases, the loss also increases
- As we move towards the right, as the values of the raw model output decreases, the loss also decreases
![Screenshot 2024-04-05 191214](Screenshot%202024-04-05%20191214.png)

4) HINGE LOSS:
- Used in SVM's
![Screenshot 2024-04-05 191435](Screenshot%202024-04-05%20191435.png)
- General shape is similar to Logistic Loss
![Screenshot 2024-04-05 191517](Screenshot%202024-04-05%20191517.png)


REGULARIZED LOGISTIC REGRESSION

- Regularization combats overfitting by making the model coefficients smaller
- hyperparameter C is the inverse of regularization strength (i.e. Larger C means less regularization and smaller C means more regularization)

In [None]:
# HOW DOES REGULARIZATION AFFECT TRAINING ACCURACY?
import sklearn.datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

breast_cancer = sklearn.datasets.load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)
lr_weak_reg = LogisticRegression(C=100)
lr_strong_reg = LogisticRegression(C=0.01)

lr_weak_reg.fit(X_train, y_train)
lr_strong_reg.fit(X_train, y_train)

print("Training accuracy of model with weaker regularization: ", lr_weak_reg.score(X_train, y_train))
print("Training accuracy of model with strong regularization: ", lr_strong_reg.score(X_train, y_train))

print("Test accuracy of model with weaker regularization: ", lr_weak_reg.score(X_test, y_test))
print("Test accuracy of model with strong regularization: ", lr_strong_reg.score(X_test, y_test))

**REGULARIZATION**

- Regularization is an extra term added to the loss function that penalizes the large values of the coeffecients Screenshot 2024-04-06 153204

- Without regularization, we are maximising the training accuracy

- With regularization, large model coeffecients are penalized which distracts from the goal of high training accuracy. Thus the training accuracy decreases

- The larger the regularization penalty(i.e. the smaller the values of C) the more we deviate from the goal of maximising training accuracy.

- However, regularization improves the test accuracy

- Ridge and Lasso are two different types of regularization

- Lasso = Linear Regression with L1 Regularization

- Ridge = Linear Regression with L2 Regularization Screenshot 2024-04-06 154055

# Training two different Logistic Regression models with L1 and L2 Regularization

lr_L1 = LogisticRegression(solver = 'liblinear', penalty = 'l1')
lr_L2 = LogisticRegression() # Default is l2 penalty

lr_L1.fit(X_train, y_train)
lr_L2.fit(X_train, y_train)

LOGISTIC REGRESSION PROBABILITIES

- Instead of classifying features as class 1 or class 2, we see the probabilities of it being in either of the classes
Screenshot 2024-04-06 155847

- The probabilities are computed from the raw model output
- The raw model output is squashed to values between 0 and 1 using the sigmoid function
Screenshot 2024-04-06 160110


MULTI CLASS LOGITIC REGRESSION

One vs Rest Strategy:
- Comining Multiple Binary Classifiers y==0 returns an array the same size as y where each element is TRUE if the class is 0 and false if the class is not 0 lr0.fit(X, y==0) lr1.fit(X, y==1) lr2.fit(X, y==2)
- To make a prediction, we take the class whose classifier gives the largest raw model output(descision_function(X) in scikilearn)
- This means that the model is more confident that the class is 0 than any other classes

SUPPORT VECTORS

- Linear SVM's are also Linear Classifiers but they use the hinge loss functionand L2 regularization
- Support vectors are defined as training examples that influence the decision boundary.
- If a training example falls in the 0 loss region, it does not contribute to the fit.
- Support vectors are training examples that are not in the flat part of the loss diagram(do not have loss=0)
- Support Vecotrs: Include Incorrectly classified examples and correctly classified examples close to the boundary. The closeness to - the boundary is controlled by the regularization strength
- if an example is not a support vector, removing it has no effect on the model
- SVM maximises the margin for linearly seperable datasets

**KERNEL SVM's**

- Classifying non linear datasets using linear classifiers
- Fitting a linear model in the transformed space corresponds to fitting a non linear model in the non transformed space

In [None]:
from sklearn.svm import SVC

svm = SVC(gamma = 1)  # The default behaviour is rbf or Radial Basis Function Kernel
# The gamma hyperparameter controls the smoothness of the boundary

**SGD Classifier**

- Stochastic Gradient Descent Classifier
- scales well to large data sets
- Hyperparameter is called alpha instead of C where alpha is 1/alpha