# Linear Models
This lab will introduce Linear Models for classification


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression, Perceptron, SGDClassifier
import warnings
from sklearn.exceptions import ConvergenceWarning

warnings.filterwarnings(action="ignore", category=ConvergenceWarning)

## Data Preparation
Let's begin by preparing a dataset for this lab

In [None]:
df = (
    pd.read_csv("../datasets/winequality-red.csv", sep=";")
    .sample(frac=1)
    .reset_index(drop=True)
)

X_train, X_test, y_train, y_test = train_test_split(
    df.iloc[:, :-1].values, df.iloc[:, -1], test_size=0.2
)

## What is a linear model?

A linear classifier is a machine learning model that classifies samples using a linear combination of their features. 

Formally, for a sample, $x$ with feature vector, $\vec{x}$, the output of a linear model is a linear combination of those features with corresponding weights, $\vec{w}$:

$$
y = f(\vec{w}\cdot\vec{x}) = f \left( \sum_j w_j x_j \right)
$$

To train a linear model, we need to find the set of weights, $\vec{w}$, that minimizes classification errors. That is what the `fit()` function is doing in SKLearn

## Building a Train and Eval Function

Let's build a function that will train and test a model on our data

In [None]:
def train_and_test(model):
    model.fit(X_train, y_train)
    return model.score(X_test, y_test)

## Logistic Regression

The first linear model we'll use is Logistic Regression. Logistic regression is a linear model that uses a logistic equation as the function $f$ in the above equation:

$$
f(x) = \frac{1}{1+e^{-x}}
$$

Let's use our train_and_test function to train a Logistic Regression Classifier:

In [None]:
model = LogisticRegression(max_iter=500)
train_and_test(model)

## Perceptron
<img src="../images/perceptron.png" width=500 />

The perceptron is a linear model that utilizes a step function as its activation function $f$:

$$
f(x) = \begin{cases}
0 & x \leq 0 \\
1 & x > 0
\end{cases}
$$

### Perceptron Learning Algorithm
The perceptron optimization algorithm functions by optimizing a cost function. In particular, it attempts to minimize the amount of error (or cost) that is generated by classifying the train dataset. 

It does this by measuring the error of the classifier (i.e., a misclassification) and then updating its own weights proportional to that error. The key idea is that by iterating though the training data enough times, the perceptron will learn how to separate the different classes present in the data. 

The perceptron is also our first introduction to neural networks, a family of machine learning algorithms modeled after the neurons in the brain. The perceptron training algorithm, described below, is a foundational algorithm for understanding how neural networks learn.

The training algorithm for a perceptron is a loop. For each sample during training:
1. Aggregation - Take the dot product of the incoming features, $x$, and the weights, $w$, and add a bias, $b$: 
$$
v = w \cdot x + b= \sum_i w_ix_i + b
$$
2. Activation - Activation determines the output of the perceptron. If the aggregated local induced field ($v$) is greater than or equal to 0, we assign a value of 1, but if the $v$ is less than 0 we assign a 0.
$$
y = \begin{cases}
1 & v \geq 0 \\
0 & v < 0
\end{cases}
$$
3. Error calculation - Determine the classification error using a cost function, $C$, which takes in the output of the perceptron, $y$, and the desired label, $d$, and returns a numerical cost:
$$
e = C(y, d) = y - d
$$
4. Weight Update - Update the weights and bias using the error and a scaling factor, known as a learning rate, $\alpha$. The learning rate allows us to decide how large of an update to apply to the weights
$$ 
w_{new} = w_{old} - \alpha ex
$$

Let's now see the perceptron in action:

In [None]:
model = Perceptron(verbose=0, max_iter=5000)
train_and_test(model)

## Stochastic Gradient Descent (SGD) Classifier

The perceptron classifier described above is a special case of a SGD Algorithm, which itself is a linear algorithm. The perceptron learning algorithm is also known as Gradient Descent, and with some minor modifications, is known as Stochastic Gradient Descent (SGD). 

The SGD Classifier differs from the Perceptron in that we can change how the loss (Step 3) is calculated, while for the Perceptron, we can only use the `perceptron` loss function. 

Let's use the SGD Classifier on our dataset:

In [None]:
model = SGDClassifier(max_iter=5000, loss="squared_hinge")
train_and_test(model)

## <span style="background:yellow">Your Turn</span>

Train and test an SGD Classifier with Huber loss with max iterations of 2500:

In [None]:
# <-- Your Code Here -->