### Assignment 5

We have now covered the perceptron, Adaline, and logistic regression.

When using gradient descent, the only difference between these algorithms is how $\hat{y}$ is computed.

In this file create one method called `fit_linear()` that takes as a parameter a value called `algorithm`.

If `algorithm="perceptron"` then `fit_linear()` fits the data with a perceptron.

If `algorithm="adaline"` then `fit_linear()` fits the data with Adaline.

If `algorithm="logreg"` then `fit_linear()` fits the data with logistic regression.

The default value of `algorithm` is `logreg`. 

The `fit_linear` can use batch, stochastic, or mini-batch gradient descent (your choice). 

You do not have to implement all 3. 

### Task

Once you have the method written, load the [Raisin](https://archive.ics.uci.edu/ml/datasets/Raisin+Dataset) dataset provided in this folder.

Produce `X` and `y` as usual and use `sklearn` to do a train-test split and scale the data.

Then fit a linear model to the training set and check your accuracy on the test set.

Which of the three algorithms performs best?

### Extra credit

If you are feeling ambitious, you can add regularization to your model, and a hyperparameter to control it.

This is not as hard as it sounds.  

Probably a properly tuned model with regularization will beat all three of the unregularized linear models. 

This extra credit is worth five points on the midterm.

In [27]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
minmax = MinMaxScaler()
from sklearn.metrics import accuracy_score

# ML functions

In [85]:
sigmoid = lambda z: 1/(1+np.exp(-z))
def Jgrad(X,y,w):

    yhat = (X@w)
    difference = (y-yhat).reshape(X.shape[0],1)
    return np.sum(difference*(X),axis=0)/X.shape[0]
def Jgrad2(X,y,w):
    sigmoid = lambda z: 1/(1+np.exp(-z))
    yhat = sigmoid(X@w)
    difference = (y-yhat).reshape(X.shape[0],1)
    return np.sum(difference*(X),axis=0)/X.shape[0]


def fit_adeline_simple(X,y,max_epochs=100,eta=0.1):

    w = np.random.randn(X.shape[1])/10
    for epoch in range(max_epochs):
        w += eta*Jgrad(X,y,w)
    return w

def fit_logreg_simple(X,y,max_epochs=100,eta=0.1):

    w = np.random.randn(X.shape[1])/10
    for epoch in range(max_epochs):
        w += eta*Jgrad2(X,y,w)
    return w

def fit_perceptron_simple(X,y,max_epochs=100,eta=0.1):
    w = np.random.randn(X.shape[1])/10
    for epoch in range(max_epochs):
        for x,y_i in zip(X,y):
            yhat = phi(w.T@x)
            dw_0 = eta*(y_i-yhat)*x
            w += dw_0
        return w
   


def fit_linear(X,y,algorithm = 'logreg'):
    if algorithm == 'adaline':
        return fit_adeline_simple(X,y)
    elif algorithm == 'logreg':
        return fit_logreg_simple(X,y)
    elif algorithm == 'perceptron' :
        return fit_perceptron_simple(X,y)
        

        


In [86]:
sigmoid(5)

0.9933071490757153

In [87]:
df = pd.read_excel('Raisin_Dataset/Raisin_Dataset.xlsx')
df.head()

Unnamed: 0,Area,MajorAxisLength,MinorAxisLength,Eccentricity,ConvexArea,Extent,Perimeter,Class
0,87524,442.246011,253.291155,0.819738,90546,0.758651,1184.04,Kecimen
1,75166,406.690687,243.032436,0.801805,78789,0.68413,1121.786,Kecimen
2,90856,442.267048,266.328318,0.798354,93717,0.637613,1208.575,Kecimen
3,45928,286.540559,208.760042,0.684989,47336,0.699599,844.162,Kecimen
4,79408,352.19077,290.827533,0.564011,81463,0.792772,1073.251,Kecimen


In [88]:
df['Class'] = df.Class.apply(lambda x: 1 if x =='Kecimen' else -1)

In [89]:
X = df.drop(columns = 'Class')
y = df['Class'].to_numpy()
X = minmax.fit_transform(X)
X = np.c_[np.ones(X.shape[0]),X]


In [90]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = .2)

# Adaline

In [91]:
phi = lambda z: (z >= 0)*2 -1
w = fit_linear(X_train,y_train,'adaline')
w

array([ 0.4061814 , -0.47342129, -0.7400664 , -0.29348748, -0.0990526 ,
       -0.71685375,  0.53726238, -0.45238527])

In [92]:
y_pred_train = phi(X_train@w)
y_pred_test = phi(X_test@w)
accuracy_train = accuracy_score(y_train,y_pred_train)
accuracy_test = accuracy_score(y_test,y_pred_test)
accuracy_train,accuracy_test

(0.8513888888888889, 0.8611111111111112)

# logit

In [93]:

w = fit_linear(X_train,y_train)


In [94]:
def classify(arr):
    arr[arr <.5] = -1
    arr[arr>= .5] = 1


In [95]:
y_pred_train = sigmoid(X_train@w)
classify(y_pred_train)
y_pred_test = sigmoid(X_test@w)
classify(y_pred_test)
accuracy_train = accuracy_score(y_train,y_pred_train)
accuracy_test = accuracy_score(y_test,y_pred_test)
accuracy_train,accuracy_test

(0.49583333333333335, 0.5166666666666667)

# Perceptrong

In [96]:
phi = lambda z: (z >= 0)*2 -1
w = fit_linear(X_train,y_train,'perceptron')
w

array([ 0.65678759, -0.45951874, -0.89760257, -0.11879451, -0.64286921,
       -0.45240104,  0.68416628, -0.71954525])

In [97]:
y_pred_train = phi(X_train@w)
y_pred_test = phi(X_test@w)
accuracy_train = accuracy_score(y_train,y_pred_train)
accuracy_test = accuracy_score(y_test,y_pred_test)
accuracy_train,accuracy_test

(0.8722222222222222, 0.8666666666666667)

# Results
## Adaline did a bit better than perceptron. logistic regression did poorly