## Logistical Regression on Diabetes Data

### Theory

#### True label approximation formula
$f(w,b) = wx + b$

$\hat y = h_\theta = \frac{1}{1 + e^{-wx+b}}$

### Update Rules

$w = w - \alpha * dw$

$b = b - \alpha * db$

### Algorithm

1. Create LogReg class.

1. Initialize random $w$ and $b$.

1. For each epoch, apply the update rule to both $w$ and $b$. This will result in a trained model.

1. For prediction, use the approximation formula on the trained weights and biases and store the predicted labels in an array.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

class LogReg:

    def __init__(self, learning_rate = .01, epochs = 1000):
        self.lr = learning_rate
        self.epochs = epochs
        self.weights = None
        self.bias = None
    
    def fit(self,X,y):

        samples, features = X.shape

        self.weights = np.zeros(features)
        self.bias = 0

        # grad desc

        for _ in range(self.epochs):
            
            lm = np.dot(X, self.weights) + self.bias
            
            y_predicted = self._sigmoid(lm)

            
            dw = (1 / samples) * np.dot(X.T, (y_predicted - y))
            db = (1 / samples) * np.sum(y_predicted - y)
            
            self.weights -= self.lr * dw
            self.bias -= self.lr * db
        
    def predict(self, X):
        linear_model = np.dot(X, self.weights) + self.bias
        y_predicted = self._sigmoid(linear_model)
        y_predicted_cls = [1 if i > 0.5 else 0 for i in y_predicted]
        return np.array(y_predicted_cls)

    def _sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    

def accuracy(y_true, y_pred):
    accuracy = np.sum(y_true == y_pred) / len(y_true)
    return accuracy*100
    

In [2]:

data = pd.read_csv('data/diabetes.csv')
X = data.iloc[:,:8]
y = data.Outcome

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

regressor = LogReg(learning_rate=0.0001, epochs=15000)
regressor.fit(X_train, y_train)
predictions = regressor.predict(X_test)


print("Classification Accuracy:", accuracy(y_test, predictions), "%")

Classification Accuracy: 66.88311688311688 %


Citations:

*Python Engineer, MLfromscratch, (2020), Github Repository*