## Logistic Regression
Logistic Regression is a statistical method used for binary classification problems, where the goal is to predict the probability of an outcome that can take one of two possible values (yes/no or 0/1).

### Assumptions
1. Binary Outcome.
2. Independence of Observations (No repeated measures or clustering)
3. Linearity of Independent variales and Log Odds (logit). (Can be checked using the Box-Tidwell test or by plotting the log odds against the predictors).
4. No Multicollinearity (Can be checked using Variance Inflation Factor)
5. Large Sample size
6. No Outliers

### Limitations
1. Linear decision boundary
2. Sensitive to irrelevant features
3. Overfitting with Too many Predictors
4. Assumptions of Independence.
5. Limited to Binary or Ordinal Outcomes.
6. Interpretability with interaction terms (polynomial terms can make the model harder to interpret)
7. Not Suitable for Non-linear Relationships

### When to Use Logistic Regression

1. When the outcome is binary.
2. When interpretability of the model is important.
3. When the relationship between predictors and the log odds is approximately linear.
4. When the dataset is not too large or complex.


In [1]:
# Pseudo Code 
# P(Y=1) = 1 / (1 + e^(-z)) , which is a sigmoid function
#  where z = a0 + a1x1 + ... + anxn
# initialize weights
# define sigmoid function
# compute cost function
# Gradient descent for para meter updates
# training loop
# prediction function
# model evaluation


In [1]:
import numpy as np 

In [6]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

sigmoid(0)

np.float64(0.5)

In [9]:
class CustomLogisticRegression:
    def __init__(self, learning_rate=0.01, epochs=1000):
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.coef_ = None
        self.intercept_ = None

    def fit(self, X, y):
        n_samples, n_features = X.shape # rows represent samples, col represent features
        self.coef_ = np.zeros(n_features)
        self.intercept_ = 0

        for _ in range(self.epochs):
            linear_pred = np.dot(X, self.coef_) + self.intercept_ # linear eqn y = xA + b
            predictions = sigmoid(linear_pred)

            dw = (1/n_samples) * np.dot(X.T, (predictions - y))
            db = (1/n_samples) * np.sum(predictions - y)

            self.coef_ = self.coef_ - self.learning_rate * dw
            self.intercept_ = self.intercept_ - self.learning_rate * db

    def predict(self, X):
         linear_pred = np.dot(X, self.coef_) + self.intercept_ 
         y_pred = sigmoid(linear_pred)
         class_pred = [0 if y<=0.5 else 1 for y in y_pred]
         return class_pred



In [11]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

clf = CustomLogisticRegression()
clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)

def accuracy(y_pred, y_test):
    return np.sum(y_pred==y_test)/len(y_test)

acc = accuracy(y_pred, y_test)
print(acc)

0.9210526315789473


  return 1/(1+np.exp(-x))
