# Logistic Regression

## Equation `Ý = 1/ (1+e`<sup>`-z`</sup>`)`


### Where

### Z = w.X + b 

### Ý = Probability that y=1  or Ý = P(y=1 | X)  

### y = Actual Label

### Ý  (or) h(∅) = Predicted 

### X = Input features

### b = bias


## Loss Function

### Binary Cross Entropy (OR) Log loss `L(y,Ý) =  -(y log (Ý) + (1-y) log (1-Ý))`

<ol>
    <li><Strong>When y=1</Strong></li><br><p> L(1,Ý) =  -log (Ý)</p>
    <p>We need a big "Ý" to get small loss because big "Ý" value leads to small value </p> i.e small = -log(big)<br><br>     
    <li><Strong>When y=0</Strong></li><p> L(0,Ý) =  -log(1-Ý)</p>
    <p>We need a small "Ý" to get small loss because small "Ý" value leads to small value </p> i.e big = -log(1-small)<br><br>

</ol>


w = w - a * dw

b = b - a * db


## Derivative

`1. dw = 1/m * (Ý -Y).X`

`2. db = 1/m * (Ý -Y)`

In [1]:
import numpy as np

In [2]:
class Logistic_Regression:
    
    def __init__(self,learning_rate=0.01,num_of_iter=1000):
        
        self.learning_rate = learning_rate
        
        self.num_of_iter = num_of_iter
        
        self.weights = None
        
        self.bias = 0
        
        self.rows = None
        
        self.cols = None
        
    def fit(self,X,y):
        
        self.rows , self.cols = X.shape
        
        self.weights = np.zeros((self.cols,1))
        
        for i in range(self.num_of_iter):
            
            self.update_weights(X,y.reshape(-1,1))
            
    
    
    def predict(self,X):
        
        z =  (X @ (self.weights) + self.bias ) 
        
        y_pred = 1/(1+np.exp(-z))
        
        y_pred = np.where(y_pred>0.5,1,0)
        
        return y_pred
        
        
    def update_weights(self, X, y):
        z = (X @ self.weights + self.bias)
        Ý = 1 / (1 + np.exp(-z))
        
        # Compute gradients
        dw = (1 / self.rows) * (X.T @ (Ý - y))  # Gradient w.r.t weights
        db = (1 / self.rows) * np.sum(Ý - y)     # Gradient w.r.t bias
        
        # Update weights and bias
        self.weights -= self.learning_rate * dw
        self.bias -= self.learning_rate * db
        
        

In [3]:
def train_test_split(X, y, test_size=0.2, random_state=None):

  if random_state:
    np.random.seed(random_state)

  indices = list(range(len(X)))
  np.random.shuffle(indices)

  test_size = int(len(X) * test_size)
  test_indices = indices[:test_size]
  train_indices = indices[test_size:]

  X_train, X_test = X[train_indices], X[test_indices]
  y_train, y_test = y[train_indices], y[test_indices]

  return X_train, X_test, y_train, y_test

# Testing Module

In [4]:
import pandas as pd

from sklearn.metrics import accuracy_score

In [5]:
df = pd.read_csv("diabetes_prediction_dataset.csv")

In [6]:
df.head()

Unnamed: 0,gender,age,hypertension,heart_disease,smoking_history,bmi,HbA1c_level,blood_glucose_level,diabetes
0,Female,80.0,0,1,never,25.19,6.6,140,0
1,Female,54.0,0,0,No Info,27.32,6.6,80,0
2,Male,28.0,0,0,never,27.32,5.7,158,0
3,Female,36.0,0,0,current,23.45,5.0,155,0
4,Male,76.0,1,1,current,20.14,4.8,155,0


In [7]:
df.isnull().sum()

gender                 0
age                    0
hypertension           0
heart_disease          0
smoking_history        0
bmi                    0
HbA1c_level            0
blood_glucose_level    0
diabetes               0
dtype: int64

### Basic Pre Processing

In [8]:
df.loc[df['gender'] == 'Other', 'gender'] = 'Female'

df['gender'] = pd.get_dummies(df['gender'],drop_first=True).astype(int)

df.drop(columns='smoking_history',inplace=bool(1))

In [9]:
X , x ,Y , y = train_test_split(df.iloc[:,:-1].values,df.iloc[:,-1].values)

In [10]:
model = Logistic_Regression()

model.fit(X, Y)

In [11]:
y_pred = model.predict(x)

In [13]:
print("The Accuracy is "+str(accuracy_score(y,y_pred)*100),"%")

The Accuracy is 91.39500000000001 %
