# Logistic regression Implementation


## 1. Information
Logistic regression is a machine learning algorithm for binary classification problems.

Logistic regression is similar to linear regression. We’re still dealing with a line equation for making predictions. The results are passed in a Sigmoid activation function to convert real values to probabilities.

The probability tells you the chance of the instance belonging to a positive class. These probabilities are then turned to actual classes based on a threshold value.

#### 1.2 Choses mathématiques
We’re still dealing with a line equation:
$$\hat{y}=wx+b$$
The output of the line equation is passed through a Sigmoid (Logistic) function
$$S(x)=\frac{1}{1+e^{-x}}$$
The purpose of a sigmoid function is to take any real value and map it to a probability — value between zero and one.

As a cost function, we’ll use a Binary Cross Entropy function, shown in the following formula:
$$BCE = -\frac{1}{n}\sum_{i}^n y_i log\hat{y}+(1+y_i)log(1-\hat{y})$$

We will need to use this cost function in the optimization process to update weights and bias iteratively. 
$$\partial_w = \frac{1}{n}\sum_{i}^n2x_i(\hat{y}-y_i)$$
$$\partial_b = \frac{1}{n}\sum_{i}^n2(\hat{y}-y_i)$$

Gradient descent update rules
$$w=w-\alpha \partial_w$$
$$b=b-\alpha \partial_b$$

#### 1.3 NumPy Implementation 

In [20]:
import numpy as np

In [None]:
class LogisticRegression:

    def __init__(self, learning_rate=0.1, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights, self.bias = None, None


    def _sigmoid(x):    
        return 1 / (1 + np.exp(-x))
    
    def bce(y, y_hat):
        def safe_log(x): return 0 if x == 0 else np.log(x)
        
        total = 0
        for curr_y, curr_y_hat in zip(y, y_hat):
            total += (curr_y * safe_log(curr_y_hat) + (1 - curr_y) * safe_log(1 - curr_y_hat))
        return - total / len(y)
    
    def fit(self, X, y):
        # 1. Initializing coefficients
        self.weights = np.zeros(X.shape[1])
        self.bias = 0
        
        # 2. Perform gradient descent
        for i in range(self.n_iterations):
            linear_pred = np.dot(X, self.weights) + self.bias
            probability = self._sigmoid(linear_pred)
            
            # Calculate derivatives
            partial_w = (1 / X.shape[0]) * (2 * np.dot(X.T, (probability - y)))
            partial_d = (1 / X.shape[0]) * (2 * np.sum(probability - y))
            
            # Update the coefficients
            self.weights -= self.learning_rate * partial_w
            self.bias -= self.learning_rate * partial_d