In this notebook my goal is to implement logistic regression from scratch.

## Logistic regression

### Core idea
Logistic regression is an algorithm that can be used to predict binary classification of the input - for example whether a statement is true or false. Instead of predicting a continous value, logistic regression predicts probablity of the input belonging to a certain class.

### Mathematical view
We start by calculating the linear combination of the input features:
$$z = w^Tx + b$$
where x is the feature vector, w is the weight vector and b is the bias.

And now, to turn this into a probability between 0 and 1 we apply the sigmoid function:
$$\sigma(z) = \frac{1}{1 + e^{-z}}$$

Parameters w and b are learned by minimizing binary cross-entropy.

$$Loss = -\frac{1}{N} \sum_{i=1}^{N}[y_ilog(\hat{y}_i) + (1-y_i)log(1-\hat{y}_i)]$$

where $y_i$ is the true label, and $\hat{y}$ is the prediction.

### Implementation from scratch

In [None]:
import numpy as np

class LogisticRegression:
    def __init__(self):
        self.learning_rate = 0.01
        self.num_iteration = 1000
        self.weights = None
        self.bias = None
        
    def sigmoid(self, z):
        """ Sigmoid function that will translate prediction into probability between 0 and 1. """
        return 1 / (1 + np.exp(-z))
    
    def predict_probability(self, X):
        """ Calculating the probability """
        linear_output = np.dot(X, self.weights) + self.bias
        return self.sigmoid(linear_output)
    
    def fit(self, X, y):
        """ Function that train the model. Its goal is to find the best
            values for weights and bias parameters by minimizing the error on training data. """
        
        # Initializing the parameters
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Training loop
        for _ in range(self.num_iterations):
            
            # Predicting probability for each data point
            y_pred = self.predict_probability(X)
            
            # Computing gradients - adjusting the weights and bias to reduce loss
            dw = (1 / n_samples) * np.dot(X.T, (y_pred - y))
            db = (1 / n_samples) * np.sum(y_pred - y)
            
            # Updating parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate *db
            
            
    def predict(self, X):
        """ Function that predicts the final class labels"""
        y_probs = self.predict_probability(X)
        
        # Formatting the output labels
        binary_predictions = y_probs >= 0.5
        return binary_predictions.astype(int)