# Logistic regression

Sigmoid function:
$$\frac{1}{1 + e^{-mx + c}}$$

Cross entropy:
$$\frac{1}{N} \sum_{i=1} ^{n}[y^{i}\log(h_{\theta}(x^{i})) + (1 - y^{i})\log(1 - h_{\theta}(x^{i})]$$

Training:
* Initialize weights as zero
* Initialize bias as zero

Given a data point:
* Predict result by using $\frac{1}{1 + e^{-mx + c}}$
* Calculate error
* Use gradient descent to figure out new weights and bias values
* Repeat n times

Testing: <br>
* Given a data point: Put in the values from the data point into the equation $\frac{1}{1 + e^{-mx + c}}$
* Choose the label based on the probability

In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

In [2]:
def sigmoid(x):
    """
    Compute the sigmoid function for a given input x.

    Parameters:
    - x: The input value.

    Returns:
    - result: The sigmoid value of the input.
    """
    return 1 / (1 + np.exp(-x))

class LogisticRegression:
    def __init__(self, learning_rate, number_of_iterations):
        """
        Initialize the logistic regression model with a learning rate and number of iterations.

        Parameters:
        - learning_rate: The learning rate for gradient descent.
        - number_of_iterations: The number of iterations for gradient descent.
        """
        self.learning_rate = learning_rate
        self.number_of_iterations = number_of_iterations
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        """
        Fit the logistic regression model to the training data.

        Parameters:
        - X: The input features of the training data.
        - y: The target values of the training data.
        """
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        for _ in range(self.number_of_iterations):
            # Perform gradient descent for the specified number of iterations
            linear_predictions = np.dot(X, self.weights) + self.bias
            # Calculate the linear predictions based on the current model parameters
            predictions = sigmoid(linear_predictions)
            # Apply the sigmoid function to obtain probabilities
            dw = (1/n_samples) * np.dot(X.T, (predictions-y))
            # Calculate the derivative of the cost function with respect to the weights
            db = (1/n_samples) * np.sum(predictions - y)
            # Calculate the derivative of the cost function with respect to the bias
            self.weights = self.weights - self.learning_rate * dw
            # Update the weights using the gradient descent update rule
            self.bias = self.bias - self.learning_rate * db
            # Update the bias using the gradient descent update rule

    def predict(self, X):
        """
        Make predictions for the given input data.

        Parameters:
        - X: The input features of the data points to predict.

        Returns:
        - class_predictions: The predicted class labels.
        """
        linear_predictions = np.dot(X, self.weights) + self.bias
        # Calculate the linear predictions based on the learned model parameters
        y_predictions = sigmoid(linear_predictions)
        # Apply the sigmoid function to obtain probabilities
        class_predictions = [0 if y <= 0.5 else 1 for y in y_predictions]
        # Convert probabilities to class predictions using a threshold of 0.5
        return class_predictions

In [3]:
bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

In [4]:
logistic_regression = LogisticRegression(0.01, 1000)
logistic_regression.fit(X_train, y_train)
y_prediction = logistic_regression.predict(X_test)

  return 1 / (1 + np.exp(-x))


In [5]:
def accuracy(y_prediction, y_test):
    return np.sum(y_prediction == y_test)/len(y_test)

In [6]:
accuracy = accuracy(y_prediction, y_test)
print(accuracy)

0.9210526315789473
