#### About

Logistic Regression is a popular and simple algorithm used for binary classification tasks, where the goal is to predict the class of an instance as one of two possible outcomes, such as spam or not spam, fraudulent or not fraudulent, etc. 

Logistic Regression is a supervised learning algorithm that uses the logistic function (also known as the sigmoid function) to model the probability of an instance belonging to a certain class. It is a type of generalized linear model that can be used for binary classification problems.

The logistic function is defined as 
g(z) = 1/(1+e^(-z))

where z is the linear combination of features and their corresponding weights,

z= w0+w1x1 + .... + wnxn

Here w0,w1,....wn are the weights of the model and x1,x2... xn are the input features of the dataset.

> Mathematics

The goal of Logistic Regression is to estimate the values of the parameters(weights) w0,w1,w2....,wn that best fit the training data. This is typically done using maximum likelihood estimation (MLE) or other optimization techniques.1

The likelihood function for Logistic Regression is defined as the product of the conditional probabilities of the target variable (binary class label) given the input features:

L(w)= product(i=1 to m) * P(y_i|x_i,w)

where y_i is the binary class label (0 or 1) of the i-th instance, x_i is the input feature vector of the i-th instance, and w is the parameter vector.

Taking the negative logarithm of the likelihood function, We get the log likelihood 

J(w) = -logL(w)

The goal of the optimization is to minimize this log-likelihood function to obtain the best parameter values.

The update rule for the parameters in LR is given by the gradient descent algorithm which aims to find the optimal parameter values that minimize the log likelihood function.

For the jth sample,

w_j = w_j - lr* d(J(w))/d(w_j)

where lr is the learning rate that controls the step size of the optimization process.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [2]:
# Generate synthetic data
X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=42)


In [4]:
# Split data into train and test sets
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
# Initialize Logistic Regression model
lr = LogisticRegression()

In [6]:
# Train the model
lr.fit(X_train, y_train)

In [7]:
# Predict on test data
y_pred = lr.predict(X_test)


In [8]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)


In [9]:
# Print accuracy
print("Accuracy:", accuracy)

Accuracy: 0.95


> Use cases

Machine learning - Spam detection, Fraud detection, Customer churn prediction, Image classification