<center><h1 style="color:green">Logistic Regression</center>

# 1. Introduction to Logistic Regression

Logistic Regression is a statistical model used for binary classification problems, where the target variable has two possible outcomes (e.g., 0 or 1, win or loss, pass or fail). 
It is used to model the probability of a binary outcome based on one or more predictor variables. Despite its name, logistic regression is a classification algorithm, not a regression algorithm.


# 2. Hypothesis Function in Logistic Regression

The hypothesis in logistic regression represents the probability that the target variable belongs to a particular class. It uses the sigmoid function to map the output to a probability value between 0 and 1.

The sigmoid function, $\sigma(z)$, is defined as:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

Where:
- \( z \) is a linear combination of the input features: $ z = w \cdot X + b $
- \( w \) is the weight vector
- \( X \) is the feature vector
- \( b \) is the bias term

Thus, the hypothesis function in logistic regression is:

$$
h_{\theta}(X) = \sigma(w \cdot X + b) = \frac{1}{1 + e^{-(w \cdot X + b)}}
$$

This function outputs a probability between 0 and 1, representing the likelihood that the input \( X \) belongs to the positive class.


# 3. Cost Function

Logistic regression uses Log-Loss or Binary Cross-Entropy Loss as the cost function, which measures how well the model's predictions match the actual labels.

The cost function for logistic regression is:

$$
J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left( y^{(i)} \log(h_{\theta}(X^{(i)})) + (1 - y^{(i)}) \log(1 - h_{\theta}(X^{(i)})) \right)
$$

Where:
- $ m $ is the number of training examples
- $ y^{(i)} $ is the actual label of the $ i $-th training example
- $ h_{\theta}(X^{(i)}) $ is the predicted probability for the $ i $-th example

The goal of logistic regression is to minimize this cost function using optimization algorithms such as Gradient Descent.

# 4. Confusion Matrix

For a binary classification problem:

|                | **Predicted Positive** | **Predicted Negative** |
|----------------|-------------------------|-------------------------|
| **Actual Positive** | True Positive (TP)       | False Negative (FN)       |
| **Actual Negative** | False Positive (FP)      | True Negative (TN)        |

1. **True Positive (TP):** Correctly predicted positive instances.
2. **True Negative (TN):** Correctly predicted negative instances.
3. **False Positive (FP):** Instances where the model incorrectly predicted positive when it was actually negative (Type I error).
4. **False Negative (FN):** Instances where the model incorrectly predicted negative when it was actually positive (Type II error).
### Evaluation Metrics:
Using the confusion matrix, the following metrics can be calculated:

1. **Accuracy:**  
   The proportion of correctly classified instances out of the total number of instances.  
   $$ 
   \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} 
   $$

2. **Precision:**  
   The proportion of true positive predictions out of all positive predictions.  
   $$ 
   \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} 
   $$

3. **Recall (Sensitivity or True Positive Rate):**  
   The proportion of actual positives correctly identified.  
   $$ 
   \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} 
   $$

4. **F1 Score:**  
   The harmonic mean of precision and recall.  
   $$ 
   \text{F1 Score} = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} 
   $$

5. **False Positive Rate (FPR):**  
   The proportion of actual negatives incorrectly classified as positive.  
   $$ 
   \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}} 
   $$

6. **Specificity (True Negative Rate):**  
   The proportion of actual negatives correctly classified as negative.  
   $$ 
   \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}} 
   $$
# 5. Types of Logistic Regression

1. Binary Logistic Regression: This is used for binary classification problems where the target variable has two classes (e.g., 0 or 1). The sigmoid function is used to model the probability of one class.

2. Multinomial Logistic Regression: This is used for classification problems where the target variable has more than two classes. The softmax function is used in place of the sigmoid function to model the probability distribution over all classes.

   The softmax function for K classes is:

$$
\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}} \quad \text{for } j = 1 \text{ to } K
$$

Where $ z_i $ is the score for class $ i $ and $ K $ is the total number of classes.

The probability for class $ c $ is:

$$
P(Y = c | X) = \frac{e^{w_c \cdot X + b_c}}{\sum_{k=1}^{K} e^{w_k \cdot X + b_k}} \quad \text{for } k = 1 \text{ to } K
$$


# 6. Proof of Sigmoid Function

The sigmoid function is defined as:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

This function maps any real-valued input to a value between 0 and 1. 

Proof that the sigmoid function maps to (0, 1):

- As $ z \to \infty $, $ \sigma(z) \to 1 $
- As $ z \to -\infty $, $ \sigma(z) \to 0 $

Thus, the sigmoid function always outputs values in the interval (0, 1), making it suitable for modeling probabilities in logistic regression.


<img src="1.png">