### ***Logistic Regression*** ###

It predicts probability of belonging to a class (e.g., pass/fail, spam/not spam, disease/no disease).

Instead of fitting a line, logistic regression uses the sigmoid function to squash predictions between 0 and 1 (probabilities).

``` math
p = 1 / 1 - e^z
```
where z is a linear regression line. z = mx + c

***Accuracy***
It is calculated as the ratio of correctly predicted instances to the total instances.

***Confusion Matrix***
A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm.
It is represented as a 2x2 matrix for binary classification problems, with the following structure:

![image.png](attachment:image.png)

***True Positive (TP)***
    Is the count of positive instances that were correctly classified as positive.

***True Negative (TN)***
    Is the count of negative instances that were correctly classified as negative.

***False Positive (FP)***
    Is the count of negative instances that were incorrectly classified as positive.

***False Negative (FN)***
    Is the count of positive instances that were incorrectly classified as negative.

***Precision***
    It is calculated as the ratio of true positive predictions to the total number of positive predictions made by the model. It is a measure of a classifier's accuracy in identifying positive instances.  

```math
Precision = True Positives / (True Positives + False Positives)
```

***Recall***
    on the other hand, is the ratio of true positive predictions to the total number of actual positive instances in the dataset. It indicates how well the model captures all relevant positive instances.

```math
Recall = True Positives / (True Positives + False Negatives)
```

Precision and Recall are two important metrics used to evaluate the performance of classification models, especially in scenarios where the classes are imbalanced.

***F1 score***
Harmomic Mean of Precision and Recall is called F1 Score. It is a single metric that combines both precision and recall, providing a balanced measure of a model's performance. The F1 Score is particularly useful when you want to find an optimal balance between precision and recall.

```math
F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
```
***ROC - AUC curve***
    Is a graphical representation of a classifier's performance across different threshold values. It plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings. The Area Under the Curve (AUC) quantifies the overall ability of the model to discriminate between positive and negative classes. An AUC of 1 indicates perfect classification, while an AUC of 0.5 suggests no discriminative power, equivalent to random guessing.


![image.png](attachment:image.png)

This is used to find the ideal threshold to decide which category the prediction should be in. 

Implementation of Logistic Regression in Python

```code
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
```

all the parameters in logistic regression 
list of parameters: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

***Order of implementation***

1. Import necessary libraries
2. Load the dataset
3. Exploratory Data Analysis (EDA)
4. Check for correlations (heatmap)
5. Detect multicollinearity (VIF)
6. Split data into train and test sets
7. Apply k-Fold Cross-Validation (e.g., K=5 or 10)
8. Fit Logistic Regression model on training data
9. Predict on test data
10. Confusion matrix creation
11. Select classification threshold - using ROC-AUC & Youden's statistic
12. Predict final classes
13. Evaluate Model performance using Accuracy, Precision, Recall
14. Evaluate F1 Score
15. Interpret coefficients
16. Apply Regularization (Ridge / Lasso / ElasticNet) — optional
17. Save or deploy the model — optional