Logistic Regression is a **statistical model** used primarily for **binary classification tasks**. It predicts the probability that a given input belongs to a particular class, making it particularly useful when the output is categorical, like "yes or no," "success or failure," or "spam or not spam." Despite its name, logistic regression is a **classification algorithm**, not a regression algorithm.

---

### **Key Concepts of Logistic Regression**

1. **Logistic Function (Sigmoid Function):**
   - Logistic regression uses the sigmoid function to map predicted values (from \(-\infty\) to \(+\infty\)) to probabilities (between 0 and 1).
   - The sigmoid function is defined as:
     \[
     \sigma(z) = \frac{1}{1 + e^{-z}}
     \]
     where \(z = w^T x + b\), \(w\) is the weights vector, \(x\) is the input feature vector, and \(b\) is the bias term.

2. **Output Interpretation:**
   - The output of the sigmoid function is a probability:
     \[
     P(y=1|x) = \sigma(z)
     \]
   - If the probability is greater than a threshold (commonly 0.5), the model predicts the class as \(y=1\); otherwise, \(y=0\).

3. **Cost Function:**
   - Logistic regression uses the **log-loss (binary cross-entropy)** as its cost function:
     \[
     J(w, b) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)}) \right]
     \]
     where:
     - \(m\): Number of training examples
     - \(y^{(i)}\): Actual label (0 or 1)
     - \(\hat{y}^{(i)}\): Predicted probability

4. **Training Process:**
   - The model learns parameters (\(w\) and \(b\)) using optimization techniques like **Gradient Descent** to minimize the cost function.

5. **Multiclass Classification (Extension):**
   - For multiclass classification, logistic regression can be extended using:
     - **One-vs-Rest (OvR):** Separate binary classifiers for each class.
     - **Softmax Regression:** Generalization of logistic regression to handle multiple classes.

---

### **Advantages:**
- Simple and easy to implement.
- Computationally efficient.
- Provides probabilities, which help in understanding model confidence.
- Performs well when the relationship between features and the target variable is approximately linear.

### **Limitations:**
- Assumes a linear relationship between the independent variables and the log-odds.
- Not suitable for complex relationships unless combined with feature engineering.
- Sensitive to outliers and irrelevant features.

---

### **Applications:**
- Email spam detection.
- Medical diagnosis (e.g., disease presence).
- Customer churn prediction.
- Fraud detection in banking.

Logistic regression is widely used as a baseline model due to its simplicity and interpretability.

In [3]:
import numpy as np           
import pandas as pd 
import matplotlib.pyplot as plt             


df = pd.read_csv('binary_classification_dataset.csv')
df.head()

Unnamed: 0,Feature_1,Feature_2,Feature_3,Feature_4,Feature_5,Feature_6,Feature_7,Feature_8,Feature_9,Feature_10,Target
0,1.6251,1.678124,0.493516,1.29088,-1.114278,1.84702,1.912294,1.357325,0.966041,-3.006921,1
1,-0.064641,4.138629,-1.522415,-2.041705,2.116697,5.28131,3.712587,-0.890254,1.438826,-3.623448,0
2,1.016313,2.665426,-0.628486,-0.886923,0.992518,1.942381,1.855199,-1.958175,-0.348803,-1.598825,0
3,1.037282,1.466618,-0.11542,1.170755,-1.458516,1.37144,1.000965,-1.034471,-1.654176,-2.936285,1
4,0.778385,1.565828,-1.724917,-2.735667,1.215107,1.231249,-0.151824,0.59833,-0.524283,1.252909,0


In [7]:
class0  = (df['Target'] == 0).sum()
class1  = (df['Target'] == 1).sum()
print(class1)
print(class0)

503
497


In [8]:
df.isnull().sum()

Feature_1     0
Feature_2     0
Feature_3     0
Feature_4     0
Feature_5     0
Feature_6     0
Feature_7     0
Feature_8     0
Feature_9     0
Feature_10    0
Target        0
dtype: int64

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Feature_1   1000 non-null   float64
 1   Feature_2   1000 non-null   float64
 2   Feature_3   1000 non-null   float64
 3   Feature_4   1000 non-null   float64
 4   Feature_5   1000 non-null   float64
 5   Feature_6   1000 non-null   float64
 6   Feature_7   1000 non-null   float64
 7   Feature_8   1000 non-null   float64
 8   Feature_9   1000 non-null   float64
 9   Feature_10  1000 non-null   float64
 10  Target      1000 non-null   int64  
dtypes: float64(10), int64(1)
memory usage: 86.1 KB


In [11]:
df.describe()

Unnamed: 0,Feature_1,Feature_2,Feature_3,Feature_4,Feature_5,Feature_6,Feature_7,Feature_8,Feature_9,Feature_10,Target
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.730472,-0.011035,-0.072376,-0.742447,0.719059,0.749845,0.697937,0.017115,-0.058077,0.956389,0.503
std,1.863961,1.965723,1.024196,1.817695,1.787194,1.894522,2.105735,1.029048,1.046402,3.510836,0.500241
min,-5.161168,-6.314203,-3.031194,-4.758034,-3.461769,-3.841477,-7.16791,-3.254479,-3.582063,-9.128397,0.0
25%,-0.482462,-1.584057,-0.779532,-1.964883,-0.789202,-0.734119,-0.584799,-0.676648,-0.744499,-1.392688,0.0
50%,1.033664,0.051065,-0.041891,-1.194817,1.025408,0.901614,0.852704,0.025772,-0.081367,1.446436,1.0
75%,2.009441,1.480276,0.644444,-0.031715,1.947875,2.031734,2.050272,0.679153,0.659029,3.169381,1.0
max,5.566061,5.105669,3.276399,6.185693,6.536793,6.108412,7.979264,3.08989,2.986329,9.385687,1.0
