# Logistic Regression
Logistic Regression is one of the most fundamental algorithms used for binary classification tasks in machine learning. Despite the name, it is a classification algorithm, not a regression one. It estimates the probability that an observation belongs to a particular class (e.g., probability of success/failure, yes/no).

## 1. The Mathematics: Sigmoid Function
Logistic Regression uses the Sigmoid Function (or Logistic Function) to transform the result of a linear equation into a probability value between 0 and 1.

### 1.1. The Linear Component
First, like Linear Regression, it computes a linear combination of the input features ($\mathbf{x}$) and their corresponding weights ($\mathbf{\beta}$):$$z = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n = \mathbf{\beta}^\top \mathbf{x}$$
### 1.2. The Sigmoid Transformation
This linear output ($z$) is called the log-odds or logit. To convert $z$, which can range from $-\infty$ to $+\infty$, into a probability $P(y=1|\mathbf{x})$ (where $y=1$ is the desired class), it is passed through the Sigmoid function ($\sigma$):$$\sigma(z) = P(y=1|\mathbf{x}) = \frac{1}{1 + e^{-z}}$$
- If $z$ is a large positive number, $P \approx 1$.
- If $z = 0$, $P = 0.5$.
- If $z$ is a large negative number, $P \approx 0$.
### 1.3. Decision Boundary
The model classifies an observation based on a threshold, typically 0.5:If $P(y=1|\mathbf{x}) \geq 0.5$, the observation is classified as 1 (Positive Class).If $P(y=1|\mathbf{x}) < 0.5$, the observation is classified as 0 (Negative Class).
## 2. Python Implementation with Scikit-learn
Logistic Regression requires feature scaling if it is solved using gradient-based optimizers, although the liblinear solver in Scikit-learn often performs well without it.

In [14]:
# CODE CELL 1: Setup and Model Training
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.datasets import load_breast_cancer # A binary classification dataset

# 1. Load Data
data = load_breast_cancer()
X, y = data.data, data.target

# 2. Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. Initialize and Train Model
# solver='liblinear' is a good choice for smaller datasets
model = LogisticRegression(solver='liblinear', random_state=42) 
model.fit(X_train, y_train)

# 4. Predict
y_pred = model.predict(X_test)
y_pred_proba = model.predict_proba(X_test)[:, 1] # Probability of the positive class (1)


In [16]:
# CODE CELL 2: Evaluation
# Display the Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("--- Confusion Matrix ---")
print(cm)

# Display the full Classification Report (Accuracy, Precision, Recall, F1-Score)
print("\n--- Classification Report ---")
print(classification_report(y_test, y_pred, target_names=data.target_names))

# Print the model coefficients (for interpretability)
print("\nModel Coefficients (Weights):")
# The coefficients show the impact of each feature on the log-odds of the target event.
print(model.coef_[0])

--- Confusion Matrix ---
[[ 59   4]
 [  2 106]]

--- Classification Report ---
              precision    recall  f1-score   support

   malignant       0.97      0.94      0.95        63
      benign       0.96      0.98      0.97       108

    accuracy                           0.96       171
   macro avg       0.97      0.96      0.96       171
weighted avg       0.96      0.96      0.96       171


Model Coefficients (Weights):
[ 2.17531613e+00  1.59671161e-01 -1.25366787e-01 -4.00239206e-03
 -1.30406138e-01 -4.11269744e-01 -6.55017223e-01 -3.50092192e-01
 -2.02213738e-01 -2.92902947e-02 -6.61118882e-02  1.40363036e+00
  1.17862799e-01 -1.09266066e-01 -1.46457567e-02 -2.48430128e-02
 -6.34943239e-02 -4.11473924e-02 -4.87792139e-02 -7.70259549e-04
  1.15521174e+00 -3.90337252e-01 -7.67977046e-02 -2.13240168e-02
 -2.42133804e-01 -1.13976190e+00 -1.57932792e+00 -6.17336578e-01
 -7.29100874e-01 -1.10784729e-01]
