# Logistic Regression

In this notebook, we will explore **Logistic Regression**, a classification algorithm used to predict discrete outcomes (e.g., Yes/No, Spam/Not Spam).

Unlike Linear Regression, Logistic Regression predicts probabilities using the **sigmoid function**:

\[ P(y=1|X) = \frac{1}{1 + e^{-(b_0 + b_1X)}} \]

- If probability ≥ 0.5 → predict class 1
- If probability < 0.5 → predict class 0

## 1. Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## 2. Load Dataset
We’ll use the **Breast Cancer dataset**, a binary classification dataset.

In [None]:
data = load_breast_cancer()
X = data.data
y = data.target

print("Features:", data.feature_names[:5], "...")
print("Target names:", data.target_names)
print("Dataset shape:", X.shape)

## 3. Train-Test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print("Training set:", X_train.shape)
print("Test set:", X_test.shape)

## 4. Train Logistic Regression Model

In [None]:
log_reg = LogisticRegression(max_iter=500)
log_reg.fit(X_train, y_train)

print("Coefficients shape:", log_reg.coef_.shape)
print("Intercept:", log_reg.intercept_)

## 5. Predictions

In [None]:
y_pred = log_reg.predict(X_test)
y_prob = log_reg.predict_proba(X_test)[:10]

print("Predictions:", y_pred[:10])
print("Probabilities:\n", y_prob)

## 6. Model Evaluation

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=data.target_names))

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, cmap="Blues", xticklabels=data.target_names, yticklabels=data.target_names)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

## 7. Key Notes
- Logistic Regression is used for **binary and multiclass classification**.
- Outputs probabilities via the **sigmoid (binary)** or **softmax (multiclass)** function.
- Assumes linear relationship between input features and the log-odds of the outcome.
- Works well for linearly separable data.
- Evaluation metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC.