# Statistical Classification Log-Loss (Cross-Entropy)
### Use Case: Hours Studied vs. Pass/Fail Prediction
This notebook demonstrates how to calculate statistical classification Log-Loss (Cross-Entropy Loss) in a binary classification scenario using a logistic regression example.

## 📘 What is Log-Loss (Cross-Entropy)?
Log-Loss measures the performance of a classification model where the prediction is a probability value between 0 and 1. A lower Log-Loss indicates a better performing model.

**Binary Cross-Entropy Formula:**
\[\text{LogLoss} = -[y \log(p) + (1 - y) \log(1 - p)]\]
- `y` is the actual class label (0 or 1)
- `p` is the predicted probability of class 1

## 🧪 Generate Example Data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Create synthetic data
np.random.seed(42)
hours_studied = np.random.uniform(0, 10, 100)
actual_pass = (hours_studied + np.random.normal(0, 2, 100)) > 5
actual_pass = actual_pass.astype(int)

df = pd.DataFrame({'Hours_Studied': hours_studied, 'Pass': actual_pass})
df.head()

In [None]:
plt.figure(figsize=(8, 5))
plt.scatter(df['Hours_Studied'], df['Pass'], alpha=0.7, c=df['Pass'], cmap='bwr')
plt.xlabel('Hours Studied')
plt.ylabel('Pass (1) / Fail (0)')
plt.title('Hours Studied vs. Pass/Fail')
plt.grid(True)
plt.show()

## 🔄 Sigmoid Function

In [None]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Example usage:
z = np.linspace(-10, 10, 100)
plt.plot(z, sigmoid(z))
plt.title('Sigmoid Function')
plt.xlabel('z')
plt.ylabel('sigmoid(z)')
plt.grid(True)
plt.show()

## 🧠 Predicting Probabilities with Logistic Model

In [None]:
# Fit a simple logistic model using one feature (Hours Studied)
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
X = df[['Hours_Studied']]
y = df['Pass']
model.fit(X, y)

pred_probs = model.predict_proba(X)[:, 1]  # Probability of class 1 (pass)
df['Predicted_Prob'] = pred_probs
df.head()

## 📉 Implementing Log-Loss

In [None]:
def binary_cross_entropy(y_true, y_pred):
    epsilon = 1e-15  # Avoid log(0)
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# Calculate log-loss on our predictions
log_loss_value = binary_cross_entropy(df['Pass'], df['Predicted_Prob'])
print(f'Log-Loss: {log_loss_value:.4f}')

## 🎤 Key Talking Points
1. **Why Log-Loss Matters**: Log-Loss penalizes confident but wrong predictions more than mild wrong guesses. This encourages better probability calibration in models.
2. **Hours Studied – A Realistic Feature**: Education is a relatable domain. Logistic regression maps 'hours studied' to the probability of passing, simulating a real-world decision model.
3. **Sigmoid + Cross Entropy**: These are the core components of binary classification. The sigmoid maps inputs to probabilities, and cross-entropy quantifies the 'cost' of wrong predictions.