# Baseline Models

Before building complex models, we must establish strong baselines
to understand whether machine learning adds real value.


## 2. Why Baselines Matter

Baselines help us:
- Detect data leakage
- Set realistic expectations
- Avoid over-engineering
- Justify model complexity


## 3. Common Baseline Strategies

### Classification
- Always predict the majority class
- Random guessing (rarely useful)

### Regression
- Predict the mean of the target
- Predict the median of the target


In [1]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

dataset = load_breast_cancer(as_frame=True)
X = dataset.data
y = dataset.target

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y
)


## 5. Majority Class Baseline (Classification)

This baseline always predicts the most frequent class
observed in the training data.


In [2]:
from sklearn.dummy import DummyClassifier

baseline_clf = DummyClassifier(strategy="most_frequent")
baseline_clf.fit(X_train, y_train)

baseline_accuracy = baseline_clf.score(X_test, y_test)
baseline_accuracy

0.631578947368421

## 6. Interpreting the Baseline

This accuracy does not mean the model is good.
It means the dataset is imbalanced.

Any meaningful model must outperform this baseline
by a significant margin.


## 7. A Common Mistake

Achieving high accuracy without comparing to a baseline
can be misleading.

A model with 70% accuracy is impressive only if
the baseline is significantly lower.


## 8. Regression Baseline (Conceptual)

For regression tasks, a common baseline is predicting
the mean value of the target variable.

This sets a minimum performance bar for regression models.


## 9. Summary

In this notebook, we:
- Defined what a baseline is
- Explained why baselines are critical
- Implemented a majority-class baseline for classification
- Learned how to interpret baseline performance

Any future model must outperform this baseline to be considered useful.
