Persistence Baseline & Model Comparison

Context

The project currently includes:
	‚Ä¢	a fully reproducible end-to-end pipeline
	‚Ä¢	structural and content validation tests (ISSUE #6 closed)
	‚Ä¢	separate notebooks for each model family

Before selecting a final production model (selected_model.py), it is necessary to compare all ML models against a trivial baseline to properly contextualize performance gains.

‚∏ª

Objective

Establish a persistence baseline to answer the key question:

Does the ML model actually improve over something trivial?

Baseline definition:
≈∑(t) = y(t‚àí1)

This baseline serves as a minimum scientific benchmark.
Where the baseline lives
	‚Ä¢	Notebook: 06_model_comparison.ipynb
	‚Ä¢	‚ùå Not part of the automated pipeline
	‚Ä¢	‚ùå Not implemented in models/
	‚Ä¢	‚ùå Not covered by pytest tests

The baseline is analytical, not production code.
Key rules
	‚Ä¢	Use the same temporal split as ML models
	‚Ä¢	Evaluate on the same test set
	‚Ä¢	Use the same metrics as ML
	‚Ä¢	Do not tune or optimize the baseline

‚∏ª

Metrics to compare
	‚Ä¢	MAE
	‚Ä¢	RMSE
	‚Ä¢	R¬≤

No additional metrics at this stage.

‚∏ª

Outputs to persist (minimum)

Saved under data/results/:
	‚Ä¢	metrics_baseline_vXXX.csv
	‚Ä¢	columns: model, MAE, RMSE, R2
	‚Ä¢	model = "persistence"

Optional:
	‚Ä¢	predictions_baseline_vXXX.csv
	‚Ä¢	date, y_true, y_pred, error

‚∏ª

Central comparison table (core artifact)
model
MAE
RMSE
R2
persistence
‚Ä¶
‚Ä¶
‚Ä¶
random_forest
‚Ä¶
‚Ä¶
‚Ä¶
xgboost
‚Ä¶
‚Ä¶
‚Ä¶
neural_net
‚Ä¶
‚Ä¶
‚Ä¶

Relative improvement vs baseline

Add a derived column:
improvement_vs_baseline_% = (RMSE_baseline - RMSE_model) / RMSE_baseline
Improvements are always computed against the baseline, never between ML models.

‚∏ª

Suggested visualization (optional)
	‚Ä¢	Bar plot of RMSE by model
	‚Ä¢	Baseline clearly highlighted

‚∏ª

Expected conclusion (Markdown)

The notebook should close with an explicit statement such as:

‚ÄúModel X reduces RMSE by approximately Y% compared to the persistence baseline, indicating that it captures non-trivial temporal and meteorological patterns beyond simple persistence.‚Äù

This text will later be reused in:
	‚Ä¢	README
	‚Ä¢	model selection justification
	‚Ä¢	technical interviews / reviews

‚∏ª

Criterion to move to selected_model.py

Proceed only when:
	‚Ä¢	all candidate models have been evaluated
	‚Ä¢	the comparison table is complete
	‚Ä¢	improvement over the baseline is clear and defensible

Only then:
	‚Ä¢	implement models/selected_model.py
	‚Ä¢	freeze the modeling decision

What NOT to do
	‚Ä¢	Do not integrate the baseline into the pipeline
	‚Ä¢	Do not force ML to ‚Äúwin‚Äù
	‚Ä¢	Do not add unnecessary complexity
	‚Ä¢	Do not select a model without this comparisonStatus

üìå Bookmark saved ‚Äî development deferred
To be revisited once all experimentation notebooks are finalized.