# Classical Machine Learning Baselines

This notebook establishes baseline performance using classical machine
learning models on the preprocessed NOAA weather dataset. These baselines
are used to contextualize the performance of the LSTM model.


In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv("../data/processed/kochi_weather_scaled.csv")

In [2]:
# defining input and target
X = df[['temperature', 'humidity', 'pressure']].values[:-1]
y = df['temperature'].values[1:]

In [3]:
#train test split
split = int(0.8 * len(X))

X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

In [4]:
#logistic regression
from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(X_train, y_train)

y_pred_lr = lr.predict(X_test)

#random forest regressor
from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(
    n_estimators=100,
    random_state=42
)
rf.fit(X_train, y_train)

y_pred_rf = rf.predict(X_test)

In [5]:
#Evaluating baseline performance
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

def evaluate(y_true, y_pred, name):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    print(f"{name} → MAE: {mae:.4f}, RMSE: {rmse:.4f}")

evaluate(y_test, y_pred_lr, "Linear Regression")
evaluate(y_test, y_pred_rf, "Random Forest")

Linear Regression → MAE: 0.0753, RMSE: 0.0961
Random Forest → MAE: 0.0798, RMSE: 0.1034


## Baseline vs LSTM Comparison

Classical models treat weather forecasting as a static regression problem
and fail to capture temporal dependencies.

The LSTM significantly outperforms classical models by explicitly modeling
time-series dynamics using historical windows.
