# IRI Random Forest – Cross-Validation Analysis (v1)

This notebook analyzes the Random Forest model trained to predict IRI using
Z-axis vibration features and speed.

The objective is to:
- Compare performance across cross-validation folds
- Identify underfitting / overfitting behavior
- Analyze per-video prediction quality
- Decide next steps for feature improvement


In [17]:
import os
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use("seaborn-v0_8")

In [18]:
BASE_DIR = "results_rf_3"

rows = []

for fold_dir in sorted(glob.glob(os.path.join(BASE_DIR, "fold_*"))):
    fold_id = os.path.basename(fold_dir)

    train_metrics_path = os.path.join(fold_dir, "train_results", "train_metrics.csv")
    test_metrics_path = os.path.join(fold_dir, "test_results", "test_metrics.csv")

    if not (os.path.exists(train_metrics_path) and os.path.exists(test_metrics_path)):
        continue

    train_df = pd.read_csv(train_metrics_path)
    test_df = pd.read_csv(test_metrics_path)

    rows.append({
        "fold": fold_id,
        "train_rmse": train_df["rmse"].iloc[0],
        "train_dummy_rmse":train_df["dummy_rmse"].iloc[0],
        "train_rrmse": train_df["rrmse"].iloc[0],
        "train_corr": train_df["correlation"].iloc[0],
        "test_rmse": test_df["rmse"].iloc[0],
        "test_dummy_rmse":test_df["dummy_rmse"].iloc[0],
        "test_rrmse": test_df["rrmse"].iloc[0],
        "test_corr": test_df["correlation"].iloc[0],
    })

metrics_df = pd.DataFrame(rows)
metrics_df


Unnamed: 0,fold,train_rmse,train_dummy_rmse,train_rrmse,train_corr,test_rmse,test_dummy_rmse,test_rrmse,test_corr
0,fold_1,0.788686,1.897783,0.415583,0.941189,0.911881,0.580692,1.570333,0.349665
1,fold_2,0.786944,1.811941,0.43431,0.930344,1.107875,1.175179,0.942729,0.494081
2,fold_3,0.747222,1.870381,0.399503,0.940915,1.232126,1.140521,1.080319,0.186856
3,fold_4,0.527697,1.03024,0.512208,0.899362,3.045953,2.38548,1.276872,0.099918


In [19]:
summary_df = metrics_df.drop(columns=["fold"]).agg(["mean", "std"]).T
summary_df


Unnamed: 0,mean,std
train_rmse,0.712637,0.124772
train_dummy_rmse,1.652586,0.416439
train_rrmse,0.440401,0.04994
train_corr,0.927952,0.019718
test_rmse,1.574459,0.989814
test_dummy_rmse,1.320468,0.760484
test_rrmse,1.217563,0.272234
test_corr,0.28263,0.174892


In [20]:
test_out=pd.read_csv(
    "results_rf_3/fold_4/test_results/perframe/test_predictions.csv"
)
test_out.head()

Unnamed: 0,sensor_video_id,db_video_id,mt,z_std,z_rms,z_peak_to_peak,speed,iri_est,iri_log,iri_pred
0,6966757704,330,1,0.195377,0.301713,0.81747,37.9425,1.944,1.079769,1.810544
1,6966757704,330,2,0.033027,0.044435,0.181436,37.9425,1.944,1.079769,1.843316
2,6966757704,330,3,0.065358,0.072086,0.379257,37.9425,1.944,1.079769,1.843177
3,6966757704,330,4,0.065358,0.072086,0.379257,37.901,8.0394,2.201593,1.84259
4,6966757704,330,5,0.106962,0.113433,0.530914,37.901,8.0394,2.201593,1.843265


In [21]:
from scipy.stats import pearsonr

video_corr = (
    test_out
    .groupby("sensor_video_id")
    .apply(lambda x: pearsonr(x["iri_est"], x["iri_pred"])[0])
    .reset_index(name="corr")
    .sort_values("corr")
)

video_corr


Unnamed: 0,sensor_video_id,corr
1,6966757704,0.019114
2,7638016176,0.09282
0,5665902180,0.262735


Training performance is consistently strong across folds
(mean train correlation ≈ 0.93, low variance)

Test performance shows high variability
(test correlation mean ≈ 0.28, std ≈ 0.17)

Fold 4 is a clear outlier, with:

Very high test RMSE

Very low test correlation (~0.10)

This indicates that the model generalizes poorly to certain videos, even though it fits the training data well.