Add CI coverage that detects obvious training behavior regressions before alpha.
Acceptance criteria:
- smoke checks at least one discrete and one continuous environment;
- compares against random baseline or fixed threshold;
- failure output points to the run and metric that regressed;
- runtime remains reasonable for PR CI.
Add CI coverage that detects obvious training behavior regressions before alpha.
Acceptance criteria: