Select regressor beliefs per (event_start, regressor) in forecasting covariate assembly#2155
Select regressor beliefs per (event_start, regressor) in forecasting covariate assembly#2155Copilot wants to merge 6 commits into
(event_start, regressor) in forecasting covariate assembly#2155Conversation
Agent-Logs-Url: https://github.com/FlexMeasures/flexmeasures/sessions/c980e7ce-6145-48fd-a71b-cd845af19b4e Co-authored-by: BelhsanHmida <149331360+BelhsanHmida@users.noreply.github.com>
…ming Agent-Logs-Url: https://github.com/FlexMeasures/flexmeasures/sessions/c980e7ce-6145-48fd-a71b-cd845af19b4e Co-authored-by: BelhsanHmida <149331360+BelhsanHmida@users.noreply.github.com>
Agent-Logs-Url: https://github.com/FlexMeasures/flexmeasures/sessions/c980e7ce-6145-48fd-a71b-cd845af19b4e Co-authored-by: BelhsanHmida <149331360+BelhsanHmida@users.noreply.github.com>
Agent-Logs-Url: https://github.com/FlexMeasures/flexmeasures/sessions/c980e7ce-6145-48fd-a71b-cd845af19b4e Co-authored-by: BelhsanHmida <149331360+BelhsanHmida@users.noreply.github.com>
(event_start, regressor) in forecasting covariate assembly
Context: - PR #2155 needed to keep issue #2154's per-regressor covariate fix while preserving main's forecast belief-time filtering from PR #2134. Change: - Resolve main merge conflicts in forecasting covariate assembly. - Select latest known regressor values per event and regressor within each forecast belief-time step. - Add regression coverage for past/future mixed-belief regressor rows and realized-regressor leakage.
BelhsanHmida
left a comment
There was a problem hiding this comment.
Approved. This now addresses #2154 cleanly.
I re-reviewed the covariate selection logic after the follow-up changes. The implementation now selects the latest known non-null value per (event_start, regressor) while preserving the forecast belief_time admissibility rules, so it fixes the original row-level collapse without reintroducing future-belief leakage.
I also checked the previous concerns:
- realized regressor values are selected inside each forecast step, not precomputed across steps;
- selection uses latest known per regressor, not closest;
- existing belief-time boundary behavior from #2134 is preserved.
Flix6x
left a comment
There was a problem hiding this comment.
Nice! I don't have problems with the coding. Looks like it solves a real issue. I'd just like to understand the test cases better, and also have a better chance at understanding them years from now ("code is read much more often than it is written." someone wise once wrote).
| Select the latest non-null value per `(event_start, regressor)`. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| data : pd.DataFrame | ||
| Input frame with `event_start`, `belief_time`, and regressor columns. | ||
| regressor_columns : list[str] | ||
| Regressor columns to select values for independently. | ||
|
|
||
| Returns | ||
| ------- | ||
| pd.DataFrame | ||
| Wide frame with one row per event_start and one selected value | ||
| per regressor column. |
There was a problem hiding this comment.
Please move to rst-style docstrings. Maybe point your agent to https://github.com/FlexMeasures/flexmeasures/blob/main/.github/agents/documentation-developer-experience-specialist.md#python-docstring-conventions
| def capture_frame(self, df, sensors, sensor_names, start, end, **kwargs): | ||
| if sensor_names == self.future_regressors: | ||
| captured_future_frames.append(df.copy()) | ||
| return df | ||
|
|
||
| monkeypatch.setattr(BasePipeline, "detect_and_fill_missing_values", capture_frame) |
There was a problem hiding this comment.
Please annotate your tests. This part seems to be a clever trick to insert data into the pipeline without needing to involve the database.
| assert len(captured_future_frames) == 1 | ||
| selected = captured_future_frames[0].set_index("event_start") | ||
| assert selected.loc[pd.Timestamp("2025-01-08T09:00:00"), regressor_a] == 4.0 | ||
| assert selected.loc[pd.Timestamp("2025-01-08T10:00:00"), regressor_a] == 5.0 | ||
| assert selected.loc[pd.Timestamp("2025-01-08T10:00:00"), regressor_b] == 8.0 | ||
| assert 50.0 not in set(selected[regressor_a].dropna()) | ||
| assert 80.0 not in set(selected[regressor_b].dropna()) |
There was a problem hiding this comment.
Please frame your expectations, specifically, by appending something like , "expected this, because of that" to each assert. Without it, I have to build up my own expectations just from the name of the test function and studying the inserted data. Please offer extra guidance.
Forecast covariate preparation could drop valid regressor values when multiple regressors had beliefs for the same
event_startat differentbelief_times. Selection was done per joined row (event_start), not per regressor, causing false missing data downstream.Covariate selection logic
BasePipeline.split_data_all_beliefsvia a helper that picks one value per(event_start, regressor).latestbelief per event),closestrealized belief per event),latestadmissible forecast belief per event).event_start.Safety guard
latest/closest) to prevent silent misuse.Regression coverage