New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site Model Selection Criteria #76

Closed
danrubado opened this Issue Feb 13, 2018 · 7 comments

Comments

Projects
5 participants
@danrubado

danrubado commented Feb 13, 2018

To select the weather model to be used for a site, CalTrack specifies a two-step process. First, candidate HDD and CDD models are specified using the pre-determined ranges of balance points. In addition, models with HDD but no CDD variable and CDD but no HDD variable are created. Lastly, an "intercept-only" model with no HDD or CDD variable is created (the mean over the time series). The first step in CalTrack for selecting a weather model is to remove candidate models where the HDD or CDD coefficients are not significant (p-value > 0.10). After candidate models with non-significant coefficients have been eliminated, then remaining model with the highest R-squared value is selected and used to compute the weather normalized annual consumption.

In testing that we did with Open EE in 2017, we found that the practical consequence of the coefficient significance screen was that there was a relatively high proportion of "intercept-only" models that were selected. We think this two step process is too restrictive and removes many valid weather models where the individual coefficients don't quite achieve statistical significance. Also, there is an argument to be made that all of the variables that are theoretically indicated should be included in the regression model, regardless of whether they are statistically significant.

Energy Trust suggests removing the coefficient significance screen and selecting the candidate weather model with the highest R-squared from the full range of candidate models. There may also be an R-squared floor for candidate weather models, below which the "intercept-only" model is selected. We have used R-squared < 0.5 as a floor for candidate weather models in the past.

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Feb 15, 2018

Proposed test methodology:

  • Caltrack monthly models will be fit to the baseline period usage data using the p-value screen.
  • The fitting process will be repeated without a p-value screen (only R-squared selection criterion).
  • Error metrics (CVRMSE and NMBE) will be calculated for each model using both approaches, using 12 months of reporting period data.

Acceptance criteria:
This update will be accepted into the Caltrack spec if removing the p-value screen does not cause average model performance to deteriorate.

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Feb 15, 2018

@danrubado I'll add a note about the R-squared floor with this issue: #71 since it seems more relevant there

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Mar 1, 2018

TEST RESULTS

Background
Caltrack includes a requirement that p<0.1 for the heating/cooling coefficients during the model qualification stage. The original concern was that spurious/low-significance coefficients could lead to poor out-of-sample prediction performance. However, eliminating weather-sensitive models in this manner may affect model interpretability and hasn’t proved necessary.

Dataset
Billing data from around 1000 residential buildings in Oregon.

Tested parameters
The out of sample prediction errors were calculated for models selected with and without the p-value criterion.

Results

A comparison was performed using 24-month electric traces, split into 12 months of training and 12 months of test data, and the mean absolute prediction error was used as the metric.

In the majority of cases (almost 90%), the fit did not change when the p-value criterion was removed. For the remainder, there was a change; in most cases, it was because the model changed from intercept-only to a weather-sensitive model (i.e., the heating or cooling terms were only marginally significant) - see Figure 1. The average Mean Absolute Error (MAE) was slightly lower when the p-value screen was removed (8.20 vs 8.34) . Over twice as many models improved than degraded when the p-value cutoff was removed, and none of the degradations were catastrophic (Figure 2).

Therefore, it appears that the p-value requirement is superfluous at best, marginally counterproductive at worst.

image
Figure 1. Effect of the p-value screen on best-fit model types

image
Figure 2. Effect of the p-value screen on out of sample mean absolute error.

Recommendation
Remove P-value criterion for candidate models during model selection using Caltrack. Handling of poor weather model fits and intercept-only models to be further explored in Issue #71.

@bkoran

This comment has been minimized.

bkoran commented Mar 15, 2018

Not sure about this recommendation. There are situations where there needs to be a decision between 2 independent variables that may be covariant. One example I have mentioned: A monthly model has an intercept and a weather-related slope for heating. Occupancy rates (leased space) changed during the baseline. There was also a second weather-related slope because there was a heat pump and resistance heat. Due to the timing of the occupancy changes, the weather and the occupancy were covariant. The correct independent variable was the second weather variable; the occupancy was not significant once the second weather variable was included. In contrast, occupancy was significant in the reporting period.

@mcgeeyoung

This comment has been minimized.

Contributor

mcgeeyoung commented Mar 15, 2018

For now, CalTRACK does not include guidance for considering factors like occupancy as routine adjustments. I'd like to bring this back up during the discussion for hourly methods, if that's alright with you @bkoran

@margaretsheridan

This comment has been minimized.

margaretsheridan commented Mar 16, 2018

Here are some histograms of adjusted R squared and CVRMSE from a SMUD run of the LBNL towt baseline algorithm for 80+% of our service territory (~520,000 hourly meters).
ResCommHistograms.docx

@hshaban hshaban moved this from In progress to Done in CalTRACK Mar 30, 2018

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jul 26, 2018

This update has been integrated in CalTRACK 2. Closing this issue

@hshaban hshaban closed this Jul 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment