Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
To improve accuracy, shift from yearly to monthly regressions #103
Background: Many residential buildings have different HVAC settings throughout the year: thermostats are often set to "Heat only" during winter months (i.e. no A/C during a hot spell in March), "Cool only" during summer months, and set points are changed throughout the year (sometimes referred to as "thermostat wars"). Furthermore, non-HVAC energy use also changes throughout the year (e.g. many pool owners change their pool pump schedule between summer and winter months), and as behavioral and plug load energy use becomes more important the accuracy of the "intercept" portion of a regression model becomes more critical. Behavioral variations throughout the year contribute to the inaccuracies of annual models, resulting in CalTRACK CVRMSE values in the range of 0.55 for residential homes, well above ASHRAE Guideline 14 requirements (CVRMSE < 0.25).
Traditional regression analysis methods such as PRISM have been applied to buildings using as many data points as were available: often just 12 or fewer data points per year (the "analysis period"). Using a shorter analysis period with a subset of those data points was precluded. These annual models result in a single intercept plus DD coefficients for the entire year: a very course model for any building with changing energy use patterns throughout the year.
But now with daily data we can deploy more accurate regression models for shorter analysis periods. HEA determined in 2011 that month-by-month analysis results are far more accurate than annual models for the majority of residential buildings. Below is a simple visualization of annual vs monthly regression results:
The constant green portion at the bottom of the left chart is not a good model for most homes.
Issue: Current CalTRACK methods use an annual regression model that assumes HVAC settings (e.g. balance point temperatures) and non-HVAC loads are consistent throughout the year. Since daily data exists, we can improve CalTRACK accuracy (i.e. reduce CVRMSE values) by creating monthly regressions which more accurately assess heating and cooling loads as conditions change throughout the year.
In 2014 Granderson, et al wrote: “The uncertainty in whole-building savings calculation for a given building is due to the robustness of the baseline model used to determine those savings, as well as the predictability of the building itself.” In HEA’s experience the predictability of residential buildings is quite poor: energy use is highly variable and unpredictable in many homes. Current CalTRACK CVRMSE values for homes are quite bad. We have an opportunity to improve the "robustness of the baseline model" (as measured by CVRMSE) by moving from an annual analysis to a monthly analysis.
Validation: HEA compared CalTRACK CVRMSE values for a cohort of about 80 homes to CVRMSE values for the same homes using our internal monthly analysis. Results are shown in the table below:
In the case of the electric models for this group of homes, the results changed from failing ASHRAE's Guideline 14 acceptance criteria to passing.
In another test, We used the baseline CalTRACK models to calculate monthly energy use for all months within the baseline period, and then compared these model results for each month to actual monthly energy use for each building in the cohort. We see very large errors: +/- 50%. Such large variations in model results for “in sample” data indicates a non-robust model, unpredictable energy use, or both.
Requested CalTRACK change: Update methods for a shorter analysis period, possibly monthly. The approach used by HEA has been described here but there are probably others. By analyzing at the monthly level, instead of deriving a single intercept plus DD coefficients for the entire year, we get 12 more accurate "models", and individual monthly usage totals (cooling energy, heating energy and base load) which are constructed to sum to the actual metered energy use of each month.
Note: Some have already criticized this proposal because "it is neither vetted nor well understood by practitioners" and "the existing CalTRACK method is an industry standard". HEA humbly suggests there is no standard yet for smart meter based residential energy analysis; existing CalTRACK results on residential buildings leave much room for improvement; and CalTRACK stakeholders should consider all options that might improve results.
@steevschmidt Thanks for sharing this. The evidence that you present is compelling and makes sense from a theoretical standpoint as well. Did your savings estimates end up varying by a lot at the end of the year between the two approaches? I'm guessing you could lower your fractional savings uncertainty, at least, which would allow higher confidence with smaller portfolios of residential projects. I would expect similar patterns to present themselves when looking at hourly savings - a topic that has come up in the past couple of weeks as we've been doing testing. Looking forward to continuing the conversation on Thursday this week on our CalTRACK call.
Once we implemented the monthly version and saw how much more accurate it was for specific homes I don't believe we ever went back to compare the savings calculations between the two methods. We have seen significant differences between our [monthly model] savings results and those from CalTRACK [using an annual model], but we don't know how much of that delta is due to this particular issue.
Note this is another issue intended for "Existing Daily Methods Improvements".