New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify maximum baseline and reporting period lengths #68

hshaban opened this Issue Feb 1, 2018 · 8 comments


5 participants

hshaban commented Feb 1, 2018

Minimum baseline and reporting period lengths are defined in Caltrack’s data sufficiency requirements, but using long baseline/reporting periods results in significantly different model fits than constrained periods (due to naturally occurring savings, non-routine events etc.)

We propose setting a limit on the data included in the baseline and reporting periods: 12-months for daily data and 24 months for monthly data.

@hshaban hshaban added this to To Do in CalTRACK Feb 1, 2018


This comment has been minimized.

steevschmidt commented Feb 1, 2018

HEA has found that 18 months of daily data is optimal for heating and cooling regressions. This amount provides more accuracy than just 12 months, but does not overwhelm recent trends (a risk with longer periods).


This comment has been minimized.


mcgeeyoung commented Feb 2, 2018

Interesting. It's a little counterintuitive to select 18 months (given the likelihood of over-fitting to which ever season gets counted twice). Do you have test results that show why 18 months yields better results?


This comment has been minimized.

jskromer commented Feb 6, 2018

Is any thought given to known or predicted variations in operational characteristics? For example, if you know that the building operation/schedule was recently changed, you probably wouldn't want to include the data from before the change. (you'd need to do a preemptive non-routine adjustment) One would prefer to have data on the operational modes that one expects to see in the reporting period.


This comment has been minimized.

danrubado commented Feb 13, 2018

Energy Trust's view is that data should be selected in increments of 12-months so that a seasonal bias is not introduced when fitting the model, as pointed out by McGee. We also have a preference for limiting the baseline and reporting periods to the 12 months of data closest to the treatment period to limit the impact of factors unrelated to the treatment, as noted by Hassan. You may get better fit statistics using 24 months of data, but it may not represent the pre-retrofit conditions as closely. A longer time series may contain a blend of current and past physical and operational conditions at the site.

@hshaban hshaban moved this from To Do to In progress in CalTRACK Feb 15, 2018


This comment has been minimized.


hshaban commented Feb 15, 2018

Proposed test methodology:

  • This test should be run separately for daily and monthly data
  • Caltrack models will be fit to the baseline periods of all test buildings for different lengths of the baseline period in 3 month increments (i.e. 12, 15, 18, 21 and 24 months)
  • Total energy usage predictions using the baseline models will be calculated for a 12 month reporting period, following a fake project date.
  • Error metrics (CVRMSE and NMBE) will be calculated for these predictions as well.

Acceptance criteria:

  • The energy use trends and error metrics will be compared for different baseline period lengths. The minimum baseline period length that does not result in inflated out-of-sample errors will be recommended as the maximum baseline/reporting period length in the Caltrack spec.

This comment has been minimized.


hshaban commented Mar 1, 2018


The length of the baseline and reporting periods that are included in the savings models may affect results in two ways:

  • Periods that are too short may not capture the full range of input conditions (e.g. weather, occupancy) that are typically experienced.
  • Periods that are too long increase the chances of unexpected changes in a building’s energy use (e.g. due to a change in occupancy or in the building’s equipment).

It is generally agreed that a minimum 12 months of data should be used in order to capture at least one annual cycle of energy use. However, there are no general guidelines about the maximum length of time to include in savings analysis.

Billing data from 1000 residential buildings in Oregon and daily data from 1000 residential buildings in California.

Tested parameters
The Caltrack methods were applied to the full datasets five times, only varying the length of the baseline period that was used to fit the models- between 12 and 24 months in 3-month increments.

Figure 1 shows that the estimated normalized consumption is clearly proportional to the baseline period length for daily data. Figure 2 demonstrates that with increasing period length, the model fits, represented by R-squared, tend to get worse. This is likely because of the second effect pointed out above (greater likelihood of non-routine events that affect energy use). This monotonic increase in baseline energy use for program participants may drive an increase in estimated savings. This is an indication that the rate of naturally occurring savings in this sample may affect results when longer time periods are used for analysis. Therefore, it appears preferable to use a maximum of 12 months of data for analysis, especially with the availability of daily data.

Figure 1. Effect of baseline period length on normalized annual consumption using daily data.

Figure 2. Effect of baseline period length on model R-squared distribution. Model fits get poorer with increasing baseline period length.

When the same test was applied using billing data, no monotonic trends were obvious in the normalized annual consumption, however, the normalized annual consumption showed cyclical trends, likely corresponding to the model being weighted towards the seasons with more data. The 24-month baseline model produced a normalized estimate that was slightly higher than the 12-month baseline model.

Figure 3. Effect of baseline period length on normalized annual consumption using billing data. Y axis (Baseline Normalized Annual Consumption) is in percent.


We recommend limiting the maximum baseline period length to 12 months of consumption data for both billing and daily models.


This comment has been minimized.


hshaban commented Mar 1, 2018

Adding some more clarification to the choice of a 12 month baseline period based on stakeholder input:
The results above are dataset-specific and do not reflect generally expected trends in estimated baseline vs. baseline period length (e.g. estimated baseline energy use will not necessarily increase if a longer baseline is used). However, they do strongly indicate that the predicted baseline may be unstable with different baseline period lengths, which may, in turn, affect calculated savings. A choice must be made by the analyst as to how long this baseline period should be. We are recommending that this choice be limited by setting the maximum baseline period at 12 months, since the year leading to the energy efficiency intervention is the most indicative of near-term energy use, making it amenable to calculating savings in pay-for-performance scenarios.


This comment has been minimized.


hshaban commented Jul 26, 2018

This update has been integrated in CalTRACK 2. Closing this issue

@hshaban hshaban closed this Jul 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment