New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLOSED] Include calendar effects (day-of-week, month-of-year, holidays) in daily model? #57

Closed
hshaban opened this Issue Jan 31, 2018 · 10 comments

Comments

Projects
None yet
1 participant
@hshaban
Collaborator

hshaban commented Jan 31, 2018

Issue by tplagge
Tuesday Mar 14, 2017 at 22:00 GMT
Originally opened as impactlab/caltrack#56


It has been proposed that the Caltrack daily model process begin by considering HDD/CDD only as exogenous variables, much as with the monthly model. However, there are good a priori reasons to expect the daily data to exhibit calendar effects, i.e. for the usage to be dependent on things like the day of the week. If we see sufficient evidence for calendar effects, we may want to include categorical variables for the relevant effects in our model, since not doing so would result in non-stationary residuals.

The CEC has published reports suggesting the possibility of including dummy variables for factors like day of week and month of year (see Appendix A here). To see whether this might be worth considering for our residential usage data, I took the 100-home sample used in the monthly Caltrack beta test and did some basic aggregation.

I started by loading the temperature and electrical usage data for the 100 traces in the monthly Caltrack beta test. Note that while the model we were evaluating in the beta test was monthly, the data is actually daily AMI metered usage. Next, I normalized each trace, dividing all of the daily usage values by the trace mean. I then computed HDD and CDD using fixed balance points.

I then fit each trace using the Caltrack procedure, but without monthly averaging; i.e., I fit four models:
Usage = Intercept + e
Usage = Intercept + βc CDD + e
Usage = Intercept + βh HDD + e
Usage = Intercept + βc CDD + βh HDD + e
Each model qualifies if all parameters are positive and significant (p < 0.1), and the qualified model with the best R2 is selected.

So, I now have a time series of normalized residuals for each of the 100 traces. If our CDD/HDD model is sufficient, we expect the residuals to be stationary, and specifically for there to be no calendar effects like systematic overestimates or underestimates based on factors like temperature, month of year, or day of week.

Results

  • The residuals show little dependence on temperature, as expected.
  • The mean residuals and standard errors on the mean for each day-of-month (1-31) are consistent with zero.
  • There is clear evidence for month-of-year dependence. The residuals ran high during June, July, August, November, and December, and low or near zero for the remainder of the months. This may be second-order temperature dependence, or may be related to occupancy/vacation schedules/etc, or some combination thereof. The largest effects were approximately 5-6%.
  • Likewise, there is clear evidence for day-of-week dependence, with high residuals on weekend days and slightly low on weekdays. Sundays were the highest, again by about 6%.
  • Holidays also run 6% higher than non-holidays.

screen shot 2017-03-14 at 12 10 23 pm

screen shot 2017-03-14 at 12 10 30 pm

Conclusions
We can make more accurate predictions and get less-correlated errors by including factors for day-of-week, month-of-year, and holidays in the daily model. The month-of-year, holiday, and day-of-week corrections all look to be similar in magnitude. The R2 values will be higher, and the out-of-sample predictions should be better (as long as we regularize to account for possible overfitting, if necessary).

However, the effects of including the categorical variables for day-of-week, month-of-year, etc. are not very dramatic when considering aggregate quantities computed over an entire year. The fact that we slightly underestimate weekend usage and overestimate weekday usage shouldn’t matter so long as we sum over a full year of weekdays and weekends. It’s only if we want to make predictions about Sunday versus Monday usage (which from previous discussions does not appear to be a use case we anticipate here) that these effects become important.

This should be intuitively obvious, but as a demonstration of this, I fit one years’ worth of data from each of the 100 accounts using the model specified above; the model specified above plus day-of-week; and the model specified above plus month-of-year. When I predict the total usage out-of-sample for the subsequent nine days, the DOW-included model does slightly better: the difference between predicted and actual usage using the with-DOW model is slightly closer to zero. However, if I predict the total out-of-sample usage for the subsequent 365 days, the difference in prediction accuracy is consistent with zero. The same story holds for month-of-year: if I predict 180 days out, month-of-year adds predictive power for the aggregate sum, but if I predict 365 days out it does not.

Given all this, I am inclined to agree with the proposition that calendar effects are not essential for the daily model, but I'd like to open it up for further discussion.

@hshaban hshaban added the question label Jan 31, 2018

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by jbackusm
Wednesday Mar 15, 2017 at 18:09 GMT


Thanks for this analysis @tplagge ! I mostly agree with your conclusions, but I wanted to mention that overfitting could be a larger problem than you might expect. To the extent that we are interested in savings estimates normalized to a typical year, we run into the problem that we don't know what a typical year looks like in terms of these additional fixed effects.

In fact, we've seen that seasonality in the residuals can vary substantially from one year to the next, based on extreme weather patterns, economic effects, etc. This could explain why you see no benefit to using the month-of-year model when you evaluate residuals in the following 365 days.

Also, this is actually one of the primary reasons why we think it's important to use a comparison group to adjust the gross savings to account for exogenous effects that vary between the pre and post-treatment periods--and we have seen that our adjusted residuals tend to be much more stationary than the unadjusted residuals. I understand that the comparison-group adjustment is out of scope for CalTRACK beta at the moment, but it seemed relevant to mention in this context, so forgive me for taking another opportunity to explain in more detail why we think it's so important.

I also think your approach is interesting and useful: we should be thinking about the dimensions along which we expect the residuals to be stationary, given our immediate use cases. Maybe that's a good way to frame the question of model specification?

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by tplagge
Wednesday Mar 15, 2017 at 19:31 GMT


I agree; in addition to looking at effects like the ones I checked, it would be useful to look at residual autocorrelations and possibly long-memory processes. I know there's been plenty of work on using ARMA-style models for daily energy consumption; not that I'm necessarily suggesting we go down that road, but at least worth thinking about.

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by jfarland
Tuesday Apr 11, 2017 at 22:19 GMT


I tend to agree with both of you. The more granular the data, the more signals we're going to be able to pick up. My experience with load forecasting applications has convinced me that calendar effects (Day of Week, Month of Year) as well as lagged dependent variables are often the most powerful predictors of energy demand after atmospheric conditions, especially at the hourly and sub-hourly levels. It's not surprising to pick up these signals at the daily level as well.

Another parallel from DNV GL's load forecasting applications that might be interesting here is, if we really are limiting our ultimate concern to annual aggregations/predictions, and we want to "tighten" how our predictions account for calendar effects, we can simply estimate separate models for each month of the year to account for that specific frequency of seasonality.

Just like @tplagge 's ARMA-style model suggestion, this is not something I am necessarily suggesting for the Caltrack Beta Test.

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by tplagge
Wednesday Apr 26, 2017 at 18:47 GMT


Calendar effects analysis (with bonus robust regression results)

The Caltrack daily model specification currently includes only heating and cooling degree days as independent variables. While it is anticipated that including calendar effects (day-of-week, month-of-year, etc) might improve the quality of the fits, previous explorations have suggested that the impact on aggregate quantities such as annualized savings is likely to be small. Since these are the only quantities of interest for the specific Caltrack use case, this post will focus on firming up that tentative conclusion.

We will work with the 1000-home electric data set, and will also focus specifically on day-of-week and month-of-year as potential additions to the model. Holidays and interaction terms are also plausible additions, but if the two most plausibly significant additions do not move the needle, then it seems unlikely that these terms would either. Our key metrics will both be related to the out-of-sample prediction, with the 2nd baseline year as the test period and the 1st baseline year as the training period. Our metrics are:

  • CV(RMSE), the normalized root mean squared error for the test period.
  • NMBE, the normalized mean bias error for the test period, equivalent to fractional savings in this case.

In order to inform our intuition, let’s take a quick look at the results from the model as currently specified. Here are the CV(RMSE) and NMBE histograms:

screen shot 2017-04-25 at 1 54 31 pm

screen shot 2017-04-25 at 1 55 33 pm

The CV(RMSE) distribution looks chi-square-ish, and the NMBE distribution looks Cauchy-ish, as you’d expect them to. In fact, here’s the Cauchy fit to the NMBE/fractional savings histogram:

screen shot 2017-04-25 at 1 56 03 pm

The Cauchy distribution here is peaked at 0.017, suggesting a small but detectable bias or population-level trend; the half width at half max is 0.095.

A model that makes better predictions than the one from the specs should have a lower CV(RMSE), a tighter distribution of NMBE/fractional savings, and -- modulo real population-level trends -- an NMBE/peak fractional savings closer to zero.

We will test the following models:

  • M0: The current spec model, as described above.
  • M0.1: M0 but with wide range of possible CDD/HDD balance points.
  • M0.2: M0 but with robust regression using the Huber loss function.
  • M1: M0 plus a categorical day-of-week variable.
  • M2: M0 plus a categorical variable distinguishing weekdays versus weekend days only.
  • M3: M0 plus categorical day-of-week plus categorical month-of-year.
  • M4: M1 with elastic net regularization (L1 = 0.5, L2 = 0.5).
  • M5: M0.2 plus a categorical variable distinguishing weekdays versus weekend days only.

The first thing to note is that for a typical home, these models produce very similar results:

screen shot 2017-04-26 at 11 27 25 am

The day-of-week + month-of-year model, M3, appears a bit different and, at least by eye, overfitted. The rest of the models do not look like they should produce very different results, and indeed they do not.

Results

Below is a plot of the median CV(RMSE) versus median NMBE for each of the models. The error bars represent the 25th and 75th percentile for each distribution. It can immediately be seen that including calendar effects does not dramatically improve the fit; in fact, in some cases it makes matters very slightly worse. Including month-of-year plus day-of-week, for example, increases the median CV(RMS) and NMBE slightly, which is likely due to overfitting. (Note that I did not multiply either CV(RMSE) or NMBE by a factor of 100.)

screen shot 2017-04-26 at 11 23 16 am

Zoomed in:

screen shot 2017-04-26 at 11 23 33 am

In terms of CV(RMSE), the best-performing model is M1, an ols regression including categorical variables for day-of-week. Simply distinguishing between weekdays and weekends gets you most of the way there. Increasing the balance point search range helps a tiny bit.

In terms of NMBE, however, the robust regressions are closer to zero than the OLS regressions, regardless of which calendar effects they include. Here’s what the actual distributions look like for the case of no calendar effects at all:

screen shot 2017-04-25 at 4 29 04 pm

This is likely because usage outliers are more often unusually low than unusually high (vacations, for example). Therefore, a regression less sensitive to outliers tends to produce higher usages. This is a strong argument in favor of robust regression. Note that this bias will be less apparent if we compare modeled quantities from the testing and training periods--then both quantities would be biased, likely similarly. Here we are comparing the training period model to the actual testing period usages.

However, as before, it does not look as if there is a strong motivation to include calendar effects in our daily model for the purposes of calculating annualized values.

This analysis can be taken further. One might suspect that a robust regression using day-of-week would be a promising candidate, and one might also suspect that a regularized regression including day-of-week and month-of-year could be let the power of the elastic net shine a bit brighter.

However, given this group’s well-justified bias in favor of simplicity, my personal opinion is that robust regression with no calendar effects or weekend/weekday only are quite attractive options.

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by tplagge
Thursday Apr 27, 2017 at 20:40 GMT


On the call, it was brought up that electricity usage=0 is very likely to indicate some sort of problem. Fortunately, out of the 1000 projects, just 23 had more than one day of 0 usage. Here's the normalized distribution of fractional savings for those 23 projects, versus the other 977. While they do indeed appear to be disproportionally in the tails (particularly the negative tail, as you'd expect), it's not driving the entire bias; the median NMBE for the 977 projects with <=1 zero is 0.0251 versus 0.0246 for the whole sample.

screen shot 2017-04-27 at 3 36 23 pm

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by mcgeeyoung
Thursday May 11, 2017 at 04:28 GMT


It seems that we have reached consensus here. I'm going to let @matthewgee or @tplagge add this to the draft spec so I don't flub it. Issue can be closed once we agree on the language.

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by tplagge
Thursday May 11, 2017 at 21:43 GMT


Robust linear model benchmarking

I ran a subset of the electrical sample through my full fitting routine using both statsmodels rlm (robust linear model) and ols (ordinary least squares), and the robust model took a little less than a factor of three times longer in both CPU and wall clock time (~4 seconds per home for robust, ~1.5 for OLS on my laptop). Note that the full fitting routine calls fit() many times as part of the CDD/HDD balance point grid search, and does a bunch of other stuff too.

I also run several fits on a single baseline trace without any of the data prep routines--just calling time(smf.ols(data=data, formula=formula)) and time(smf.rlm(data=data, formula=formula))--and found that the difference was just over a factor of 3 (~23 ms for the robust fit, ~7 ms for the OLS fit on my laptop).

Conclusion: The time for the robust/ols fit almost but not completely dominates the total running time, and the computational cost is approximately 3x for robust versus OLS. I'd say this is fairly significant.

Given that robust least squares has a firm theoretical and small-but-non-negligible empirical justification, and that the increase in computational cost is significant but from a low base, I'm very slightly in favor of using robust least squares. It would take little to convince me otherwise.

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by houghb
Friday May 12, 2017 at 17:14 GMT


John and I talked more in depth about using the robust linear model this morning. For the reasons outlined below I think we've come around to the conclusion that we're in favor of sticking with OLS as the default approach, but suggest that robust regression should be seriously considered as a future improvement (we're in favor of putting language to this effect in the specs). Here is a summary of our thinking:

  • We agree with Tom's analysis and conclusion that there is some advantage to using robust regression
  • The increased computational cost is noteworthy but probably not problematic for those running this analysis
  • To justify the added complexity for future users of the CalTRACK specs (who may not be well versed in various regression types, or may not have convenient access to a robust regression tool), robust regression would have to be very clearly the superior method, and it's not completely clear that is the case
  • While it is a simple matter in python, using statsmodels, to switch from OLS to RLM (with the default arguments), there are a number of things "under the hood" in statsmodels that we would need to specify so that others could implement the regression in a similar way with other tools (the loss function and tuning constant, a scaling approach, etc). It's likely that if someone implements a robust regression with another tool (R, SPSS, Excel) their default behavior may be different than statsmodels, leading to replicability issues that aren't present with OLS.
@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by tplagge
Friday May 12, 2017 at 18:50 GMT


You've nudged me over to your side of the fence. I agree with this conclusion: let's call robust regression out as a future improvement but stick with OLS for the specs.

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jan 31, 2018

Comment by houghb
Thursday May 18, 2017 at 15:55 GMT


I've added draft language to the analysis specs suggesting robust regression for a future improvement, so am closing this issue.

@hshaban hshaban closed this Jan 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment