New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weather station selection criteria should include guidance for site as well as zip code centroid and a fallback option in case the first weather station fails because of data sufficiency #65

Closed
mcgeeyoung opened this Issue Feb 1, 2018 · 22 comments

Comments

Projects
8 participants
@mcgeeyoung
Contributor

mcgeeyoung commented Feb 1, 2018

Current CalTRACK methods underspecify weather station selection.
"Weather station mapping requires locating the station nearest to the project. Each project file should contain a zip code that allows matching weather stations to projects"

I propose amending the methods guidance as follows:

  1. Weather station mapping should first attempt to find the weather station nearest to the location of the site within the same climate zone.
  2. Weather station mapping should fall back to the next closest weather station nearest to the location of the site within the same climate zone.
  3. If the precise location of the site is unknown, but the zip code is known, the weather station closest to the centroid of the zip code should be used within the same climate zone. As a fallback, the next closest weather station to the zip code centroid within the same climate zone should be used.

@mcgeeyoung mcgeeyoung added this to In progress in CalTRACK Feb 1, 2018

@mcgeeyoung mcgeeyoung moved this from In progress to To Do in CalTRACK Feb 1, 2018

@roberthansen

This comment has been minimized.

roberthansen commented Feb 8, 2018

I'm curious what other weather station selection methods were considered. Given the limited number of weather stations listed in the specification, it may be best to use the nearest available station data, but from what I've read, spatial interpolating by Kriging could be appropriate. The interpolation methodology would have to be adapted to accommodate for occasionally missing data.

@mcgeeyoung

This comment has been minimized.

Contributor

mcgeeyoung commented Feb 9, 2018

@roberthansen Do you have a link to the Kriging methodology? We hadn't considered it, but it may be particularly useful as we get into hourly savings methods and have more of a substantial need for complete hourly data.

@roberthansen

This comment has been minimized.

roberthansen commented Feb 9, 2018

Here's a thread on the topic, including links to examples in a few languages (R and Fortran); the given answer is a generalized case, but hopefully useful: https://stats.stackexchange.com/questions/157543/how-does-kriging-interpolation-work

ArcGIS also has some help documentation for using their implementations: http://resources.arcgis.com/en/help/main/10.1/index.html#/What_are_the_different_kriging_models/00310000003q000000/

And here's a Python implementation: https://github.com/bsmurphy/PyKrige

There are of course other interpolation methods that may be worth considering, but Kriging seemed to be the preferred option for temperature in my quick research.

@danrubado

This comment has been minimized.

danrubado commented Feb 13, 2018

I do not support spatial interpolation of weather data values for CalTrack for three reasons:

  1. This would complicate the process needed to obtain a weather value for a site and significantly increase the computing power requirements.
  2. There is a significant amount of error in spatially interpolated values, so I'm not convinced that the degree of accuracy would be improved over simply using the weather value from the nearest station. You may in fact be layering on an additional source of error.
  3. It is unclear how missing values at a particular station would be handled in the spatial interpolation scenario and it could compound the error introduced by missing values.

I support McGee's general approach to selecting a weather station for a site and using the data from that station. However, in the event of missing values in the time series, I would suggest the following:

  1. For up to two consecutive missing values, interpolate the missing value with the mean of the non-missing neighboring values. This will cover the vast majority of missing weather values.
  2. For more than two consecutive missing values, identify the next nearest station as a fallback, as suggested by McGee, and use the non-missing values from that station to fill in the time series.
  3. If the fallback station also contains missing values during the target time period, then identify the next closest station to fallback to.
@danrubado

This comment has been minimized.

danrubado commented Feb 14, 2018

Another weather station selection issue to consider: what subset of weather stations should be considered as sources of weather data? Not all weather stations are created equal and they have varying levels of data quality and reliability. In particular, issues to consider are the frequency of readings, percent of missing values in the time series, length of the time series, and calibration/accuracy of sensors. We should evaluate station reliability and develop (or adopt from other work) some criteria to screen out stations deemed unreliable.

@hshaban hshaban moved this from To Do to In progress in CalTRACK Feb 15, 2018

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Feb 15, 2018

Proposed plan:

  • Document methods for identifying primary and fallback weather stations using site address and ZCTA
  • Report on average temperature difference between a sample of primary and fallback weather stations
  • Develop criteria that define acceptable source of weather data and share with the working group (start from @danrubado's suggestions above)
  • Establish guidelines for filling in missing values, if present
@efranconi

This comment has been minimized.

efranconi commented Feb 15, 2018

If the approach for selecting the weather station and retrieving/manipulating data require testing, visualization, formatting, or adjusting, the freely available, open source tool , Elements, may be of use. See https://bigladdersoftware.com/projects/elements/ for more information.

@eliotcrowe

This comment has been minimized.

eliotcrowe commented Feb 15, 2018

Suggest dealing with hourly data separately (ie. different sub-sections), as the considerations will be different (eg. if talking about "two consecutive missing data points"). Based on (limited) recent experience it can be harder to find good quality complete hourly weather data, so we need to think carefully how we set the minimum data sufficiency requirement. Also, on the question of moving to the next-nearest weather station if you find bad data, CalTrack should define method in case that you end up 50 miles away, on the other side of a mountain or on the coastline, ie. in a place where weather may be different. Hard to be prescriptive about it, but you could at least require that the energy modeler justify their choice in cases where they choose a weather station outside of a specified radius.

@mcgeeyoung

This comment has been minimized.

Contributor

mcgeeyoung commented Feb 15, 2018

@eliotcrowe Your suggestion makes sense to me, as we'll have a number of similar questions regarding data sufficiency once we get to hourly. For now, we should focus this question on monthly and daily weather sufficiency.

@roberthansen

This comment has been minimized.

roberthansen commented Feb 15, 2018

@danrubado, thanks for pushing back on the interpolation methods. I'd like to continue this discussion before moving forward with nearest-neighbor interpolation (or, for that matter, Kriging). I'll address your concerns below.

  1. This would complicate the process needed to obtain a weather value for a site and significantly increase the computing power requirements.

I disagree that complexity would necessarily be increased significantly. With nearest-neighbor interpolation, each weather reading would, at worst, check each weather station's location and time stamps (88 stations if we constrain to the NOAA set, assume sorted lists/no searching for most recent) before selecting the nearest, checking for and applying the nearest (temporally and spatially) available reading. An alternative to nearest-neighbor would be triangulation, which would use three readings, rather than one, and could provide substantially more realistic values when sensors are geographically sparse. Inverse distance weighting and polynomial interpolation could use a subset (up to the full set) of all sensors, and again provide improved results at project sites. In each of these cases, the computational complexity is a function of both the fixed number of stations and the potentially growing number of sites. Since number of stations is constant, and the number of sites will likely exceed the stations, the complexity is essentially O(n), where n is the number of sites. Interpolation methods involving regression (e.g., Kriging) may add complexity to each interpolation calculation by a function of the number of stations, but as this number is constant, overall computational complexity remains O(n). In any case, any interpolation method is well-suited for parallelization. No interpolation method we might consider is going to overwhelm the processing capability a modern computer.

  1. There is a significant amount of error in spatially interpolated values, so I'm not convinced that the degree of accuracy would be improved over simply using the weather value from the nearest station. You may in fact be layering on an additional source of error.

This is a fair point, however it applies equally, if not more so, to the proposed nearest-neighbor interpolation. I have been looking for studies which either apply or recommend this method for weather data, but I haven't found any such endorsement. Authors acknowledge that it is available, and is useful for ease of calculation, but only produces somewhat realistic values unless weather stations are immediately adjacent to the location of interpolation. I believe applying this interpolation across the distances between the 88 NOAA stations will yield unsatisfactory approximations of the weather at project sites. If, however, we allow (or require) implementers to setup their own weather stations at each site, or use nearby, reliable stations within, say, 1 to 10 miles, depending on the terrain and climate zone, this method would be justifiable.

One advantage Kriging holds over other interpolation methods is that it gives estimations of the error in the interpolated value. All interpolation methods necessarily layer on additional errors to those intrinsic to the station data. With nearest-neighbor interpolation, it's clear that these errors are significant, but just much is an unknown.

  1. It is unclear how missing values at a particular station would be handled in the spatial interpolation scenario and it could compound the error introduced by missing values.

Again, resorting to nearest-neighbor interpolation does not address this concern. Indeed, Kriging can be generalized into three (2 space + 1 time) or four (add elevation/adiabatic lapse rate), and interpolations between readings in space and time become automatic--missing data points are no more of a problem than a weather station that never existed.

Finally, here's another resource on several common interpolation methods used for meteorological data, including nearest-neighbor and several flavors of Kriging:
Interpolation Methods for Climate Data

Since CalTrack aims to be an open, repeatable methodology, I would propose considering the deterministic interpolation methods described in the link above, as anyone could apply the methodology to the same data and return identical results. Kriging, while potentially a more accurate method, is stochastic in nature, and thus won't return the same results on repeated analysis. Thus, I propose either the inverse distance weighting (easier to code) or linear regression (better results).

@roberthansen

This comment has been minimized.

roberthansen commented Feb 15, 2018

Regarding missing data, I recommend reviewing literature for best practices, as I've seen authors suggest different approaches depending on the sampling interval. I'm not sure how real-time the calculations are to be done, but if we're talking about intermittently missing historical data, it makes sense to interpolate between both nearby stations and the closest station before and after the outage. For real-time calculation, we wouldn't have the nearest reading after the outage, but could still use the reading before the outage.

More generally, however, I suggest dropping the weather station paradigm and dealing with weather readings independently, so for each site, at each time of interest, we would interpolate among a time- and space-windowed set of readings. For nearest-neighbor interpolation, this set would always be one reading (alternatively, interpolating in time between nearest-station readings is another possibility, in which case we would be looking at consecutive readings from one station). For more advanced interpolation methods, the window can be expanded arbitrarily or based on geographical features (i.e., mountain ranges). In this case, we wouldn't need to specify data sufficiency, but would need to define minimum sensor quality, including proper setup. This would open up the possibility of incorporating temporary and mobile weather stations.

@mcgeeyoung

This comment has been minimized.

Contributor

mcgeeyoung commented Feb 16, 2018

It might be helpful to run an out of sample test using a sample of homes in which we select weather data from the primary weather station and then for the same sample of homes we select the fallback weather station data. It may turn out that the difference is large enough to justify additional cleaning of weather data. It also might turn out to not matter very much.

@mcgeeyoung

This comment has been minimized.

Contributor

mcgeeyoung commented Feb 20, 2018

One of the first cuts at weather data we looked at so far is how consistent the weather is across a state over the course of a year. Of course this will raise more questions than answers, but I thought others would find this interesting as we dig in.

These are plots of all the weather stations in several states (New York, Texas, Illinois, Colorado, and California). You can see that California actually has three weather patterns, but in the other states, the weather is remarkably consistent across the state. The next question will be, how much do the variations in the weather matter?

image 6
image 5
image 4
image 3
image 2

@philngo

This comment has been minimized.

Contributor

philngo commented Mar 5, 2018

I used the following testing methodology to test methods for matching a site to a weather station.

To isolate differences in weather from differences in model fits due to the peculiarities of any particular site, this methodology avoids using meter data and focuses entirely on temperature data. Temperature models as described in the daily and billing methods assume that an accurate source of site temperature data is available. Weather station matching algorithms are developed to assist in the process of finding appropriate weather station from which to obtain this temperature data. But to evaluate the effectiveness of these weather mappings, we need a way to test them for accuracy.

What I'm proposing as a more general testing method for weather station matching algorithm is to bootstrap a scenario in which we have knowledge of weather at a site - in this case by matching weather stations sites to other weather stations. Because we have a clear picture of weather at each weather station site, we can use that to match it to other stations using metrics such as weather-to-weather cosine distance or rmse we can't normally use without “ground truth” temperature data (because few buildings have a reliable data source of outdoor temperature data on site). The assumption, of course, is that the most similar source of weather data will be the best one to use in modeling (I see no reason why this should not be the case - but I could imagine someone coming up with a different similarity metric that more heavily weight the similarity of the shape or mean. This method should work even in those scenarios.)

Weather similarity was determined in the following way. For all weather stations in california meeting the data quality pre-screen, we pulled hourly temperature data for Jan 2014 to Jan 2018. We combined 3 distance metrics (rmse, km, and cosine_dist) by adding each rank together with equal weight (like scoring a racing team). Using that metric there are 12 (of 115) stations which have "best" stations with scores of 10 or higher, meaning for the remaining stations the worst rank we see is a 6 in any category, showing some evidence of stability. Two stations have scores are above 20, max is 29. Ranks correspond to size on the map below. A rank with a larger number means that the “best” station was less convincingly dominant in the three ranked similarity scores.

Example:

Station 1: rmse rank 3, distance rank 4, cosine distance rank 5 -> score 12
Station 2: rmse rank 2, distance rank 6, cosine distance rank 3 -> score 11

Station 2 has a better ranked similarity than Station 1.

Bootstrapped “ground truth” station mappings:

target_usaf_id similar_usaf_id combined_distance_rank rmse rmse_rank cos_dist cos_dist_rank km_dist km_dist_rank
690150 723815 5.0 1.760808890966525 1.0 0.0029029497753592093 1.0 83.682 3.0
720267 724837 5.0 2.794190064344386 1.0 0.01073248768748436 2.0 36.247 2.0
720406 724955 5.0 2.0904500132732515 2.0 0.008210845890239793 2.0 24.154 1.0
720576 724839 5.0 1.3623397509372746 2.0 0.0026446847267995732 1.0 24.697 2.0
720614 724837 4.0 1.7408885711723603 1.0 0.003954363607838629 1.0 25.861 2.0
720646 994016 7.0 2.0617691221160306 1.0 0.011255310491542936 2.0 32.785 4.0
722810 747185 3.0 1.0586403226385692 1.0 0.0008321533074979737 1.0 9.919 1.0
722860 722899 4.0 2.3280057727302763 1.0 0.005756721764839212 1.0 36.643 2.0
722868 747187 11.0 3.1354498183700685 5.0 0.006719090296968044 5.0 38.603 1.0
722869 747040 5.0 1.160539677438173 1.0 0.001530586319751559 1.0 18.822 3.0
722874 722956 3.0 1.355535966633939 1.0 0.0021360037650760555 1.0 11.886 1.0
722880 722886 3.0 1.504106777422174 1.0 0.0022835614351165434 1.0 12.114 1.0
722885 722950 4.0 0.9740131693129869 1.0 0.0013737849462196472 1.0 10.376 2.0
722886 722880 3.0 1.504106777422174 1.0 0.0022835614351165434 1.0 12.114 1.0
722897 723940 4.0 1.9882377990944902 1.0 0.006060839737376256 1.0 41.386 2.0
722899 722869 4.0 1.7284820096794442 1.0 0.0033481002432598217 1.0 18.385 2.0
722900 722906 3.0 1.4221952753151992 1.0 0.0022513261061404277 1.0 4.093 1.0
722903 722931 4.0 1.4009954336546515 1.0 0.002288895446073136 2.0 5.683 1.0
722904 722903 6.0 1.4719330736236913 1.0 0.002222757911406914 1.0 30.94 4.0
722906 722900 3.0 1.4221952753151992 1.0 0.0022513261061404277 1.0 4.093 1.0
722909 722900 6.0 2.021007940489647 2.0 0.004703526200377728 1.0 19.527 3.0
722910 723927 9.0 2.9380436281265507 1.0 0.01291509937507529 5.0 109.1 3.0
722920 722950 8.0 3.607615190733409 1.0 0.01897326651297604 3.0 59.171 4.0
722926 722934 3.0 1.8075481709095842 1.0 0.005045485894566437 1.0 8.984 1.0
722927 722900 10.0 1.7098775636771857 3.0 0.0024493634412842136 1.0 44.608 6.0
722931 722903 3.0 1.4009954336546515 1.0 0.002288895446073136 1.0 5.683 1.0
722934 722926 3.0 1.8075481709095842 1.0 0.005045485894566437 1.0 8.984 1.0
722950 722885 4.0 0.9740131693129869 1.0 0.0013737849462196472 1.0 10.376 2.0
722953 723820 7.0 1.9704791325821067 1.0 0.004509860558405743 1.0 48.964 5.0
722956 722950 3.0 1.2815614867959255 1.0 0.0016211982145214465 1.0 5.35 1.0
722970 722975 6.0 1.4401637629219335 3.0 0.0023332021137184578 2.0 9.039 1.0
722975 722970 4.0 1.4401637629219335 1.0 0.0023332021137184578 2.0 9.039 1.0
722976 722975 5.0 1.712917649314265 3.0 0.001298969510911574 1.0 11.33 1.0
722977 722970 5.0 1.5284067166071682 1.0 0.003051723870582279 1.0 29.79 3.0
723171 723810 3.0 1.9193043483798962 1.0 0.00421415990370666 1.0 9.764 1.0
723805 747188 3.0 2.0973982179590167 1.0 0.002566496159500775 1.0 127.754 1.0
723810 723816 6.0 1.739000354232614 1.0 0.003657823463418808 1.0 36.157 4.0
723815 690150 5.0 1.760808890966525 1.0 0.0029029497753592093 1.0 83.682 3.0
723816 723820 3.0 1.5122638734652933 1.0 0.0028071033275083312 1.0 17.086 1.0
723820 723816 3.0 1.5122638734652933 1.0 0.0028071033275083312 1.0 17.086 1.0
723825 723810 5.0 1.8546192910916661 2.0 0.004075789663344431 2.0 56.578 1.0
723830 722953 8.0 4.2900464052860405 2.0 0.015399997055774994 1.0 63.529 5.0
723840 723895 7.0 2.655344581521793 2.0 0.004848406298761487 3.0 66.022 2.0
723890 723896 6.0 1.823491730805995 1.0 0.002538492940278081 1.0 58.783 4.0
723894 725847 9.0 3.262142536373841 1.0 0.03094970389745022 1.0 172.5 7.0
723895 723896 3.0 1.3464441815158783 1.0 0.002101604279398428 1.0 44.05 1.0
723896 723898 3.0 1.328177873057274 1.0 0.0018201756951511383 1.0 20.565 1.0
723898 723896 3.0 1.328177873057274 1.0 0.0018201756951511383 1.0 20.565 1.0
723910 723926 3.0 1.555116589061642 1.0 0.003342112703195288 1.0 11.527 1.0
723925 723927 13.0 2.5787878886747104 1.0 0.010928760710007923 10.0 63.638 2.0
723926 723910 3.0 1.555116589061642 1.0 0.003342112703195288 1.0 11.527 1.0
723927 723910 4.0 1.6236212696277132 1.0 0.0038059827152203685 1.0 12.477 2.0
723930 723940 3.0 2.11485801888213 1.0 0.004867661536503642 1.0 22.896 1.0
723940 723930 4.0 2.11485801888213 2.0 0.004867661536503642 1.0 22.896 1.0
723965 745046 29.0 3.675713131477007 13.0 0.01386818565461334 3.0 153.441 13.0
724800 723810 22.0 3.967337485224125 2.0 0.017580431940864893 1.0 277.723 19.0
724815 745046 3.0 1.2620709321717551 1.0 0.0019448620636599578 1.0 48.602 1.0
724828 720576 5.0 1.7612014214459972 1.0 0.003199410219409282 1.0 23.008 3.0
724830 724839 3.0 1.1463452271051369 1.0 0.0019252104474736242 1.0 22.553 1.0
724837 724838 3.0 1.035566924379123 1.0 0.001457527270072534 1.0 12.171 1.0
724838 724837 3.0 1.035566924379123 1.0 0.001457527270072534 1.0 12.171 1.0
724839 724830 3.0 1.1463452271051369 1.0 0.0019252104474736242 1.0 22.553 1.0
724915 725930 3.0 1.83259174955142 1.0 0.006165746580467979 1.0 22.814 1.0
724920 724926 5.0 1.667907123369647 3.0 0.002512643770906653 1.0 38.11 1.0
724926 724920 4.0 1.667907123369647 2.0 0.002512643770906653 1.0 38.11 1.0
724927 724950 7.0 1.6102533693470389 1.0 0.0039645516024913174 1.0 39.387 5.0
724930 725850 4.0 1.262670937156364 1.0 0.002684861700151586 1.0 11.946 2.0
724938 994041 3.0 1.1649927390828216 1.0 0.0021813578613403273 1.0 4.557 1.0
724940 994041 7.0 1.2916586741906542 1.0 0.003351770789603914 1.0 19.23 5.0
724945 745090 3.0 1.1233769551767403 1.0 0.0019788737145625124 1.0 12.157 1.0
724950 998011 7.0 1.8669012886347405 3.0 0.003626304736866759 1.0 25.196 3.0
724955 998011 7.0 2.393993339185215 3.0 0.008128022962441195 2.0 23.483 2.0
724957 720406 3.0 1.933338916558483 1.0 0.00720761288741012 1.0 45.397 1.0
725845 720267 8.0 6.0503097659322425 5.0 0.03344247276056489 2.0 48.093 1.0
725846 725847 3.0 2.555160487266816 1.0 0.02241542592357526 1.0 48.476 1.0
725847 725846 3.0 2.555160487266816 1.0 0.02241542592357526 1.0 48.476 1.0
725850 724930 3.0 1.262670937156364 1.0 0.002684861700151586 1.0 11.946 1.0
725905 720576 14.0 2.694372584074183 2.0 0.010460459313312298 6.0 139.621 6.0
725910 725920 3.0 1.4404053024976313 1.0 0.0024308543298690033 1.0 40.82 1.0
725920 725910 3.0 1.4404053024976313 1.0 0.0024308543298690033 1.0 40.82 1.0
725930 745058 4.0 1.496962023081087 1.0 0.004680190422240882 1.0 34.235 2.0
725945 725946 4.0 1.628227099758966 1.0 0.008755591630530524 1.0 89.712 2.0
725946 725945 3.0 1.628227099758966 1.0 0.008755591630530524 1.0 89.712 1.0
725955 725957 3.0 2.803911934740981 1.0 0.015005596368414365 1.0 51.015 1.0
725957 725955 3.0 2.803911934740981 1.0 0.015005596368414365 1.0 51.015 1.0
725958 725957 4.0 3.25483885927589 1.0 0.02530005889993181 2.0 148.939 1.0
745046 724815 4.0 1.2620709321717551 1.0 0.0019448620636599578 1.0 48.602 2.0
745048 724837 3.0 1.6227576906102215 1.0 0.0025188114596172984 1.0 42.725 1.0
745056 747186 15.0 2.861314616343396 2.0 0.012133594346042331 3.0 62.025 10.0
745058 725930 3.0 1.496962023081087 1.0 0.004680190422240882 1.0 34.235 1.0
745090 724945 3.0 1.1233769551767403 1.0 0.0019788737145625124 1.0 12.157 1.0
745160 998011 3.0 1.5344289080665217 1.0 0.0032764997313357025 1.0 9.212 1.0
746110 723815 5.0 2.2922063949102345 2.0 0.004638251623902856 2.0 49.463 1.0
746120 723810 7.0 2.875698415123137 2.0 0.0060276862791801555 2.0 88.846 3.0
747020 723898 3.0 1.5193924996683617 1.0 0.0026699865592928473 1.0 28.864 1.0
747040 722869 4.0 1.160539677438173 1.0 0.001530586319751559 1.0 18.822 2.0
747185 722810 3.0 1.0586403226385692 1.0 0.0008321533074979737 1.0 9.919 1.0
747186 745056 5.0 2.861314616343396 1.0 0.012133594346042331 1.0 62.025 3.0
747187 722810 6.0 2.2543688424285824 1.0 0.0037971592056093018 1.0 100.196 4.0
747188 747185 4.0 2.0432521217036763 1.0 0.003022958357464911 2.0 118.662 1.0
749171 723830 18.0 4.378775095814953 1.0 0.026894924954008026 14.0 50.591 3.0
994016 994036 3.0 1.9120075661653697 1.0 0.004687749234145944 1.0 14.597 1.0
994017 725946 6.0 1.8942954233218885 2.0 0.010900272205825123 2.0 112.515 2.0
994023 998340 8.0 2.5174319909105254 1.0 0.010982755643330622 1.0 132.093 6.0
994028 722950 5.0 2.1342051520186907 2.0 0.005406187460311185 1.0 12.865 2.0
994033 994036 6.0 1.000336461069355 1.0 0.0021270349581855585 1.0 19.503 4.0
994034 998474 6.0 1.6961340490654202 2.0 0.004814980899285559 1.0 19.395 3.0
994036 994033 5.0 1.000336461069355 1.0 0.0021270349581855585 1.0 19.503 3.0
994041 724938 3.0 1.1649927390828216 1.0 0.0021813578613403273 1.0 4.557 1.0
994044 994016 13.0 2.1480990078054965 3.0 0.008201821450925295 2.0 164.05 8.0
998011 745160 3.0 1.5344289080665217 1.0 0.0032764997313357025 1.0 9.212 1.0
998340 723927 10.0 2.4091666265174463 2.0 0.008563513391249833 6.0 47.443 2.0
998474 994036 7.0 1.5463346318869926 3.0 0.0039232833476796625 2.0 18.932 2.0

ground_truth_rmse_cos_dist_km_dist_ranks

The two methods I tested are as follows:

Method A:

  1. Determine the candidate stations which will be considered. (115 Caltrack stations)
  2. Determine the climate zone standard(s) which will be considered. (Considering the following standards for US sites and stations: (a) IECC Climate Zone, (b) IECC Moisture Regime, (c) Building America Climate Zones, (d) CEC California Building Climate Zone Areas)
  3. Determine the climate zone inclusion of the site, i.e, for each considered climate zone standard, determine which climate zones contain the site.
  4. Determine the (next) closest candidate station.
  5. Determine the climate zone inclusion of the candidate station.
  6. Reject the candidate station if any of the following are true:
    a. The candidate station is further than 150 km of the site.
    b. The candidate station climate zone inclusion does not match the site climate zone inclusion for all considered climate zone standards.
    c. The candidate station does not have sufficient data quality when matched with site meter data.
  7. If criteria in (6) are met, use the candidate station. If criteria are not met, continue to test other candidates by moving back at step 3.
  8. If no stations meet the criteria above, use method B.

Method B:

  1. Determine the candidate stations which will be considered. CalTRACK recommends considering all ISD stations. If Method A did not succeed, CalTRACK allows using weather data not obtained
  2. Determine the (next) closest weather station.
  3. Reject the candidate station if any of the following are true:
    a. The candidate station does not have sufficient data quality when matched with site meter data.
    b. If no stations meet the criteria above, consider the site unmatched.

For each method, we measured the percentage of station-sites for which the method found the “ground-truth” station and used this to “score” the method. These scores are reported below and were used to determine preferred order of the methods.

Method A: 56%
Method B: 53%

Therefore, Method A is recommended in favor of Method B. Method A scores 56% because 56% of the selections made by Method A (restricted to the test station set) exactly matched the ground truth ranked distance selection in the table above. There is likely room for improvement here if other methods are proposed, for example one using elevation (although using a method like that would require having elevation data at the sites in question, which may not always be practical). One could imagine using a scoring method similar to this (i.e., using bootstrapped weather stations as sites) to score the accuracy of various kriging methods, although I did not.

@philngo

This comment has been minimized.

Contributor

philngo commented Mar 5, 2018

Another effort I made was to test the effect of using poorly matched weather data on the following two meters from the greater Philadelphia area (matched weather station USAF ID 725113, ~8km from site) , using the Caltrack 1 Daily model (hdd range 55-65, increment 1, cdd range 65-75, increment 1). Basic fits on two years of data show below. (Also performed fits on first year and second year alone and both showed little difference):

gas_model
elec_model

Then I picked two weather stations from each state, the idea being that the weather at these stations would diverge widely (but realistically) from the weather observed at the station closest to the ISD station.

usaf_ids = [
    '720265', '720363', '702710', '998224', '720644', '722786', '723419', '723450', '722956', '722931',
    '724769', '720535', '725046', '725086', '997281', '997694', '722069', '722067', '722137', '720712',
    '911820', '911900', '720322', '725866', '744662', '722075', '724387', '720575', '725480', '725477',
    '724585', '724580', '724238', '720447', '720346', '722312', '997707', '726196', '722218', '997781',
    '725107', '744104', '725373', '725408', '726583', '726564', '723320', '998239', '724460', '724459',
    '727750', '727690', '725540', '725625', '725825', '724770', '726165', '742078', '997687', '724094',
    '722710', '723658', '744989', '744864', '723165', '723156', '727675', '720863', '724298', '725210',
    '723560', '722091', '726945', '726985', '724080', '725434', '722151', '725070', '723119', '723119',
    '726625', '726605', '723340', '721031', '720315', '722533', '725755', '725720', '720493', '726140',
    '724053', '724053', '727985', '722208', '724176', '724273', '720327', '726430', '720345', '725776'
]

Then I fit the models using the same CalTRACK 1.0 daily method with the same meter data but different the weather data - once for each of these stations.

The results are charted below, with results for electricity on the left and gas on the right.

Commentary on each row of charts:

  1. No surprises here. Temperature differences (rmse) increase with distance. Much variation seems to appears within the first 1000km. Outliers are AK, HI. Temperature RMSE is calculated on hourly frequency data.
  2. Model r-squared tends to decrease (but not always!) with temperature. R-squared appears for this meter data to be fairly consistent out to 5-ish degrees F average difference in temperature.
  3. Model in-sample prediction error increases mildly with stronger temperature differences. Model fit on first year of data and predicted over that same year.
  4. Model out-of-sample prediction error increases mildly with stronger temperature differences. Model fit on first year of data and predicted on second year of data.
  5. In-sample predictions are very consistent, even with high temperature differences. (this is expected, due to model fitting process).
  6. Out-of-sample predictions are surprisingly consistent, even with high temperature differences. Models fit on first year of data and predicted on second year of data. The consistency of the annual totals is striking, and shows some resilience of the predicted totals to badly fitted weather data. (although consistency in disaggregated, monthly, daily, or other breakdowns has yet to be measured.
  7. (and 8) Heating and cooling balance points (and other model parameters too, presumably) appear to change dramatically due to differences in weather data and this suggests that models are highly coupled to weather station/source, and thus a model which was fit with data from one weather station should not be used to predict with data from another weather station.

temp_data_error_split_with_bp

@steevschmidt

This comment has been minimized.

steevschmidt commented Mar 6, 2018

From the summary:

This indicates that the accuracy of weather data does not have a significant effect on annual energy savings predictions, even in extreme cases.

There are a number of possible explanations for this. I would suggest a more accurate conclusion is:

This indicates that the accuracy of weather data does not have a significant effect on [CalTRACK's existing] annual energy savings predictions [on this building data set], even in extreme cases.

@philngo

This comment has been minimized.

Contributor

philngo commented Mar 6, 2018

That was the intended conclusion, thank you for the clarifying language.

@eliotcrowe

This comment has been minimized.

eliotcrowe commented Mar 6, 2018

@philngo

This comment has been minimized.

Contributor

philngo commented Mar 6, 2018

"Accuracy" is definitely the wrong language, as it necessitates having something to measure against - a ground truth. I did not measure or get access to that ground truth at this or most other sites, so the lack of a ground truth in this case makes my choice of the word "accuracy" totally meaningless. Oops!
I also didn't actually measure statistical significance, so I'm going to avoid using "significant" as well.
Let's go with this milder version:

This indicates that the choice of weather station did not have a visible effect on CalTRACK's existing annual energy savings predictions in this building data set, even in extreme cases.

@steevschmidt

This comment has been minimized.

steevschmidt commented Mar 6, 2018

I raised the issue of accuracy in issue 73. As described there, I believe you could use a best-in-class building energy simulation tool (like CBECC-Res) to run various tests with different building types and weather data to test CalTRACK methods. These modeling tools create highly accurate disaggregated energy data for these virtual buildings, with the benefit of having no real occupants to mess up the measurements. This may be the closest thing we can get to "ground truth data".

@philngo

This comment has been minimized.

Contributor

philngo commented Mar 6, 2018

Right! It's good to remember that there are many different ways to test these methods, and anything that gives us more (or less!) confidence in these methods can be helpful. It's likely there is something to gain from using simulations as a proxy for true ground truth. I think that at the end of the day what matters is the extent to which CalTRACK methods are reliable under a variety of circumstances. The better we understand these methods and their characteristics the better equipped we are to make those methodology decisions.

@hshaban hshaban moved this from In progress to Done in CalTRACK Mar 30, 2018

@hshaban

This comment has been minimized.

Collaborator

hshaban commented Jul 26, 2018

This update has been integrated in CalTRACK 2. Closing this issue. Added an issue for weather data averaging in Sandbox #91

@hshaban hshaban closed this Jul 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment