-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New sample calculation with proof #986
Conversation
@@ -219,8 +219,10 @@ def _sample_probability_distribution(prob_dist, num_samples) | |||
end | |||
|
|||
return_samples = {} | |||
return_samples_v2 = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version.
Once the new version is accepted to be okay, we can remove the old version. Keeping both for now to make it possible to compare.
resources/run_sampling.rb
Outdated
@@ -268,14 +281,33 @@ def _sample_probability_distribution(prob_dist, num_samples) | |||
end | |||
end | |||
|
|||
if not (return_samples == return_samples_v2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most cases, the new version is equal to old version; When they are not, making sure that the difference can be justified.
I sampled project_national w/ 350k datapoints on Windows Subsystem for Linux.
All three produced CSV files of roughly the same size, so that's a good sign. Logs: |
@shorowit Thanks! |
Here is what I got on Mac and on Eagle. Eagle Login Node: run_sampling.py.m1mac.txt WSL seems to be doing something really bad for python! |
Here are my results on Windows.
Very impressive! Definitely helps to make use of multiple cpu cores (my machine has 16). Not sure what's going on w/ my WSL results above. |
There is a certain limitation of current sampling approach. Consider this simplified case of just two TSVs.
Ceiling Fan.TSV
For n=10, one would expect two samples each for bedrooms = 1 through 5. However, run_sampling.rb on these TSVs would generate these samples on buildstock.csv:
There are 5 samples each for None and Standard Efficiency and no samples for Premium Efficiency. This limitation is easily explained by the way current sampling works. It first samples Bedrooms that has no dependency, so it assigns two samples for each number of bedrooms. Then for each of these two samples, it tries to distribute the Fan Efficiency. Since there are 3 fan efficiencies, but only two samples, it picks the two option with the highest probability. While this ensures that the distribution follows the TSV specification as close as possible for this particular bedroom, on doing so for all bedrooms, we miss out on the fact that the probability for Option=Premium, small as it may be, should add up and we should eventually get some samples for it. Although this issue is demonstrated for a case where the number of samples is very small, this phenomena can occur even for the case of very large n when sampling within a narrow slice created by a large number of dependencies and their options. I am thinking of a slight modification to the sampling approach where we accumulate the probability of discarded options, and use that to ensure that low probability options also get some samples. |
In my opinion, this is a feature, not a limitation/bug. Indeed the sampling algorithm was intentionally developed this way. In your example, if you did end up end up with 2 Premium Efficiency ceiling fans, they would be randomly assigned to buildings. If you were to then look at ceiling fan energy use as a function of number of bedrooms, the trend would have random spikes based on this random assignment and a user would likely be confused. Or, a user might grab the two 2-bedroom building models and use them for an analysis assuming that these models reflect the highest probability building characteristics, but they would not. The bottom line is that there is no good sampling result when you have too few samples for a given distribution. And it's important that ResStock maintain enough samples given its distributions. In the absence of having enough samples, your proposed approach may be what some users want and the current approach may be what other users want. The current algorithm reflects our consensus from many years ago as to which use cases are more important. @afontani may have thoughts here, I think that he and I had some discussions about all of this many years ago. |
@shorowit I agree it's trade-off between two things - making global distribution more accurate vs making local distribution more accurate. My feelings was that we have more confidence on global distribution (what % of homes nationally have premium fan) than on local distribution (of homes that have 2 bedrooms, what % have premium fan), so it might be more suitable to prioritize that. But, yeah, my presumption maybe wrong. Also, I might be raising sort of a false flag. I did notice some parameters had slightly (very tiny in fact; couple of samples in a 100K total samples), different global distribution than what one would expect based on sampling_probability calculation. I was investigating what could have caused that. I am working on a test to validate the generated buildstock.csv against the TSVs and was investigating this as a part of that. I will see if we even have any substantial discrepancy for any parameters; maybe we don't. Anyway, this was an interesting exercise on understanding the behavior of sampling. |
@rajeee I completely agree with your assessment. And to be clear, I'm not arguing that the current behavior is correct just because it's always been this way. I look forward to the results of your investigations and any improvements you can make! Developing a test around all of this is a definite good first step. |
@rajeee: Thanks for thinking through this. I also look forward to any results or comparisons at larger sample sizes to see if there are any issues. I agree with @shorowit about this not actually being a bug but a feature. I am not positive, but I think this behavior helps the algorithm converge to the distributions at I think we will have issues with small sample sizes regardless of the algorithm. In this case, I would prefer the algorithm as it is currently written. But for larger sample sizes, like for a national run, 500k samples, I am open to modifying if there are use cases that properly motivate the change. I do worry about how people slice our data from a large run and if the user is getting a more representative set of samples or a set of edge cases. |
@shorowit @afontani Thanks for the comments. As per your feedback, I decided to keep the current approach since it has better accuracy in regions where we have low samples. I continued the development of the python sampler here: NREL/buildstockbatch#308 I verified the samples generated by all three versions (ruby_v1, ruby_v2 and python) to be correct. I did 5 sampling runs at n=350K between ruby_v1, ruby_v2 and python sampler using different random seed each time and compared the sampling_error (errors for both dependency groups, and errors for options). There is no qualitative difference between them. This is for 350K samples. So, we have up-to 10% option distribution error and 18% group distribution error for some of the large TSVs. |
Moved to NREL/buildstockbatch#308 |
…a8d95 fda505a8d95 Update measure argument description. 5ea2342916f Update changelog and docs. a63142a44f7 Add reporting test for advanced output variables. f574c52e842 Add reporting test and update template osw. 18193edc885 Add new optional argument for output variables. 602378b40ca Merge pull request #1001 from NREL/eri_reporting ce9a1a6b90e Latest results. 1a9b90987d5 Increase number of output decimal places and a little code cleanup. c67d0f7a780 Update xml 59bab55d0bf EC bugfix. 66d21ad412d Refactor/simplify ReportSimulationOutput for ERI purposes. No longer creates a separate *_ERI.csv. 0457f4db9f9 Revert "Add HPXML has_fuel helper method." 365c8e2fcdd Add HPXML has_fuel helper method. c8b56ceb83a Fix case. 67e93b0c152 Clarify documentation for interior surfaces (or flat roofs) regarding Azimuth/Orientation defaulting. a7f6341d493 Merge pull request #999 from NREL/shared_recirc_temperature d5fc0daed83 Drop ABCDE 4bb47a2d00a Updated xml 5c3c071f772 Adds support for shared hot water recirculation systems controlled by temperature, per ANSI/RESNET/ICC 301-2019 Addendum C. 916014c91c8 Merge pull request #996 from NREL/battery-docs 48503939ee9 Merge pull request #995 from NREL/fossil_fuel_emissions_factors d0a9528e078 Minor tweak. 9c4727d50e7 Adds some explanation for how batteries are controlled and other misc cleanup. 6d9edb4f618 Latest results. e0369eb925d Rename "CO2" to "CO2e". Updated cambium data to be CO2e and combined (combustion + pre-combustion). 6c9cd855816 Updated default fossil fuel emissions factors to include pre-combustion (e.g., methane leakage for natural gas). 58bccc6420c Merge pull request #992 from NREL/minitest_helper2 9b7ce3f3c02 Merge branch 'master' of https://github.com/NREL/OpenStudio-HPXML into minitest_helper2 6a451ecdb45 Revert to original config 713fb844c91 Merge pull request #993 from NREL/goodbye_beta bfa2365a843 Remove the "Beta" label from the BuildResidentialHPXML measure. [ci skip] f5729c759b8 Aha, this test file was not loading minitest_helper.rb. 77ca82d4152 Try excluding build_residential_hpxml_test.rb 854d4ccb090 First, reproduce issue on CI. d174ace1df1 Merge pull request #987 from NREL/hvac_iq_limits 9d40f9e25e6 Update Changelog.md [ci skip] 7a1644d70e9 Merge branch 'master' of https://github.com/NREL/OpenStudio-HPXML into hvac_iq_limits 06bd81964cc Merge pull request #986 from NREL/docs_on_ci afe5d64f823 Merge branch 'master' of https://github.com/NREL/OpenStudio-HPXML into docs_on_ci e5de16e128d Merge pull request #988 from NREL/minitest_helper 66bf8248d02 Possible fix to obtain code coverage on CI? e6ee2daa0cb Try reverting minitest_helper.rb change. 33b4d180cf9 Adds more stringent limits for `AirflowDefectRatio` and `ChargeDefectRatio` (-0.9 to 9, which now allows values from 1/10th to 10x the design value). f1c13fb6871 Re-enable CI steps 913f947f19a Fix error? 3c21bf713a8 Install pip 3afc9dcf5c0 Install sphinx f8e4ca4a4c8 Merge branch 'master' of https://github.com/NREL/OpenStudio-HPXML into docs_on_ci f0844a93edf Merge pull request #978 from NREL/leap-year-handling 040604b1106 Add newline 44224c6a06d Build docs on CI, store as artifacts. 852a36d084f Update workflow test file. bc8ccc2868a Update error test. 219cbdfee88 Update error message. 69319016c04 Update changelog and docs. f782deade4f Merge branch 'master' into leap-year-handling 39686d9de69 Merge pull request #980 from NREL/report-sim-output-setup-fix a80a1f25183 Return if model not retrieved. 6dd4463a51d Latest results. 335370e3c0b Update defaults header test. bbd8c3bcc1a Add test for throwing error. 24936969f9e Throw error if leap year with TMY. 14519b064f5 Clean up translator measure. c7a8ad6a2ef Change to non leap year in test xml file. 6a1ea0ad3ba Update README.md git-subtree-dir: resources/hpxml-measures git-subtree-split: fda505a8d95267aaee4f8ae832f338f321981d00
…f619e 75de13f619e Simpler fix to negative peak/design values. 51dbd82f910 Merge pull request #1007 from NREL/negative_peak_flow_rate f5ee3eee6a6 Prevent possibility of trying to set, e.g., negative peak flow rate. 0a561c1369e Merge pull request #1006 from NREL/door_bugfix b5440086603 Bugfix for cases where door R-value w/o air film goes negative. (Can occur for, e.g., CZ 1 where IECC door U-factor is 1.2.) 602378b40ca Merge pull request #1001 from NREL/eri_reporting ce9a1a6b90e Latest results. 1a9b90987d5 Increase number of output decimal places and a little code cleanup. c67d0f7a780 Update xml 59bab55d0bf EC bugfix. 66d21ad412d Refactor/simplify ReportSimulationOutput for ERI purposes. No longer creates a separate *_ERI.csv. 0457f4db9f9 Revert "Add HPXML has_fuel helper method." 365c8e2fcdd Add HPXML has_fuel helper method. c8b56ceb83a Fix case. 67e93b0c152 Clarify documentation for interior surfaces (or flat roofs) regarding Azimuth/Orientation defaulting. a7f6341d493 Merge pull request #999 from NREL/shared_recirc_temperature d5fc0daed83 Drop ABCDE 4bb47a2d00a Updated xml 5c3c071f772 Adds support for shared hot water recirculation systems controlled by temperature, per ANSI/RESNET/ICC 301-2019 Addendum C. 916014c91c8 Merge pull request #996 from NREL/battery-docs 48503939ee9 Merge pull request #995 from NREL/fossil_fuel_emissions_factors d0a9528e078 Minor tweak. 9c4727d50e7 Adds some explanation for how batteries are controlled and other misc cleanup. 6d9edb4f618 Latest results. e0369eb925d Rename "CO2" to "CO2e". Updated cambium data to be CO2e and combined (combustion + pre-combustion). 6c9cd855816 Updated default fossil fuel emissions factors to include pre-combustion (e.g., methane leakage for natural gas). 58bccc6420c Merge pull request #992 from NREL/minitest_helper2 9b7ce3f3c02 Merge branch 'master' of https://github.com/NREL/OpenStudio-HPXML into minitest_helper2 6a451ecdb45 Revert to original config 713fb844c91 Merge pull request #993 from NREL/goodbye_beta bfa2365a843 Remove the "Beta" label from the BuildResidentialHPXML measure. [ci skip] f5729c759b8 Aha, this test file was not loading minitest_helper.rb. 77ca82d4152 Try excluding build_residential_hpxml_test.rb 854d4ccb090 First, reproduce issue on CI. d174ace1df1 Merge pull request #987 from NREL/hvac_iq_limits 9d40f9e25e6 Update Changelog.md [ci skip] 7a1644d70e9 Merge branch 'master' of https://github.com/NREL/OpenStudio-HPXML into hvac_iq_limits 06bd81964cc Merge pull request #986 from NREL/docs_on_ci afe5d64f823 Merge branch 'master' of https://github.com/NREL/OpenStudio-HPXML into docs_on_ci e5de16e128d Merge pull request #988 from NREL/minitest_helper 66bf8248d02 Possible fix to obtain code coverage on CI? e6ee2daa0cb Try reverting minitest_helper.rb change. 33b4d180cf9 Adds more stringent limits for `AirflowDefectRatio` and `ChargeDefectRatio` (-0.9 to 9, which now allows values from 1/10th to 10x the design value). f1c13fb6871 Re-enable CI steps 913f947f19a Fix error? 3c21bf713a8 Install pip 3afc9dcf5c0 Install sphinx f8e4ca4a4c8 Merge branch 'master' of https://github.com/NREL/OpenStudio-HPXML into docs_on_ci f0844a93edf Merge pull request #978 from NREL/leap-year-handling 040604b1106 Add newline 44224c6a06d Build docs on CI, store as artifacts. 852a36d084f Update workflow test file. bc8ccc2868a Update error test. 219cbdfee88 Update error message. 69319016c04 Update changelog and docs. f782deade4f Merge branch 'master' into leap-year-handling 39686d9de69 Merge pull request #980 from NREL/report-sim-output-setup-fix a80a1f25183 Return if model not retrieved. 6dd4463a51d Latest results. 335370e3c0b Update defaults header test. bbd8c3bcc1a Add test for throwing error. 24936969f9e Throw error if leap year with TMY. 14519b064f5 Clean up translator measure. c7a8ad6a2ef Change to non leap year in test xml file. 6a1ea0ad3ba Update README.md git-subtree-dir: resources/hpxml-measures git-subtree-split: 75de13f619e9b4b98f7f33889caf8de406f6ea07
Pull Request Description
(Proof refers to the assertions in the code that compares new samples to old samples)
Adds python version of run_sampling.rb
Simplifies sample calculation (for modest speed boost) in run_sampling.rb.
Checklist
Not all may apply:
openstudio tasks.rb update_measures
has been run