Reproducibility fix by exporting all LCT for a given polygon#5
Merged
yosukefk merged 3 commits intoNCAR:reproducibilityfrom Sep 30, 2019
yosukefk:export_all_lct
Merged
Reproducibility fix by exporting all LCT for a given polygon#5yosukefk merged 3 commits intoNCAR:reproducibilityfrom yosukefk:export_all_lct
yosukefk merged 3 commits intoNCAR:reproducibilityfrom
yosukefk:export_all_lct
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
FINN preprocessor was not able to reproduce the results completely.
Issue
In global scale, emisiosn wold be off by ~0.001%. ~0.3% of individual fire may have discrepancy in emission estimate by >1% across runs with identical input code. ~0.2% if individual fire may had >50% discrepancy. Table below shows count of polygons counted with discrepany threshhold (discrepancy dfined as ratio of greater estimate divided by smaller estiamte from two runs).
Table 1. Count of polygon with large discrepancy between two idencal annual, continental scale runs.
It was found that the cause of this discrepancy is mostly, if not completely, attributed to the use of majority LCT to represent polygon, ignoring contribution of other LCT in the polygon. This is not only inaccurate but also causes occasinal reproducibility problem when two or more LCT has identical number of pixel contributing to a given polygon. In such case PostgreSQL's sort algorthm has undefined behavior for which LCT be the majority LCT, and it may differ from one run to another.
Fix
To resolve this issue, all the LCT from a given polygon is exported, not only the majority LCT. In other words, multiple records are generated for single subdivided fire polygons if multiple LCT is found in the polygon. When emission is estimated, emisison estimate from different LCT for the same polygon are weight averaged using fractional land cover by LCT.
With this fix, global emission estimate was matching by 7 digits (0.000001% error), and only 8 out of 800,000 polygons any difference in estimate (none had >1% discrepancy).
Number of records for processing was increased by 56% for this example case (annual, continuous application). Code is not going to make redundant calculation with this increased records, it simply export more records. So overhead would be by increased I/O, and expected to be minor. Subsequent emission model needs to process increased number of records which is expected to be proportinal to number of increase in recrods.
Summary
Wit this fix: