Skip to content

Reproducibility fix by exporting all LCT for a given polygon#5

Merged
yosukefk merged 3 commits intoNCAR:reproducibilityfrom
yosukefk:export_all_lct
Sep 30, 2019
Merged

Reproducibility fix by exporting all LCT for a given polygon#5
yosukefk merged 3 commits intoNCAR:reproducibilityfrom
yosukefk:export_all_lct

Conversation

@yosukefk
Copy link
Copy Markdown
Collaborator

FINN preprocessor was not able to reproduce the results completely.

Issue

In global scale, emisiosn wold be off by ~0.001%. ~0.3% of individual fire may have discrepancy in emission estimate by >1% across runs with identical input code. ~0.2% if individual fire may had >50% discrepancy. Table below shows count of polygons counted with discrepany threshhold (discrepancy dfined as ratio of greater estimate divided by smaller estiamte from two runs).

Table 1. Count of polygon with large discrepancy between two idencal annual, continental scale runs.

Threshold Ratio (greater/smaller) >= 1 > 1.00001 > 1.01 > 1.1 > 1.2 > 1.5 > 2 > 5
Count 801,984 2,775 2702 1899 1888 1385 794 482
Frequency 100% 0.35% 0.34% 0.24% 0.24% 0.17% 0.10% 0.06%

It was found that the cause of this discrepancy is mostly, if not completely, attributed to the use of majority LCT to represent polygon, ignoring contribution of other LCT in the polygon. This is not only inaccurate but also causes occasinal reproducibility problem when two or more LCT has identical number of pixel contributing to a given polygon. In such case PostgreSQL's sort algorthm has undefined behavior for which LCT be the majority LCT, and it may differ from one run to another.

Fix

To resolve this issue, all the LCT from a given polygon is exported, not only the majority LCT. In other words, multiple records are generated for single subdivided fire polygons if multiple LCT is found in the polygon. When emission is estimated, emisison estimate from different LCT for the same polygon are weight averaged using fractional land cover by LCT.

With this fix, global emission estimate was matching by 7 digits (0.000001% error), and only 8 out of 800,000 polygons any difference in estimate (none had >1% discrepancy).

Number of records for processing was increased by 56% for this example case (annual, continuous application). Code is not going to make redundant calculation with this increased records, it simply export more records. So overhead would be by increased I/O, and expected to be minor. Subsequent emission model needs to process increased number of records which is expected to be proportinal to number of increase in recrods.

Summary

Wit this fix:

  • Results are reproducible
  • Results are more accurate
  • Slight overhead in time for processing by increased I/O.

@yosukefk yosukefk merged commit ad9706d into NCAR:reproducibility Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant