Conversation
| 'county_ansi', | ||
| 'end_code', | ||
| 'group_desc', | ||
| ] |
There was a problem hiding this comment.
are we not dropping these columns anymore? or they're handled differently now?
|
@bl-young do we know why there are very minor differences in USDA_CoA_Cropland? and should we be concerned about it? |
| df['DataCollection'] = 2 | ||
|
|
||
| # Keep only necessary columns | ||
| df = df[flow_by_activity_fields] |
There was a problem hiding this comment.
@MoLi7 here at the end we filter the whole df by just the necessary fields instead of removing columns by name throughout
I am attaching the diff file df_m_usda_coa.csv Note the full dataset has ~198,000 data points. I'm not overly concerned about it. I don't know if it will have any impact on the GHG FBS. @catherinebirney is a bit more familiar with the nuances of the dataset and might have some thoughts. |
Ah, yes it does lead to a very small adjustment in the allocation of N2O in the cropland activity set (from Table 5-18) |
Thanks Ben, for sharing the diff. I trust your judgement, but for my own understanding, @catherinebirney I'm curious for your thoughts. |
|
These code changes would not impact the results of the FBA generation, so the numbers were likely updated in NASS Quickstats. Unfortunately, USDA does not publish release dates or announce when they make changes to the available data. Since the changes in results are only for woody crops and sugarcane, it seems likely that the data was updated. The new data for corn and small grains are at the national level and replace 0 values, so those aren't updated/recalculated numbers. It looks like they added missing aggregated numbers. So, agree with @bl-young - nothing to be concerned about |
|
I noticed that some of them seem to be where the new data has more distinct records, but the sum total is the same, e.g.,: 6 new sugarcane records that add up to the original value. Any idea what could be going on there @catherinebirney ? |
|
Great - that fixed the sugarcane issue. The latest dataset also seems to have data at the national level (only) for bedrock/bedrock/extract/usda/USDA_CoA.py Lines 260 to 264 in 0a58648 |
|
ooo and I found this: https://www.nass.usda.gov/Corrections/?log=QS |
catherinebirney
left a comment
There was a problem hiding this comment.
Approving this PR as is - will address the changes to the USDA COA Cropland FBA separately, as those data differences are due to changes within the published source data, rather than caused by any code within this repo.
Documenting the issue in #136


cc: @catherinebirney
Closes: #80
What changed? Why?
USDA_CoA_Cropland, USDA_CoA_Cropland_NAICS, and USDA_CoA_Livestock load csv files from GCS during FBA generation. To generate raw csv files during the API call, set
load_from_gcsto False.csv files for USDA_CoA_Cropland and USDA_CoA_Cropland_NAICS added to GCS for 2022:
gs://cornerstone-default/ceda-usa/input/USDA_2022/*USDA_CoA_Livestock not used in GHG method so not yet evaluated.Some consolidation of the parsing functions was performed. USDA_CoA.py removed from the exceptions for linting.
Testing
Adds new test
test_generate_fba_compare_to_remote()to facilitate review of newly generated FBA files.USDA_CoA_Cropland_NAICS is identical, but very minor differences in USDA_CoA_Cropland.