Skip to content

Update GCS pull for USDA#132

Merged
bl-young merged 10 commits intomainfrom
by_usda
Jan 30, 2026
Merged

Update GCS pull for USDA#132
bl-young merged 10 commits intomainfrom
by_usda

Conversation

@bl-young
Copy link
Copy Markdown
Contributor

cc: @catherinebirney
Closes: #80

What changed? Why?

USDA_CoA_Cropland, USDA_CoA_Cropland_NAICS, and USDA_CoA_Livestock load csv files from GCS during FBA generation. To generate raw csv files during the API call, set load_from_gcs to False.

csv files for USDA_CoA_Cropland and USDA_CoA_Cropland_NAICS added to GCS for 2022: gs://cornerstone-default/ceda-usa/input/USDA_2022/* USDA_CoA_Livestock not used in GHG method so not yet evaluated.

Some consolidation of the parsing functions was performed. USDA_CoA.py removed from the exceptions for linting.

Testing

Adds new test test_generate_fba_compare_to_remote() to facilitate review of newly generated FBA files.
USDA_CoA_Cropland_NAICS is identical, but very minor differences in USDA_CoA_Cropland.

@bl-young bl-young linked an issue Jan 28, 2026 that may be closed by this pull request
@bl-young bl-young requested a review from MoLi7 January 28, 2026 02:48
'county_ansi',
'end_code',
'group_desc',
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we not dropping these columns anymore? or they're handled differently now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see here

@MoLi7
Copy link
Copy Markdown
Member

MoLi7 commented Jan 28, 2026

@bl-young do we know why there are very minor differences in USDA_CoA_Cropland? and should we be concerned about it?

Comment thread bedrock/extract/usda/USDA_CoA.py Outdated
df['DataCollection'] = 2

# Keep only necessary columns
df = df[flow_by_activity_fields]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MoLi7 here at the end we filter the whole df by just the necessary fields instead of removing columns by name throughout

@bl-young
Copy link
Copy Markdown
Contributor Author

bl-young commented Jan 28, 2026

@bl-young do we know why there are very minor differences in USDA_CoA_Cropland? and should we be concerned about it?

I am attaching the diff file df_m_usda_coa.csv

Note the full dataset has ~198,000 data points. I'm not overly concerned about it. I don't know if it will have any impact on the GHG FBS. @catherinebirney is a bit more familiar with the nuances of the dataset and might have some thoughts.

@bl-young
Copy link
Copy Markdown
Contributor Author

Note the full dataset has ~198,000 data points. I'm not overly concerned about it. I don't know if it will have any impact on the GHG FBS. @catherinebirney is a bit more familiar with the nuances of the dataset and might have some thoughts.

Ah, yes it does lead to a very small adjustment in the allocation of N2O in the cropland activity set (from Table 5-18)

@MoLi7
Copy link
Copy Markdown
Member

MoLi7 commented Jan 28, 2026

@bl-young do we know why there are very minor differences in USDA_CoA_Cropland? and should we be concerned about it?

I am attaching the diff file df_m_usda_coa.csv

Note the full dataset has ~198,000 data points. I'm not overly concerned about it. I don't know if it will have any impact on the GHG FBS. @catherinebirney is a bit more familiar with the nuances of the dataset and might have some thoughts.

Thanks Ben, for sharing the diff. I trust your judgement, but for my own understanding, @catherinebirney I'm curious for your thoughts.

@catherinebirney
Copy link
Copy Markdown
Contributor

These code changes would not impact the results of the FBA generation, so the numbers were likely updated in NASS Quickstats. Unfortunately, USDA does not publish release dates or announce when they make changes to the available data. Since the changes in results are only for woody crops and sugarcane, it seems likely that the data was updated.

The new data for corn and small grains are at the national level and replace 0 values, so those aren't updated/recalculated numbers. It looks like they added missing aggregated numbers.

So, agree with @bl-young - nothing to be concerned about

@bl-young
Copy link
Copy Markdown
Contributor Author

I noticed that some of them seem to be where the new data has more distinct records, but the sum total is the same, e.g.,:
image

6 new sugarcane records that add up to the original value. Any idea what could be going on there @catherinebirney ?

@catherinebirney
Copy link
Copy Markdown
Contributor

There is an error in our code pulling data from QuickStats. Looks like we are pulling in typology data in addition to the the acreage data we want.

I'll modify our FBA code.

image

@bl-young
Copy link
Copy Markdown
Contributor Author

Great - that fixed the sugarcane issue. The latest dataset also seems to have data at the national level (only) for CORN which is a sum of the existing CORN, GRAIN and CORN, SILAGE. Is that worth dropping? Similarly, a new SMALL GRAINS, WHEAT & BARLEY & OATS & RYE is added. And they seemed to have changed from SHORT TERM WOODY CROPS to SHORT TERM WOODY TREES. The former shows up in the old dataset, but the new one does not show up in the new dataset.

# horticulture: only want a few commodities
df_h = df[df['group_desc'] == 'HORTICULTURE']
df_h = df_h[
df_h['commodity_desc'].isin(['CUT CHRISTMAS TREES', 'SHORT TERM WOODY CROPS'])
]

@bl-young
Copy link
Copy Markdown
Contributor Author

ooo and I found this: https://www.nass.usda.gov/Corrections/?log=QS

Copy link
Copy Markdown
Contributor

@catherinebirney catherinebirney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR as is - will address the changes to the USDA COA Cropland FBA separately, as those data differences are due to changes within the published source data, rather than caused by any code within this repo.

Documenting the issue in #136

@bl-young bl-young merged commit 53408a8 into main Jan 30, 2026
4 checks passed
@bl-young bl-young deleted the by_usda branch January 30, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

evaluate FBA generation for USDA_CoA data

3 participants