Added seed database ingestion helper#2010
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new seed_database action to the ingestion helper, which populates the Spanner database with essential base nodes for the Data Commons schema. The implementation includes updates to the README, a new handler in main.py, and the core logic in spanner_client.py, supported by new unit tests. A review comment suggests refactoring the seed_database method to improve maintainability by deriving the subject list from the keys of the candidates dictionary, thereby avoiding redundancy.
gmechali
left a comment
There was a problem hiding this comment.
thanks Dan, LGTM!
TBH I'm not sure if we're missing some of the nodes required. I have a doubt on whether we need dc/g/Root - very liekly but not sure. But we can add it later if confirmed.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Thanks Gabe! And that sounds good. we can modify this list as we go |
This pull request introduces a new
seed_databaseaction to the ingestion helper service, which seeds the Spanner database with essential base nodes required by the Data Commons schema. The implementation includes the new action handler, the underlying logic in the Spanner client, and comprehensive tests to ensure correct behavior.New Feature: Database Seeding
seed_databaseaction to the ingestion helper API, which seeds the Spanner database with base nodes such asStatisticalVariable,StatVarGroup,StatVarObservation,Topic, andc/g/Root. This action ensures that the database contains the minimum required schema nodes for Data Commons operations. [1] [2] [3]Testing
main_test.pyto verify that theseed_databaseaction is handled correctly and that the Spanner client’sseed_databasemethod is called as expected.spanner_client_test.pyto ensure that theseed_databasemethod inserts base nodes when missing and does not insert duplicates if the nodes already exist.Documentation
README.mdto document the newseed_databaseaction, including its purpose and usage.