Skip to content

[v0/v1 migration] /bulk/info/variable-group#6199

Merged
nick-nlb merged 17 commits intodatacommonsorg:masterfrom
nick-nlb:v2_migration_bulk_info_variable_group_nl
Apr 14, 2026
Merged

[v0/v1 migration] /bulk/info/variable-group#6199
nick-nlb merged 17 commits intodatacommonsorg:masterfrom
nick-nlb:v2_migration_bulk_info_variable_group_nl

Conversation

@nick-nlb
Copy link
Copy Markdown
Contributor

Issue

[b/491885197] (https://b.corp.google.com/issues/491885197)

Description

This PR implements the migration of v1/bulk/info/variable-group to v2.

The change is gated behind the use_v2_api flag.

Notes

The core of the migration is very simple: a flag-mediated gate that determines which endpoint is called.

The complication comes (that makes up most of the diff) comes from the fact that the v2 endpoint is no longer able to provide a definition along with each stat var. This definition is used (relatively rarely) in the natural language search, in order to find sibling stat vars to provide further exploration topics for the user.

Because this is no longer available in the v2 endpoint, the functionality had to be reconstructed via direct v2 calls.

A discussion of the methodology, testing, fidelity and latency considerations of this can be found at this link. (Message if access is required).

This document describes how the definition functionality is used and analyses the latency and fidelity implications of moving that functionality to Flask and v2.

Testing

There are two aspects of the NL search that are affected by the "definitions". These are described in the document as Flow 1 and Flow2.

Flow 1 is rarely invoked, but can be seen in the following query (which should produce the same results for "Related" charts populated at the bottom of the results section).

Flow 2 is much more common, and is the primary driver of latency discrepancies between v1 and v2.

This flow is invoked on a standard query such as:

Query

Goldens

This PR also includes explicit directives to the integration tests to use v1. The goldens would have to be regenerated for v2 at some point before the feature flag is dropped.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request migrates the /bulk/info/variable-group endpoint to V2, ensuring compatibility with the existing natural language search infrastructure. Since the V2 endpoint does not provide variable definitions by default, the change introduces a robust mechanism to fetch these definitions independently. This ensures that critical features like sibling variable discovery remain functional during the transition.

Highlights

  • V2 API Migration: Implemented migration of /bulk/info/variable-group to V2, gated behind the use_v2_api feature flag.
  • Definition Reconstruction: Reconstructed missing variable definition functionality in V2 by adding a new helper to fetch definitions directly via V2 calls.
  • Indirect Sibling Expansion: Enhanced variable extension logic to support indirect sibling discovery when direct definitions are unavailable in V2.
  • Testing: Added comprehensive unit tests for variable extension logic and updated integration tests to explicitly use V1 for existing goldens.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces V2 API support for variable extension and StatVarGroup expansion, featuring a new batched fetching mechanism and a get_variable_definitions service function. Feedback focuses on performance optimizations, such as using sets to avoid redundant API calls and defining static mappings as module-level constants. Additionally, the reviewer pointed out unreachable code in the legacy path and several indentation inconsistencies that should be corrected for better code quality.

Comment thread server/lib/nl/common/variable.py Outdated
Comment thread server/lib/nl/common/variable.py Outdated
Comment thread server/lib/nl/common/variable_group.py Outdated
Comment thread server/lib/nl/common/variable_group.py Outdated
Comment thread server/lib/nl/common/variable_group.py
Comment thread server/services/datacommons.py Outdated
@nick-nlb nick-nlb marked this pull request as ready for review April 13, 2026 00:35
@nick-nlb nick-nlb requested a review from n-h-diaz April 13, 2026 00:36
Copy link
Copy Markdown
Contributor

@n-h-diaz n-h-diaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I had a few comments on some alternative api fetches to consider, but not blocking for this PR

all_sibling_child_svs = set()
for item in svg_siblings_info.get('data', []):
for c in item.get('info', {}).get('childStatVars', []):
if 'id' in c:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're only using child stat vars from the variable group info response in the new path, have you tried using node with "memberOf", for example https://api.datacommons.org/v2/node?key=&nodes=dc/g/Demographics&property=%3C-memberOf
(not sure if this would be any faster now, but could help when moving to spanner)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a go at this but had trouble reproducing the same results (not a question of latency). I'll go with the current version for now but revisit after!

Comment thread server/services/datacommons.py Outdated
@nick-nlb nick-nlb merged commit d0657d2 into datacommonsorg:master Apr 14, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants