[v0/v1 migration] /bulk/info/variable-group#6199
[v0/v1 migration] /bulk/info/variable-group#6199nick-nlb merged 17 commits intodatacommonsorg:masterfrom
Conversation
…s) for the NL paths that use v1/bulk/info/variable-group
Introduces a dedicated execution flow for the V2 API. To mitigate potential latency, this new process is designed with an initial, capped tree traversal (up to MAX_V2_SVS) that collects all descendant variables first. This allows the definitions to be fetched in a single batch API call rather than the level-by-level network calls used in V1.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request migrates the /bulk/info/variable-group endpoint to V2, ensuring compatibility with the existing natural language search infrastructure. Since the V2 endpoint does not provide variable definitions by default, the change introduces a robust mechanism to fetch these definitions independently. This ensures that critical features like sibling variable discovery remain functional during the transition. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces V2 API support for variable extension and StatVarGroup expansion, featuring a new batched fetching mechanism and a get_variable_definitions service function. Feedback focuses on performance optimizations, such as using sets to avoid redundant API calls and defining static mappings as module-level constants. Additionally, the reviewer pointed out unreachable code in the legacy path and several indentation inconsistencies that should be corrected for better code quality.
n-h-diaz
left a comment
There was a problem hiding this comment.
Thanks!
I had a few comments on some alternative api fetches to consider, but not blocking for this PR
| all_sibling_child_svs = set() | ||
| for item in svg_siblings_info.get('data', []): | ||
| for c in item.get('info', {}).get('childStatVars', []): | ||
| if 'id' in c: |
There was a problem hiding this comment.
If we're only using child stat vars from the variable group info response in the new path, have you tried using node with "memberOf", for example https://api.datacommons.org/v2/node?key=&nodes=dc/g/Demographics&property=%3C-memberOf
(not sure if this would be any faster now, but could help when moving to spanner)
There was a problem hiding this comment.
I had a go at this but had trouble reproducing the same results (not a question of latency). I'll go with the current version for now but revisit after!
Issue
[b/491885197] (https://b.corp.google.com/issues/491885197)
Description
This PR implements the migration of
v1/bulk/info/variable-grouptov2.The change is gated behind the
use_v2_apiflag.Notes
The core of the migration is very simple: a flag-mediated gate that determines which endpoint is called.
The complication comes (that makes up most of the diff) comes from the fact that the
v2endpoint is no longer able to provide adefinitionalong with each stat var. This definition is used (relatively rarely) in the natural language search, in order to find sibling stat vars to provide further exploration topics for the user.Because this is no longer available in the
v2endpoint, the functionality had to be reconstructed via direct v2 calls.A discussion of the methodology, testing, fidelity and latency considerations of this can be found at this link. (Message if access is required).
This document describes how the
definitionfunctionality is used and analyses the latency and fidelity implications of moving that functionality to Flask andv2.Testing
There are two aspects of the NL search that are affected by the "definitions". These are described in the document as Flow 1 and Flow2.
Flow 1 is rarely invoked, but can be seen in the following query (which should produce the same results for "Related" charts populated at the bottom of the results section).
Flow 2 is much more common, and is the primary driver of latency discrepancies between
v1andv2.This flow is invoked on a standard query such as:
Query
Goldens
This PR also includes explicit directives to the integration tests to use v1. The goldens would have to be regenerated for v2 at some point before the feature flag is dropped.