Remove v2 post bulk/info/variable-group endpoint definition construction#6258
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request streamlines the variable group information retrieval process by removing the client-side construction of variable definitions for the v2/bulk/info/variable-group endpoint. With the backend now providing these definitions directly, the frontend logic has been simplified to use a unified path for both v1 and v2, reducing complexity and removing redundant API calls. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request removes V2-specific variable extension logic and associated feature flags, simplifying the extend_svs and open_svgs functions. The review feedback identifies a critical performance regression where batched network calls for indirect siblings were replaced by sequential queries inside a loop, creating an N+1 query problem. Furthermore, several instances of unsafe dictionary access were found that could lead to KeyError exceptions, and the reviewer recommends restoring and updating the deleted test suite to ensure the refactored logic remains correct.
I am having trouble creating individual review comments. Click here to see my feedback.
server/lib/nl/common/variable.py (180-188)
This refactoring introduces a significant performance regression by removing the batching logic for fetching indirect siblings. The previous implementation used a multi-pass approach with _fetch_indirect_siblings to perform network calls in batches for all candidate SVs. The current implementation performs three separate network calls (property_values twice and get_variable_group_info once) sequentially inside a loop for each SV requiring indirect expansion. This will lead to an N+1 query problem and significantly increase latency when multiple SVs are processed. Please consider restoring the batched fetching logic.
server/tests/lib/nl/variable_test.py (1-184)
Removing this test file is not recommended. Although the PR removes the frontend definition calculation logic, the core functionality of extend_svs (finding direct and indirect siblings) is still present and has been heavily refactored. These tests provide critical verification for the variable extension logic and should be updated to match the new implementation rather than being deleted.
server/lib/nl/common/variable.py (170)
Using direct dictionary access svg2childsvs[svg] may raise a KeyError if the get_variable_group_info call (on line 154) did not return data for that specific SVG node. It is safer to use .get(svg, []), which was the pattern used in the previous implementation.
for child_sv in svg2childsvs.get(svg, []):
server/lib/nl/common/variable.py (180)
Accessing the result of property_values directly with [svg] may raise a KeyError if the node is not found in the response. It's safer to use .get(svg, []).
svg_parents = fetch.property_values([svg], "specializationOf", True).get(svg, [])
server/lib/nl/common/variable.py (184-185)
Similar to line 180, accessing the result of property_values directly with [svg_parent] may raise a KeyError.
svg_siblings = fetch.property_values([svg_parent], "specializationOf",
False).get(svg_parent, [])
server/lib/nl/common/variable.py (189)
It is safer to check if data exists in svg_siblings_info before iterating, similar to the check on line 155.
server/lib/nl/common/variable.py (196)
Similar to line 170, using svg2childsvs.get(svg, []) is safer here to avoid a potential KeyError.
for new_sv_info in svg2childsvs.get(svg, []):
…lls that require a definitions and those that don't.
Issue
b/507068100
Related PRs
Mixer: 1877
If the above PR is not yet merged into master, you should have the above branch active in your local mixer.
Description
With Spanner soon to be supporting definitions directly from the
v2/bulk/info/variable-groupendpoint, the costly frontend definitions calculations are now removed. Bothv1andv2operate on the same path, under the assumptions that the definitions will be provided.This PR involves a significant reversion of Flask-side definition construction. However, major improvements introduced during the initial work on the
v2definitions have been kept, most notably query batching to remove an N+1 that had always existed in the originalv1path.Notes
With the addition of the
include_definitionsparameter to the Spanner call (see 1877 linked above), we can now selectively determine which calls to thev2/bulk/info/variable-groupendpoint calls return definitions. This is done for the sake of latency: calls that require definitions take a non-trivial amount of additional time and compute to resolve, particularly when the call also requests additional filtering.We take the following approach:
variable-groupendpoint (used only by the hierarchy) never need definitions, and so always exclude them.Testing
Explore searches should provide largely similar results through either Spanner or BT backed mixers .
Results may not be exactly the same. For example, in the following search, the first graph provides "Language Spoken at Home" as an explore more option from BT, but does not from Spanner (you can see this below using production for the BT version, but the same result will happen locally).
This is due to the backfilled Spanner definitions being different in some cases (i.e., in the case below, the base and comparison SVs differ by more than one constraint in the Spanner definitions). The root of this discrepancy is unknown, but it is not due to the functionality of this PR.
Local search: Population of the United States
Production search: Population of the United States
Merging Notes
🚨 This PR should not be merged in until after the above linked 1877 PR in Mixer