Reintroduce type renaming with more resilient handling for unexpected protobuf names #3770

alecgrieser · 2025-11-21T11:54:21Z

This reintroduces protobuf type and field renaming. The first version was added in #3696 and #3706 and then taken out by #3726. A second version was introduced with #3736 and then removed by #3767. This adds back a variation based on the data from #3736 but updated to be more resilient to unexpected names in existing meta-data objects.

The issue with #3736 before is that if a type existed in the meta-data that was not correctly escaped (e.g., Type__Blah instead of Type__0Blah), then it would fail to match the type during querying. This was a problem even if the type wasn't actually involved in a query because of how matching worked on the FullUnorderedScanExpression, meaning that any query would fail to plan if any type in the meta-data was so written.

This makes things more resilient. We now do a bit more work to associate a type with its original name from the protobuf file if one is provided, recording both the user-visible name and the original storage name. The only places that we now generate new protobuf compliant names is when we construct a Type object. In all other cases, we only go from the storage name to user-visible names.

We still do rely on the fact that we can correctly predict the expected user-visible name by running the de-escaping logic. At some point, we may need to have a more complicated mapping, especially if we want to support more arbitrary names. That is left as future work. I could also see us wanting to do a bit more refactoring to better encapsulate this transformation.

The new test modifications made to valid-identifiers.yamsql cover those cases by adding new types with names that would not have been generated by any DDL statement, and then validating that (1) those do not disrupt correctly constructed queries and (2) that the problematic types can themselves be queried.

In addition, this addresses some shortcomings with the match candidates where FieldKeyExpressions (which use the internal names) would sometimes be used to generate match candidates which referenced the internal name directly. This fixes that by plugging those gaps. There are additional queries in valid-identifiers.yamsql that are designed to cover those matches.

… names (FoundationDB#3736)" (FoundationDB#3767) This reverts commit 7f1f4c9.

This adds support for retaining the protobuf names more directly for types and fields. This can happen if the user has created a meta-data proto and used a strategy for naming that differs from the one that would have been generated by our own DML. The basic strategy is to: 1. Continue to always apply the `toProtoUtils` method to produce plausible user-generated names but 1. Retain the original protobuf name in the `Type` information and then use that to get the name used to access data in the field

… that code in the RecordMetadataDeserializer with logic in the Type system

- make sure to convert FieldKeyExpression#fieldName (which is internal) to user-facing name when constructing match candidates. - also, add tests for deeply nested (and repeated) structures with non-pb-compliant field names, and an index.

…matches an index

… can match the indexes in the same circumstances as cases with non-escaped identifiers

…ializing RecordLayerSchemaTemplate. (#3)

hatyo

Nice work, LGTM, but I would leave the final judgement to @normen662.

…SerDeTests

github-actions · 2025-11-27T13:10:29Z

📊 Metrics Diff Analysis Report

Summary

New queries: 76
Dropped queries: 0
Plan changed + metrics changed: 0
Plan unchanged + metrics changed: 0

ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

New queries: Queries added in this PR
Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
Plan changed + metrics changed: The query plan has changed along with planner metrics.
Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

yaml-tests/src/test/resources/in-predicate.metrics.yaml: 4
yaml-tests/src/test/resources/valid-identifiers.metrics.yaml: 72

alecgrieser and others added 6 commits November 20, 2025 16:11

Reapply "Reintroduce translation of identifiers to Protobuf compliant…

3f583ee

… names (FoundationDB#3736)" (FoundationDB#3767) This reverts commit 7f1f4c9.

Add additional tests of the field name preservation logic and replace…

0e76f6a

… that code in the RecordMetadataDeserializer with logic in the Type system

Address normen662 comments.

23da0ae

- make sure to convert FieldKeyExpression#fieldName (which is internal) to user-facing name when constructing match candidates. - also, add tests for deeply nested (and repeated) structures with non-pb-compliant field names, and an index.

Update valid identifier tests of query on exploded fields so that it …

535ad2f

…matches an index

Add tests for scalar repeateds with weird identifiers and validate it…

5cb9176

… can match the indexes in the same circumstances as cases with non-escaped identifiers

alecgrieser added the bug fix Change that fixes a bug label Nov 24, 2025

addreess spotbugs failure ; add documentation

c680900

alecgrieser marked this pull request as ready for review November 24, 2025 15:09

adjust teamscale findings

168fc7d

hatyo mentioned this pull request Nov 25, 2025

WIP Reintroduce Protobuf renaming, and a fix for deprecated types #3768

Closed

be more explicit about the storage name in RecordMetadataDeserializer

c2ed143

alecgrieser requested a review from g31pranjal November 25, 2025 11:49

alecgrieser and others added 2 commits November 25, 2025 11:58

tag yamsql files with unexpected plans with relevant issue

c172556

add test that verifies converting pb-names to user-defined when deser…

4077de6

…ializing RecordLayerSchemaTemplate. (#3)

hatyo reviewed Nov 27, 2025

View reviewed changes

Add test for malformed escape sequences in RecordLayerScheamaTemplate…

5d212bc

…SerDeTests

alecgrieser force-pushed the rereintroduce-type-renaming branch from ad4d344 to 5d212bc Compare November 27, 2025 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reintroduce type renaming with more resilient handling for unexpected protobuf names #3770

Reintroduce type renaming with more resilient handling for unexpected protobuf names #3770

alecgrieser commented Nov 21, 2025 •

edited

Loading

Uh oh!

hatyo left a comment

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Reintroduce type renaming with more resilient handling for unexpected protobuf names #3770

Are you sure you want to change the base?

Reintroduce type renaming with more resilient handling for unexpected protobuf names #3770

Conversation

alecgrieser commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hatyo left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 27, 2025

📊 Metrics Diff Analysis Report

Summary

New Queries

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alecgrieser commented Nov 21, 2025 •

edited

Loading