Skip to content

Validate uniqueness of column names in fieldConfigList during table config validation#18211

Open
anshul98ks123 wants to merge 1 commit intoapache:masterfrom
anshul98ks123:fix/validate-duplicate-fieldconfig-names
Open

Validate uniqueness of column names in fieldConfigList during table config validation#18211
anshul98ks123 wants to merge 1 commit intoapache:masterfrom
anshul98ks123:fix/validate-duplicate-fieldconfig-names

Conversation

@anshul98ks123
Copy link
Copy Markdown
Contributor

Issue(s)

TableConfigUtils.validateIndexingConfigAndFieldConfigList does not check for duplicate column names in fieldConfigList. A malformed config with two FieldConfig entries for the same column passes validation successfully but crashes downstream in AbstractIndexType.convertToNewFormat, where Collectors.toMap throws IllegalStateException: Duplicate key.

This adds a duplicate column name check during validation so that invalid configs are rejected early with a clear error message.

Root Cause

In validateIndexingConfigAndFieldConfigList, the existing loop iterates over fieldConfigs to verify schema presence and compression codec compatibility, but never checks whether two entries share the same column name:

for (FieldConfig fieldConfig : fieldConfigs) {
    String column = fieldConfig.getName();
    Preconditions.checkState(schema.hasColumn(column), ...);
    // No duplicate name check
}

Downstream consumers like AbstractIndexType.convertToNewFormat use Collectors.toMap(FieldConfig::getName, ...) which has zero tolerance for duplicate keys and crashes with an opaque IllegalStateException.

Fix

Add a Set<String> to track seen column names and reject duplicates:

Set<String> seenColumns = new HashSet<>();
for (FieldConfig fieldConfig : fieldConfigs) {
    String column = fieldConfig.getName();
    Preconditions.checkState(seenColumns.add(column),
        "Duplicate FieldConfig for column: %s", column);
    Preconditions.checkState(schema.hasColumn(column), ...);
}

Test Plan

Added 5 new tests in TableConfigUtilsTest covering: duplicate same encoding, duplicate different encoding, duplicate with indexes, and regression tests for distinct columns and single entry. All existing tests continue to pass.


TableConfigUtils.validateIndexingConfigAndFieldConfigList did not check for
duplicate column names in fieldConfigList. A config with two FieldConfig
entries for the same column would pass validation but crash downstream in
AbstractIndexType.convertToNewFormat (Collectors.toMap throws
IllegalStateException: Duplicate key).

Fix: add a duplicate column name check in the validation loop.
Made-with: Cursor
@anshul98ks123 anshul98ks123 changed the title [pinot-segment-local] Validate uniqueness of column names in fieldConfigList during table config validation Validate uniqueness of column names in fieldConfigList during table config validation Apr 15, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.38%. Comparing base (9a580dd) to head (777d1ab).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18211      +/-   ##
============================================
+ Coverage     63.35%   63.38%   +0.02%     
  Complexity     1627     1627              
============================================
  Files          3238     3238              
  Lines        197000   197007       +7     
  Branches      30464    30464              
============================================
+ Hits         124807   124870      +63     
+ Misses        62196    62131      -65     
- Partials       9997    10006       +9     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.33% <100.00%> (+0.02%) ⬆️
java-21 63.34% <100.00%> (+0.02%) ⬆️
temurin 63.38% <100.00%> (+0.02%) ⬆️
unittests 63.38% <100.00%> (+0.02%) ⬆️
unittests1 55.31% <100.00%> (+<0.01%) ⬆️
unittests2 35.00% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@shauryachats shauryachats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gortiz
Copy link
Copy Markdown
Contributor

gortiz commented Apr 16, 2026

I'm not sure whether this is safe to merge. My main concern is the effect this may have on previously existing tables. I analyzed the code, and it seems this is only called when using the REST methods that update or validate tables and not when tables are refreshed or committed, so I guess it is fine. But do we really need this? Couldn't we make convertToNewFormat more resilient instead?

@anshul98ks123
Copy link
Copy Markdown
Contributor Author

I'm not sure whether this is safe to merge. My main concern is the effect this may have on previously existing tables. I analyzed the code, and it seems this is only called when using the REST methods that update or validate tables and not when tables are refreshed or committed, so I guess it is fine. But do we really need this? Couldn't we make convertToNewFormat more resilient instead?

convertToNewFormat already throws error when duplicate entries exist in fieldConfigList. We are still validating such configs are true, which seems troublesome since it'll create unnecessary problems. Any config with multiple entries for same column in fieldConfigList should be rejected imho

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants