Validate uniqueness of column names in fieldConfigList during table config validation#18211
Conversation
TableConfigUtils.validateIndexingConfigAndFieldConfigList did not check for duplicate column names in fieldConfigList. A config with two FieldConfig entries for the same column would pass validation but crash downstream in AbstractIndexType.convertToNewFormat (Collectors.toMap throws IllegalStateException: Duplicate key). Fix: add a duplicate column name check in the validation loop. Made-with: Cursor
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18211 +/- ##
============================================
+ Coverage 63.35% 63.38% +0.02%
Complexity 1627 1627
============================================
Files 3238 3238
Lines 197000 197007 +7
Branches 30464 30464
============================================
+ Hits 124807 124870 +63
+ Misses 62196 62131 -65
- Partials 9997 10006 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I'm not sure whether this is safe to merge. My main concern is the effect this may have on previously existing tables. I analyzed the code, and it seems this is only called when using the REST methods that update or validate tables and not when tables are refreshed or committed, so I guess it is fine. But do we really need this? Couldn't we make convertToNewFormat more resilient instead? |
convertToNewFormat already throws error when duplicate entries exist in fieldConfigList. We are still validating such configs are true, which seems troublesome since it'll create unnecessary problems. Any config with multiple entries for same column in fieldConfigList should be rejected imho |
Issue(s)
TableConfigUtils.validateIndexingConfigAndFieldConfigListdoes not check for duplicate column names infieldConfigList. A malformed config with twoFieldConfigentries for the same column passes validation successfully but crashes downstream inAbstractIndexType.convertToNewFormat, whereCollectors.toMapthrowsIllegalStateException: Duplicate key.This adds a duplicate column name check during validation so that invalid configs are rejected early with a clear error message.
Root Cause
In
validateIndexingConfigAndFieldConfigList, the existing loop iterates overfieldConfigsto verify schema presence and compression codec compatibility, but never checks whether two entries share the same column name:Downstream consumers like
AbstractIndexType.convertToNewFormatuseCollectors.toMap(FieldConfig::getName, ...)which has zero tolerance for duplicate keys and crashes with an opaqueIllegalStateException.Fix
Add a
Set<String>to track seen column names and reject duplicates:Test Plan
Added 5 new tests in
TableConfigUtilsTestcovering: duplicate same encoding, duplicate different encoding, duplicate with indexes, and regression tests for distinct columns and single entry. All existing tests continue to pass.