[deprecation] Block field-level forward encoding in table configs#18668
[deprecation] Block field-level forward encoding in table configs#18668xiangfu0 wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18668 +/- ##
============================================
+ Coverage 56.92% 64.49% +7.56%
- Complexity 7 1291 +1284
============================================
Files 2587 3372 +785
Lines 150165 208641 +58476
Branches 24274 32586 +8312
============================================
+ Hits 85483 134554 +49071
- Misses 57463 63287 +5824
- Partials 7219 10800 +3581
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
3b9a7b8 to
69c7124
Compare
xiangfu0
left a comment
There was a problem hiding this comment.
Review — reviewed at current HEAD 69c7124
Thanks — the demotion of field-level encodingType in favor of indexes.forward.encodingType is a clean direction and the serialization handling is solid. One backward-compatibility concern stands out that I think needs a decision before merge; the rest are minor.
🔴 MAJOR — validate() now hard-rejects field-level encodingType; this contradicts the PR description and has no migration path
TableConfigUtils.validateIndexingConfigAndFieldConfigList now does:
Preconditions.checkState(!fieldConfig.hasFieldLevelEncodingType(),
"FieldConfig.encodingType is deprecated for column: %s. Use fieldConfigList[].indexes.forward.encodingType instead.", column);hasFieldLevelEncodingType() returns _encodingType != null, i.e. it fires whenever a config explicitly sets the long-standing, documented field-level form:
{ "name": "message", "encodingType": "RAW" }I checked the controller write path: TableConfigUtils.validate(...) runs on the as-submitted config at table create/update (PinotTableRestletResource#868, TableConfigsRestletResource#594/604), and there is no auto-migration before it — createTableConfigFromOldFormat / convertFromLegacyTableConfig have no controller callers (test-only). So:
- Creating a table with field-level
encodingTypeis now rejected. - Editing an existing table whose stored config has field-level
encodingType(re-submitting the full config) is rejected until the user hand-migrates every such column intoindexes.forward.
This directly contradicts the PR description, which says "Existing configs using top-level encodingType continue to work, but that field is now deprecated" and shows {"name":"message","encodingType":"RAW"} as a still-valid example. As written, the field isn't deprecated — it's rejected. (Running clusters keep working since stored configs aren't re-validated on read; the break is on the create/update/validation path.)
Recommendation — pick one and make the PR consistent with it:
- Backward-compatible (preferred): auto-migrate field-level
encodingType→indexes.forward.encodingTypeon the config-write path (reuse thewithForwardEncodinglogic), then validate. Existing tables keep working transparently and the legacy field is genuinely "deprecated". - Intentional hard break: then add the
backward-incompat/release-noteslabel + rolling-upgrade notes, correct the PR description (drop "continues to work"), and reconcile the now-mostly-dead legacy fallback inForwardIndexType.createDeserializer(fieldConfig.getEncodingType()can no longer return an explicit value for a validated config).
✅ Resolved / looks good
noDictionaryColumns+forward=DICTIONARYis now an explicit, tested validation error (Dictionary must be enabled for dictionary-encoded forward index column). The forward and dictionary resolvers disagree on precedence for that contradictory combo, but validation now catches it deterministically — good.- Serialization is safe across versions: omitting
encodingTypeno longer emits a spurious"encodingType":"DICTIONARY"; an old node defaults missing→DICTIONARY identically.FieldConfigTestcovers both directions includinghasFieldLevelEncodingType(). - The RAW direction is fully coherent — both
ForwardIndexTypeandDictionaryIndexTypehonorforward=RAW, so a RAW forward index correctly disables the dictionary. ArrayList→LinkedHashSetforconfigsToUpdateis a real robustness fix;withForwardEncoding/getForwardEncodingTypehandleNullNode/null correctly.
🟡 Minor
- Duplication of the forward-block parse/write.
DictionaryIndexType.getForwardEncodingTypere-implements theindexes.forward.encodingTypelookup thatForwardIndexType.createDeserializeralready does, plus there are now multiplewithForwardEncodinghelpers (main + test) and hardcoded"encodingType"/"forward"literals. A single shared accessor + constant would keep the precedence rule single-sourced. (Inline.) equals()/hashCode()semantics (BaseJsonConfigistoJsonNode-based): aFieldConfigwith absentencodingTypeis no longer equal to one with explicitencodingType:DICTIONARY(previously both normalized to DICTIONARY). Low risk since explicit values round-trip unchanged, but worth confirming no controller "config changed?" diff relies on the old normalization.OpenStructIndexConfig.shouldUseDictionaryForKeystill reads the deprecated field-levelgetEncodingType(); with field-level encoding now forbidden by validation, it can only ever observe the DICTIONARY default and won't see a key column configured RAW via the forward block. Pre-existing and not touched here, but the gap widens under this PR — worth a follow-up.
(Note: I reviewed the previous HEAD earlier; the branch advanced to 69c7124 mid-review, which already addressed the earlier dictionary/forward precedence concern. The above reflects the current diff.)
8eb6cd7 to
3e3615f
Compare
6509d07 to
e982888
Compare
e982888 to
1f8adbc
Compare
Summary
Deprecates field-level
FieldConfig.encodingTypeand makesfieldConfig.indexes.forward.encodingTypethe canonical forward-index encoding source.This change now blocks table creation/update configs that still use
fieldConfigList[].encodingType. Validation fails with a targeted message directing users tofieldConfigList[].indexes.forward.encodingType.Compatibility retained internally:
fieldConfig.encodingTypestill deserializes for backward-compatibleFieldConfigreads.FieldConfig#getEncodingType()still returnsDICTIONARYwhen the field is absent for existing callers.noDictionaryColumns,noDictionaryConfig, and field-levelencodingTyperemain fallback signals in index-conversion paths when noindexes.forward.encodingTypeis present.indexes.forward.encodingTypeis present, it is authoritative over legacy no-dictionary and field-level encoding signals.User Manual
For new or updated table configs, configure forward-index encoding under the internal forward-index block:
{ "fieldConfigList": [ { "name": "message", "indexes": { "forward": { "encodingType": "RAW", "compressionCodec": "ZSTANDARD" } } } ] }Do not use the deprecated field-level encoding form:
{ "fieldConfigList": [ { "name": "message", "encodingType": "RAW" } ] }That form now fails table validation with:
Sample Table Config
{ "tableName": "logs_OFFLINE", "tableType": "OFFLINE", "segmentsConfig": { "schemaName": "logs" }, "tableIndexConfig": {}, "fieldConfigList": [ { "name": "logLine", "indexes": { "forward": { "encodingType": "RAW", "compressionCodec": "ZSTANDARD" }, "text": {} } }, { "name": "statusCode", "indexes": { "forward": { "encodingType": "DICTIONARY" }, "inverted": {} } } ] }Validation
./mvnw -pl pinot-segment-local -am -Dtest=TableConfigUtilsTest,TableConfigConsumingSegmentTierOverrideTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-segment-local -am -Dtest=TableConfigUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-segment-local -am -Dtest=ForwardIndexTypeTest,DictionaryIndexTypeTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-spi -Dtest=FieldConfigTest test./mvnw -pl pinot-segment-spi -Dtest=ForwardIndexConfigTest test./mvnw -pl pinot-spi -Dtest=FieldConfigTest,OpenStructIndexConfigTest test./mvnw spotless:apply -pl pinot-spi,pinot-segment-local./mvnw license:format -pl pinot-spi,pinot-segment-local./mvnw checkstyle:check -pl pinot-spi,pinot-segment-local./mvnw license:check -pl pinot-spi,pinot-segment-local./mvnw spotless:apply -pl pinot-spi,pinot-segment-spi,pinot-segment-local./mvnw checkstyle:check -pl pinot-spi,pinot-segment-spi,pinot-segment-local./mvnw license:format -pl pinot-spi,pinot-segment-spi,pinot-segment-local./mvnw license:check -pl pinot-spi,pinot-segment-spi,pinot-segment-locallicense:checkpasses with pre-existing unknown-extension warnings for Lucene binary test resources underpinot-segment-local/src/test/resources/data/lucene_80_index.