Skip to content

[deprecation] Block field-level forward encoding in table configs#18668

Open
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/deprecate-fieldconfig-encoding-type
Open

[deprecation] Block field-level forward encoding in table configs#18668
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/deprecate-fieldconfig-encoding-type

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Jun 3, 2026

Summary

Deprecates field-level FieldConfig.encodingType and makes fieldConfig.indexes.forward.encodingType the canonical forward-index encoding source.

This change now blocks table creation/update configs that still use fieldConfigList[].encodingType. Validation fails with a targeted message directing users to fieldConfigList[].indexes.forward.encodingType.

Compatibility retained internally:

  • fieldConfig.encodingType still deserializes for backward-compatible FieldConfig reads.
  • FieldConfig#getEncodingType() still returns DICTIONARY when the field is absent for existing callers.
  • Legacy inputs such as noDictionaryColumns, noDictionaryConfig, and field-level encodingType remain fallback signals in index-conversion paths when no indexes.forward.encodingType is present.
  • When indexes.forward.encodingType is present, it is authoritative over legacy no-dictionary and field-level encoding signals.

User Manual

For new or updated table configs, configure forward-index encoding under the internal forward-index block:

{
  "fieldConfigList": [
    {
      "name": "message",
      "indexes": {
        "forward": {
          "encodingType": "RAW",
          "compressionCodec": "ZSTANDARD"
        }
      }
    }
  ]
}

Do not use the deprecated field-level encoding form:

{
  "fieldConfigList": [
    {
      "name": "message",
      "encodingType": "RAW"
    }
  ]
}

That form now fails table validation with:

FieldConfig.encodingType is deprecated for column: message. Use fieldConfigList[].indexes.forward.encodingType instead.

Sample Table Config

{
  "tableName": "logs_OFFLINE",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "schemaName": "logs"
  },
  "tableIndexConfig": {},
  "fieldConfigList": [
    {
      "name": "logLine",
      "indexes": {
        "forward": {
          "encodingType": "RAW",
          "compressionCodec": "ZSTANDARD"
        },
        "text": {}
      }
    },
    {
      "name": "statusCode",
      "indexes": {
        "forward": {
          "encodingType": "DICTIONARY"
        },
        "inverted": {}
      }
    }
  ]
}

Validation

  • ./mvnw -pl pinot-segment-local -am -Dtest=TableConfigUtilsTest,TableConfigConsumingSegmentTierOverrideTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-segment-local -am -Dtest=TableConfigUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-segment-local -am -Dtest=ForwardIndexTypeTest,DictionaryIndexTypeTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-spi -Dtest=FieldConfigTest test
  • ./mvnw -pl pinot-segment-spi -Dtest=ForwardIndexConfigTest test
  • ./mvnw -pl pinot-spi -Dtest=FieldConfigTest,OpenStructIndexConfigTest test
  • ./mvnw spotless:apply -pl pinot-spi,pinot-segment-local
  • ./mvnw license:format -pl pinot-spi,pinot-segment-local
  • ./mvnw checkstyle:check -pl pinot-spi,pinot-segment-local
  • ./mvnw license:check -pl pinot-spi,pinot-segment-local
  • ./mvnw spotless:apply -pl pinot-spi,pinot-segment-spi,pinot-segment-local
  • ./mvnw checkstyle:check -pl pinot-spi,pinot-segment-spi,pinot-segment-local
  • ./mvnw license:format -pl pinot-spi,pinot-segment-spi,pinot-segment-local
  • ./mvnw license:check -pl pinot-spi,pinot-segment-spi,pinot-segment-local

license:check passes with pre-existing unknown-extension warnings for Lucene binary test resources under pinot-segment-local/src/test/resources/data/lucene_80_index.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 76.66667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.49%. Comparing base (e60a8b2) to head (1f8adbc).

Files with missing lines Patch % Lines
.../local/segment/index/forward/ForwardIndexType.java 66.66% 2 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18668      +/-   ##
============================================
+ Coverage     56.92%   64.49%   +7.56%     
- Complexity        7     1291    +1284     
============================================
  Files          2587     3372     +785     
  Lines        150165   208641   +58476     
  Branches      24274    32586    +8312     
============================================
+ Hits          85483   134554   +49071     
- Misses        57463    63287    +5824     
- Partials       7219    10800    +3581     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.49% <76.66%> (+7.56%) ⬆️
temurin 64.49% <76.66%> (+7.56%) ⬆️
unittests 64.48% <76.66%> (+7.56%) ⬆️
unittests1 56.91% <36.66%> (-0.02%) ⬇️
unittests2 37.12% <76.66%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 added configuration Config changes (addition/deletion/change in behavior) deprecation Marks deprecated APIs, configs, or features labels Jun 3, 2026
@xiangfu0 xiangfu0 changed the title [codex] Deprecate field-level forward encoding [deprecation] Deprecate field-level forward encoding Jun 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@xiangfu0 xiangfu0 force-pushed the codex/deprecate-fieldconfig-encoding-type branch from 3b9a7b8 to 69c7124 Compare June 3, 2026 21:39
@xiangfu0 xiangfu0 changed the title [deprecation] Deprecate field-level forward encoding [deprecation] Block field-level forward encoding in table configs Jun 3, 2026
@xiangfu0 xiangfu0 marked this pull request as ready for review June 5, 2026 22:34
Copy link
Copy Markdown
Contributor Author

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — reviewed at current HEAD 69c7124

Thanks — the demotion of field-level encodingType in favor of indexes.forward.encodingType is a clean direction and the serialization handling is solid. One backward-compatibility concern stands out that I think needs a decision before merge; the rest are minor.

🔴 MAJOR — validate() now hard-rejects field-level encodingType; this contradicts the PR description and has no migration path

TableConfigUtils.validateIndexingConfigAndFieldConfigList now does:

Preconditions.checkState(!fieldConfig.hasFieldLevelEncodingType(),
    "FieldConfig.encodingType is deprecated for column: %s. Use fieldConfigList[].indexes.forward.encodingType instead.", column);

hasFieldLevelEncodingType() returns _encodingType != null, i.e. it fires whenever a config explicitly sets the long-standing, documented field-level form:

{ "name": "message", "encodingType": "RAW" }

I checked the controller write path: TableConfigUtils.validate(...) runs on the as-submitted config at table create/update (PinotTableRestletResource#868, TableConfigsRestletResource#594/604), and there is no auto-migration before it — createTableConfigFromOldFormat / convertFromLegacyTableConfig have no controller callers (test-only). So:

  • Creating a table with field-level encodingType is now rejected.
  • Editing an existing table whose stored config has field-level encodingType (re-submitting the full config) is rejected until the user hand-migrates every such column into indexes.forward.

This directly contradicts the PR description, which says "Existing configs using top-level encodingType continue to work, but that field is now deprecated" and shows {"name":"message","encodingType":"RAW"} as a still-valid example. As written, the field isn't deprecated — it's rejected. (Running clusters keep working since stored configs aren't re-validated on read; the break is on the create/update/validation path.)

Recommendation — pick one and make the PR consistent with it:

  • Backward-compatible (preferred): auto-migrate field-level encodingTypeindexes.forward.encodingType on the config-write path (reuse the withForwardEncoding logic), then validate. Existing tables keep working transparently and the legacy field is genuinely "deprecated".
  • Intentional hard break: then add the backward-incompat/release-notes label + rolling-upgrade notes, correct the PR description (drop "continues to work"), and reconcile the now-mostly-dead legacy fallback in ForwardIndexType.createDeserializer (fieldConfig.getEncodingType() can no longer return an explicit value for a validated config).

✅ Resolved / looks good

  • noDictionaryColumns + forward=DICTIONARY is now an explicit, tested validation error (Dictionary must be enabled for dictionary-encoded forward index column). The forward and dictionary resolvers disagree on precedence for that contradictory combo, but validation now catches it deterministically — good.
  • Serialization is safe across versions: omitting encodingType no longer emits a spurious "encodingType":"DICTIONARY"; an old node defaults missing→DICTIONARY identically. FieldConfigTest covers both directions including hasFieldLevelEncodingType().
  • The RAW direction is fully coherent — both ForwardIndexType and DictionaryIndexType honor forward=RAW, so a RAW forward index correctly disables the dictionary.
  • ArrayListLinkedHashSet for configsToUpdate is a real robustness fix; withForwardEncoding/getForwardEncodingType handle NullNode/null correctly.

🟡 Minor

  1. Duplication of the forward-block parse/write. DictionaryIndexType.getForwardEncodingType re-implements the indexes.forward.encodingType lookup that ForwardIndexType.createDeserializer already does, plus there are now multiple withForwardEncoding helpers (main + test) and hardcoded "encodingType"/"forward" literals. A single shared accessor + constant would keep the precedence rule single-sourced. (Inline.)
  2. equals()/hashCode() semantics (BaseJsonConfig is toJsonNode-based): a FieldConfig with absent encodingType is no longer equal to one with explicit encodingType:DICTIONARY (previously both normalized to DICTIONARY). Low risk since explicit values round-trip unchanged, but worth confirming no controller "config changed?" diff relies on the old normalization.
  3. OpenStructIndexConfig.shouldUseDictionaryForKey still reads the deprecated field-level getEncodingType(); with field-level encoding now forbidden by validation, it can only ever observe the DICTIONARY default and won't see a key column configured RAW via the forward block. Pre-existing and not touched here, but the gap widens under this PR — worth a follow-up.

(Note: I reviewed the previous HEAD earlier; the branch advanced to 69c7124 mid-review, which already addressed the earlier dictionary/forward precedence concern. The above reflects the current diff.)

@xiangfu0 xiangfu0 force-pushed the codex/deprecate-fieldconfig-encoding-type branch 2 times, most recently from 8eb6cd7 to 3e3615f Compare June 6, 2026 02:29
@xiangfu0 xiangfu0 added backward-incompat Introduces a backward-incompatible API or behavior change release-notes Referenced by PRs that need attention when compiling the next release notes labels Jun 6, 2026
@xiangfu0 xiangfu0 force-pushed the codex/deprecate-fieldconfig-encoding-type branch 2 times, most recently from 6509d07 to e982888 Compare June 6, 2026 10:11
@xiangfu0 xiangfu0 force-pushed the codex/deprecate-fieldconfig-encoding-type branch from e982888 to 1f8adbc Compare June 6, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backward-incompat Introduces a backward-incompatible API or behavior change configuration Config changes (addition/deletion/change in behavior) deprecation Marks deprecated APIs, configs, or features release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants