Skip to content

Refactor PinotDataType / FunctionUtils dispatch to value+instanceof#18428

Merged
Jackie-Jiang merged 1 commit into
apache:masterfrom
Jackie-Jiang:pinot_data_type_dispatch
May 6, 2026
Merged

Refactor PinotDataType / FunctionUtils dispatch to value+instanceof#18428
Jackie-Jiang merged 1 commit into
apache:masterfrom
Jackie-Jiang:pinot_data_type_dispatch

Conversation

@Jackie-Jiang
Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang commented May 5, 2026

Summary

Replaces the Class-based map / chain dispatch in PinotDataType.getSingleValueType / getMultiValueType and FunctionUtils.getArgumentType with an instanceof chain that takes the value directly. Fixes a long-standing exact-class match bug where vendor JDBC Timestamp subclasses (e.g. BigQuery Simba's TimestampTz) fell through to OBJECT and broke downstream conversion.

PinotDataType

  • getSingleValueType(Class<?>)getSingleValueType(Object) and getMultiValueType(Class<?>)getMultiValueType(Object)instanceof dispatch in canonical Pinot type order. Always non-null (OBJECT / OBJECT_ARRAY for unrecognized types). Subclasses of non-final types (Timestamp, Map, etc.) match their parent type naturally.
  • Split BOOLEAN_ARRAY into PRIMITIVE_BOOLEAN_ARRAY (boolean[]) and BOOLEAN_ARRAY (Boolean[]) — parallel to PRIMITIVE_INT_ARRAY / INTEGER_ARRAY. Fixes the silent asymmetry where BOOLEAN_ARRAY stored as primitive while every other *_ARRAY was boxed.
  • Rename toBooleanArray (returns boolean[]) to toPrimitiveBooleanArray; new toBooleanArray returns Boolean[]. Matches int / long / float / double naming.
  • toObjectArray now handles boolean[] alongside int[] / long[] / float[] / double[].
  • Reorder default to*Array methods to canonical Pinot type order (INT → LONG → FLOAT → DOUBLE → BIG_DECIMAL → BOOLEAN → TIMESTAMP → STRING → BYTES → DATE → TIME → UUID).

FunctionUtils

  • getArgumentType(Class<?>)getArgumentType(Object), always non-null. Delegates SV dispatch to PinotDataType.getSingleValueType and MV reference-array dispatch to PinotDataType.getMultiValueType via element sampling; primitive arrays handled locally (since they can't be element-sampled into a boxed type).
  • Add boolean[] / Timestamp[] entries to PARAMETER_TYPE_MAP and COLUMN_DATA_TYPE_MAP so scalar functions can declare these as parameter / return types.
  • Remove unused DATA_TYPE_MAP and getDataType — zero callers; the map mapped Java array classes to the element-type DataType which lost the SV/MV distinction.

ScalarTransformFunctionWrapper

  • Add missing PRIMITIVE_BOOLEAN_ARRAY and TIMESTAMP_ARRAY cases to getNonLiteralValues. Both were reachable via the new boolean[] / Timestamp[] entries in PARAMETER_TYPE_MAP but previously hit the default branch and threw IllegalStateException("Unsupported parameter type: ..."). Per-row conversion mirrors the SV BOOLEAN / TIMESTAMP cases (int[][]boolean[][] and long[][]Timestamp[][]).

Caller updates

Drops .getClass() at every call site:

  • FunctionInvoker.convertTypes
  • BaseDefaultColumnHandler.createDerivedColumnV1Indices
  • DataTypeConversionFunctions.cast
  • MapColumnPreIndexStatsCollector.createKeyStatsCollector
  • DataTypeTransformerUtils.transformValue

Tests

  • New FunctionUtilsTest covering getArgumentType / getParameterType / getColumnDataType, including the vendor Timestamp subclass case and the new boolean[] / Timestamp[] map entries.
  • PinotDataTypeTest converted to value-based assertions in canonical order, added PRIMITIVE_BOOLEAN_ARRAYBOOLEAN_ARRAY cross-form conversions and Timestamp subclass cases for both getSingleValueType and getMultiValueType.
  • New ScalarTransformFunctionWrapperTest cases — testCountTrueBooleansTransformFunction and testSumTimestampMillisTransformFunction — exercise the new PRIMITIVE_BOOLEAN_ARRAY / TIMESTAMP_ARRAY dispatch end-to-end via test-scope @ScalarFunction helpers (countTrueBooleans(boolean[]), sumTimestampMillis(Timestamp[])) registered through FunctionRegistry's classpath scan.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors Pinot type inference from exact Class<?> matching to value-based instanceof dispatch so runtime subclasses, especially JDBC Timestamp subclasses, resolve to the expected Pinot types. It also separates primitive vs boxed boolean-array handling and updates affected call sites/tests.

Changes:

  • Replaced Class<?>-based dispatch in PinotDataType and FunctionUtils with value-based inference and updated callers to pass values instead of .getClass().
  • Split boolean-array handling into PRIMITIVE_BOOLEAN_ARRAY (boolean[]) and BOOLEAN_ARRAY (Boolean[]), and added related conversion/map updates.
  • Added and updated tests for new dispatch behavior, including vendor Timestamp subclasses and new array mappings.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/DataTypeTransformerUtils.java Switches transform-time source type inference to value-based dispatch.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/defaultcolumn/BaseDefaultColumnHandler.java Infers derived-column output type from runtime values instead of classes.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/MapColumnPreIndexStatsCollector.java Uses runtime values for map-entry stats type detection.
pinot-common/src/test/java/org/apache/pinot/common/utils/PinotDataTypeTest.java Reworks PinotDataType tests for value-based dispatch and boolean-array split.
pinot-common/src/test/java/org/apache/pinot/common/function/FunctionUtilsTest.java Adds coverage for getArgumentType, parameter types, and column data types.
pinot-common/src/test/java/org/apache/pinot/common/evaluator/InbuiltFunctionEvaluatorTest.java Updates evaluator tests to use value-based argument type inference.
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java Implements runtime-value dispatch and new boxed/primitive boolean-array behavior.
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/DataTypeConversionFunctions.java Updates cast source-type resolution and fixes the array guard condition.
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java Replaces argument type lookup logic and extends type maps for new array forms.
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionInvoker.java Converts function arguments using value-derived Pinot types.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 5, 2026

Codecov Report

❌ Patch coverage is 89.75904% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.59%. Comparing base (cedb6c6) to head (e55d282).

Files with missing lines Patch % Lines
...a/org/apache/pinot/common/utils/PinotDataType.java 85.98% 8 Missing and 7 partials ⚠️
.../apache/pinot/common/function/FunctionInvoker.java 50.00% 0 Missing and 1 partial ⚠️
...n/function/scalar/DataTypeConversionFunctions.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18428      +/-   ##
============================================
- Coverage     63.59%   63.59%   -0.01%     
  Complexity     1717     1717              
============================================
  Files          3252     3252              
  Lines        199119   199132      +13     
  Branches      30857    30875      +18     
============================================
+ Hits         126627   126631       +4     
- Misses        62425    62427       +2     
- Partials      10067    10074       +7     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.59% <89.75%> (-0.01%) ⬇️
temurin 63.59% <89.75%> (-0.01%) ⬇️
unittests 63.58% <89.75%> (-0.01%) ⬇️
unittests1 55.68% <87.95%> (+<0.01%) ⬆️
unittests2 34.90% <31.32%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Jackie-Jiang added a commit to Jackie-Jiang/pinot that referenced this pull request May 5, 2026
…L_ARRAY, guards

Restructures `JsonExtractScalarTransformFunction` for correctness, consistency, and
performance. Stacked on the PinotDataType / FunctionUtils refactor in the parent
commit; review only this commit on top of apache#18428.

Per-element coercion (the bug):
- The MV transform methods declared their result list as `List<Integer>` /
  `List<Long>` / etc. and cast `result.get(j)` directly. When the JsonPath resolved
  to elements of a different runtime type (e.g. `STRING_ARRAY` over a JSON array of
  numbers), the cast threw `ClassCastException`. Switched each MV result list to
  `List<Object>` and route per-element conversion through type-specific helpers.

Type-specific coercion helpers (shared by SV and MV):
- `toInt(value, isBoolean)` — `Number` → `intValue()`; for BOOLEAN result, follows
  Pinot's numeric convention (any non-zero `Number` → 1), `Boolean` → 1/0, and
  String forms via `BooleanUtils.toInt(String)`.
- `toLong(value, isTimestamp)` — `Number` → `longValue()`; for TIMESTAMP result,
  String forms parsed via `TimestampUtils.toMillisSinceEpoch` (ISO-8601 + numeric);
  otherwise `NumberUtils.parseJsonLong`.
- `toFloat`, `toDouble`, `toBigDecimal`, `toString` — straight `Number` /
  type-specific cast with `parse*(toString())` / `JsonUtils.objectToString`
  fallback.

New transform method:
- Added `transformToBigDecimalValuesMV`. `BIG_DECIMAL_ARRAY` previously fell through
  to the base class which can't extract from JSON.

Parser-context selection:
- `BIG_DECIMAL` and `STRING` SV/MV use `JSON_PARSER_CONTEXT_WITH_BIG_DECIMAL` to
  preserve full numeric precision and produce canonical-form string serialization.
  Numeric SV/MV stay on the default parser since narrowing to int / long / float /
  double yields equivalent results within double precision.
- New helper `getResultExtractorWithBigDecimal(valueBlock)` for the
  BigDecimal-parser path, mirroring the default `getResultExtractor`.

Stored-type guards on every transform method:
- All 12 SV/MV transform methods now guard with
  `_storedType != DataType.<X> ? super.transformTo*Values*V() : ...`. Closes the
  cross-type correctness hole where a caller asks for an int from a STRING-typed
  function — the base class now handles the conversion.

Default-value handling:
- `_defaultValue` is pre-converted to the canonical stored-type form once in
  `init()` (most types via the literal accessors; `BOOLEAN` literal stored as
  `Integer` 0 / 1 to match the `INT` storedType). Per-row default extraction is now
  a single direct cast at the top of each transform method, eliminating the
  `instanceof Number` / `parse*(toString())` chain that was repeated in every
  method.

Cached members:
- `_dataType` and `_storedType` cached as fields in `init()` so the transform
  methods avoid repeated `getDataType()` / `getStoredType()` invocations.

Tests:
- Added comprehensive coverage for the new behavior using `FluentQueryTest` with
  synthetic JSON: BOOLEAN coercion (Number / Boolean / String forms), TIMESTAMP
  coercion (numeric millis, ISO-8601, JDBC-format strings), STRING serialization
  for non-String JSON values, INT_ARRAY / STRING_ARRAY heterogeneous-element
  coercion, BIG_DECIMAL_ARRAY precision preservation, and the cross-type guard via
  the base class.
Jackie-Jiang added a commit to Jackie-Jiang/pinot that referenced this pull request May 5, 2026
…L_ARRAY, guards

Restructures `JsonExtractScalarTransformFunction` for correctness, consistency, and
performance. Stacked on the PinotDataType / FunctionUtils refactor in the parent
commit; review only this commit on top of apache#18428.

Per-element coercion (the bug):
- The MV transform methods declared their result list as `List<Integer>` /
  `List<Long>` / etc. and cast `result.get(j)` directly. When the JsonPath resolved
  to elements of a different runtime type (e.g. `STRING_ARRAY` over a JSON array of
  numbers), the cast threw `ClassCastException`. Switched each MV result list to
  `List<Object>` and route per-element conversion through type-specific helpers.

Type-specific coercion helpers (shared by SV and MV):
- `toInt(value, isBoolean)` — `Number` → `intValue()`; for BOOLEAN result, follows
  Pinot's numeric convention (any non-zero `Number` → 1), `Boolean` → 1/0, and
  String forms via `BooleanUtils.toInt(String)`.
- `toLong(value, isTimestamp)` — `Number` → `longValue()`; for TIMESTAMP result,
  String forms parsed via `TimestampUtils.toMillisSinceEpoch` (ISO-8601 + numeric);
  otherwise `NumberUtils.parseJsonLong`.
- `toFloat`, `toDouble`, `toBigDecimal`, `toString` — straight `Number` /
  type-specific cast with `parse*(toString())` / `JsonUtils.objectToString`
  fallback.

New transform method:
- Added `transformToBigDecimalValuesMV`. `BIG_DECIMAL_ARRAY` previously fell through
  to the base class which can't extract from JSON.

Parser-context selection:
- `BIG_DECIMAL` and `STRING` SV/MV use `JSON_PARSER_CONTEXT_WITH_BIG_DECIMAL` to
  preserve full numeric precision and produce canonical-form string serialization.
  Numeric SV/MV stay on the default parser since narrowing to int / long / float /
  double yields equivalent results within double precision.
- New helper `getResultExtractorWithBigDecimal(valueBlock)` for the
  BigDecimal-parser path, mirroring the default `getResultExtractor`.

Stored-type guards on every transform method:
- All 12 SV/MV transform methods now guard with
  `_storedType != DataType.<X> ? super.transformTo*Values*V() : ...`. Closes the
  cross-type correctness hole where a caller asks for an int from a STRING-typed
  function — the base class now handles the conversion.

Default-value handling:
- `_defaultValue` is pre-converted to the canonical stored-type form once in
  `init()` (most types via the literal accessors; `BOOLEAN` literal stored as
  `Integer` 0 / 1 to match the `INT` storedType). Per-row default extraction is now
  a single direct cast at the top of each transform method, eliminating the
  `instanceof Number` / `parse*(toString())` chain that was repeated in every
  method.

Cached members:
- `_dataType` and `_storedType` cached as fields in `init()` so the transform
  methods avoid repeated `getDataType()` / `getStoredType()` invocations.

Tests:
- Added comprehensive coverage for the new behavior using `FluentQueryTest` with
  synthetic JSON: BOOLEAN coercion (Number / Boolean / String forms), TIMESTAMP
  coercion (numeric millis, ISO-8601, JDBC-format strings), STRING serialization
  for non-String JSON values, INT_ARRAY / STRING_ARRAY heterogeneous-element
  coercion, BIG_DECIMAL_ARRAY precision preservation, and the cross-type guard via
  the base class.
@Jackie-Jiang
Copy link
Copy Markdown
Contributor Author

Summarizing replies to Copilot's review:

Re: Class<?> API removal (3 comments on getSingleValueType / getMultiValueType / getArgumentType)

Deliberate. The old Class<?>-based API was the source of the bug this PR fixes — exact-class matching against Timestamp.class silently miscategorized vendor JDBC subclasses (e.g. BigQuery Simba's TimestampTz) as OBJECT, which broke downstream conversion. A Class<?> shim that delegates to the new value-based form would either need to materialize a sentinel instance per Class<?> (impossible without per-class registration) or perpetuate the original bug. Going through value.getClass() at the existing call sites is the cleaner contract — every internal caller already had a value in hand and was paying .getClass() boilerplate to feed the Class-based API. The four call sites in OSS (FunctionInvoker, BaseDefaultColumnHandler, DataTypeConversionFunctions, MapColumnPreIndexStatsCollector, DataTypeTransformerUtils) all pass values now, none lost functionality. Same reasoning for removing getDataType — it was unused and conceptually broken (mapped Java array classes to the element-type DataType, losing the SV/MV distinction that FieldSpec tracks separately).

Re: BOOLEAN_ARRAY repurposing

Deliberate. Pre-PR, BOOLEAN_ARRAY used boolean[] storage while every other *_ARRAY type used the boxed form (INTEGER_ARRAY/LONG_ARRAY/etc.); the parallel PRIMITIVE_INT_ARRAY/PRIMITIVE_LONG_ARRAY already existed. This PR splits the boolean form to match: PRIMITIVE_BOOLEAN_ARRAY carries boolean[] (parallel to PRIMITIVE_INT_ARRAY), BOOLEAN_ARRAY carries Boolean[] (parallel to INTEGER_ARRAY). The naming is now uniform across all numeric types. Pinot's actual ingestion path goes through DataTypeTransformer.transformBOOLEAN_ARRAY.convert(value, sourceType)BOOLEAN_ARRAY.toInternal(...), and the new convert produces Boolean[] matching the new toInternal cast — so the chain stays internally consistent. Direct external use of PinotDataType.BOOLEAN_ARRAY.toInternal(boolean[]) is rare to nonexistent in the codebase.

Re: empty/all-null typed reference arrays in getArgumentType

The collapse to OBJECT_ARRAY for empty/all-null arrays is harmless. The cited concern about BaseDefaultColumnHandler.createDerivedColumnV1Indices doesn't materialize: outputValueType is only used to pick which to*Array method to call, and every to*Array default impl starts with an instanceof short-circuit on the destination boxed-array type:

public Integer[] toIntegerArray(Object value) {
  if (value instanceof Integer[]) return (Integer[]) value;
  ...
}

So if a later row produces a typed Integer[], OBJECT_ARRAY.toIntegerArray(Integer[]) (the inherited default) returns it as-is regardless of the cached outputValueType. Mixed-type arrays are handled by the per-element catch CCE → anyToInt(element) fallback inside the loop. The dispatch label and the conversion result are decoupled here — the label only steers method selection.

Re: /// Javadoc on toInternal

/// is JEP 467 Markdown Javadoc — proper Javadoc, not a line comment, since Java 23. Pinot already uses this convention widely. Tooling (javadoc, IDEs, IntelliJ) renders it.

Re: null fallback test coverage (3 comments)

The API contract is non-null after this PR (note: @Nullable was dropped on the value parameter). The fact that instanceof null returns false and the chain falls through to OBJECT/OBJECT_ARRAY is an incidental implementation detail, not a documented contract. Adding getSingleValueType(null) / getMultiValueType(null) / getArgumentType(null) assertions would lock down behavior we don't intend to guarantee — callers with a null value should pre-check, not rely on OBJECT fallthrough.

Jackie-Jiang added a commit to Jackie-Jiang/pinot that referenced this pull request May 5, 2026
…L_ARRAY, guards

Restructures `JsonExtractScalarTransformFunction` for correctness, consistency, and
performance. Stacked on the PinotDataType / FunctionUtils refactor in the parent
commit; review only this commit on top of apache#18428.

Per-element coercion (the bug):
- The MV transform methods declared their result list as `List<Integer>` /
  `List<Long>` / etc. and cast `result.get(j)` directly. When the JsonPath resolved
  to elements of a different runtime type (e.g. `STRING_ARRAY` over a JSON array of
  numbers), the cast threw `ClassCastException`. Switched each MV result list to
  `List<Object>` and route per-element conversion through type-specific helpers.

Type-specific coercion helpers (shared by SV and MV):
- `toInt(value, isBoolean)` — `Number` → `intValue()`; for BOOLEAN result, follows
  Pinot's numeric convention (any non-zero `Number` → 1), `Boolean` → 1/0, and
  String forms via `BooleanUtils.toInt(String)`.
- `toLong(value, isTimestamp)` — `Number` → `longValue()`; for TIMESTAMP result,
  String forms parsed via `TimestampUtils.toMillisSinceEpoch` (ISO-8601 + numeric);
  otherwise `NumberUtils.parseJsonLong`.
- `toFloat`, `toDouble`, `toBigDecimal`, `toString` — straight `Number` /
  type-specific cast with `parse*(toString())` / `JsonUtils.objectToString`
  fallback.

New transform method:
- Added `transformToBigDecimalValuesMV`. `BIG_DECIMAL_ARRAY` previously fell through
  to the base class which can't extract from JSON.

Parser-context selection:
- `BIG_DECIMAL` and `STRING` SV/MV use `JSON_PARSER_CONTEXT_WITH_BIG_DECIMAL` to
  preserve full numeric precision and produce canonical-form string serialization.
  Numeric SV/MV stay on the default parser since narrowing to int / long / float /
  double yields equivalent results within double precision.
- New helper `getResultExtractorWithBigDecimal(valueBlock)` for the
  BigDecimal-parser path, mirroring the default `getResultExtractor`.

Stored-type guards on every transform method:
- All 12 SV/MV transform methods now guard with
  `_storedType != DataType.<X> ? super.transformTo*Values*V() : ...`. Closes the
  cross-type correctness hole where a caller asks for an int from a STRING-typed
  function — the base class now handles the conversion.

Default-value handling:
- `_defaultValue` is pre-converted to the canonical stored-type form once in
  `init()` (most types via the literal accessors; `BOOLEAN` literal stored as
  `Integer` 0 / 1 to match the `INT` storedType). Per-row default extraction is now
  a single direct cast at the top of each transform method, eliminating the
  `instanceof Number` / `parse*(toString())` chain that was repeated in every
  method.

Cached members:
- `_dataType` and `_storedType` cached as fields in `init()` so the transform
  methods avoid repeated `getDataType()` / `getStoredType()` invocations.

Tests:
- Added comprehensive coverage for the new behavior using `FluentQueryTest` with
  synthetic JSON: BOOLEAN coercion (Number / Boolean / String forms), TIMESTAMP
  coercion (numeric millis, ISO-8601, JDBC-format strings), STRING serialization
  for non-String JSON values, INT_ARRAY / STRING_ARRAY heterogeneous-element
  coercion, BIG_DECIMAL_ARRAY precision preservation, and the cross-type guard via
  the base class.
@xiangfu0
Copy link
Copy Markdown
Contributor

xiangfu0 commented May 5, 2026

do we need to handle PRIMITIVE_BOOLEAN_ARRAY or TIMESTAMP_ARRAY in ScalarTransformFunctionWrapper .getNonLiteralValues

Replaces the Class-based map / chain dispatch in PinotDataType.getSingleValueType /
getMultiValueType and FunctionUtils.getArgumentType with an instanceof chain that
takes the value directly. Fixes the long-standing exact-class match bug where vendor
JDBC Timestamp subclasses (e.g. BigQuery Simba's TimestampTz) fell through to OBJECT
and broke downstream conversion.

Highlights:
- PinotDataType.getSingleValueType / getMultiValueType: take Object instead of Class,
  dispatch via instanceof chain in canonical Pinot type order. Always non-null
  (OBJECT / OBJECT_ARRAY for unrecognized).
- Split BOOLEAN_ARRAY into PRIMITIVE_BOOLEAN_ARRAY (boolean[]) and BOOLEAN_ARRAY
  (Boolean[]), parallel to PRIMITIVE_INT_ARRAY / INTEGER_ARRAY etc. Fixes the silent
  asymmetry where BOOLEAN_ARRAY stored as primitive while every other *_ARRAY was
  boxed.
- Rename toBooleanArray (returns boolean[]) to toPrimitiveBooleanArray; new
  toBooleanArray returns Boolean[]. Matches int / long / float / double naming.
- toObjectArray now handles boolean[] alongside int[] / long[] / float[] / double[].
- Reorder default to*Array methods to canonical Pinot type order.

FunctionUtils:
- getArgumentType: take Object, always non-null. Delegates SV to
  PinotDataType.getSingleValueType, MV reference arrays to
  PinotDataType.getMultiValueType (via element sampling), primitive arrays handled
  locally.
- Add boolean[] / Timestamp[] entries to PARAMETER_TYPE_MAP and COLUMN_DATA_TYPE_MAP.
- Remove unused DATA_TYPE_MAP and getDataType (zero callers; the map mapped Java
  array classes to the element-type DataType which lost the SV/MV distinction).

Callers updated to pass values instead of classes (drops .getClass() at every call
site): FunctionInvoker, BaseDefaultColumnHandler, DataTypeConversionFunctions,
MapColumnPreIndexStatsCollector, DataTypeTransformerUtils.

Tests:
- New FunctionUtilsTest covering getArgumentType / getParameterType /
  getColumnDataType, including vendor Timestamp subclass case and boolean[] /
  Timestamp[] additions.
- PinotDataTypeTest converted to value-based assertions in canonical order, added
  PRIMITIVE_BOOLEAN_ARRAY ↔ BOOLEAN_ARRAY cross-form conversions and Timestamp
  subclass cases for both getSingleValueType and getMultiValueType.
@Jackie-Jiang
Copy link
Copy Markdown
Contributor Author

do we need to handle PRIMITIVE_BOOLEAN_ARRAY or TIMESTAMP_ARRAY in ScalarTransformFunctionWrapper .getNonLiteralValues

Good point. Added

@Jackie-Jiang Jackie-Jiang force-pushed the pinot_data_type_dispatch branch from d5ab8da to e55d282 Compare May 6, 2026 02:34
@Jackie-Jiang Jackie-Jiang added ingestion Related to data ingestion pipeline query Related to query processing labels May 6, 2026
@Jackie-Jiang Jackie-Jiang merged commit 4e40672 into apache:master May 6, 2026
11 checks passed
@Jackie-Jiang Jackie-Jiang deleted the pinot_data_type_dispatch branch May 6, 2026 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is not working as expected ingestion Related to data ingestion pipeline query Related to query processing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants