[Arrow] Extract ArrowToPinotTypeConverter shared utility#18632
[Arrow] Extract ArrowToPinotTypeConverter shared utility#18632real-mj-song wants to merge 1 commit into
Conversation
…TypeConverter Pure refactor — no behavior change. Move the schema-driven Arrow → Pinot value conversion (originally introduced in apache#18434) from ArrowRecordExtractor into a new public static utility ArrowToPinotTypeConverter so it can be reused by column-major ColumnReader implementations that wrap Arrow vectors. ArrowToPinotTypeConverter exposes a single entry point: public static Object toPinotValue(Field field, Object value, boolean extractRawTimeValues) with all per-type helpers (convertTimestamp / Date / Time / List / Map / Struct / byRuntimeType) reduced to private static methods. The extractRawTimeValues flag — previously an ArrowRecordExtractor instance field — is threaded through as a method parameter so the converter remains stateless. ArrowRecordExtractor.convert(Field, Object) is removed; the per-row dispatch in extract() now calls ArrowToPinotTypeConverter.toPinotValue() directly. All existing tests in pinot-arrow continue to pass unchanged: - ArrowRecordReaderTest (2 / 2) - ArrowRecordExtractorTest (54 / 54) - ArrowMessageDecoderTest (7 / 7) This precursor lets a follow-up ColumnReader implementation reuse the conversion without duplicating ~200 lines of schema-driven dispatch.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18632 +/- ##
=========================================
Coverage 64.40% 64.40%
+ Complexity 1137 1131 -6
=========================================
Files 3337 3337
Lines 206069 206069
Branches 32128 32128
=========================================
+ Hits 132710 132716 +6
+ Misses 62726 62723 -3
+ Partials 10633 10630 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The single CI failure (Pinot Integration Test Set 1) is unrelated to this PR's diff. Failing case: Assertion diff:
The Recent |
TL;DR: Precursor refactor extracting Arrow → Pinot type-conversion logic from
ArrowRecordExtractorinto a new public utilityArrowToPinotTypeConverter. Behavior-preserving on the row-major path; enables a column-majorColumnReaderconsumer to reuse the same conversion in a follow-on PR.Tracks #18629. This is the precursor PR; a follow-on PR will add
ArrowColumnReaderFactory/ArrowColumnReaderon top of this branch and consume the extracted utility.What changed
ArrowToPinotTypeConverter— public static utility holding the schema-driven dispatch onArrowType.ArrowTypeIDpreviously inlined inArrowRecordExtractor. Returns canonical JDK types (StringfromUtf8/LargeUtf8unwrapped from Arrow'sText,Object[]forListvariants,LocalDate/LocalTime/Timestampfor temporal types,BigDecimalforDecimal, etc.).ArrowRecordExtractor.extractnow delegates toArrowToPinotTypeConverter.toPinotValue(field, value, extractRawTimeValues). The inlineconvert()and its helpers (convertTimestamp,convertDate,convertTime,convertList,convertMap,convertStruct,convertByRuntimeType,toEpochInUnit) are removed from the extractor.Testing
All 63 existing row-major tests pass unchanged:
ArrowRecordReaderTest(2 / 2)ArrowRecordExtractorTest(54 / 54)ArrowMessageDecoderTest(7 / 7)ArrowRecordExtractorTestexhaustively exercises every converter branch (every Arrow logical type, all temporal units, theextractRawTimeValuesbypass, list / struct / map recursion, dictionary encoding, nulls). The refactor green-bar on those tests proves behavior preservation.Compatibility
Purely additive on the SPI side. Public API surface change:
ArrowToPinotTypeConverteris a new public class; nothing existing is renamed or removed.References