Skip to content

[VL] useful Velox PRs not merged into upstream #11585

@FelixYBW

Description

@FelixYBW

Here is a track of useful Velox PRs mostly submitted from Gluten community but not merged. It will be removed once the PR is merged automatically. Comment your PR# below if you want your PR to be tracked here. We didn't pick into gluten/velox because of the rebase effort. You may pick it on necessary

Tracked PRs (2026-03-11)

Ansi

  • 16307: ✅ [PICKABLE] feat: Support decimal type for Spark checked_multiply function

Bug Fixes

  • 15751: 📕 [CLOSED] feat: Flush row group by buffered bytes in parquet writer
  • 13734: 📕 [CLOSED] fix: Support constant value for lead/lag function
  • 12684: 📕 [CLOSED] fix: Make type check reflect the corresponding logical type
  • 12630: 📕 [CLOSED] fix(sparksql): Fix result mismatch cases in casting varchar to timestamp
  • 12563: 📕 [CLOSED] fix: HashProbe load LazyVector before wrapping
  • 10925: 📕 [CLOSED] Fix invalid 'rawResultNulls_' in SelectiveDecimalColumnReader
  • 8825 : 📕 [CLOSED] Fix Spark split function
  • 8888 : 📕 [CLOSED] Fix NaN in Spark array_intersect and array_except functions
  • 15173: ✅ [PICKABLE] fix(parquet): Fix reading array of row
  • 8014 : 📕 [CLOSED] Fix adapters issue caused by null config in FileHandleGenerator
  • 14504: 📕 [CLOSED] fix: Fix incorrect null results when unrelated child fields are missing from the requested type in readers
  • 15534: ⚠️ [CONFLICT] fix: Trim numeric suffix when casting string to real/double
  • 15313: 📕 [CLOSED] fix(s3): Make the MinioServer instance only be initialized once
  • 14277: 📕 [CLOSED] fix: An unloaded lazy vector cannot be wrapped by two different top level vectors
  • 13138: 📕 [CLOSED] fix: Fix the full outer join result mismatch issue with multi duplicated match
  • 11772: 📕 [CLOSED] fix: Fix the MergeSource data lost issue
  • 11068: 📕 [CLOSED] Fix full outer result mismatch issue when output contains multiple matching rows
  • 10402: 📕 [CLOSED] Fix the "An unsupported nested encoding was found." exception in parquet writer
  • 13907: ✅ [PICKABLE] feat: Fix the full outer join result mismatch issue
  • 11771: ⚠️ [CONFLICT] fix: Fix smj result mismatch issue in semi, anit and full outer join
  • 15711: ✅ [PICKABLE] fix: Reduce memory spike in aggregate window functions
  • 15343: ✅ [PICKABLE] feat(parquet): Allow reading a wider integer as a narrower one
  • 16164: ✅ [PICKABLE] fix: Def/rep level calculation for legacy Parquet lists

Bux Fixes

  • 16511: ✅ [PICKABLE] fix: Check for corrupted repeat/define lengths in Parquet headers
  • 15953: ✅ [PICKABLE] fix: Use EvictingCacheMap for compiled regular expressions

Enhancement

  • 14472: ⚠️ [CONFLICT] feat:Support multi-threaded asynchronous data upload to object storage.
  • 14214: 📕 [CLOSED] feat(parquet): Support page‑level pruning
  • 11285: 📕 [CLOSED] misc: Optimize the computation of sliding window kRange frame bound
  • 10638: 📕 [CLOSED] Distinguish null constant and non-null constant in simple function's initialize method
  • 9591 : 📕 [CLOSED] Support offset-based timezone
  • 8769 : 📕 [CLOSED] Enable UNKNOWN type in type dispatch
  • 15707: 📕 [CLOSED] feat: Enable the hash join to accept a pre-built hash table for joining
  • 13762: ✅ [PICKABLE] feat: Optimize nested loop other join types with small build side
  • 11808: ✅ [PICKABLE] feat: Add negated hugeint filters
  • 11740: ✅ [PICKABLE] feat: Support decimal schema evolution in Parquet scan
  • 11646: ⚠️ [CONFLICT] feat: Support row group skip for Parquet decimal
  • 5962 : ✅ [PICKABLE] feat: Support struct schema evolution matching by name
  • 5464 : 📕 [CLOSED] Add decimal type support for Spark first/last aggregate functions
  • 11836: 📕 [CLOSED] feat: Optimize serializer decompress buffer for BufferInputStream
  • 11824: 📕 [CLOSED] feat: Support prefix comparator in spill merge
  • 11703: 📕 [CLOSED] feat(prefix sort): Eliminate null byte from prefix encoder when single sort key
  • 11685: 📕 [CLOSED] feat: Support read file stream without buffer
  • 11954: 📕 [CLOSED] feat: Support Spark explode outer
  • 13862: 📕 [CLOSED] feat: Support spill write batch size limit
  • 10456: 📕 [CLOSED] Support semi projection join type in smj
  • 13041: ⚠️ [CONFLICT] feat: Enable the hash join to accept a pre-built hash table for joining
  • 11272: 📕 [CLOSED] Support string type for PrefixSort
  • 13817: 📕 [CLOSED] feat: Add zstd compression for unified compression API
  • 11206: 📕 [CLOSED] Supports serializing a range of rows for UnsafeRowFast
  • 7734 : 📕 [CLOSED] Fix Parquet writer to produce evenly-sized row groups
  • 15116: 📕 [CLOSED] feat: In the str_to_map Spark function, entryDelimiter and keyValueDelimiter are supported for more characters
  • 15751: 📕 [CLOSED] feat: Flush row group by buffered bytes in parquet writer
  • 15409: ✅ [PICKABLE] feat(spilling): Fallback to timsort when allocation of prefix sort buffer memory fails during spilling
  • 15290: ⚠️ [CONFLICT] perf(spilling): Support serializing rows to avoid extracting it as vector
  • 15848: ✅ [PICKABLE] feat: Allow subfield rename and deletion for ORC format
  • 15458: ⚠️ [CONFLICT] perf: Optimize basic numeric upcast
  • 15300: 📕 [CLOSED] feat: Add support for ORC writer
  • 16514: ✅ [PICKABLE] feat(sparksql): Support multi-character delimiters in str_to_map
  • 16547: ✅ [PICKABLE] perf(exec): Tiled column-major extraction for RowContainer
  • 16546: ✅ [PICKABLE] perf(exec): Combine low-selectivity filter results in HashProbe
  • 16545: ✅ [PICKABLE] perf(exec): Add AMAC prefetch optimization for listJoinResults

Iceberg

  • 14276: 📕 [CLOSED] feat(iceberg): Add Iceberg all functions

Json

  • 11433: 📕 [CLOSED] Fix JSON parser to allow control characters in JSON string input
  • 12892: 📕 [CLOSED] feat(sparksql): Support wildcard in json path for get_json_object function
  • 5179 : 📕 [CLOSED] Optimize get_json_object Spark function using simdjson
  • 6016 : 📕 [CLOSED] Reject duplicated keys in abstract join node
  • 14801: 📕 [CLOSED] fix: Minify JSON objects/arrays in Spark get_json_object

Regexp

  • 10279: 📕 [CLOSED] Introduce Hyperscan lib to implement regexp functions
  • 8387 : 📕 [CLOSED] Fix signature of regexp_replace Spark function and register it in Spark function registry

Spark Functions

  • 7555 : 📕 [CLOSED] Add date_format Spark function
  • 6296 : 📕 [CLOSED] Add SparkSQL url_decode function
  • 9719 : 📕 [CLOSED] Support allowPrecisionLoss in Spark decimal ops
  • 11304: 📕 [CLOSED] Register re-usable Presto date_trunc functions for Spark
  • 9714 : 📕 [CLOSED] Add session timezone getter
  • 7086 : 📕 [CLOSED] Support Spark array_union function
  • 7083 : 📕 [CLOSED] Add Spark quarter function
  • 12780: 📕 [CLOSED] feat: Add Spark to_pretty_string function
  • 12763: 📕 [CLOSED] feat: Add Spark make_dt_interval function
  • 12762: 📕 [CLOSED] feat: Add CAST(interval year month as integer)
  • 12521: 📕 [CLOSED] [Velox] Add Support for Day Time Interval Type
  • 12369: 📕 [CLOSED] feat: Add support for Timestamp to Integral for Spark
  • 12230: 📕 [CLOSED] feat: Add support for double to timestamp cast for Spark
  • 12229: 📕 [CLOSED] feat: Add Spark support to cast double to timestamp
  • 10788: 📕 [CLOSED] Add Spark split_part function
  • 8692 : 📕 [CLOSED] Add map_from_entries Spark function
  • 10359: 📕 [CLOSED] Add SparkSql function to_pretty_string
  • 12749: 📕 [CLOSED] feat: Register spark map_from_entries function
  • 11033: 📕 [CLOSED] Support sparksql approx_percentile
  • 10280: 📕 [CLOSED] feat: Register function for map_from_arrays for SparkSQL
  • 11114: 📕 [CLOSED] Support all patterns for Spark CAST(varchar as timestamp)
  • 12512: 📕 [CLOSED] fix(expr): Align cast from decimal to float/double with Spark and Presto
  • 4859 : 📕 [CLOSED] Add months_between Spark function
  • 4830 : 📕 [CLOSED] Add next_day Spark function
  • 10641: 📕 [CLOSED] Support overflow in Timestamp::toTimeZone method
  • 5419 : 📕 [CLOSED] Add substring_index spark function
  • 11126: 📕 [CLOSED] Skip overflow check for decimal add in agg function
  • 9272 : 📕 [CLOSED] Add normalize_nan Spark function
  • 8356 : 📕 [CLOSED] Add nanvl Spark function
  • 7204 : 📕 [CLOSED] Add corr Spark function

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttrackerTracker of issues in the same category

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions