Skip to content

feat: add vector distance and array math functions#21371

Open
crm26 wants to merge 3 commits intoapache:mainfrom
crm26:feat/vector-distance-functions
Open

feat: add vector distance and array math functions#21371
crm26 wants to merge 3 commits intoapache:mainfrom
crm26:feat/vector-distance-functions

Conversation

@crm26
Copy link
Copy Markdown

@crm26 crm26 commented Apr 4, 2026

Summary

Adds vector distance and array math functions to datafusion-functions-nested, enabling vector search and array algebra in standard SQL.

-- Vector search: find nearest neighbors by cosine distance
SELECT id, cosine_distance(embedding, ARRAY[0.1, 0.2, ...]) as dist
FROM documents ORDER BY dist LIMIT 10

-- Array math
SELECT array_normalize(embedding) FROM documents
SELECT array_add(vec_a, vec_b) FROM t
SELECT array_scale(embedding, 2.0) FROM documents

Functions

Function Returns Description
cosine_distance(a, b) float64 1 - cosine similarity
inner_product(a, b) float64 Dot product
array_normalize(a) list(float64) Unit vector
array_add(a, b) list(float64) Element-wise addition
array_subtract(a, b) list(float64) Element-wise subtraction
array_scale(a, f) list(float64) Scalar multiplication

All have list_* aliases. inner_product also aliased as dot_product.

Design

Shared primitives in vector_math.rs:

  • dot_product_f64(a, b) — used by inner_product and cosine_distance
  • magnitude_f64(a) — used by cosine_distance and array_normalize
  • sum_of_squares_f64(a) — used by magnitude_f64
  • convert_to_f64_array(a) — shared with existing array_distance

The existing distance.rs duplicate convert_to_f64_array is consolidated into the shared module.

Follows the exact pattern of the existing array_distance function: same signature style, coerce_types, null handling, and type support (Float32, Float64, Int32, Int64, FixedSizeList, LargeList, List).

Tests

79 tests including: normal inputs, null handling, zero vectors, orthogonal vectors, empty arrays, Float32/Float64, mismatched lengths, vector search ranking pattern. Sqllogictest coverage in vector_functions.slt. Clippy clean.

crm26 and others added 2 commits April 4, 2026 16:24
Add 6 new scalar functions to datafusion-functions-nested:
- cosine_distance(array, array) — cosine distance (1 - cosine similarity)
- inner_product(array, array) — dot product
- array_normalize(array) — L2 unit normalization
- array_add(array, array) — element-wise addition
- array_subtract(array, array) — element-wise subtraction
- array_scale(array, float) — scalar multiplication

Shared math primitives (dot_product, magnitude, sum_of_squares) extracted
into vector_math.rs to avoid duplication across functions.

Includes aliases (list_*, dot_product), 29 unit tests, and a sqllogictest
file with vector search pattern coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds cosine_distance, inner_product, array_normalize, array_add,
array_subtract, and array_scale to datafusion-functions-nested.

Shared primitives in vector_math.rs (dot_product_f64, magnitude_f64,
sum_of_squares_f64, convert_to_f64_array) are reused across all
functions and the existing array_distance. Consolidates the duplicate
convert_to_f64_array from distance.rs into the shared module.

Functions:
  cosine_distance(a, b) → float64    (aliases: list_cosine_distance)
  inner_product(a, b) → float64      (aliases: list_inner_product, dot_product)
  array_normalize(a) → list(float64) (aliases: list_normalize)
  array_add(a, b) → list(float64)    (aliases: list_add)
  array_subtract(a, b) → list(float64) (aliases: list_subtract)
  array_scale(a, f) → list(float64)  (aliases: list_scale)

Enables vector search in standard SQL:
  SELECT id, cosine_distance(embedding, ARRAY[0.1, 0.2, ...]) as dist
  FROM documents ORDER BY dist LIMIT 10

79 tests, sqllogictest coverage, clippy clean.
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant