Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pointwise operations on arrays #49939

Closed
gingerwizard opened this issue May 17, 2023 · 4 comments · Fixed by #52625
Closed

Support pointwise operations on arrays #49939

gingerwizard opened this issue May 17, 2023 · 4 comments · Fixed by #52625
Labels
feature warmup task The task for new ClickHouse team members. Low risk, moderate complexity, no urgency.

Comments

@gingerwizard
Copy link
Contributor

Currently to do pointwise operations on arrays we have to use ArrayMap. This is unnecessarily verbose e.g.

SELECT url, caption, 
L2Distance(array_column, 
    arrayMap((x,y) -> x+y,
        arrayMap((x,y) -> x-y, [1,2,3,4], [1,2,3,4]), 
        arrayMap((x,y) -> x+y, [5,6,7,4], [1,2,3,4])
    )
) AS score FROM laion_10m ORDER BY score ASC LIMIT 10

Propose that + and - on arrays perform pointwise operations. Currently not supported.

clickhouse-cloud :) select [1,2]+[3,4]

SELECT [1, 2] + [3, 4]

Query id: dbf44020-0866-4a29-ad58-7a8b8221fa2a

0 rows in set. Elapsed: 0.220 sec.

Received exception from server (version 23.3.1):
Code: 43. DB::Exception: Received from cbupclfpbv.us-east-2.aws.clickhouse-staging.com:9440. DB::Exception: Illegal types Array(UInt8) and Array(UInt8) of arguments of function plus: While processing [1, 2] + [3, 4]. (ILLEGAL_TYPE_OF_ARGUMENT)

Usecase is ClickHouse for vector operations.

@canhld94
Copy link
Contributor

I think '+' and '-' on array is a little ambiguous. Better to have explicits function like arrayAdd(numeric_arr, arg) and arraySubtract(numeric_arr, arg) where arg can be a scalar or an array of numeric type with same length.

@alexey-milovidov alexey-milovidov added the warmup task The task for new ClickHouse team members. Low risk, moderate complexity, no urgency. label Jun 27, 2023
@alexey-milovidov
Copy link
Member

There are the following caveats:

  1. + on arrays can be confused with concatenation.

But we have a dedicated operator for concatenation: || that applies to strings and arrays. It is ANSI SQL standard.
Although MS SQL Server uses + for concatenation.

Many scripting languages use + for array concatenation. Update: I tried to check this statement, and failed multiple times:

$ node -e 'console.log([1, 2, 3] + [4, 5])'
1,2,34,5

WAT?

$ perl -e 'print [1, 2, 3] + [4, 5]'
187896031848960

WTF??

$ php -r 'print_r([1, 2, 3] + [4, 5]);'
Array
(
    [0] => 1
    [1] => 2
    [2] => 3
)

Shit.

$ python -c 'print([1, 2, 3] + [4, 5])'
[1, 2, 3, 4, 5]

👍

@alexey-milovidov
Copy link
Member

  1. If you overload + and - for arrays, the users might expect that the other arithmetic operations, like *, / also have meaning for arrays. But for multiplication, it's unclear - should it be pointwise multiplication (that does not have obvious geometric sense), or scalar product (that differs in the return type from + and -), or only apply to vector and scalar (at least it makes sense).

@alexey-milovidov
Copy link
Member

Let's overload + and - for arrays. It is easy to implement, improves usability, and does not introduce much confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature warmup task The task for new ClickHouse team members. Low risk, moderate complexity, no urgency.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants