Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrayFold function #34155

Closed
alexey-milovidov opened this issue Jan 30, 2022 · 6 comments · Fixed by #49794
Closed

arrayFold function #34155

alexey-milovidov opened this issue Jan 30, 2022 · 6 comments · Fixed by #49794
Labels

Comments

@alexey-milovidov
Copy link
Member

See

#21589
#23248
#27270

@zhiqiang-hhhh
Copy link

Hello,i want to pick this up

@alexey-milovidov
Copy link
Member Author

alexey-milovidov commented Feb 2, 2022

Ok, it's yours.

Just to let you know - it's not an easy task.
We have to implement "vertical" application of the folding function to the column with arrays.

  1. Filter arrays that contain more than zero elements and pick first elements from every array.
  2. Apply a function to the first elements of every arrays.
  3. Repeat (1) and (2) with the second elements and so forth.
  4. Now we have columns with the results for 0-element arrays, 1-element arrays, 2-element arrays and so forth. Gather the results from these columns into the final resulting column.

@amosbird Maybe you can help @roanhe-ts with this task?

@alexey-milovidov alexey-milovidov added the st-community-taken External developer is working on that label Feb 2, 2022
@zhiqiang-hhhh
Copy link

zhiqiang-hhhh commented Feb 3, 2022

All right ...., I thought it was an easy task ...
So, is there any time limitation? I will consider the complexity with my capability in case of overdue too much.

@alexey-milovidov

@alexey-milovidov
Copy link
Member Author

No time limitation. You can spend several months if you want 👍
It should be doable maybe in a week if someone will help.

@zhiqiang-hhhh
Copy link

It should be doable maybe in a week if someone will help.

OK, I will try my best.

@zhiqiang-hhhh
Copy link

zhiqiang-hhhh commented Feb 14, 2022

@alexey-milovidov hello,I met some problems during this task, could you please help me?

First, according to definition of arrayFold, we can have pseudocode like bellow:

let acc = init
for (array in array_vector)
{
    acc = lambda(array, acc)
}
return acc;
  • acc need to be updated every time we call lambda expression

Every time, we got acc of N-element arrays, we need to eliminate some of them from acc column, and construct a new acc column. Because the eliminated acc just got its final result, no more N+1-element arrays can be calculated with them.

May be a specific example could help discuss. Say, we have a table foldDemo like this:

array1(Array) array2(Array)
[1,2,3,4] [10,20,30,40]
[4] [40]
[21,2] [10,20]
[11,20,30] [11,2,3]

the query is like:

select arrayFold((x, y, acc) -> (x + acc.1, y + acc.2), array1, array2, (0,0)) from foldDemo
  1. Filter all arrays which has at least 1 elements & calculate accumulator of 1-element array, we got
accumulator
1,10
4,40
21,10
11,11
  1. We need to resize column accumulator because row NO.2 does not have 2-element array. So the input arguments of lambda expression in next loop is
array1 array2 accumulator
2 20 (1,10)
2 20 (21,10)
20 2 (11,11)

No need to continue the processing.
In step2 on above, the resize of column accumulator is expensive (tuple copy), and we need an ordered map to store the eliminated acc tuple (4,40), or we have to insert it to the row NO.2 of result column, which is also expensive.

The problem is:

  1. whether the above steps are right? Are there any better way to Gather the results from these columns into the final resulting column.
  2. Are there any possibility to avoid resize input of lambda expression? For example we use an NULL-like array to fill the position so than lambda(null-like-array, acc) still equals to acc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants