Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: implement JSON{,B}_{OBJECT,}_AGG #19978

Closed
justinj opened this issue Nov 10, 2017 · 12 comments · Fixed by #48306
Closed

sql: implement JSON{,B}_{OBJECT,}_AGG #19978

justinj opened this issue Nov 10, 2017 · 12 comments · Fixed by #48306
Labels
A-sql-builtins SQL built-in functions and semantics thereof. A-sql-json JSON handling in SQL. A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-starter Might be suitable for a starter project for new employees or team members. good first issue

Comments

@justinj
Copy link
Contributor

justinj commented Nov 10, 2017

I think this would be an ambitious, but doable first issue.

There's a handful of things that need to happen for this:

  • We need a public interface from pkg/util/json to incrementally construct a
    JSON object/array. In the case of objects, since their keys need to be
    sorted, we would want to do something like push the new keys into a priority
    queue (the Go stdlib has a heap implementation). This should be tested
    independently from SQL-land. I think this will probably be useful outside of
    these aggregates so it should be a somewhat general interface.
  • Add the aggregates in aggregate_builtins.go that use these new constructors.
    These will need be careful to monitor their memory usage, see the
    implementation of ARRAY_AGG for reference.
  • Test the aggregates in SQL-land. Care should be taken to compare the
    semantics of what is implemented with what Postgres does (handling of NULL,
    duplicate keys, etc.).

This needs to happen for the next major release so I'll get to it if nobody is
interested in picking it up.

@justinj justinj added this to the 1.2 milestone Nov 10, 2017
@justinj justinj changed the title sql: implement JSON{,B}_{OBJECT,ARRAY}_AGG sql: implement JSON{,B}_{OBJECT,}_AGG Dec 5, 2017
@heyihong
Copy link
Contributor

heyihong commented Dec 6, 2017

Hi @justinj, is the issue being worked on? I would like to pick it up if possible.

@justinj
Copy link
Contributor Author

justinj commented Dec 6, 2017

@heyihong it's not being worked on, you're free to pick it up! You're probably a good person to work on it anyway since it requires your work on #20234 I believe :)

@heyihong
Copy link
Contributor

JSON_OBJECT_AGG requires support of multiple arguments in aggregation function. Related to #10495

@justinj
Copy link
Contributor Author

justinj commented Dec 15, 2017

I think that issue only relates to aggregates which have an argument which is not aggregated (say, for STRING_AGG), but I think since JSON_OBJECT_AGG aggregates over both inputs we should be able to support that type signature (though I haven't exhaustively checked). For example FINAL_VARIANCE (which is used internally) aggregates over three inputs.

@heyihong
Copy link
Contributor

heyihong commented Jan 3, 2018

I felt that aggregating keys and values separately and combining them finally to build json object seems to be a short-term solution, compared with supporting aggregate functions with secondary arguments. Is it ok to add the secondary arguments support instead?

@justinj
Copy link
Contributor Author

justinj commented Jan 8, 2018

@heyihong Sorry if I was unclear - we support all the features required to implement this. What we don't support is having another argument which is not aggregated over, like in STRING_AGG. I believe this feature should be implementable without modifying the aggregate function system.

@heyihong
Copy link
Contributor

heyihong commented Jan 31, 2018

@justinj According to the hint you gave, the way I could find is to aggregate keys and values separately and then combine them by FinalRendering. If that's the way we should implement, I still doubt that we may need to change the aggregate function system a little bit. The reason is that un like FINAL_VARIANCE , json_object_agg is order sensitive(i.e. we should avoid local stage aggregation) if rows are sorted. The current system seems to not support skipping local aggregate stage while keeping the final aggregate stage and final rendering in dist aggregation specification.

@justinj
Copy link
Contributor Author

justinj commented Feb 8, 2018

Hey @heyihong, I talked to a coworker who knows a bit more about distributed execution - he says there should be a way to do what is needed, but you might need to, worst-case, add a "passthrough" aggregator that does nothing and configure that as the local aggregator step. Does that make any sense?

@bdarnell bdarnell modified the milestones: 2.0, 2.1 Feb 8, 2018
@heyihong
Copy link
Contributor

Yes, it makes sense. Thanks for the explanation.

@knz knz added the A-sql-builtins SQL built-in functions and semantics thereof. label Apr 28, 2018
@knz knz added this to Triage in (DEPRECATED) SQL Front-end, Lang & Semantics via automation Apr 28, 2018
@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-json JSON handling in SQL. A-sql-pgcompat Semantic compatibility with PostgreSQL labels Apr 28, 2018
@knz knz moved this from Triage to Backlog in (DEPRECATED) SQL Front-end, Lang & Semantics May 3, 2018
@knz knz modified the milestones: 2.1, 2.2 Aug 30, 2018
@petermattis petermattis removed this from the 2.2 milestone Oct 5, 2018
@jordanlewis jordanlewis added the E-starter Might be suitable for a starter project for new employees or team members. label Apr 24, 2019
@C0rWin
Copy link
Contributor

C0rWin commented Apr 22, 2020

it seems to be stalled for a while, I would like to give it a try if no one minds :)

@jordanlewis
Copy link
Member

Far from it, @C0rWin! We'd love to see this get done. Please follow up with questions if you have them.

@C0rWin
Copy link
Contributor

C0rWin commented May 1, 2020

@jordanlewis I've created draft PR #48306 still missing unit test and most likely missed something else, hence would appreciate any suggestions or comments for improvement. More specifically would be glad to get some guidelines regarding unit testing.

@craig craig bot closed this as completed in 96feb13 Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-builtins SQL built-in functions and semantics thereof. A-sql-json JSON handling in SQL. A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-starter Might be suitable for a starter project for new employees or team members. good first issue
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

7 participants