Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROUP BY implementation #3494

Merged
merged 4 commits into from
Dec 29, 2015
Merged

GROUP BY implementation #3494

merged 4 commits into from
Dec 29, 2015

Conversation

dt
Copy link
Member

@dt dt commented Dec 18, 2015

Review on Reviewable

@CLAassistant
Copy link

CLA assistant check
All committers have accepted the CLA.

@petermattis
Copy link
Collaborator

Nice.


Review status: 0 of 3 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


sql/group.go, line 38 [r1] (raw file):
s/addRenderer/addRender/g


sql/group.go, line 57 [r1] (raw file):
I think it would be safer to check for err != nil first. Otherwise, you might return an error but set aggregation to false and thus accidentally ignore the error.


sql/group.go, line 70 [r1] (raw file):
s/just return/returns/g


sql/group.go, line 84 [r1] (raw file):
I thought the aggregation functions correctly handled no rows being fed to them. Maybe this works differently with grouping.


sql/group.go, line 127 [r1] (raw file):
s/currnetBucket/currentBucket/g


sql/group.go, line 177 [r1] (raw file):
Add a TODO (or file an issue) that we can avoid bucketing if groupedValues are in the correct order. This will require changing desiredAggregateOrdering similar to how the desired ordering is computed for ORDER BY.


sql/group.go, line 178 [r1] (raw file):
s/values/values./g


sql/testdata/aggregate, line 301 [r1] (raw file):
Does FUNC(DISTINCT ...) interact with GROUP BY?


Comments from the review on Reviewable.io

@dt
Copy link
Member Author

dt commented Dec 18, 2015

Review status: 0 of 3 files reviewed at latest revision, 4 unresolved discussions.


sql/group.go, line 57 [r1] (raw file):
So right now, err should be ignored if aggregation is false: it is non-nil if any qvalues aren't aggregated, but that is only a real error condition if aggregation is in use. It might make sense to filter it inside checkAggregateExprs ?


sql/group.go, line 84 [r1] (raw file):
it isn't the functions themselves but rather the rendering of rows that needs this:
when not grouping, postgres returns a single row of nulls when nothing is aggregated -- when grouping, it returns 0 rows. This flag is to force the single empty row to be inserted in that first case.


sql/group.go, line 177 [r1] (raw file):
Yeah, ORDER BY doesn't work at all right now with group -- I was going to look at that next. I'll file an issue if there isn't one.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

Review status: 0 of 3 files reviewed at latest revision, 4 unresolved discussions.


sql/group.go, line 177 [r1] (raw file):
I'm referring to something else here: we can avoid using the bucket map if we are able to scan the rows and have them ordered by the group by key. For example, if you are grouping on the primary key and are scanning the primary index the rows will already be grouped together.


Comments from the review on Reviewable.io

@dt
Copy link
Member Author

dt commented Dec 18, 2015

Review status: 0 of 3 files reviewed at latest revision, 4 unresolved discussions.


sql/group.go, line 177 [r1] (raw file):
Ah, I'll add a TODO in computeAggregates mentioning that.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions.


sql/group.go, line 57 [r1] (raw file):
Yeah. As it is written, this looks error prone (even if it isn't).


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Dec 18, 2015

Reviewed 2 of 3 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 15 unresolved discussions.


sql/group.go, line 38 [r2] (raw file):
s/ them/ to them/


sql/group.go, line 55 [r2] (raw file):
I'd put this description (explaining how this works) in the method comment on checkAggregateExprs instead of here


sql/group.go, line 67 [r2] (raw file):
put this inline above?


sql/group.go, line 73 [r2] (raw file):
s/function/functions/


sql/group.go, line 96 [r2] (raw file):
Nice.


sql/group.go, line 118 [r2] (raw file):
delegate this to values?


sql/group.go, line 127 [r2] (raw file):
is this a pointer so that you can tell the difference between initialized and uninitialized? add a comment.


sql/group.go, line 164 [r2] (raw file):
instead of this (allocation), use var scratch [64]byte, then pass scratch[:0] below.


sql/group.go, line 278 [r2] (raw file):
nit: this doesn't mutate the receiver - call it something like new() rather than init


sql/group.go, line 327 [r2] (raw file):
Nice.


sql/group.go, line 399 [r2] (raw file):
Might be worth a TODO here, but this comment is great.


sql/group.go, line 471 [r2] (raw file):
can this ever happen? when?


sql/group.go, line 528 [r2] (raw file):
why is an uninitialized identAggregate equivalent to one initialized with NULL? add a comment.


sql/testdata/aggregate, line 301 [r1] (raw file):
Also, add some more complex error cases.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 15 unresolved discussions.


sql/group.go, line 164 [r2] (raw file):
Doing so will be equivalent as the array will be heap allocated. One advantage to the current approach is that we could do: scratch = encoded[0:0] so that we take advantage of any growth in the slice on subsequent iterations. Doing that, we could actually initialize scratch as var scratch []byte.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Dec 18, 2015

Review status: all files reviewed at latest revision, 15 unresolved discussions.


sql/group.go, line 164 [r2] (raw file):
Sounds good; let's do that then (var scratch []byte). For reference, from @petermattis via slack:

The only time an array stays on the stack is if it is used solely within a function or within functions that have been inlined.


Comments from the review on Reviewable.io

@dt dt force-pushed the dt.groupby branch 4 times, most recently from d820e82 to 0975a50 Compare December 18, 2015 23:44
@dt
Copy link
Member Author

dt commented Dec 18, 2015

Review status: 2 of 3 files reviewed at latest revision, 10 unresolved discussions.


sql/group.go, line 57 [r1] (raw file):
I renamed the bound err to invalidAgg to make it clearer what it is and why it isn't checked first. Also changed body of checkAggregateExprs to only return a non-nil error when aggregated is also true.


sql/group.go, line 55 [r2] (raw file):
There's a longer-form comment on it already, so I just shorted this one to reduce duplication.


sql/group.go, line 118 [r2] (raw file):
I was doing that before but undid it when I started deferring initialization of values until Next calls computeAggregates, which made this NPE since Columns is called earlier.


sql/group.go, line 278 [r2] (raw file):
renamed it create (new is a reserved word, right?)


sql/group.go, line 399 [r2] (raw file):
put one below: // TODO(davidt): consider other ways of comparing expression trees.


sql/group.go, line 471 [r2] (raw file):
It happens in the case where a bucket is rendered without ever having a row added, eg SELECT MAX(a) FROM t WHERE false; is supposed to render 1 row (of NULL).


sql/group.go, line 528 [r2] (raw file):
ah, i don't think it ever is uninitialized actually once I got everything else working correctly -- i'll take this out for now.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Dec 19, 2015

Reviewed 1 of 1 files at r3.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


sql/group.go, line 57 [r1] (raw file):
Can you name it invalidAggErr?


sql/group.go, line 118 [r2] (raw file):
What if you also called computeAggregates on a call to Columns?


sql/group.go, line 278 [r2] (raw file):
not sure, but create is fine.


Comments from the review on Reviewable.io

@tbg
Copy link
Member

tbg commented Dec 19, 2015

Review status: all files reviewed at latest revision, 12 unresolved discussions.


sql/group.go, line 141 [r3] (raw file):
doesn't really matter, but you have used TODO(dt) elsewhere in this PR.


sql/group.go, line 325 [r3] (raw file):
why can't this happen?


sql/group.go, line 368 [r3] (raw file):
what speaks against naming this err? I expect a bool with the current name.


sql/group.go, line 403 [r3] (raw file):
Can you add to the comment a simple example of something that would be valid and something that wouldn't?


sql/group.go, line 518 [r3] (raw file):
is a line var _ aggregateImpl = &identAggregate missing? Also, I'm not sure what the "ident aggregate" is.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Dec 19, 2015

Review status: all files reviewed at latest revision, 12 unresolved discussions.


sql/group.go, line 325 [r3] (raw file):
No aggregate functions take more than one argument. Type checking has already run on these expressions, which would have returned type errors for aggregate function calls with incorrect arities; if an aggregate function of the wrong arity gets here, something has gone really wrong.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 11 unresolved discussions.


sql/group.go, line 57 [r1] (raw file):
Ditto.


Comments from the review on Reviewable.io

@dt
Copy link
Member Author

dt commented Dec 21, 2015

Review status: 2 of 3 files reviewed at latest revision, 10 unresolved discussions.


sql/group.go, line 118 [r2] (raw file):
hmm, might callers be surprised if Columns() started being expensive / having side-effects? I could also just go back to initializing values earlier (so this could delegate to it) and bring back the explicit bool indicating whether the rows had been initialized.


sql/group.go, line 141 [r3] (raw file):
whoops, old habits. fixed.


sql/group.go, line 325 [r3] (raw file):
Just copy-pasted that into a comment to the same effect.


sql/group.go, line 368 [r3] (raw file):
Since it is only meaningful, and thus only checked, when aggregation is being used, I wanted to avoid a name that suggested it could be used to return arbitrary errors, since otherwise those might mistakenly be ignored when aggregation is not in use.


sql/group.go, line 518 [r3] (raw file):
Added the interface check and a comment explaining what it is and why we need it.


Comments from the review on Reviewable.io

@tbg
Copy link
Member

tbg commented Dec 21, 2015

Review status: 2 of 3 files reviewed at latest revision, 10 unresolved discussions.


sql/group.go, line 368 [r3] (raw file):
that makes sense. aggrErr?


Comments from the review on Reviewable.io

@dt dt force-pushed the dt.groupby branch 2 times, most recently from 4f8e134 to 11c2509 Compare December 21, 2015 13:47
@dt
Copy link
Member Author

dt commented Dec 21, 2015

Review status: 1 of 3 files reviewed at latest revision, 9 unresolved discussions.


sql/group.go, line 368 [r3] (raw file):
done.


sql/testdata/aggregate, line 301 [r1] (raw file):
I believe it does. DISTINCT is per-bucket, as far as I can tell.
Prefixed the seen keys with the bucket key to implement this, and added a test case that ensures same output as postgres.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

LGTM


Review status: 1 of 3 files reviewed at latest revision, 9 unresolved discussions.


sql/testdata/aggregate, line 10 [r5] (raw file):
I believe you can break this into multiple lines. The logic test parses the statement as everything up to the next blank line. That's why multi-line CREATE TABLE works.


sql/testdata/aggregate, line 44 [r5] (raw file):
Do you have a test which uses a table qualified name in one part of the query and an unqualified name in another? Something like:

SELECT UPPER(kv.s) FROM kv GROUP BY s

Comments from the review on Reviewable.io

@dt
Copy link
Member Author

dt commented Dec 21, 2015

Review status: 1 of 3 files reviewed at latest revision, 8 unresolved discussions.


sql/testdata/aggregate, line 44 [r5] (raw file):
added (with one for each variant of one side qualified, other not).


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Dec 28, 2015

Reviewed 2 of 2 files at r8.
Review status: all files reviewed at latest revision, 4 unresolved discussions.


Comments from the review on Reviewable.io

return false
aggregatedValues, groupedValues := values[:len(n.funcs)], values[len(n.funcs):]

//TODO(dt): optimization: skip buckets when underlying plan is ordered by grouped values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after // (here and elsewhere)

@dt
Copy link
Member Author

dt commented Dec 28, 2015

@tamird
Copy link
Contributor

tamird commented Dec 28, 2015

Reviewed 3 of 3 files at r10.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


sql/group.go, line 54 [r8] (raw file):
"either aggregates qvalues or ..." is this actually true? it should be possible to aggregate arbitrary non-aggregate expressions


Comments from the review on Reviewable.io

@dt
Copy link
Member Author

dt commented Dec 29, 2015

Review status: all files reviewed at latest revision, 7 unresolved discussions.


sql/group.go, line 54 [r8] (raw file):
yeah, if there are no qvalues, it is vacuously true that it aggregates "all qvalues".


sql/group.go, line 445 [r8] (raw file):
I'm not sure that i've seen an approach I like better than the current -- supposedly calls to string(myByteSlice) are optimized by the compiler if they are only temporary (eg used only locally as map keys) but relying on that feels icky.


Comments from the review on Reviewable.io

Add buckets of values to aggregateFuncs, and capture grouped qvalues.
Render buckets into rows of a valuesNode on first call to Next.

Note: ORDER BY (with GROUP BY) still unsupported (as is HAVING).
@tamird
Copy link
Contributor

tamird commented Dec 29, 2015

Reviewed 1 of 1 files at r11.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


Comments from the review on Reviewable.io

dt added a commit that referenced this pull request Dec 29, 2015
@dt dt merged commit c4f20e5 into cockroachdb:master Dec 29, 2015
@dt dt deleted the dt.groupby branch December 29, 2015 19:08
@benesch benesch added the first-pr Use to mark the first PR sent by a contributor / team member. Reviewers should be mindful of this. label May 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
first-pr Use to mark the first PR sent by a contributor / team member. Reviewers should be mindful of this.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants