Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upexec: initial commit of execgen tool #31554
Conversation
solongordon
requested a review
from
jordanlewis
Oct 17, 2018
solongordon
requested review from
cockroachdb/build-prs
as
code owners
Oct 17, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jordanlewis
requested changes
Oct 17, 2018
Great!
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/sql/exec/.gitignore, line 1 at r1 (raw file):
rowstovec.og.go
I'd just make this *.og.go so we don't have to keep editing it
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 34 at r1 (raw file):
// EncDatumRowsToColVec converts one column from EncDatumRows to a column // vector.
Can you add a comment explaining the columnIdx parameter? Seems like something that could end up being confusing for newcomers one day (all those index-to-index maps, for example).
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 44 at r1 (raw file):
nRows := uint16(len(rows)) {{range .}} if columnType.SemanticType == sqlbase.{{.SemanticType}} && columnType.Width == {{.Width}} {
I'm a little suspicious about this, because I don't think Width has semantic meaning for most types - just INT, DECIMAL, CHAR, and BINARY according to the comment in structured.proto.
I suppose if they're just left zeroed on both sides (in the columnConversion as well as the ColumnType) this will just be a no-op... but since we're in template land, I feel like we might as well just remove the width equality methods since we do have the chance to do so.
Finally, I think the code will be more efficient with a switch on SemanticType the way it's currently implemented in columnarizer.go. That's because switch does a binary search to find the correct case statement. We could even use a lookup table directly, since ColumnType_SemanticType (though an int32 for some reason) doesn't have many values. Then again, this is just once per batch, so maybe who cares... I'd definitely be motivated to fix this myself, but I guess we've got more to do for a prototype than to shave every last batch-level microsecond.
tl;dr this whole comment is optional for now but would definitely yield less work for the CPU to do
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 56 at r1 (raw file):
vec.SetNull(i) } else { {{if .HasSetMethod}}
Nice - this is a lot less gross than what I was fearing, now that I see it written down. I suppose a similar approach could probably go well for the infix operators too - just a simple if.
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 88 at r1 (raw file):
} var columnConversions = []columnConversion{
I would consider making this slice generated from a map in an init function.
The key of the map can be the actual semantic type value - e.g. ColumnType_BOOL. Protobuf provides an autogenerated map of value to name (ColumnType_SemanticType_name) that gives you back "BOOL" for ColumnType_Bool. For the ExecType, I would consider using FromColumnType on the key of the map to get the actual value of the execType, and then adding a new map to types.go similar to ColumnType_SemanticType_name that gives you back the actual variable name.
For HasSetMethod, you can use the reflect package. Do newMemColumn on the type, then reflect.TypeOf the inner column of the type, then ask it if it reflect.Implements a marker interface that you can give to all of those vector types (type HasSet interface { HasSetMarker() }).
For DatumToPhysicalFn I'm sure there's something you can do too.
Perhaps the last two are too high effort and tricky to get right, but I don't think there's a strong reason not to do it for the SemanticType and ExecType names, and I think we'll likely reuse that work in other templates.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jordanlewis
Oct 18, 2018
Member
I have a few thoughts that I'm jotting down here, don't take these comments as work to be done on the current PR though.
Instead of hardcoding the files that will be generated in main.go, like optgen does, we probably want to have that be autogenerated as well, since we'll have a long list of generated code.
I'm envisioning a registerGenerator function that each operator will call in its init function that adds a generator to a list of things that execgen can generate. That will minimize the number of things that has to change each time a new operator template is added - and I think there will be lots of these templates over time.
Also, there are some common mappings that we'll need over and over again in templates, like the name of the vector accessor for each type (currently always exactly equal to the type enum name, I guess), the semantic type, and the Go type associated with the exec type (e.g. int32 for exec.Int32). I'm hoping the columnConversions slice or something like it can grow over time into a generic all-types slice that every operator can iterate over in the same way to generate all of its variants. Not sure if it'll work out in practice, though.
|
I have a few thoughts that I'm jotting down here, don't take these comments as work to be done on the current PR though. Instead of hardcoding the files that will be generated in main.go, like optgen does, we probably want to have that be autogenerated as well, since we'll have a long list of generated code. I'm envisioning a Also, there are some common mappings that we'll need over and over again in templates, like the name of the vector accessor for each type (currently always exactly equal to the type enum name, I guess), the semantic type, and the Go type associated with the exec type (e.g. |
solongordon commentedOct 17, 2018
Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.
Release note: None