New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: initial commit of execgen tool #31554

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
3 participants
@solongordon
Contributor

solongordon commented Oct 17, 2018

Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.

Release note: None

exec: initial commit of execgen tool
Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.

Release note: None

@solongordon solongordon requested a review from jordanlewis Oct 17, 2018

@solongordon solongordon requested review from cockroachdb/build-prs as code owners Oct 17, 2018

@cockroach-teamcity

This comment has been minimized.

Show comment
Hide comment
@cockroach-teamcity

cockroach-teamcity Oct 17, 2018

Member

This change is Reviewable

Member

cockroach-teamcity commented Oct 17, 2018

This change is Reviewable

@jordanlewis

Great!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/exec/.gitignore, line 1 at r1 (raw file):

rowstovec.og.go

I'd just make this *.og.go so we don't have to keep editing it


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 34 at r1 (raw file):

// EncDatumRowsToColVec converts one column from EncDatumRows to a column
// vector.

Can you add a comment explaining the columnIdx parameter? Seems like something that could end up being confusing for newcomers one day (all those index-to-index maps, for example).


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 44 at r1 (raw file):

	nRows := uint16(len(rows))
	{{range .}}
	if columnType.SemanticType == sqlbase.{{.SemanticType}} && columnType.Width == {{.Width}} {

I'm a little suspicious about this, because I don't think Width has semantic meaning for most types - just INT, DECIMAL, CHAR, and BINARY according to the comment in structured.proto.

I suppose if they're just left zeroed on both sides (in the columnConversion as well as the ColumnType) this will just be a no-op... but since we're in template land, I feel like we might as well just remove the width equality methods since we do have the chance to do so.

Finally, I think the code will be more efficient with a switch on SemanticType the way it's currently implemented in columnarizer.go. That's because switch does a binary search to find the correct case statement. We could even use a lookup table directly, since ColumnType_SemanticType (though an int32 for some reason) doesn't have many values. Then again, this is just once per batch, so maybe who cares... I'd definitely be motivated to fix this myself, but I guess we've got more to do for a prototype than to shave every last batch-level microsecond.

tl;dr this whole comment is optional for now but would definitely yield less work for the CPU to do


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 56 at r1 (raw file):

				vec.SetNull(i)
			} else {
				{{if .HasSetMethod}}

Nice - this is a lot less gross than what I was fearing, now that I see it written down. I suppose a similar approach could probably go well for the infix operators too - just a simple if.


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 88 at r1 (raw file):

}

var columnConversions = []columnConversion{

I would consider making this slice generated from a map in an init function.

The key of the map can be the actual semantic type value - e.g. ColumnType_BOOL. Protobuf provides an autogenerated map of value to name (ColumnType_SemanticType_name) that gives you back "BOOL" for ColumnType_Bool. For the ExecType, I would consider using FromColumnType on the key of the map to get the actual value of the execType, and then adding a new map to types.go similar to ColumnType_SemanticType_name that gives you back the actual variable name.

For HasSetMethod, you can use the reflect package. Do newMemColumn on the type, then reflect.TypeOf the inner column of the type, then ask it if it reflect.Implements a marker interface that you can give to all of those vector types (type HasSet interface { HasSetMarker() }).

For DatumToPhysicalFn I'm sure there's something you can do too.

Perhaps the last two are too high effort and tricky to get right, but I don't think there's a strong reason not to do it for the SemanticType and ExecType names, and I think we'll likely reuse that work in other templates.

@jordanlewis

This comment has been minimized.

Show comment
Hide comment
@jordanlewis

jordanlewis Oct 18, 2018

Member

I have a few thoughts that I'm jotting down here, don't take these comments as work to be done on the current PR though.

Instead of hardcoding the files that will be generated in main.go, like optgen does, we probably want to have that be autogenerated as well, since we'll have a long list of generated code.

I'm envisioning a registerGenerator function that each operator will call in its init function that adds a generator to a list of things that execgen can generate. That will minimize the number of things that has to change each time a new operator template is added - and I think there will be lots of these templates over time.

Also, there are some common mappings that we'll need over and over again in templates, like the name of the vector accessor for each type (currently always exactly equal to the type enum name, I guess), the semantic type, and the Go type associated with the exec type (e.g. int32 for exec.Int32). I'm hoping the columnConversions slice or something like it can grow over time into a generic all-types slice that every operator can iterate over in the same way to generate all of its variants. Not sure if it'll work out in practice, though.

Member

jordanlewis commented Oct 18, 2018

I have a few thoughts that I'm jotting down here, don't take these comments as work to be done on the current PR though.

Instead of hardcoding the files that will be generated in main.go, like optgen does, we probably want to have that be autogenerated as well, since we'll have a long list of generated code.

I'm envisioning a registerGenerator function that each operator will call in its init function that adds a generator to a list of things that execgen can generate. That will minimize the number of things that has to change each time a new operator template is added - and I think there will be lots of these templates over time.

Also, there are some common mappings that we'll need over and over again in templates, like the name of the vector accessor for each type (currently always exactly equal to the type enum name, I guess), the semantic type, and the Go type associated with the exec type (e.g. int32 for exec.Int32). I'm hoping the columnConversions slice or something like it can grow over time into a generic all-types slice that every operator can iterate over in the same way to generate all of its variants. Not sure if it'll work out in practice, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment