-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-9616] Add RegisterDoFn #12903
[BEAM-9616] Add RegisterDoFn #12903
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I have a few small comments and questions, but nothing big enough that I feel is worth blocking on.
As a caveat, I was able to generally follow the starcgenx code, but I'm not very familiar with ast so I'm liable to miss small details if there are issues there.
} | ||
|
||
var retFunc interface{} | ||
rt := reflect.TypeOf(dofn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just use the rt
from near the top of the function? It's a bit confusing right now with it named the same as a different scoped variable in the same function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that rt is out of scope and it's only present if the user passed in a reflect.Type instance. It's scoped to the if statement. It's also nearly 30 lines up, which hampers readability a bit by having an extended distance between declaration and use. This is adjacent to the uses, and that if block at the start ensures that there's only one way to interpret the rt variable here. It's the type of the dofn.
Semantically, this is identical to calling all error values err. I'm always calling arbitrary reflect.Type instances rt.
if rt.Kind() == reflect.Ptr { | ||
rt = rt.Elem() | ||
} | ||
dofn = reflect.New(rt).Interface() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for creating a new interface from a reflect type for dofn here? Is it a way to reset the dofn to it's default values by creating a new instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The graph.NewFn validator requires an actual instance of the DoFn, since graph.NewFn was designed for pipeline construction. This line creates such an instance, if what is passed in is reflect.Type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, duh. I had a brain-fart that this statement only happens if the dofn is a reflect.Type. I was imagining turning an actual DoFn instance into a reflect.Type and back. Makes complete sense now.
} | ||
sdf := (*graph.SplittableDoFn)(fn) | ||
c.pullMethod(sdf.CreateInitialRestrictionFn()) | ||
c.pullMethod(sdf.CreateTrackerFn()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that while restrictions need to be serializable, restriction trackers don't (and probably can't). Registering RTrackers is probably not a problem as long as nothing actually tries to use the registered type, but otherwise you can probably avoid registering RTrackers by removing this line pulling CreateTrackerFn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTs would be registered anyway since they show up in the ProcessElement signature. It's OK for extra types to be registered, the framework just won't automatically synthesize them like it needs to for other elements.
@@ -72,7 +72,7 @@ type SymbolResolver interface { | |||
// RegisterFunction allows function registration. It is beneficial for performance | |||
// and is needed for functions -- such as custom coders -- serialized during unit | |||
// tests, where the underlying symbol table is not available. It should be called | |||
// in init() only. Returns the external key for the function. | |||
// in `init()` only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Is removing the last sentence intentional? If so, should it also be removed in forward.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! It should be removed from forward.go! I removed it since this function doesn't return anything ever, so it was incorrect documentation.
Thanks for the review! I'm also not super adept with ast, so we're in the same boat. The short version is that it's breaking down all the go syntax things for us. I did cover a good number of reasonable cases that do work with RegisterDoFn, and we avoid the rest by making sure the documentation (the final documentation) and the examples are complete, and consistent, which will avoid bad cases that can't be handled. |
I ended up adding the usage documentation, may as well start off on the right foot with the recommended boilerplate. |
Heh no worries.
…On Fri, Sep 25, 2020, 8:18 PM Daniel Oliveira ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sdks/go/pkg/beam/core/runtime/genx/genx.go
<#12903 (comment)>:
> + for _, t := range ts {
+ runtime.RegisterType(t)
+ }
+}
+
+// registerDoFn returns all types associated with the provided DoFn.
+// If passed a functional DoFn, the first return is a Function to
+// register with runtime.RegisterFunction.
+// The second return is all types to register with runtime.RegisterType.
+// Returns an error if the passed in values are not DoFns.
+func registerDoFn(dofn interface{}) (interface{}, []reflect.Type, error) {
+ if rt, ok := dofn.(reflect.Type); ok {
+ if rt.Kind() == reflect.Ptr {
+ rt = rt.Elem()
+ }
+ dofn = reflect.New(rt).Interface()
Oh, duh. I had a brain-fart that this statement only happens if the dofn
is a reflect.Type. I was imagining turning an actual DoFn instance into a
reflect.Type and back. Makes complete sense now.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#12903 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKDOFOLZ77ACW4VD36USQTSHVMPHANCNFSM4RV745WQ>
.
|
Adds a beam.RegisterDoFn call for users to call on their DoFns instead of individually calling RegisterFunction and RegisterType for all parameter types, using reflection.
These registrations are important to ensure workers can process and lookup types and functions correctly.
This PR also lets starcgen key off of RegisterDoFn in addition to the identifiers list.
Only transforms/top has been adjusted to demonstrate, and a full go generate ./... pass run on the SDK to demonstrate that the code generator works in both modes simultaneously. In time, we'll remove the identifiers approach and require RegisterDoFn as the only way to hint to the code generator.
The extra changes are go fmt changes missed from other PRs and are benign.
TODO in later PRs:
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.