Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5354] Add side input nodes to their consumer's parent-scope. #8713

Merged
merged 1 commit into from
May 29, 2019

Conversation

lostluck
Copy link
Contributor

Some transforms require additional nodes in the Portable Pipeline Proto representation, and in particular, side inputs. The node representing the source of the Side input was previously being added as a root transform, which is OK as far as the model is concerned, but for ordering dependent runners (like Dataflow is currently), this can cause issues, since it requires nodes be specified before they're depended on (topological ordering).

This change ensures that new side input nodes are added as siblings to their consumer's scope, rather than as unique root transforms, which fits things into the pipeline hypergraph naturally.

Added additional tests to the translation from the Go SDK's internal representation, to the PPP, and modified one of the integration tests to trigger this behavior.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@lostluck
Copy link
Contributor Author

Run Go Postcommit

@lostluck
Copy link
Contributor Author

R: @youngoli @aaltay

@@ -128,7 +128,7 @@ func (m *marshaller) addScopeTree(s *ScopeTree) string {

var subtransforms []string
for _, edge := range s.Edges {
subtransforms = append(subtransforms, m.addMultiEdge(edge))
subtransforms = append(subtransforms, m.addMultiEdge(edge)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Learning question, does the addition of ... here prevents creation of a new slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... is the operator that says "unroll this slice into the variadic parameter"

The built-in function append([]T, T...) is special in that it's got additional compiler support for optimizing the behavior in cases like this. In all cases if "subtransforms" has extra capacity, no new backing array for the slice will be allocated.

Other than ensuring we're appending strings, rather than []string to a []string, so the types work out, IIRC a single append call will only lead to at most a single allocation of a new slice.
eg. The runtime will check the length of the parameter being ...'d, and only increase it by at most once.
Since append is special, I wouldn't expect the returned slice from addMultiEdge to be copied.

For other uses, in user defined functions, functionally variadics (...string) operate identically to having a slice ([]string) on the receiver side, but the caller has the option to pass in disparate parameters as well. The main restriction is variadics must be the final parameter.
I don't know if the slice is copied for those, but I do believe it woul be the slice header (the pointer to the backing array, length, and capacity), and not the whole backing array that would be copied.

There's likely nuance going on here that I'm missing, but the above should be the right shape of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation!

@aaltay aaltay merged commit 9647808 into apache:master May 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants