New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-11574] Enable cross-language integration tests on Dataflow #14397
Conversation
Run Go PostCommit |
Contains a variety of small changes needed to enable cross-language tests to work on Dataflow. Here's a list of changes: 1. Allow running pipelines with portable job submission (required for x-lang transforms). 2. Allow submitting Dataflow jobs with multiple environments. 3. Add SdkHarnessContainerImageOverrides flag to allow specifying overrides for multiple environments. This implementation works by calling the flag multiple times, once for each override. 4. Update the ValidatesRunner script to use the above functionality (portable submission and overrides) as well as uploading a Java SDK container for cross-language in addition to a Go SDK container. 5. Don't namespace environments in external transform expansion. This avoids bugs that occurred because an environment with an empty value was present in the final pipeline. 6. Skip fake impulse generation in external transform expansion. Something must have changed on the expansion service end, because this is no longer necessary, and was causing problems because the fake impulse wasn't getting properly removed.
Run Go PostCommit |
Run GoPortable PreCommit |
R: @lostluck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after the changes. Mostly typos and smoke checks.
} | ||
oldImg := payload.GetContainerImage() | ||
for pattern, replacement := range patterns { | ||
re, err := regexp.Compile(pattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action required, just a call out to be wary of nested for loops and repeated work.
I think my main concern here is that we're compiling the patterns multiple times (each once per environment at most), which is probably not going to be terrible since this is at construction time, and both patterns and environments are going to be limited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I went ahead and fixed it anyway, and yeah I'll keep that in mind.
// ApplySdkImageOverrides takes a pipeline and a map of patterns to overrides, | ||
// and proceeds to replace matching ContainerImages in any Environments | ||
// present in the pipeline. | ||
func ApplySdkImageOverrides(p *pipepb.Pipeline, patterns map[string]string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider documenting the expectation that any given environment is expected to match only a single pattern for replacement, and that if multiple patterns would match, it's arbitrary which will be applied (due to map iteration ordering being random.)
There's no good way to handle such conflict cases for multple matches. I suspect the most we can do is say it's undefined in the flag itself, as well as the commentary change in the previous paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also adding this to the flag description.
if t.EnvironmentId != "" { | ||
t.EnvironmentId = addEnvironmentID(c, idMap, t.EnvironmentId, newID) | ||
} | ||
// TODO: Currently environments are not namespaced. This works under the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider including the JIRA tag for history context, or instead of TODO (which implies work that should likely be done) use Note: or avoid a prefix entirely, since all comments are notes to our future selves.
As for the content, it seems probable that we'd make sure the Go ExpansionService that handles these calls would do the namespacing on the response itself. That way it can build the pipeline graph normally, and we simply run the replacement there, where we are already certain that it's a foreign component, rather than the primary pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to a "note". That makes more sense than making a JIRA for the vague possibility of multiple Go environments.
"sdk_harness_container_image_override", | ||
"Overrides for SDK harness container images. Could be for the "+ | ||
"local SDK or for a remote SDK that pipeline has to support due "+ | ||
"to a cross-language transform. Each entry consist of two values "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to a cross-language transform. Each entry consist of two values "+ | |
"to a cross-language transform. Each entry consists of two values "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"separated by a comma where first value gives a regex to "+ | ||
"identify the container image to override and the second value "+ | ||
"gives the replacement container image. Multiple entries can be "+ | ||
"specified by using this flag multiple times.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming, this part matches the semantics of the equivalent flag in the other SDKs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been able to find the documentation for the Java version of this flag (despite being relatively sure it exists, based on this line), but the Python version works identically, it requires you to use the flag once per entry.
if err != nil { | ||
return nil, errors.WithContext(err, "generating model pipeline") | ||
} | ||
err = pipelinex.ApplySdkImageOverrides(model, jobopts.GetSdkImageOverrides()) | ||
if err != nil { | ||
return nil, errors.WithContext(err, "generating model pipeline") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, errors.WithContext(err, "generating model pipeline") | |
return nil, errors.WithContext(err, "applying container image overrides") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also updated the one a few lines above.
Run Go Flink ValidatesRunner |
There's light discussion around the format of the override flag happening, but that's not a reason to block this change from getting in. |
Contains a variety of small changes needed to enable cross-language tests to work on Dataflow. Here's a list of changes:
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.