Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11574] Enable cross-language integration tests on Dataflow #14397

Merged
merged 2 commits into from Apr 6, 2021

Conversation

youngoli
Copy link
Contributor

Contains a variety of small changes needed to enable cross-language tests to work on Dataflow. Here's a list of changes:

  1. Allow running pipelines with portable job submission (required for x-lang transforms).
  2. Allow submitting Dataflow jobs with multiple environments.
  3. Add SdkHarnessContainerImageOverrides flag to allow specifying overrides for multiple environments. This implementation works by calling the flag multiple times, once for each override.
  4. Update the ValidatesRunner script to use the above functionality (portable submission and overrides) as well as uploading a Java SDK container for cross-language in addition to a Go SDK container.
  5. Don't namespace environments in external transform expansion. This avoids bugs that occurred because an environment with an empty value was present in the final pipeline.
  6. Skip fake impulse generation in external transform expansion. Something must have changed on the expansion service end, because this is no longer necessary, and was causing problems because the fake impulse wasn't getting properly removed.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK ULR Dataflow Flink Samza Spark Twister2
Go --- --- Build Status --- Build Status ---
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@youngoli
Copy link
Contributor Author

Run Go PostCommit

Contains a variety of small changes needed to enable cross-language tests to work on Dataflow. Here's a list of changes:

1. Allow running pipelines with portable job submission (required for x-lang transforms).
2. Allow submitting Dataflow jobs with multiple environments.
3. Add SdkHarnessContainerImageOverrides flag to allow specifying overrides for multiple environments. This implementation works by calling the flag multiple times, once for each override.
4. Update the ValidatesRunner script to use the above functionality (portable submission and overrides) as well as uploading a Java SDK container for cross-language in addition to a Go SDK container.
5. Don't namespace environments in external transform expansion. This avoids bugs that occurred because an environment with an empty value was present in the final pipeline.
6. Skip fake impulse generation in external transform expansion. Something must have changed on the expansion service end, because this is no longer necessary, and was causing problems because the fake impulse wasn't getting properly removed.
@youngoli
Copy link
Contributor Author

Run Go PostCommit

@youngoli
Copy link
Contributor Author

Run GoPortable PreCommit

@youngoli
Copy link
Contributor Author

youngoli commented Apr 1, 2021

R: @lostluck

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the changes. Mostly typos and smoke checks.

}
oldImg := payload.GetContainerImage()
for pattern, replacement := range patterns {
re, err := regexp.Compile(pattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action required, just a call out to be wary of nested for loops and repeated work.

I think my main concern here is that we're compiling the patterns multiple times (each once per environment at most), which is probably not going to be terrible since this is at construction time, and both patterns and environments are going to be limited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I went ahead and fixed it anyway, and yeah I'll keep that in mind.

// ApplySdkImageOverrides takes a pipeline and a map of patterns to overrides,
// and proceeds to replace matching ContainerImages in any Environments
// present in the pipeline.
func ApplySdkImageOverrides(p *pipepb.Pipeline, patterns map[string]string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider documenting the expectation that any given environment is expected to match only a single pattern for replacement, and that if multiple patterns would match, it's arbitrary which will be applied (due to map iteration ordering being random.)

There's no good way to handle such conflict cases for multple matches. I suspect the most we can do is say it's undefined in the flag itself, as well as the commentary change in the previous paragraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also adding this to the flag description.

if t.EnvironmentId != "" {
t.EnvironmentId = addEnvironmentID(c, idMap, t.EnvironmentId, newID)
}
// TODO: Currently environments are not namespaced. This works under the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider including the JIRA tag for history context, or instead of TODO (which implies work that should likely be done) use Note: or avoid a prefix entirely, since all comments are notes to our future selves.

As for the content, it seems probable that we'd make sure the Go ExpansionService that handles these calls would do the namespacing on the response itself. That way it can build the pipeline graph normally, and we simply run the replacement there, where we are already certain that it's a foreign component, rather than the primary pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to a "note". That makes more sense than making a JIRA for the vague possibility of multiple Go environments.

"sdk_harness_container_image_override",
"Overrides for SDK harness container images. Could be for the "+
"local SDK or for a remote SDK that pipeline has to support due "+
"to a cross-language transform. Each entry consist of two values "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"to a cross-language transform. Each entry consist of two values "+
"to a cross-language transform. Each entry consists of two values "+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"separated by a comma where first value gives a regex to "+
"identify the container image to override and the second value "+
"gives the replacement container image. Multiple entries can be "+
"specified by using this flag multiple times.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming, this part matches the semantics of the equivalent flag in the other SDKs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to find the documentation for the Java version of this flag (despite being relatively sure it exists, based on this line), but the Python version works identically, it requires you to use the flag once per entry.

if err != nil {
return nil, errors.WithContext(err, "generating model pipeline")
}
err = pipelinex.ApplySdkImageOverrides(model, jobopts.GetSdkImageOverrides())
if err != nil {
return nil, errors.WithContext(err, "generating model pipeline")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, errors.WithContext(err, "generating model pipeline")
return nil, errors.WithContext(err, "applying container image overrides")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, also updated the one a few lines above.

@youngoli
Copy link
Contributor Author

youngoli commented Apr 5, 2021

Run Go Flink ValidatesRunner

@lostluck
Copy link
Contributor

lostluck commented Apr 6, 2021

There's light discussion around the format of the override flag happening, but that's not a reason to block this change from getting in.

@youngoli youngoli merged commit 9ec4582 into apache:master Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants