Skip to content

Conversation

@youngoli
Copy link
Contributor

@youngoli youngoli commented Nov 4, 2020

Adds wrappers around each test xlang transform used in the examples, which show how xlang transforms are best implemented. Also adjusts the xlang fronte-end, specifically removing the misleadingly named Sink and Source versions of the methods, and adjusting the Input/Output tag naming to match and adding some helpers.

Note: Currently the CoGroupByKey, Partition, and MultiInputOutput examples are failing when run manually, however that seems unrelated to this PR. They fail identically if I run them in a clean repo. Still debugging that issue.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

Adds wrappers around each test xlang transform used in the examples, which show how xlang transforms are best implemented. Also adjusts the xlang fronte-end, specifically removing the misleadingly named Sink and Source versions of the methods, and adjusting the Input/Output tag naming to match and adding some helpers.
@youngoli
Copy link
Contributor Author

youngoli commented Nov 4, 2020

R: @lostluck
CC: @pskevin

@youngoli
Copy link
Contributor Author

youngoli commented Nov 5, 2020

Actually don't merge this yet. I ran into an issue where adding a scope (i.e. putting the external transform in a composite) causes all the examples to fail. It seems to be because of a built-in assumption in the cross-language code that the root transform of the protobuf external transform is the external transform itself, but this is false when it's in a composite, because the root transform is the composite instead.

I'm still trying to fix that, because this is an issue that would give users trouble immediately and needs to be fixed ASAP. That said, if the rest of the PR looks good I might just comment out the Scope calls with a TODO and a Jira, since that will work.

This avoids some bugs that were occurring because the existing implementation didn't account for composite transforms.
@youngoli
Copy link
Contributor Author

youngoli commented Nov 7, 2020

I worked around the scope problem by essentially removing any composite transforms from the proto representation of the external transform, before continuing on with the existing logic. I spent a while trying to more properly fix it so it would work with the composite transform in there, but it seems like there are many places in the existing logic where composites weren't accounted for, and at the end of the day I don't think there's anything gained at the moment by changing the logic to work with the composite in there.

@lostluck let me know what you think. Any issues we might hit with this workaround in the future?

@lostluck
Copy link
Contributor

lostluck commented Nov 7, 2020

We certainly should fix the problem in the future as the composite/hypergraph structure of beam pipelines is very useful for runners like Dataflow, WRT visualizations. Totally reasonable to put a TODO and punt for now. I'd need to take a look myself to have more specific advice.

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's likely we might want to lightly figure out how to "re-nest" the returned expansion within whatever scope was provided by the user. When you get to running things on Dataflow, you'll see what I mean WRT the visualizations around scoping and composites, though you're probably right that we'll need to strip that before sending things to the expansion services.


"github.com/apache/beam/sdks/go/pkg/beam/internal/errors"
jobpb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1"
"github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please abbreviate the pipeline_v1 package protos as pipepb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"google.golang.org/grpc"
)

// Expand queries the expansion service to resolve the ExpansionRequest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to expand the description here, since the expansion request is being created "in house".

// Expand submits the pipeline components and root transform to be expanded by the given expansion service.
//
// Most should call beam.CrossLanguage to access foreign transforms rather than calling this function directly.

The latter segment is probably unnecessary since it won't add things to the pipeline properly.. but we don't currently have too much documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@youngoli
Copy link
Contributor Author

youngoli commented Nov 7, 2020

I just checked the pipeline graph that gets sent to the runner, after external transforms are re-expanded (at universal.go:100), and it seems to still have the composite transform like we'd like.

I strip out the composites only from the expansion request right now. When it external transforms get replaced by the expanded versions later it seems to all happen inside the composite transform and that seems to be preserved. Although I'd like to check it in the actual Dataflow visualization at some point just to make sure there aren't any subtle issues.

@lostluck
Copy link
Contributor

lostluck commented Nov 9, 2020

Ah that makes sense! LGTM for now then, and we can re-evaluate when we see how runner visualizations treat it.

@youngoli youngoli merged commit 45ce942 into apache:master Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants