Skip to content

[BEAM-4297] Streaming executable stage translation and operator for portable Flink runner.#5407

Merged
tweise merged 2 commits intoapache:masterfrom
tweise:BEAM-4297.flinkStreamingExecutableStage
May 29, 2018
Merged

[BEAM-4297] Streaming executable stage translation and operator for portable Flink runner.#5407
tweise merged 2 commits intoapache:masterfrom
tweise:BEAM-4297.flinkStreamingExecutableStage

Conversation

@tweise
Copy link
Contributor

@tweise tweise commented May 18, 2018

Executable stage translation for streaming mode based on the generic Flink streaming operator. Stage execution and tests adopted from batch translation.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

@tweise tweise requested review from aljoscha and lukecwik May 18, 2018 02:37
@tweise tweise force-pushed the BEAM-4297.flinkStreamingExecutableStage branch 4 times, most recently from f1127a0 to 6c8c1c7 Compare May 18, 2018 05:36
@tweise
Copy link
Contributor Author

tweise commented May 18, 2018

R: @lukecwik @bsidhom


/** Creates a mapping from PCollection id to output tag integer. */
private static BiMap<String, Integer> createOutputMap(Iterable<String> localOutputs) {
static BiMap<String, Integer> createOutputMap(Iterable<String> localOutputs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this into a shared utility class.

@tweise
Copy link
Contributor Author

tweise commented May 23, 2018

@bsidhom @lukecwik PTAL

BiMap<String, Integer> outputMap =
FlinkPipelineTranslatorUtils.createOutputMap(outputs.keySet());
Map<String, Coder<WindowedValue<?>>> outputCoders = Maps.newHashMap();
for (String localOutputName : new TreeMap<>(outputMap.inverse()).values()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using TreeSet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tree set of what? The intent here is to order the coders (the values in the map) by their tags (the keys).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have cleaned up this portion of the code.

}

String inputPCollectionId =
Iterables.getOnlyElement(transform.getInputsMap().values());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ExecutableStage and rely on using getInputPCollection().getId() method.

Ditto for updating this in the batch pipeline translator as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, we need to pass a serializable representation to operators. We could create an ExecutableStage here and then reconstruct it on the runner. In that case, we need to ensure that two ExecutableStages constructed from the same ExecutableStagePayload are always equivalent (including any synthetic ids that may be generated). That should be the case as of now, but we need to be careful.

Copy link
Member

@lukecwik lukecwik May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the way in which we construct coders and other properties of the ExecutableProcessBundleDescriptor are done during execution and it would be best if we somehow could make all these details be stable during pipeline translation so during execution it doesn't change. Having Flink rely on calling WireCoders.instantiateRunnerWireCoder(...) is an anti-pattern for encapsulation.

So if we need to construct the executable stage payload twice, we could make the contract have it be stable regardless the number of times it is constructed. I just want to push more of the input/output/coder/state/side input information upto translation time instead of having it deep within execution. In my opinion, only service (ApiServiceDescriptor) binding should happen there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, Flink needs to have an associated serializer (TypeInformation, aka Coder) with each distributed collection. This TypeInformation needs to be known at pipeline construction time. It need not match the exact coder being used to materialize elements over gRPC, but it does need to match the in-memory element type.

We could get around this partially by representing everything as bytes. The downside is that each runner-native operation that requires structure (e.g., GBK) will require an additional operation to break elements into their constituent parts. This step itself also requires knowledge of the coded type, so we ultimately run into the same issue.

Copy link
Member

@lukecwik lukecwik May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just arguing for pushing most of the manipulation done within ExecutableProcessBundleDescriptor into the ExecutableStage payload (minus the ApiServiceDescriptor binding) so it doesn't need modification. This would allow the ExecutableStage to concretely answer what are the input coders, output coders, side input coders, state coders, ... in addition to any other information.

Longer term it seems if we had a way for the runner to say whether we need a keyed input context or grouped keyed output context makes sense as the runner could then say. These are the cases I know of:

  • KV<Key, Value> for SplittableDoFn input, StatefulDoFn input, GBK input, Multimap side input materialization input, window mapping input
  • KV<Key, Iterable> for GBK output

Do you know of any others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for serialization. If we agree on the repeated instantiation of ExecutableStage, then I can take this up in a separate PR (for both, batch and streaming translation). I would do that once we have test and end-to-end coverage, right now the translators are still not wired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

throw new RuntimeException("executable stage translation not implemented");
// TODO: is this still relevant?
// we assume that the transformation does not change the windowing strategy.
RunnerApi.WindowingStrategy windowingStrategyProto =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExecutableStage's can change the windowing strategy as they may execute assign windows within. So a previous executable stage may have changed the windowing strategy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, it was a remnant of translation code from old runner (only used in case of stateful ParDo). Since key handling and output encoding will happen in the SDK harness, we probably won't need to know the windowing strategy in the operator.

// we assume that the transformation does not change the windowing strategy.
RunnerApi.WindowingStrategy windowingStrategyProto =
pipeline.getComponents().getWindowingStrategiesOrThrow(
pipeline.getComponents().getPcollectionsOrThrow(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should rely on getInputPCollection().pcollection() from the ExecutableStage that you could construct above from the payload.

Map<String, Coder<WindowedValue<?>>> outputCoders = Maps.newHashMap();
for (String localOutputName : new TreeMap<>(outputMap.inverse()).values()) {
String collectionId = outputs.get(localOutputName);
Coder<WindowedValue<?>> windowCoder = (Coder) instantiateCoder(collectionId, components);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rely on creating an ExecutableStage here and its getOutputPCollections method. You can pass forward this ExecutableStage to the ExecutableStageDoFnOperator. Anywhere below where you need to get the input pcollection (and in the future state/timer or side input information), you can get it from this object.

I can see how having access to the coders would be important here and that the ExecutableStage should contain this information without needing to bind the data/state service that the ExecutableProcessBundleDescriptor does (currently that ExecutableProcessBundleDescriptor is munging the payload and making small albeit important modifications).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we cannot simply pass the ExecutableStage onto operators here. Flink requires that operators be serializable. Operator constructors run on the client JVM, but operators are initialized via lifecycle methods on TaskManagers. For this reason, we use the executable stage payload in the batch translator. The same applies here.

We could decide to use a different serialized representation here for operator tasks, but it seems convenient here to reuse what we already have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented above, we should be able to recreate the ExecutableStage multiple times and only bind the service (ApiServiceDescriptor) information should happen upstream.

public static BiMap<String, Integer> createOutputMap(Iterable<String> localOutputs) {
ImmutableBiMap.Builder<String, Integer> builder = ImmutableBiMap.builder();
int outputIndex = 0;
for (String tag : localOutputs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort the localOutputs to get a stable indexing otherwise multiple calls to createOutputMap won't be stable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Mock private RuntimeContext runtimeContext;
@Mock private DistributedCache distributedCache;
@Mock private FlinkExecutableStageContext stageContext;
@Mock private StageBundleFactory stageBundleFactory;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier to follow the testing strategy employed here:

and use the InProcessSdkHarness TestRule to setup the tests instead of mocks.

If not, can review the tests as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mock based test is good for covering just the operator class, without other dependencies. InProcessServerFactory might be a good way to write an integration test that also covers the translator, outside of the validate runner suite. I can probably do that as follow-up.

@tweise tweise force-pushed the BEAM-4297.flinkStreamingExecutableStage branch 3 times, most recently from e44f783 to 424ed38 Compare May 29, 2018 16:23
@tweise tweise force-pushed the BEAM-4297.flinkStreamingExecutableStage branch from 424ed38 to 540d36e Compare May 29, 2018 17:37
@tweise tweise force-pushed the BEAM-4297.flinkStreamingExecutableStage branch from 540d36e to c6a0bf3 Compare May 29, 2018 18:13
@tweise tweise merged commit c63de03 into apache:master May 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants