Skip to content

[BEAM-4285] Extend side input handlers to handle multiple access patterns.#5814

Merged
jkff merged 2 commits intoapache:masterfrom
robertwb:non-keyed-side-inputs-master
Jun 28, 2018
Merged

[BEAM-4285] Extend side input handlers to handle multiple access patterns.#5814
jkff merged 2 commits intoapache:masterfrom
robertwb:non-keyed-side-inputs-master

Conversation

@robertwb
Copy link
Contributor

Python side inputs now work.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Spark
Go Build Status --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- ---

@robertwb robertwb requested a review from jkff June 28, 2018 17:19
Collections.emptyList());
}

private static <T> byte[] encode(T value, Coder<T> coder) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced by CoderUtils.encodeToByteArray

* be used to encode/decode their respective values.
*/
<K, V, W extends BoundedWindow> MultimapSideInputHandler<K, V, W> forSideInput(
<T, V, W extends BoundedWindow> SideInputHandler<V, W> forSideInput(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of trying to force two different access patterns into a single interface that sort of works for both by special casing the key, wouldn't it make more sense to have a separate handler interface for iterable side inputs?

The idea is that for side inputs, you can create something which delegates based upon the access pattern so you effectively have three StateRequestHandlers, one for iterable, one for multimap, and one that delegates based upon access pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this but I suggest we do it in an immediately-following-up cleanup PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually started down this road, but found myself duplicating a huge amount of code due to the lack of a common interface (short of passing the full StateRequest which seemed to break the abstraction). That's not to say it wouldn't be worth exploring more, but it seems every access pattern (geo-spacial, range-based, ...) falls naturally into the pattern of having a key that is interpreted/ignored according to the access pattern's semantics. The other alternative is to let the StateHandlerFactory interface have a handle-returning-method per-known access pattern rather than pass this down to the runners themselves, which also felt odd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If runners want to write one object that supports both types, runners should be able to have a single class which has shared state and exposes a trivial wrapper object that implements the desired interface that converts calls to the shared object.

K key;
try {
// TODO: We could skip decoding and just compare encoded values for deterministic keyCoders.
key = keyCoder.decode(new ByteArrayInputStream(keyBytes));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, could use CoderUtils.

* be used to encode/decode their respective values.
*/
<K, V, W extends BoundedWindow> MultimapSideInputHandler<K, V, W> forSideInput(
<T, V, W extends BoundedWindow> SideInputHandler<V, W> forSideInput(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this but I suggest we do it in an immediately-following-up cleanup PR.

@jkff jkff merged commit 0688774 into apache:master Jun 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants