Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5184] Multimap side inputs with duplicate keys and values are being lost #6257

Merged

Conversation

VaclavPlajt
Copy link
Contributor

@VaclavPlajt VaclavPlajt commented Aug 21, 2018

BEAM-5184 Side inputs aggregating multimap changed from HashMultimap to ArrayListMultimap, which allows for duplicated key-value items. Tests in ViewTest modified to exercise duplicate key-value items scenario.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- --- ---

…ap` to `ArrayListMultimap`, which allows for duplicated key-value items.

Tests in `ViewTest` modified to exercise duplicate key-value items scenario.
@timrobertson100
Copy link
Contributor

Retest this please

1 similar comment
@VaclavPlajt
Copy link
Contributor Author

Retest this please

@lukecwik
Copy link
Member

Run Java PreCommit

@lukecwik
Copy link
Member

Run Dataflow ValidatesRunner

@lukecwik
Copy link
Member

Run Java PreCommit

1 similar comment
@lukecwik
Copy link
Member

Run Java PreCommit

@VaclavPlajt
Copy link
Contributor Author

It seems that Google Cloud Dataflow Runner did not pass one of the updated tests. I tried with direct runner and it pass. Can somebody check the test?

@lukecwik
Copy link
Member

lukecwik commented Aug 22, 2018

I'll try to look at the failure in Dataflow, it's side input handling is different.

The Jenkins logs for org.apache.beam.sdk.transforms.ViewTest.testMultimapSideInputWithNonDeterministicKeyCoder fail with

Expected: iterable over [<KV{apple, 1}>, <KV{apple, 1}>, <KV{apple, 2}>, <KV{banana, 3}>, <KV{blackberry, 3}>] in any order
     but: No item matches: <KV{apple, 1}> in [<KV{apple, 2}>, <KV{apple, 1}>, <KV{blackberry, 3}>, <KV{banana, 3}>]

The error message is implying a comparison issue since all the values do exist in the actual output

@VaclavPlajt
Copy link
Contributor Author

I would say that the error message is a bit misleading. It should say that there is no second KV{apple, 1} in presented collection. The message is hard to understand.

@lukecwik
Copy link
Member

Yes your right. What I find odd is that only the one failed.

@lukecwik
Copy link
Member

I figured out that the issue is that BatchViewOverrides.java is also using a HashMultimap instead of a list based multimap like ArrayListMultimap. I'll send you a patch to add to your PR.

@lukecwik
Copy link
Member

Opened up seznam#25 against your PR branch.

Fix-up BatchViewOverrides.java to use list based multimap.
@VaclavPlajt
Copy link
Contributor Author

Run Dataflow ValidatesRunner

@VaclavPlajt
Copy link
Contributor Author

Run Java PreCommit

1 similar comment
@VaclavPlajt
Copy link
Contributor Author

Run Java PreCommit

@lukecwik lukecwik merged commit 2b777a1 into apache:master Aug 24, 2018
@lukecwik
Copy link
Member

Thanks for your contribution. This should make it out into the 2.7.0 release which is starting soon.

@VaclavPlajt
Copy link
Contributor Author

Great. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants