[BEAM-6350] Reuse PCollectionView when created in translators#7399
[BEAM-6350] Reuse PCollectionView when created in translators#7399dmvk merged 3 commits intoapache:masterfrom
Conversation
5326dfd to
b33dac2
Compare
157ef28 to
6db25b4
Compare
|
Run JavaPortabilityApi PreCommit |
| * @return the current (already existing or computed) value associated with the specified key | ||
| */ | ||
| @SuppressWarnings("unchecked") | ||
| public <K, V> PCollectionView<Map<K, Iterable<V>>> computeViewAsMultimapIfAbsent( |
There was a problem hiding this comment.
can we have a more readable name? eg.: asMultimap
| PViewsStore getPCollectionViewsStore(); | ||
|
|
||
| @Internal | ||
| void setPCollectionViewsStore(PViewsStore pCollectionViewsStore); |
There was a problem hiding this comment.
I reckon we don't need a separate field in pipeline options. It should be enough to keep instance property in BHJ Translator.
There was a problem hiding this comment.
Yes, I agree. But Gradle complains during build that there is a missing setter. Is there a way of circumventing it?
The PViewsStore will be skipped completely. As a conclusion of mutual discussion.
| final PCollectionView<Map<KeyT, Iterable<RightT>>> broadcastRight = | ||
| right.apply(View.asMultimap()); | ||
| return left.apply( | ||
| pViews.computeViewAsMultimapIfAbsent(right, rightKeyed); |
There was a problem hiding this comment.
I don't think comparing by pcollection is enough as we can have a different key extractor.
There was a problem hiding this comment.
Agree. But that brigs the problem of comparing lambdas for equality (again). Which we do not have good enough solution for. So I will use == equality and try to highlight it in the docs.
6db25b4 to
ee805b4
Compare
…muliplication. PCollectionViews are now stored in BroadcastHashJoinTranslator. Key extractor is taken into consideration when looking for the same views.
ee805b4 to
2b90576
Compare
| * Used to prevent multiple views to the same input PCollection. And therefore multiple broadcasts | ||
| * of the same data. | ||
| */ | ||
| private Table<PCollection<?>, UnaryFunction<?, KeyT>, PCollectionView<?>> pViews = |
There was a problem hiding this comment.
better to add final to this field
694d6db to
1ba3efb
Compare
|
Run JavaPortabilityApi PreCommit |
If for LeftJoin is used BroadcastHashJoinTranslator then from right side is created PCollectionView (as sideInput).
If we use right side in multiple joins then PCollectionView is created multiple times which is not optimal behavior.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username) to look at it.Post-Commit Tests Status (on master branch)