New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-5867] Operator translation based on operator name #6846
[BEAM-5867] Operator translation based on operator name #6846
Conversation
@@ -34,9 +34,10 @@ dependencies { | |||
testCompile library.java.slf4j_api | |||
testCompile library.java.hamcrest_core | |||
testCompile library.java.hamcrest_library | |||
testCompile library.java.mockito_core | |||
testCompile library.java.powermock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we stick with mockito?
02b6b8d
to
5cc2cb6
Compare
5cc2cb6
to
dcd8c12
Compare
15507d0
to
cfedd93
Compare
I think this heads the right direction. We need a convenient and flexible way for user to choose which translator should be used for an operator. I'm thinking about having some kind of @mareksimunek @VaclavPlajt any thoughts about this? |
@dmvk I think it should be solved in another ticket. You could also pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look at comments and update documentation.
* @param shortName on which should operator name start | ||
* @return builder | ||
*/ | ||
public NameBasedTranslatorProvider.Builder addShortNameTranslation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that names of parameters are mismatched. Class<? extends Operator> operator
should be operatorClass
and could be just OperatorTranslator<?, ?, ?> translator
since it is an instance.
I would also thing more about the name of the method itself. Something like addNameBasedTranslation
may better describe what the mehod does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
...rc/main/java/org/apache/beam/sdk/extensions/euphoria/core/client/operator/base/Operator.java
Show resolved
Hide resolved
* {@link Builder#addShortNameTranslation(Class, OperatorTranslator, String)} translation). If no | ||
* match, then its used {@link Builder#setDefaultTranslationProvider(TranslatorProvider)}. | ||
*/ | ||
public class NameBasedTranslatorProvider implements TranslatorProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rephrase?:
`An implementation of {@link TranslatorProvider} which allows for custom translations of hand picked operators based on operator names.
It either selects specific translation or defaults to wrapped {@link TranslatorProvider}. Selection of specific translation is based on operator's name and {@link Class}. Specific translation could be added through {@link Builder#addShortNameTranslation(Class, OperatorTranslator, String)} method during build step. Names are considered match when an operator's name starts with given prefix.`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rephrased and class link little edited
@@ -343,7 +343,7 @@ Represents left join of two (left and right) datasets on given key producing sin | |||
// KV(3, "3+rat"), KV(0, "0+null"), KV(4, "4+duck"), KV(3, "3+cat"), | |||
// KV(3, "3+rat"), KV(1, "1+X")] | |||
``` | |||
Euphoria support performance optimization called 'BroadcastHashJoin' for the `LeftJoin`. User can indicate through previous operator's output hint `.output(SizeHint.FITS_IN_MEMORY)` that output `Dataset` of that operator fits in executors memory. And when the `Dataset` is used as right input, Euphoria will automatically translated `LeftJoin` as 'BroadcastHashJoin'. Broadcast join can be very efficient when joining between skewed datasets. | |||
Euphoria support performance optimization called 'BroadcastHashJoin' for the `LeftJoin`. User can indicate by operators name that right side of left join fits in executors memory (more [NameBasedTranslation](#namebasedtranslatorprovider)). Broadcast join can be very efficient when joining between skewed datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we should not write that this type of optimization is based on operators name. It is based on option to translate it differently. Some other implementation of TranslatorProvider
may do that choice based on different information than operators name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would link TranslationProviders
section here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited to more generic
@@ -362,7 +362,7 @@ Dataset<KV<Integer, String>> joined = | |||
// KV(4, "4+duck"), KV(3, "3+cat"), KV(3, "3+rat"), KV(1, "1+X"), | |||
// KV(8, "null+elephant"), KV(5, "null+mouse")] | |||
``` | |||
Euphoria support performance optimization called 'Broadcast Hash Join' for the `RightJoin`. User can indicate through previous operator's output hint `.output(SizeHint.FITS_IN_MEMORY)` that output `Dataset` of that operator fits in executors memory. And when the `Dataset` is used as left input, Euphoria will automatically translated `RightJoin` as 'Broadcast Hash Join'. Broadcast join can be very efficient when joining between skewed datasets. | |||
Euphoria support performance optimization called 'Broadcast Hash Join' for the `RightJoin`. User can indicate by operators name that left side of right join fits in executors memory (more [NameBasedTranslation](#namebasedtranslatorprovider)). Broadcast join can be very efficient when joining between skewed datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as in case of LeftJoin
.
@@ -567,6 +567,19 @@ Dataset<SomeEventObject> timeStampedEvents = | |||
//Euphoria will now know event time for each event | |||
``` | |||
|
|||
## TranslationProviders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## TranslationProviders | |
## TranslationProviders | |
The fact that Euphoria API is translated to Beam Java SDK give us option to fine tune the translation itself. Euphoria uses `TranslationProvider` to decide which `OperatorTranslator` should be used. User of Euphoria API can supply it ts own `TranslationProvider` by extending `EuphoriaOptions`. Euphoria already contains some implementations. | |
#SimpleTranslatorProvider | |
Default implementation of `TranslationProvider`. It is able to give default translators for all Euphoria operators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
included
@dmvk I like the idea of chaining (anyhow) But it would require some more attention than originally anticipated in @mareksimunek scope of current work. So would also prefer to do it as different issue. |
ee94c44
to
4dedbe0
Compare
4dedbe0
to
f046acb
Compare
Looks like this needs a rebase. |
This is now obsolete by already merged refactoring. @mareksimunek Can we close it? |
Smarter why to choose operator translation based on operators name by introducing NameBaseTranslationProviderTest. Useful e.g. when we want Join operator which fits in memory translate with
BroadcastHashJoinTranslator
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username
) to look at it.Post-Commit Tests Status (on master branch)