-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
translateHashJoin should allow Joiner subclasses #46
Comments
Hi @themodernlife, Thanks for reporting this issue. I read you blog post and I'm happy that you are trying to run Scalding on Flink. This is very exciting and a good validation for Cascading-Flink. Cascading on MR or Tez uses its own implementation for joins. This implementation is generic and capable of handling all supported join types (inner, left outer, right outer, full outer, and custom joins) by handing the result of a full outer join to the Joiner class. For example, an InnerJoin discards all results with an empty left or right side. Cascading-Flink leverages Flink's internal join implementations which are only available for inner and left joins. Therefore, Cascading-Flink needs to check for the Joiner type to make sure that the Joiner is called with all tuple-pairs that will be in the join result. Hence the restriction to InnerJoin and LeftJoin. I think this issue can be solved by adding a Scalding dependency to Cascading-Flink and check for the type of the wrapped joiner if a WrappedJoiner is found. Please report any other problem you find when using Cascading-Flink, with Scalding or without. |
@fhueske can I suggest a different approach? Depending on Scalding (and associated dependencies) might be a no go for Cascading Java users. What if instead you just added a configuration parameter, something like I think this way Scalding users could say "trust me, I know what I'm doing". |
Hi @themodernlife, sorry for the late reply. I just pushed a fix to extract the wrapped joiner via Java reflection without adding a Scalding dependency. I'm not experienced with Scalding, but tested it with a Java mock-up. Please let me know, if that solves your problem. Also, let me know if you face any other issue. Thanks, Fabian |
I ran a Scalding job with a |
When running Scalding jobs on Flink (more info here: http://themodernlife.github.io/scala/hadoop/hdfs/sclading/flink/streaming/realtime/2015/12/20/running-scalding-jobs-on-apache-flink/) I noticed that the only join type that works was
Unfortunately a lot of jobs are going to use something like
Currently this fails on Flink with
The reason is because in the Scalding code for
joinWithTiny
you have this:Notice how the
Joiner
has been wrapped withWrappedJoiner
? In Cascading Flink you have this:I thought it would be simple as replacing
joiner.getClass().equals(InnerJoin.class)
withjoiner.getClass().isAssignableFrom(InnerJoin.class)
but that's not the case sinceWrappedJoiner
extendsJoiner
and notInnerJoin
(orLeftJoin
).Not sure if there is a simple solution, but wanted to get the bug in in case anyone has any ideas!
The text was updated successfully, but these errors were encountered: