Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-2106] [runtime] add Left-, Right- and Full Outer Join drivers #1052

Closed
wants to merge 3 commits into from

Conversation

r-pogalz
Copy link

This PR includes the implementation of 3 drivers for the specific outer joins and adapts the DriverStrategy for the merge based outer joins.

To make the outer join operators implemented in ticket FLINK-2105 visible to the Java and Scala API, we have started with the integration of the merge based outer join operators bottom up. To reduce duplicated code we implemented an AbstractOuterJoinDriver which is responsible for most parts of the drivers setup and running the operators. The specific outer join driver implementations (LeftOuterJoinDriver, RightOuterJoinDriver, FullOuterJoinDriver) set the corresponding Outer Join Iterators based on the ExecutionConfig. Therefore, we had to adapt the DriverStrategy adding three additional enum fields (LEFT_OUTER_MERGE, RIGHT_OUTER_MERGE, FULL_OUTER_MERGE).
Currently, we only support the outer join on a merged based strategy, but hash based will follow (FLINK-2107).

Furthermore, DriverStrategy.MERGE has been renamed to DriverStrategy.INNER_MERGE as requested by @fhueske.

Next steps of this issue will be the optimizer and Java/Scala API parts.

@r-pogalz r-pogalz changed the title [FLINK-2106] Runtime part [FLINK-2106] [runtime] add Left-, Right- and Full Outer Join drivers Aug 24, 2015
@chiwanpark
Copy link
Member

Hi, Thanks for your contribution. I'll review this and leave comments.

@tillrohrmann
Copy link
Contributor

Thanks for your contribution @r-pogalz. Could you please add some more details about the implementation to the PR description.

final TypeComparator<IT2> comparator2 = this.taskContext.getDriverComparator(1);

final TypePairComparatorFactory<IT1, IT2> pairComparatorFactory = config.getPairComparatorFactory(
this.taskContext.getUserCodeClassLoader());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line break

@tillrohrmann
Copy link
Contributor

Really good work @r-pogalz. I had only some minor comments concerning style and test cases.

I like your approach to split the implementation of FLINK-687 into multiple parts. This makes it far easier to review. Concerning the description of FLINK-2106, you haven't integrated the outer sort merge join into the optimizer and the API, yet. I guess this will happen as a next step. Maybe you can update the description of FLINK-2106 accordingly.

Other than that, the PR looks good to me :-)

@r-pogalz
Copy link
Author

Hi @tillrohrmann, thanks for your review and comments. I will work through it soon.

Regarding the ticket FLINK-2106, @fhueske suggested us to create multiple PRs for runtime, optimizer and API integration. I guess it is an unusual way, but is it possible to have more than one PR for an issue? If not, I will change the description of FLINK-2106 as you said and create two more tasks for the optimizer and API part.

@tillrohrmann
Copy link
Contributor

Hi @r-pogalz, usually you only have one PR for a ticket. Thus, it would be good to create two new sub-tasks for the API and the optimizer integration.

@fhueske
Copy link
Contributor

fhueske commented Aug 26, 2015

@r-pogalz I would implement API and optimizer together in one issue. IMO it is easier to start from the API and then work your way through API to optimizer and finally use your runtime code. However, you cannot add the API without adding the optimizer part. So, I would do both in one issue.

final double fractionAvailableMemory = config.getRelativeMemoryDriver();
final int numPages = memoryManager.computeNumberOfPages(fractionAvailableMemory);

// test minimum memory requirements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated comment

@r-pogalz r-pogalz force-pushed the feature/FLINK-2106-Runtime branch 2 times, most recently from a1331e3 to 46f825e Compare August 29, 2015 21:39
@r-pogalz
Copy link
Author

Hi @tillrohrmann, @fhueske and @chiwanpark,
thanks for your reviews and comments. I went through the classes and changed the formatting. Moreover, I refactored the tests so that exceptions are not hidden somewhere on the console anymore and changed the testCancel*() methods such that the inputs are infinite now to prevent the driver finishing before cancel() is called.

addInput(new UniformIntTupleGenerator(keyCnt2, valCnt2, true), this.serializer);

testDriver(testTask, MockFailingJoinStub.class);
Assert.fail("Driver did not forward Exception.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this assertion.
The test will fail if the test method does not throw an ExpectedTestException.

@fhueske
Copy link
Contributor

fhueske commented Sep 3, 2015

Hi @r-pogalz, @jkovacs,

thanks for the update. It's good to merge, IMO.
Either you resolve my minor comments or I can do it before merging.

Anybody else whats to have a look at this PR?

@chiwanpark
Copy link
Member

Looks good to merge. 👍

@StephanEwen
Copy link
Contributor

I just saw that the OuterJoin runtime classes/tests cause a lot of warnings when compiling. Manyly raw types, unchecked casts.

Would be good to fix those. In general, we strive for as little warnings as possible. Generic warnings should be fixed or suppressed (if impossible to fix). Otherwise they crowd the warnings output so that the actual critical warnings get lost.

@r-pogalz
Copy link
Author

r-pogalz commented Sep 4, 2015

Thanks for you comment @StephanEwen, good to know that. I tried to minimize the occurrences of warnings where possible by fixing or suppressing it for this issue. Should I also fix it for the previous issue (runtime OuterJoin operators and tests) and commit it together with this one?

@StephanEwen
Copy link
Contributor

Yeah, adding warnings fix to this PR is good.

@r-pogalz
Copy link
Author

r-pogalz commented Sep 7, 2015

The warnings fix commit is ready for review.

@fhueske
Copy link
Contributor

fhueske commented Sep 7, 2015

Looks good to me. Will merge this tomorrow, if nobody objects.
Thanks @r-pogalz and @jkovacs!

@StephanEwen
Copy link
Contributor

I did not go through all details, but from what I read it was good. Go ahead from my side...

@fhueske
Copy link
Contributor

fhueske commented Sep 8, 2015

Will merge this now

@asfgit asfgit closed this in 2da82f9 Sep 8, 2015
@fhueske
Copy link
Contributor

fhueske commented Sep 8, 2015

Thanks @r-pogalz and @jkovacs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants