Feature/faster join #148
Feature/faster join #148
Conversation
I can't see how this is faster than the previous solution? To me it looks like a refactor of it, so it must be same speed. I mean, the refactor may have some merit, but just trying to ensure if I get it right. Is there something about this change that would make it faster? |
Actually this should be slower than the previous solution as i removed the join ordering and changed the way the filter/ join condition is evaluated. Both made some assumptions i was not sure if they are right, so i try to go a safer route instead. On the other hand, this implementation plays nice with the existing testcases. The existing testcases check for string equivalence of DataSet.toString(). |
So we should close this PR then, if it's going to be slower? |
I do not think the changes i proposed about a year ago were merged, as they broke some tests. Or did i miss sth here? The proposed change is still ~10 times faster, depending on the dataset size. |
Sorry, I can't remember your previous change :-) What I am asking for is if you can point out why this is faster. I'm maybe just not seeing it. |
The previous method materializes the cartesian product and then filters it. |
Ah yes you're right. That should give a good boost in performance in deed! Functionally the code looks good to me. I am not 100% sure about what I think about the JoinHelper class. It's only made up of static methods which means that all it's methods could just as well be directly part of MetaModelHelper. And MetaModelHelper is IMO already a kind of a junkdrawer of static methods, so I don't think I want another :-) On the other hand, something that would maybe be nice is if you made the JoinHelper class non-static so that you would instantiate it like this:
(or something quite similar to this) I think then we would have a class that's quite nicely built and scoped such that it cannot be used out of context (statically). So that would be my suggestion for a more pretty design :-) |
My 2 cents regarding helpers (without having looked at the actual code): While any kind of utility or helper class is an anti pattern, if the JoinHelper centers around the same subject and MetaModelHelper does not, it's much preferable to keep them separate. IMHO, we'd want to remove methods from MetaModelHelper (to the point it ceases to be at some point), not add to it. |
Imho, would not refer to utility classs in general as an anti-pattern, you really have to look at it use-case dependent. In this case however, i agree that Metamodelhelper should be replaced. MetamodelHelper appears to do two things: I suggest for clearing things up, i move (a) first in a separate static class. From there on more profound ways for executing those operations can be defined. For dealing with (b): Some of them are not used in MM anyway, so they might be as well deleted in MM 6.x.x, may deprecate them in MM 5.x.x? |
Ok, this is literally meddling with the core of Metamodel. I'll refactor this pull request, such that is as minimally invasive as possible: The API doc also guarantees MetaModelHelper to be stable. For more experimental changes, i'll create an other pull request. |
removed guava
The point about using collections instead of arrays is a good old wishlist item :-) If you're up for it, I am 100% behind it. But it's a LOT of work I think. |
assertEquals(9, results.size()); | ||
assertTrue(results.contains("Row[values=[f, b]]")); | ||
assertTrue(results.contains("Row[values=[f, a]]")); | ||
assertTrue(results.contains("Row[values=[o, r]]")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're not asserting on order anymore, we should probably assert on the existance of all 9 records. Otherwise we've somehow easened the assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true.
There are a lot of Arrays.toString() comparisons in other test, which should not be there.
This PR looks very good to me. I'm gonna go ahead and format the files slightly and then merge it. |
A simple nested loop join implementation to deal with METAMODEL-1144.
This is an okay-ish solution:
it maintains the results of the joins, but wont scale too well
no join ordering or similar