New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove sentinels as they race with multi threading #217
Conversation
} | ||
|
||
while (a(i, jc1) == b(j, jc2)) { | ||
// In case of match, create cross-product |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, the inner while loop that handles the cross-product is always the same
for all join-related functions. With inlining etc. I should not cost performance, and if we can measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only problem with this is that it manipulates i
and j
and thus we would either have to pass them as size_t*
or have it return e.g. a std::pair<size_t, size_t>
and kind of awkwardly update i
and j
with that. Sadly one can't assign in a structured binding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of killing code duplication, I think we should try to merge the two galloping cases. I'll give it a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea: We could handle the sentinels inside the IdTable
by using a finalizeSentinels
class, that writes a maximums beyond bounds. Afterwards we have the sentinel in the off-by-one area and can use them with the old code.
src/engine/Engine.h
Outdated
if (l2(j, jc2) > l1(i, jc1)) { | ||
Id* val = new Id[l1.cols()]; | ||
val[jc1] = l2(j, jc2); | ||
i = std::lower_bound(l1.begin() + i, l1.end(), val, | ||
[jc1](const auto& l, const auto& r) -> bool { | ||
return l[jc1] < r[jc1]; | ||
}) - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an even better idea:
Id val = l2(j, jc2);
i = std::lower_bound(l1.begin() + i, l1.end(), val,
[jc1](const auto& l, const auto& r) -> bool {
return l[jc1] < r;
}) -
- No new
- no constexpr
- always efficient and sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I've included this and also tried (see other comment) that this isn't what made the non-sentinel version faster.
Let's first see how much performance the removal of sentinels costs us. I think while the code looks cleaner they are also harder to reason about so they really need to pay for themselves with performance |
@joka921 ok this is interesting. Turns out, this branch is actually 22% faster than current master and still 18% faster than the master branch after applying your change to the use of With sentinels and new/delete:
With sentinels but no new/delete:
Without sentinels:
So it looks like sentinels are actually bad for performance. I guess without them the compiler just knows more and the branch predictor has no problems guessing the branches that one safes anyway. |
216 fewer lines of code to break with all the same features but ~22% faster. Inflationary… |
I've found another broken use of sentinels in the Text stuff. Will have to fix that tomorrow as the straightforward versions doesn't seem to work yet |
src/index/Index.cpp
Outdated
@@ -1312,7 +1312,6 @@ void Index::scanOSP(const string& object, IdTable* result) const { | |||
void Index::scanPSO(Id predicate, IdTable* result) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice finding, but you probably will run into trouble after my reworking of the index class.
Maybe cancel this change, then rebase, then add it again:)
Also add some TODOs for ideas how to get joins faster
Also without the delete we can just return
I decided to test this with valgrind and found an error in join() but also some in Simpl8bCode, Filter and the Server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,
I have one suggestion and 2-3 questions for clarification.
@@ -472,28 +459,20 @@ void OptionalJoin::optionalJoin(const IdTable& dynA, const IdTable& dynB, | |||
} | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code above you could also replace ==x.size()
by >=x.size
to be consistent.
_serverSocket.close(); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably never a problem if the Server is the only thing running in a problem but nevertheless always important to clean up open network stuff.
++nofCodeWordsDone; | ||
if (nofElementsDone < nofElements) { | ||
selector = encoded[nofCodeWordsDone] & SIMPLE8B_SELECTOR_MASK; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring!
Just to my understanding:
- The loops are almost equivalent, yours is just better readable.
- The original code had an off-by-one error (
selector = encoded[nofCodeWordsDone] & SIMPLE8B_SELECTOR_MASK;
was also executed one last time but never used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's how I understand it as well
return ::bind(_fd, res->ai_addr, res->ai_addrlen) != -1; | ||
bool success = ::bind(_fd, res->ai_addr, res->ai_addrlen) != -1; | ||
freeaddrinfo(res); | ||
return success; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't understand enough of manual socket programming to properly review this line.
Also add some TODOs for ideas how to get joins faster and we may still remove the
goto
s