Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sentinels as they race with multi threading #217

Merged
merged 12 commits into from Mar 29, 2019
Merged

Remove sentinels as they race with multi threading #217

merged 12 commits into from Mar 29, 2019

Conversation

niklas88
Copy link
Member

Also add some TODOs for ideas how to get joins faster and we may still remove the gotos

}

while (a(i, jc1) == b(j, jc2)) {
// In case of match, create cross-product
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, the inner while loop that handles the cross-product is always the same
for all join-related functions. With inlining etc. I should not cost performance, and if we can measure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem with this is that it manipulates i and j and thus we would either have to pass them as size_t* or have it return e.g. a std::pair<size_t, size_t> and kind of awkwardly update i and j with that. Sadly one can't assign in a structured binding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of killing code duplication, I think we should try to merge the two galloping cases. I'll give it a try

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: We could handle the sentinels inside the IdTable
by using a finalizeSentinels class, that writes a maximums beyond bounds. Afterwards we have the sentinel in the off-by-one area and can use them with the old code.

if (l2(j, jc2) > l1(i, jc1)) {
Id* val = new Id[l1.cols()];
val[jc1] = l2(j, jc2);
i = std::lower_bound(l1.begin() + i, l1.end(), val,
[jc1](const auto& l, const auto& r) -> bool {
return l[jc1] < r[jc1];
}) -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an even better idea:

Id val = l2(j, jc2);
 i = std::lower_bound(l1.begin() + i, l1.end(), val,
                             [jc1](const auto& l, const auto& r) -> bool {
                               return l[jc1] < r;
                             }) -
  • No new
  • no constexpr
  • always efficient and sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've included this and also tried (see other comment) that this isn't what made the non-sentinel version faster.

@niklas88
Copy link
Member Author

Idea: We could handle the sentinels inside the IdTable
by using a finalizeSentinels class, that writes a maximums beyond bounds. Afterwards we have the sentinel in the off-by-one area and can use them with the old code.

Let's first see how much performance the removal of sentinels costs us. I think while the code looks cleaner they are also harder to reason about so they really need to pay for themselves with performance

@niklas88
Copy link
Member Author

niklas88 commented Mar 28, 2019

@joka921 ok this is interesting. Turns out, this branch is actually 22% faster than current master and still 18% faster than the master branch after applying your change to the use of std::lower_bound

With sentinels and new/delete:

Average completion moves: 4.52 (vs 8.51 manually)
Executed 272 queries in 15032.706 seconds
with the following timings

Average: 55.27 ms/query
Max: 837.45 ms/query
90th percentile: 106.33 ms/query
70th percentile: 71.68 ms/query
50th percentile: 8.52 ms/query

With sentinels but no new/delete:

Average completion moves: 4.52 (vs 8.51 manually)
Executed 272 queries in 14579.671 seconds
with the following timings

Average: 53.60 ms/query
Max: 833.57 ms/query
90th percentile: 109.39 ms/query
70th percentile: 70.67 ms/query
50th percentile: 7.94 ms/query

Without sentinels:

Average completion moves: 4.52 (vs 8.51 manually)
Executed 287 queries in 13004.729 seconds
with the following timings

Average: 45.31 ms/query
Max: 710.91 ms/query
90th percentile: 93.13 ms/query
70th percentile: 27.60 ms/query
50th percentile: 5.30 ms/query

So it looks like sentinels are actually bad for performance. I guess without them the compiler just knows more and the branch predictor has no problems guessing the branches that one safes anyway.

@niklas88
Copy link
Member Author

216 fewer lines of code to break with all the same features but ~22% faster. Inflationary…

@niklas88
Copy link
Member Author

I've found another broken use of sentinels in the Text stuff. Will have to fix that tomorrow as the straightforward versions doesn't seem to work yet

@@ -1312,7 +1312,6 @@ void Index::scanOSP(const string& object, IdTable* result) const {
void Index::scanPSO(Id predicate, IdTable* result) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding, but you probably will run into trouble after my reworking of the index class.
Maybe cancel this change, then rebase, then add it again:)

@niklas88 niklas88 requested a review from joka921 March 29, 2019 16:12
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,
I have one suggestion and 2-3 questions for clarification.

@@ -472,28 +459,20 @@ void OptionalJoin::optionalJoin(const IdTable& dynA, const IdTable& dynB,
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code above you could also replace ==x.size() by >=x.size to be consistent.

_serverSocket.close();
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably never a problem if the Server is the only thing running in a problem but nevertheless always important to clean up open network stuff.

++nofCodeWordsDone;
if (nofElementsDone < nofElements) {
selector = encoded[nofCodeWordsDone] & SIMPLE8B_SELECTOR_MASK;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring!
Just to my understanding:

  • The loops are almost equivalent, yours is just better readable.
  • The original code had an off-by-one error (selector = encoded[nofCodeWordsDone] & SIMPLE8B_SELECTOR_MASK; was also executed one last time but never used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's how I understand it as well

return ::bind(_fd, res->ai_addr, res->ai_addrlen) != -1;
bool success = ::bind(_fd, res->ai_addr, res->ai_addrlen) != -1;
freeaddrinfo(res);
return success;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't understand enough of manual socket programming to properly review this line.

@niklas88 niklas88 merged commit 08047fa into ad-freiburg:master Mar 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants