Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed data race in failure state initialization #15

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

genbattle
Copy link

A data race was created when multiple threads entered parse_text at the
same time and failure state construction was triggered. The first thread
would enter the failure state construction function and begin
construction, setting a completion flag before initialization was
actually complete. Other threads would see the flag in its set state and
skip over the lock, proceeding to use the trie while the failure states
were still being modified by the first thread.

In addition to this race condition, there was also some possible
undefined behaviour introduced by the double-checked locking
implementation. This has been cleaned up a little with C++11 atomics to
make the implementation more portable, if a little slower. See
https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/
for more information.

@dkurzaj
Copy link

dkurzaj commented Nov 6, 2019

Hello,
I can not compile that PR.

I get the following errors:

aho_corasick.hpp:557:39: error: unknown type name 'd_mutex'; did you mean '__gnu_cxx::__mutex'?
aho_corasick.hpp:560:17: error: unknown type name 'my_type'
aho_corasick.hpp:560:27: error: 'this' argument to member function 'construct_failure_states' has type 'const aho_corasick::basic_trie<char>', but function is not marked const

@genbattle
Copy link
Author

Ah sorry, I just pished the patch I created, but there's more changes in our local copy that haven't been upstreamed, I'll add them to this PR.

@genbattle
Copy link
Author

Ok so the further changes make basic_tree::parse_text a cont method, and implement locking such that all const methods are thread safe, which is where this bug originally comes from. So this bug is specific to our fork of this library, and doesn't apply to the upstream version because there are no such guarantees around mutability or thread-safety.

I will add these changes into the PR, but it is up to you whether you would like to accept them or keep the behaviour as it currently is. I apologise for the mess, I'm picking up maintenance of this code after someone else has left so I'm still trying to get everything in order.

Nick Sarten added 3 commits November 7, 2019 08:56
These changes make parse_text a const method which should be safe to
use from mutliple threads at once. In reality there's a bug in this
implementation, so another patch will be applied on top of it to fix
this bug.
A data race was create when multiple threads entered `parse_text` at the
same time and failure state construction was triggered. The first thread
would enter the failure state construction function and begin
construction, setting a completion flag before initialization was
actually complete. Other threads would see the flag in its set state and
skip over the lock, proceeding to use the trie while the failure state
were still being modified by the first thread.

In addition to this race condition, there was also some possible
undefined behavior introduced by the double-checked locking
implementation. This has been cleaned up a little with C++11 atomics to
make the implementation more portable, if a little slower. See
https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/
for more information.
@genbattle
Copy link
Author

I have pushed all of the changes. As I have already said, you're free to choose not to take them if you don't think this functionality is valuable. It certainly seems to impact the benchmark performance of the library, which makes sense since there's some additional locking overhead.

master:

*** Aho-Corasick Benchmark ***
Generating input text ... done
Generating search patterns ... done
Generating trie ... done
Running .......... done
Results:
  loop #1, naive: 394ms, ac: 3ms
  loop #2, naive: 381ms, ac: 3ms
  loop #3, naive: 394ms, ac: 3ms
  loop #4, naive: 377ms, ac: 3ms
  loop #5, naive: 387ms, ac: 3ms
  loop #6, naive: 390ms, ac: 3ms
  loop #7, naive: 376ms, ac: 3ms
  loop #8, naive: 381ms, ac: 3ms
  loop #9, naive: 430ms, ac: 3ms
  loop #10, naive: 415ms, ac: 11536ms

this branch:

*** Aho-Corasick Benchmark ***
Generating input text ... done
Generating search patterns ... done
Generating trie ... done
Running .......... done
Results:
  loop #1, naive: 463ms, ac: 3ms
  loop #2, naive: 638ms, ac: 3ms
  loop #3, naive: 748ms, ac: 4ms
  loop #4, naive: 686ms, ac: 4ms
  loop #5, naive: 641ms, ac: 4ms
  loop #6, naive: 713ms, ac: 3ms
  loop #7, naive: 799ms, ac: 4ms
  loop #8, naive: 813ms, ac: 4ms
  loop #9, naive: 872ms, ac: 4ms
  loop #10, naive: 919ms, ac: 16803ms

@dkurzaj
Copy link

dkurzaj commented Nov 7, 2019

Thank you! It seems to work very well for my use case too.

And it also probably fixes: #13. 😃

I'm not owner nor reviewer, but I would say it's a good improvement. If my feedback can help merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants