Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary thread safety fix for ICUTokenizer/Locking patches for AttributeSource #328

Merged
merged 9 commits into from Aug 24, 2020

Conversation

NightOwl888
Copy link
Contributor

This patch fixes thread safety with ICUTokenizer and makes it pass the TestRandomStrings and TestRandomHugeStrings of both TestICUTokenizer and TestICUTokenizerCJK test classes (see #269). This serves as proof that thread safety is broken in ICU4N's BreakIterator and the patch can be reverted once the thread safety issues have been addressed in ICU4N.

The ThaiTokenizer is also patched slightly better here, but it still occasionally fails the TestRandomStrings and TestRandomHugeStrings tests (see #269).

This PR also contains some performance improvements for AttributeSource that make the tests less likely to run slow due to threading contention issues or unnecessary dictionary lookups.

…try catch and replaced ContainsKey with TryGetValue
…make the get/update process of creating an attribute WeakReference atomic
… Use external lock for better performance and removed redundant GetOrAdd() call
… thread to manipulate the BreakIterator at a time. This helps, but is only a partial fix.
…w a single thread to manipulate the BreakIterator at a time. This can be reverted when the BreakIterator issue is fixed.
@NightOwl888 NightOwl888 merged commit 6eba1e0 into apache:master Aug 24, 2020
@NightOwl888 NightOwl888 added this to the 4.8.0-beta00012 milestone Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant