-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve loading speed (of regex?) - cli usecase #56
Comments
That is quite subjective :) But I agree that loading times should be improved / tracked more closely.
This would be really nice. I believe
Regular expressions are already compiled lazily, and regex execution happens partially in parallel so I don't think there's much potential for improvement there: nlprule/nlprule/src/utils/regex.rs Lines 70 to 80 in afc2d3e
Are you sure that regex compilation is actually the bottleneck? I think the deserialization of the tagger from an FST could be the issue. Happens here: nlprule/nlprule/src/tokenizer/tag.rs Line 102 in afc2d3e
So the first step here would be to add a criterion benchmark for loading the tokenizer and rules, and then check where there's speed issues. I want to focus on modularization for now, but I'm open to contributions and if there's some identifiable bottleneck which isn't too hard to fix we can do that now (for example, the |
Note the constraint: cli - and there the load time is very significant for the end user.
I've had a quick peek into the source of regex, and that task is actually non-trivial unfortunately.
Yeah, I'll add some criterion benches eventually next week.
Just making sure I am not investing time + effort for the 🗑️ 😉 |
Right, for CLI that is definitely true.
That would be great! Some very simple benchmark + |
I took the lazy path and below is the current flamegraph of Not quite sure how to approach this, I'd like to just do all the hard work in So my question here is, how much one can expose for deserialization purposes to make that easier to swap out as needed. |
Thanks for the graph! It seems almost all of the work is indeed done in the Tagger deserialization. In general, deserializing from an FST is necessary since otherwise size of the binary blows up, but it makes sense that it's slow. Here's the problematic code: nlprule/nlprule/src/tokenizer/tag.rs Lines 102 to 153 in 12bd52f
I can give parallelizing this to some degree a go, that's not trivial here but it should help. (also, I just noticed the |
tl;dr need a good test case to improve this for real. My measurements really were only a few runs of cargo-spellcheck on the same file with different impls, and comparing the total runtime percentage of the The above code snippet seems slow due to the additional allocations, in reality, the change is insignificant and is slightly faster as is (probably due to some cache locallity) compared to: let word_store_fst = Map::new(data.word_store_fst).unwrap();
let mut word_store = FastBiMap::<String, WordIdInt>::with_capacity_and_hashers(
word_store_fst.len(),
Default::default(),
Default::default(),
);
let mut stream = word_store_fst.into_stream();
while let Some((key, value)) = stream.next() {
if let Some(key) = std::str::from_utf8(key).ok() {
word_store.insert(key.to_owned(), WordIdInt(value as u32));
}
};
|
Some more digging revealed that the data really is very sparse. We do 2 allocations for structure and the intermediate combinator key, where everything holds 1 element only in 99% of all cases. So at this point, I am tempted to say that using
which also barely used, but in |
Thanks for looking into this!
Sorry, what's the difference between these?
IIRC the reason for Also, what do you think about e.g. https://docs.rs/flurry/0.3.1/flurry/ for parallelization?
If the above is not fast enough, I think the best approach in this direction would be splitting the tag store into multiple FSTs + deserializing in parallel. That should speed things up roughly by a factor of the number |
One retains the current 2 layer inner nesting structure, the other flattens the
That could work, but the most significant part is not the insertion, but just calling
I think fixing the topology of the |
Please rebase this onto The |
That'd be much appreciated, I'll have some time tomorrow night to look into this further. |
Alright, there's better doc comments for the tagger on |
This is somewhat addressed by #66 and #70. I'll keep this open because there's currently (at least) two more things worth investigating in this area:
|
The biggest issue using this library currently is the fact, that on each startup a lot of regular expressions are compiled.
If
regex
(or whatever crate being used) implements serialization of the compiled regex this could be entirely avoided and shifted to build time / once upon a time. The current issue here is the impl of the crateregex
itself which does not implserde
serialization.In the meantime, parallel compilation of the parsed regexps would probably speed up the initial loading by a factor of $cores.
The text was updated successfully, but these errors were encountered: