You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice if this crate offered some way to build up more complex key values before adding them to the FST. My specific use case is mapping byte sequences with embedded utf8 codepoints and byte sequences with an escaped ascii representation of those same codepoints to the same value. This is perfectly possible with the current API.
use fst::Map;let both_reps_to_same = Map::from_iter(vec![("Δ", 1), (r"\u0394", 1)]);
The problem is that the size of the set of keys you have to add will grow pretty fast. Handling 2 Deltas would be:
It seems like, even though the final fst would be compact, the work to build it up would be O(2^n). It would be nice to have some API for composing keys in an fst that could avoid this. I'm thinking something like:
let m = MapBuilder::new(...)?;let either_or = Key::alt(vec![b"Δ", b"\u0394"]);let two_deltas = either_or.concat(either_or.clone());
m.insert_complex_key(two_deltas,1);
I looked at the code a bit to try to figure out if there is some internal API that just isn't exposed, but I didn't see an obvious place to hook into. Is this something that would be reasonable for the fst crate to provide? It seems like a RegexSet would probably also work for my use case, but it would be nice to guarantee that the automata is compiled ahead of time and deterministic.
I apologize if I missed a part of the exiting API which already allows this.
What would the fst builder do that's smarter than what you can do with the public API? At the end of the day, it still has to write the keys in lexicographic order, so maybe I'm missing something, but I'm not seeing how you get out of 2^n if you want to consider all pairs of keys in a given set.
It would be nice if this crate offered some way to build up more complex key values before adding them to the FST. My specific use case is mapping byte sequences with embedded utf8 codepoints and byte sequences with an escaped ascii representation of those same codepoints to the same value. This is perfectly possible with the current API.
The problem is that the size of the set of keys you have to add will grow pretty fast. Handling 2 Deltas would be:
It seems like, even though the final fst would be compact, the work to build it up would be O(2^n). It would be nice to have some API for composing keys in an fst that could avoid this. I'm thinking something like:
I looked at the code a bit to try to figure out if there is some internal API that just isn't exposed, but I didn't see an obvious place to hook into. Is this something that would be reasonable for the fst crate to provide? It seems like a RegexSet would probably also work for my use case, but it would be nice to guarantee that the automata is compiled ahead of time and deterministic.
I apologize if I missed a part of the exiting API which already allows this.
This might be related to #58.
The text was updated successfully, but these errors were encountered: