Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add union and concatination for keys #74

Closed
ethanpailes opened this issue Nov 4, 2018 · 2 comments
Closed

Feature Request: add union and concatination for keys #74

ethanpailes opened this issue Nov 4, 2018 · 2 comments

Comments

@ethanpailes
Copy link

It would be nice if this crate offered some way to build up more complex key values before adding them to the FST. My specific use case is mapping byte sequences with embedded utf8 codepoints and byte sequences with an escaped ascii representation of those same codepoints to the same value. This is perfectly possible with the current API.

use fst::Map;
let both_reps_to_same = Map::from_iter(vec![("Δ", 1), (r"\u0394", 1)]);

The problem is that the size of the set of keys you have to add will grow pretty fast. Handling 2 Deltas would be:

use fst::Map;
let both_reps_to_same = Map::from_iter(vec![(r"ΔΔ", 1), (r"Δ\u0394", 1), (r"\u0394Δ", 1), (r"\u0394\u0394", 1)]);

It seems like, even though the final fst would be compact, the work to build it up would be O(2^n). It would be nice to have some API for composing keys in an fst that could avoid this. I'm thinking something like:

let m = MapBuilder::new(...)?;
let either_or = Key::alt(vec![b"Δ", b"\u0394"]);
let two_deltas = either_or.concat(either_or.clone());
m.insert_complex_key(two_deltas, 1);

I looked at the code a bit to try to figure out if there is some internal API that just isn't exposed, but I didn't see an obvious place to hook into. Is this something that would be reasonable for the fst crate to provide? It seems like a RegexSet would probably also work for my use case, but it would be nice to guarantee that the automata is compiled ahead of time and deterministic.

I apologize if I missed a part of the exiting API which already allows this.

This might be related to #58.

@BurntSushi
Copy link
Owner

What would the fst builder do that's smarter than what you can do with the public API? At the end of the day, it still has to write the keys in lexicographic order, so maybe I'm missing something, but I'm not seeing how you get out of 2^n if you want to consider all pairs of keys in a given set.

@ethanpailes
Copy link
Author

Probably nothing. It sounds like I didn't understand the internal data structures of this crate well enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants