You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously we had a way to make a tokenizer by abusing the construction of large regexp, but due to Nix 1.12 changes, we are limited in the number of states creations and thus to quadratic algorithms, due to the full-match nature of std::regex_match.
Tokenizers are made by looking for finite set of vocabulary, and thus a limited set of states. We could make a tokenize function which given a regex iterates over continuous matches of tokens, and returns a list of matches, or a way to fold these matches as they appear.
Doing so would help solve the problem seen at mozilla/nixpkgs-mozilla#40 which currently has no good solutions.
Previously we had a way to make a tokenizer by abusing the construction of large regexp, but due to Nix 1.12 changes, we are limited in the number of states creations and thus to quadratic algorithms, due to the full-match nature of std::regex_match.
Tokenizers are made by looking for finite set of vocabulary, and thus a limited set of states. We could make a
tokenize
function which given a regex iterates over continuous matches of tokens, and returns a list of matches, or a way to fold these matches as they appear.Doing so would help solve the problem seen at mozilla/nixpkgs-mozilla#40 which currently has no good solutions.
One way to implement it would be to use http://en.cppreference.com/w/cpp/regex/regex_iterator/regex_iterator , and maybe given the
std::regex_constants::match_continuous
argument.The text was updated successfully, but these errors were encountered: