Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a tokenize function. #1479

Closed
nbp opened this issue Jul 22, 2017 · 1 comment · Fixed by #1516
Closed

Add a tokenize function. #1479

nbp opened this issue Jul 22, 2017 · 1 comment · Fixed by #1516

Comments

@nbp
Copy link
Member

nbp commented Jul 22, 2017

Previously we had a way to make a tokenizer by abusing the construction of large regexp, but due to Nix 1.12 changes, we are limited in the number of states creations and thus to quadratic algorithms, due to the full-match nature of std::regex_match.

Tokenizers are made by looking for finite set of vocabulary, and thus a limited set of states. We could make a tokenize function which given a regex iterates over continuous matches of tokens, and returns a list of matches, or a way to fold these matches as they appear.

Doing so would help solve the problem seen at mozilla/nixpkgs-mozilla#40 which currently has no good solutions.

One way to implement it would be to use http://en.cppreference.com/w/cpp/regex/regex_iterator/regex_iterator , and maybe given the std::regex_constants::match_continuous argument.

@taktoa
Copy link
Member

taktoa commented Aug 2, 2017

Relevant: #1491

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants