Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache embedding regexes #730

Merged
merged 6 commits into from May 9, 2021
Merged

Cache embedding regexes #730

merged 6 commits into from May 9, 2021

Conversation

alexmaco
Copy link
Contributor

@alexmaco alexmaco commented Feb 26, 2021

Fixes #616

This runs the embedding regexes once per line, and caches their results. The state machine then only tries to look up results by the index.

This also adds a test suite dedicated to embedding, containing only one test file, lifted from the firefox 85 codebase. This is done since adjusting the existing html test to reproduce some problems seemed more complicated.

As a side effect of these changes, counting has become more accurate around embedding (e.g. tokei preciously did not detect javascript at all in the included embedding test), so the precise behavior is expected to differ from the current master.

There are a few more miscounting issues lurking in there, but they seem to be preexisting. Eventually the last commit that simplifies the embedding test should be reverted.

@XAMPPRocky
Copy link
Owner

Thank you for your PR! I will try to review this sometime soon.

@XAMPPRocky
Copy link
Owner

Thank you for your PR! I think there's a small amount of cleanup that might be needed in the new module, but I'm still going to merge this, and we can iterate further with future PRs.

@XAMPPRocky XAMPPRocky merged commit cd7bd6d into XAMPPRocky:master May 9, 2021
kornysietsma pushed a commit to kornysietsma/tokei that referenced this pull request Nov 16, 2022
* Run cargo fmt and apply clippy lints from rust 1.51.0

* Rename can_perform_single_line_analisys to try_perform_single_line_analisys and streamline body

* Remove ops::Add impl for CodeStats; only AddAssign is meaningful and used

* Add embedding module to cache heuristic regexes used to detect embedded languages

* Add separate embedding tests; Not all cases can be execised with a single test

* Simplify html embedding test file_triggeringprincipal_frame_1.html until quoting issues are resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infinite loop on some encodings
2 participants