Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid allocating in lex_decimal #7252

Merged
merged 2 commits into from
Sep 11, 2023
Merged

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Sep 9, 2023

Summary

lex_decimal_number shows up as one of the most expensive operations during lexing. This is partially due to it allocating a string before calling into f64::parse or BigInt::parse.

This PR avoids allocating a string for numbers that don't use _ or E.

Test Plan

cargo test

This improves performance for files with many numbers (dataset.py +3%)

@MichaReiser
Copy link
Member Author

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@MichaReiser MichaReiser added the parser Related to the parser label Sep 9, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 9, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I like the LexedText abstraction. I think it could even be used in f-string normalization to normalize {{/}} to {/} using LexedText::skip_char.

crates/ruff_python_parser/src/lexer.rs Outdated Show resolved Hide resolved
crates/ruff_python_parser/src/lexer.rs Outdated Show resolved Hide resolved
@MichaReiser MichaReiser enabled auto-merge (squash) September 11, 2023 06:36
@MichaReiser MichaReiser merged commit 7440e54 into main Sep 11, 2023
16 checks passed
@MichaReiser MichaReiser deleted the lex-decimal-avoid-allocating branch September 11, 2023 06:37
@@ -1236,6 +1220,49 @@ const fn is_python_whitespace(c: char) -> bool {
)
}

enum LexedText<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that the same abstraction we use for normalizing strings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lexer doesn't normalize Strings. I haven't looked into what the whole string.rs is doing (and how much of it could be moved into the lexer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant in the formatter, this was just a random thought because the logic looked familiar from the normalize functions of strings, ints and floats

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's similar but we don't use an abstraction in the formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants