Perf: Skip string normalization when possible #10116

MichaReiser · 2024-02-25T10:08:36Z

Summary

Most strings don't contain any quotes or other characters that need to be normalized. This PR makes use of this fact and short circuits choose_quotes and normalize for strings that contain no such characters. This is done by having one simple loop that searches for the occurrences of any character that needs normalization. This is much faster than the complicated normalization loop.

This removes the calls to StringNormalizer almost entirely from the flamegraphs (except for users that use \r or \r\n newlines that always need normalisation ;( )

Test Plan

cargo test

codspeed-hq · 2024-02-25T10:16:00Z

CodSpeed Performance Report

Merging #10116 will improve performances by 18.79%

_{Comparing perf-string-normalization (f7a3f92) with main (15b87ea)}

Summary

⚡ 2 improvements
✅ 28 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`perf-string-normalization`	Change
⚡	`formatter[numpy/ctypeslib.py]`	10.2 ms	9.7 ms	+4.62%
⚡	`formatter[numpy/globals.py]`	1,176.1 µs	990.1 µs	+18.79%

github-actions · 2024-02-25T10:37:02Z

`ruff-ecosystem` results

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to read examples/How_to_handle_rate_limits.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: trailing comma at line 47 column 4

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to read examples/How_to_handle_rate_limits.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: trailing comma at line 47 column 4

charliermarsh · 2024-02-26T15:52:37Z

crates/ruff_python_formatter/src/string/normalize.rs

+        let raw_content = locator.slice(string.content_range());
+        let first_quote_or_normalized_char_offset = raw_content
+            .bytes()
+            .position(|b| matches!(b, b'\\' | b'"' | b'\'' | b'\r' | b'{'));


If it's not an f-string, you can omit {, I think? Similarly, if it's a raw string, you can omit \\... (In that case, you could actually use memchr3, but perhaps not worth it.)

I considered special casing but decided against it because I want to avoid the extra complexity and it doesn't have to be perfect, for as long as it avoids the "expensive" normalization for most strings.

Using memchr would be nice... but normalize is already now almost gone from the benchmark profiles. That's why I consider this as "good enough".

MichaReiser added performance Potential performance improvement formatter Related to the formatter labels Feb 25, 2024

MichaReiser force-pushed the perf-string-normalization branch from 1c7e244 to b87fbfb Compare February 25, 2024 10:26

MichaReiser marked this pull request as ready for review February 25, 2024 10:42

MichaReiser requested a review from dhruvmanila February 26, 2024 15:37

charliermarsh reviewed Feb 26, 2024

View reviewed changes

charliermarsh approved these changes Feb 26, 2024

View reviewed changes

Perf: Skip string normalization when possible

f7a3f92

MichaReiser force-pushed the perf-string-normalization branch from b87fbfb to f7a3f92 Compare February 26, 2024 17:28

MichaReiser enabled auto-merge (squash) February 26, 2024 17:28

MichaReiser merged commit 8dc22d5 into main Feb 26, 2024
17 checks passed

MichaReiser deleted the perf-string-normalization branch February 26, 2024 17:35

MichaReiser mentioned this pull request Feb 29, 2024

Fix ecosystem check for indico #10164

Merged

BrewTestBot mentioned this pull request Feb 29, 2024

ruff 0.3.0 Homebrew/homebrew-core#164613

Merged

nkxxll pushed a commit to nkxxll/ruff that referenced this pull request Mar 10, 2024

Perf: Skip string normalization when possible (astral-sh#10116)

b9d8b2d

bswck mentioned this pull request Mar 15, 2024

Bump ruff-pre-commit to v0.3.2 python-poetry/cleo#412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: Skip string normalization when possible #10116

Perf: Skip string normalization when possible #10116

MichaReiser commented Feb 25, 2024 •

edited

codspeed-hq bot commented Feb 25, 2024 •

edited

github-actions bot commented Feb 25, 2024 •

edited

charliermarsh Feb 26, 2024

MichaReiser Feb 26, 2024 •

edited

Perf: Skip string normalization when possible #10116

Perf: Skip string normalization when possible #10116

Conversation

MichaReiser commented Feb 25, 2024 • edited

Summary

Test Plan

codspeed-hq bot commented Feb 25, 2024 • edited

CodSpeed Performance Report

Merging #10116 will improve performances by 18.79%

Summary

Benchmarks breakdown

github-actions bot commented Feb 25, 2024 • edited

ruff-ecosystem results

Formatter (stable)

Formatter (preview)

charliermarsh Feb 26, 2024

Choose a reason for hiding this comment

MichaReiser Feb 26, 2024 • edited

Choose a reason for hiding this comment

MichaReiser commented Feb 25, 2024 •

edited

codspeed-hq bot commented Feb 25, 2024 •

edited

github-actions bot commented Feb 25, 2024 •

edited

`ruff-ecosystem` results

MichaReiser Feb 26, 2024 •

edited