-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use memchr
to speedup newline search on x86
#3985
Merged
MichaReiser
merged 3 commits into
main
from
Use_memchr_to_speedup_newline_search_on_x86
Apr 26, 2023
Merged
Use memchr
to speedup newline search on x86
#3985
MichaReiser
merged 3 commits into
main
from
Use_memchr_to_speedup_newline_search_on_x86
Apr 26, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
6 tasks
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinux
Windows
|
MichaReiser
force-pushed
the
byte-offset-parser
branch
from
April 17, 2023 06:07
0e7a8fa
to
c477216
Compare
MichaReiser
force-pushed
the
Use_memchr_to_speedup_newline_search_on_x86
branch
from
April 17, 2023 06:07
9791361
to
ebfeb11
Compare
Merged
MichaReiser
force-pushed
the
byte-offset-parser
branch
2 times, most recently
from
April 17, 2023 14:53
dc30757
to
815f484
Compare
MichaReiser
force-pushed
the
Use_memchr_to_speedup_newline_search_on_x86
branch
2 times, most recently
from
April 17, 2023 15:22
63fed61
to
322b800
Compare
MichaReiser
force-pushed
the
byte-offset-parser
branch
3 times, most recently
from
April 17, 2023 16:08
8eca22d
to
c29914a
Compare
MichaReiser
force-pushed
the
Use_memchr_to_speedup_newline_search_on_x86
branch
from
April 17, 2023 16:08
322b800
to
6d1311c
Compare
MichaReiser
force-pushed
the
byte-offset-parser
branch
2 times, most recently
from
April 18, 2023 08:30
cfe0fe1
to
78b9a89
Compare
MichaReiser
force-pushed
the
Use_memchr_to_speedup_newline_search_on_x86
branch
from
April 18, 2023 08:30
6d1311c
to
4b3c569
Compare
MichaReiser
force-pushed
the
byte-offset-parser
branch
from
April 19, 2023 06:56
7f7036b
to
c1b739d
Compare
charliermarsh
approved these changes
Apr 20, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Needs to be rebased but I reviewed the new commits individually.)
MichaReiser
force-pushed
the
Use_memchr_to_speedup_newline_search_on_x86
branch
from
April 20, 2023 05:56
4b3c569
to
c1b40a3
Compare
MichaReiser
force-pushed
the
byte-offset-parser
branch
4 times, most recently
from
April 26, 2023 17:22
90fc963
to
5995306
Compare
MichaReiser
force-pushed
the
Use_memchr_to_speedup_newline_search_on_x86
branch
from
April 26, 2023 17:23
c1b40a3
to
d22ae89
Compare
This was referenced Apr 26, 2023
MichaReiser
force-pushed
the
byte-offset-parser
branch
from
April 26, 2023 17:24
5995306
to
7893968
Compare
MichaReiser
force-pushed
the
Use_memchr_to_speedup_newline_search_on_x86
branch
from
April 26, 2023 18:17
d22ae89
to
1df35db
Compare
renovate bot
added a commit
to ixm-one/pytest-cmake-presets
that referenced
this pull request
May 2, 2023
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ruff](https://togithub.com/charliermarsh/ruff) | `^0.0.263` -> `^0.0.264` | [![age](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/compatibility-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/confidence-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>charliermarsh/ruff</summary> ### [`v0.0.264`](https://togithub.com/charliermarsh/ruff/releases/tag/v0.0.264) [Compare Source](https://togithub.com/charliermarsh/ruff/compare/v0.0.263...v0.0.264) <!-- Release notes generated using configuration in .github/release.yml at 8cb76f85eba1c970a8c800348fd1e0c874621a57 --> #### What's Changed ##### Rules - Autofix `EM101`, `EM102`, `EM103` if possible by [@​dhruvmanila](https://togithub.com/dhruvmanila) in [astral-sh/ruff#4123 - Add bugbear immutable functions as allowed in dataclasses by [@​mosauter](https://togithub.com/mosauter) in [astral-sh/ruff#4122 ##### Settings - Add support for providing command-line arguments via `argfile` by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4087 ##### Bug Fixes - Make D410/D411 autofixes mutually exclusive by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [astral-sh/ruff#4110 - Remove `pyright` comment prefix from PYI033 checks by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [astral-sh/ruff#4152 - Fix F811 false positive with match by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [astral-sh/ruff#4161 - Fix `E713` and `E714` false positives for multiple comparisons by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [astral-sh/ruff#4083 - Fix B023 shadowed variables in nested functions by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#4111 - Preserve star-handling special-casing for force-single-line by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4129 - Respect parent-scoping rules for `NamedExpr` assignments by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4145 - Fix UP032 auto-fix by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [astral-sh/ruff#4165 - Allow boolean parameters for `pytest.param` by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4176 ##### Internal - Replace row/column based `Location` with byte-offsets. by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#3931 - perf(logical-lines): Various small perf improvements by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#4022 - Use `memchr` to speedup newline search on x86 by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#3985 - Remove `ScopeStack` in favor of child-parent `ScopeId` pointers by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4138 **Full Changelog**: astral-sh/ruff@v0.0.263...v0.0.264 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/ixm-one/pytest-cmake-presets). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42Ni4zIiwidXBkYXRlZEluVmVyIjoiMzUuNjYuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Signed-off-by: Renovate Bot <bot@renovateapp.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
renovate bot
added a commit
to allenporter/flux-local
that referenced
this pull request
May 3, 2023
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ruff](https://togithub.com/charliermarsh/ruff) | `==0.0.263` -> `==0.0.264` | [![age](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/compatibility-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/confidence-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>charliermarsh/ruff</summary> ### [`v0.0.264`](https://togithub.com/charliermarsh/ruff/releases/tag/v0.0.264) [Compare Source](https://togithub.com/charliermarsh/ruff/compare/v0.0.263...v0.0.264) <!-- Release notes generated using configuration in .github/release.yml at 8cb76f85eba1c970a8c800348fd1e0c874621a57 --> #### What's Changed ##### Rules - Autofix `EM101`, `EM102`, `EM103` if possible by [@​dhruvmanila](https://togithub.com/dhruvmanila) in [astral-sh/ruff#4123 - Add bugbear immutable functions as allowed in dataclasses by [@​mosauter](https://togithub.com/mosauter) in [astral-sh/ruff#4122 ##### Settings - Add support for providing command-line arguments via `argfile` by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4087 ##### Bug Fixes - Make D410/D411 autofixes mutually exclusive by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [astral-sh/ruff#4110 - Remove `pyright` comment prefix from PYI033 checks by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [astral-sh/ruff#4152 - Fix F811 false positive with match by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [astral-sh/ruff#4161 - Fix `E713` and `E714` false positives for multiple comparisons by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [astral-sh/ruff#4083 - Fix B023 shadowed variables in nested functions by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#4111 - Preserve star-handling special-casing for force-single-line by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4129 - Respect parent-scoping rules for `NamedExpr` assignments by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4145 - Fix UP032 auto-fix by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [astral-sh/ruff#4165 - Allow boolean parameters for `pytest.param` by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4176 ##### Internal - Replace row/column based `Location` with byte-offsets. by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#3931 - perf(logical-lines): Various small perf improvements by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#4022 - Use `memchr` to speedup newline search on x86 by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#3985 - Remove `ScopeStack` in favor of child-parent `ScopeId` pointers by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4138 **Full Changelog**: astral-sh/ruff@v0.0.263...v0.0.264 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/allenporter/flux-local). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42OS4zIiwidXBkYXRlZEluVmVyIjoiMzUuNjkuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
renovate bot
added a commit
to allenporter/pyrainbird
that referenced
this pull request
May 3, 2023
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ruff](https://togithub.com/charliermarsh/ruff) | `==0.0.263` -> `==0.0.264` | [![age](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/compatibility-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/confidence-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>charliermarsh/ruff</summary> ### [`v0.0.264`](https://togithub.com/charliermarsh/ruff/releases/tag/v0.0.264) [Compare Source](https://togithub.com/charliermarsh/ruff/compare/v0.0.263...v0.0.264) <!-- Release notes generated using configuration in .github/release.yml at 8cb76f85eba1c970a8c800348fd1e0c874621a57 --> #### What's Changed ##### Rules - Autofix `EM101`, `EM102`, `EM103` if possible by [@​dhruvmanila](https://togithub.com/dhruvmanila) in [astral-sh/ruff#4123 - Add bugbear immutable functions as allowed in dataclasses by [@​mosauter](https://togithub.com/mosauter) in [astral-sh/ruff#4122 ##### Settings - Add support for providing command-line arguments via `argfile` by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4087 ##### Bug Fixes - Make D410/D411 autofixes mutually exclusive by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [astral-sh/ruff#4110 - Remove `pyright` comment prefix from PYI033 checks by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [astral-sh/ruff#4152 - Fix F811 false positive with match by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [astral-sh/ruff#4161 - Fix `E713` and `E714` false positives for multiple comparisons by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [astral-sh/ruff#4083 - Fix B023 shadowed variables in nested functions by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#4111 - Preserve star-handling special-casing for force-single-line by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4129 - Respect parent-scoping rules for `NamedExpr` assignments by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4145 - Fix UP032 auto-fix by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [astral-sh/ruff#4165 - Allow boolean parameters for `pytest.param` by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4176 ##### Internal - Replace row/column based `Location` with byte-offsets. by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#3931 - perf(logical-lines): Various small perf improvements by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#4022 - Use `memchr` to speedup newline search on x86 by [@​MichaReiser](https://togithub.com/MichaReiser) in [astral-sh/ruff#3985 - Remove `ScopeStack` in favor of child-parent `ScopeId` pointers by [@​charliermarsh](https://togithub.com/charliermarsh) in [astral-sh/ruff#4138 **Full Changelog**: astral-sh/ruff@v0.0.263...v0.0.264 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/allenporter/pyrainbird). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42OS4zIiwidXBkYXRlZEluVmVyIjoiMzUuNjkuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
BurntSushi
added a commit
that referenced
this pull request
Nov 10, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
BurntSushi
added a commit
that referenced
this pull request
Nov 10, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
BurntSushi
added a commit
that referenced
this pull request
Nov 10, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
BurntSushi
added a commit
that referenced
this pull request
Nov 27, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
BurntSushi
added a commit
that referenced
this pull request
Nov 28, 2023
Eliding bounds checks very rarely results in a meaningful performance improvement. This is usually because branch predictors are very good, and in cases like this, the branch predictor likely predicts perfectly given that the code is correct. It looks like this use of unsafe was added in #3985 as part of an optimization to use `memchr`. But `memchr` is where the real win is. Benchmarks for before and after: $ critcmp main test group main test ----- ---- ---- linter/all-rules/large/dataset.py 1.01 5.2±0.03ms 7.8 MB/sec 1.00 5.2±0.01ms 7.9 MB/sec linter/all-rules/numpy/ctypeslib.py 1.00 1375.3±6.38µs 12.1 MB/sec 1.00 1379.3±10.77µs 12.1 MB/sec linter/all-rules/numpy/globals.py 1.01 157.1±0.63µs 18.8 MB/sec 1.00 155.5±0.73µs 19.0 MB/sec linter/all-rules/pydantic/types.py 1.00 2.6±0.00ms 9.9 MB/sec 1.00 2.6±0.00ms 9.9 MB/sec linter/all-rules/unicode/pypinyin.py 1.00 647.7±1.27µs 6.5 MB/sec 1.00 647.5±3.46µs 6.5 MB/sec linter/default-rules/large/dataset.py 1.00 2.3±0.00ms 17.5 MB/sec 1.00 2.3±0.00ms 17.6 MB/sec linter/default-rules/numpy/ctypeslib.py 1.00 456.1±1.13µs 36.5 MB/sec 1.00 458.3±1.53µs 36.3 MB/sec linter/default-rules/numpy/globals.py 1.00 39.7±0.21µs 74.2 MB/sec 1.00 39.7±0.23µs 74.3 MB/sec linter/default-rules/pydantic/types.py 1.01 1005.8±5.49µs 25.4 MB/sec 1.00 995.5±2.99µs 25.6 MB/sec linter/default-rules/unicode/pypinyin.py 1.01 139.9±0.91µs 30.0 MB/sec 1.00 138.0±0.22µs 30.5 MB/sec Typically, eliding bounds checks makes the most sense when it unlocks some other kind of optimization (e.g., autovectorization).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use
memchr
to find the newline characters in strings. I expect this to improve performance on X86 processors becausememchr
usesSIMD
internally.