[pull] master from rust-lang:master#2169
Merged
pull[bot] merged 21 commits intoMu-L:masterfrom Nov 17, 2022
Merged
Conversation
Co-authored-by: joboet <jonas.boettiger@icloud.com>
Based on Wojciech Muła's "SIMD-friendly algorithms for substring searching"[0] The two-way algorithm is Big-O efficient but it needs to preprocess the needle to find a "criticla factorization" of it. This additional work is significant for short needles. Additionally it mostly advances needle.len() bytes at a time. The SIMD-based approach used here on the other hand can advance based on its vector width, which can exceed the needle length. Except for pathological cases, but due to being limited to small needles the worst case blowup is also small. benchmarks taken on a Zen2: ``` 16CGU, OLD: test str::bench_contains_short_short ... bench: 27 ns/iter (+/- 1) test str::bench_contains_short_long ... bench: 667 ns/iter (+/- 29) test str::bench_contains_bad_naive ... bench: 131 ns/iter (+/- 2) test str::bench_contains_bad_simd ... bench: 130 ns/iter (+/- 2) test str::bench_contains_equal ... bench: 148 ns/iter (+/- 4) 16CGU, NEW: test str::bench_contains_short_short ... bench: 8 ns/iter (+/- 0) test str::bench_contains_short_long ... bench: 135 ns/iter (+/- 4) test str::bench_contains_bad_naive ... bench: 130 ns/iter (+/- 2) test str::bench_contains_bad_simd ... bench: 292 ns/iter (+/- 1) test str::bench_contains_equal ... bench: 3 ns/iter (+/- 0) 1CGU, OLD: test str::bench_contains_short_short ... bench: 30 ns/iter (+/- 0) test str::bench_contains_short_long ... bench: 713 ns/iter (+/- 17) test str::bench_contains_bad_naive ... bench: 131 ns/iter (+/- 3) test str::bench_contains_bad_simd ... bench: 130 ns/iter (+/- 3) test str::bench_contains_equal ... bench: 148 ns/iter (+/- 6) 1CGU, NEW: test str::bench_contains_short_short ... bench: 10 ns/iter (+/- 0) test str::bench_contains_short_long ... bench: 111 ns/iter (+/- 0) test str::bench_contains_bad_naive ... bench: 135 ns/iter (+/- 3) test str::bench_contains_bad_simd ... bench: 274 ns/iter (+/- 2) test str::bench_contains_equal ... bench: 4 ns/iter (+/- 0) ``` [0] http://0x80.pl/articles/simd-strfind.html#sse-avx2
The Big-O is cubic, but this is only called with ~70 chars so it's still fast enough
- bump simd compare to 32bytes - import small slice compare code from memmem crate - try a few different probe bytes to avoid degenerate cases - but special-case 2-byte needles
For the next commit, `FunctionCx::codegen_*_terminator` need to take a `&mut Bx` instead of consuming a `Bx`. This triggers a cascade of similar changes across multiple functions. The resulting code is more concise and replaces many `&mut bx` expressions with `bx`.
In `codegen_assert_terminator` we decide if a BB's successor is a candidate for merging, which requires that it be the only successor, and that it only have one predecessor. That result then gets passed down, and if it reaches `funclet_br` with the appropriate BB characteristics, then no `br` instruction is issued, a `MergingSucc::True` result is passed back, and the merging proceeds in `codegen_block`. The commit also adds `CachedLlbb`, a new type to help keep track of each BB that has been merged into its predecessor.
Merge basic blocks where possible when generating LLVM IR. r? `@ghost`
x86_64 SSE2 fast-path for str.contains(&str) and short needles Based on Wojciech Muła's [SIMD-friendly algorithms for substring searching](http://0x80.pl/articles/simd-strfind.html#sse-avx2) The two-way algorithm is Big-O efficient but it needs to preprocess the needle to find a "critical factorization" of it. This additional work is significant for short needles. Additionally it mostly advances needle.len() bytes at a time. The SIMD-based approach used here on the other hand can advance based on its vector width, which can exceed the needle length. Except for pathological cases, but due to being limited to small needles the worst case blowup is also small. benchmarks taken on a Zen2, compiled with `-Ccodegen-units=1`: ``` OLD: test str::bench_contains_16b_in_long ... bench: 504 ns/iter (+/- 14) = 5061 MB/s test str::bench_contains_2b_repeated_long ... bench: 948 ns/iter (+/- 175) = 2690 MB/s test str::bench_contains_32b_in_long ... bench: 445 ns/iter (+/- 6) = 5732 MB/s test str::bench_contains_bad_naive ... bench: 130 ns/iter (+/- 1) = 569 MB/s test str::bench_contains_bad_simd ... bench: 84 ns/iter (+/- 8) = 880 MB/s test str::bench_contains_equal ... bench: 142 ns/iter (+/- 7) = 394 MB/s test str::bench_contains_short_long ... bench: 677 ns/iter (+/- 25) = 3768 MB/s test str::bench_contains_short_short ... bench: 27 ns/iter (+/- 2) = 2074 MB/s NEW: test str::bench_contains_16b_in_long ... bench: 82 ns/iter (+/- 0) = 31109 MB/s test str::bench_contains_2b_repeated_long ... bench: 73 ns/iter (+/- 0) = 34945 MB/s test str::bench_contains_32b_in_long ... bench: 71 ns/iter (+/- 1) = 35929 MB/s test str::bench_contains_bad_naive ... bench: 7 ns/iter (+/- 0) = 10571 MB/s test str::bench_contains_bad_simd ... bench: 97 ns/iter (+/- 41) = 762 MB/s test str::bench_contains_equal ... bench: 4 ns/iter (+/- 0) = 14000 MB/s test str::bench_contains_short_long ... bench: 73 ns/iter (+/- 0) = 34945 MB/s test str::bench_contains_short_short ... bench: 12 ns/iter (+/- 0) = 4666 MB/s ```
Record `LocalDefId` in HIR nodes instead of a side table This is part of an attempt to remove the `HirId -> LocalDefId` table from HIR. This attempt is a prerequisite to creation of `LocalDefId` after HIR lowering (#96840), by controlling how `def_id` information is accessed. This first part adds the information to HIR nodes themselves instead of a table. The second part is #103902 The third part will be to make `hir::Visitor::visit_fn` take a `LocalDefId` as last parameter. The fourth part will be to completely remove the side table.
Attempt to reuse `Vec<T>` backing storage for `Rc/Arc<[T]>` If a `Vec<T>` has sufficient capacity to store the inner `RcBox<[T]>`, we can just reuse the existing allocation and shift the elements up, instead of making a new allocation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )