[pull] master from rust-lang:master by pull[bot] · Pull Request #2169 · Mu-L/rust

pull · 2022-11-17T12:38:44Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

Co-authored-by: joboet <jonas.boettiger@icloud.com>

Based on Wojciech Muła's "SIMD-friendly algorithms for substring searching"[0] The two-way algorithm is Big-O efficient but it needs to preprocess the needle to find a "criticla factorization" of it. This additional work is significant for short needles. Additionally it mostly advances needle.len() bytes at a time. The SIMD-based approach used here on the other hand can advance based on its vector width, which can exceed the needle length. Except for pathological cases, but due to being limited to small needles the worst case blowup is also small. benchmarks taken on a Zen2: ``` 16CGU, OLD: test str::bench_contains_short_short ... bench: 27 ns/iter (+/- 1) test str::bench_contains_short_long ... bench: 667 ns/iter (+/- 29) test str::bench_contains_bad_naive ... bench: 131 ns/iter (+/- 2) test str::bench_contains_bad_simd ... bench: 130 ns/iter (+/- 2) test str::bench_contains_equal ... bench: 148 ns/iter (+/- 4) 16CGU, NEW: test str::bench_contains_short_short ... bench: 8 ns/iter (+/- 0) test str::bench_contains_short_long ... bench: 135 ns/iter (+/- 4) test str::bench_contains_bad_naive ... bench: 130 ns/iter (+/- 2) test str::bench_contains_bad_simd ... bench: 292 ns/iter (+/- 1) test str::bench_contains_equal ... bench: 3 ns/iter (+/- 0) 1CGU, OLD: test str::bench_contains_short_short ... bench: 30 ns/iter (+/- 0) test str::bench_contains_short_long ... bench: 713 ns/iter (+/- 17) test str::bench_contains_bad_naive ... bench: 131 ns/iter (+/- 3) test str::bench_contains_bad_simd ... bench: 130 ns/iter (+/- 3) test str::bench_contains_equal ... bench: 148 ns/iter (+/- 6) 1CGU, NEW: test str::bench_contains_short_short ... bench: 10 ns/iter (+/- 0) test str::bench_contains_short_long ... bench: 111 ns/iter (+/- 0) test str::bench_contains_bad_naive ... bench: 135 ns/iter (+/- 3) test str::bench_contains_bad_simd ... bench: 274 ns/iter (+/- 2) test str::bench_contains_equal ... bench: 4 ns/iter (+/- 0) ``` [0] http://0x80.pl/articles/simd-strfind.html#sse-avx2

The Big-O is cubic, but this is only called with ~70 chars so it's still fast enough

- bump simd compare to 32bytes - import small slice compare code from memmem crate - try a few different probe bytes to avoid degenerate cases - but special-case 2-byte needles

For the next commit, `FunctionCx::codegen_*_terminator` need to take a `&mut Bx` instead of consuming a `Bx`. This triggers a cascade of similar changes across multiple functions. The resulting code is more concise and replaces many `&mut bx` expressions with `bx`.

In `codegen_assert_terminator` we decide if a BB's successor is a candidate for merging, which requires that it be the only successor, and that it only have one predecessor. That result then gets passed down, and if it reaches `funclet_br` with the appropriate BB characteristics, then no `br` instruction is issued, a `MergingSucc::True` result is passed back, and the merging proceeds in `codegen_block`. The commit also adds `CachedLlbb`, a new type to help keep track of each BB that has been merged into its predecessor.

Merge basic blocks where possible when generating LLVM IR. r? `@ghost`

x86_64 SSE2 fast-path for str.contains(&str) and short needles Based on Wojciech Muła's [SIMD-friendly algorithms for substring searching](http://0x80.pl/articles/simd-strfind.html#sse-avx2) The two-way algorithm is Big-O efficient but it needs to preprocess the needle to find a "critical factorization" of it. This additional work is significant for short needles. Additionally it mostly advances needle.len() bytes at a time. The SIMD-based approach used here on the other hand can advance based on its vector width, which can exceed the needle length. Except for pathological cases, but due to being limited to small needles the worst case blowup is also small. benchmarks taken on a Zen2, compiled with `-Ccodegen-units=1`: ``` OLD: test str::bench_contains_16b_in_long ... bench: 504 ns/iter (+/- 14) = 5061 MB/s test str::bench_contains_2b_repeated_long ... bench: 948 ns/iter (+/- 175) = 2690 MB/s test str::bench_contains_32b_in_long ... bench: 445 ns/iter (+/- 6) = 5732 MB/s test str::bench_contains_bad_naive ... bench: 130 ns/iter (+/- 1) = 569 MB/s test str::bench_contains_bad_simd ... bench: 84 ns/iter (+/- 8) = 880 MB/s test str::bench_contains_equal ... bench: 142 ns/iter (+/- 7) = 394 MB/s test str::bench_contains_short_long ... bench: 677 ns/iter (+/- 25) = 3768 MB/s test str::bench_contains_short_short ... bench: 27 ns/iter (+/- 2) = 2074 MB/s NEW: test str::bench_contains_16b_in_long ... bench: 82 ns/iter (+/- 0) = 31109 MB/s test str::bench_contains_2b_repeated_long ... bench: 73 ns/iter (+/- 0) = 34945 MB/s test str::bench_contains_32b_in_long ... bench: 71 ns/iter (+/- 1) = 35929 MB/s test str::bench_contains_bad_naive ... bench: 7 ns/iter (+/- 0) = 10571 MB/s test str::bench_contains_bad_simd ... bench: 97 ns/iter (+/- 41) = 762 MB/s test str::bench_contains_equal ... bench: 4 ns/iter (+/- 0) = 14000 MB/s test str::bench_contains_short_long ... bench: 73 ns/iter (+/- 0) = 34945 MB/s test str::bench_contains_short_short ... bench: 12 ns/iter (+/- 0) = 4666 MB/s ```

Record `LocalDefId` in HIR nodes instead of a side table This is part of an attempt to remove the `HirId -> LocalDefId` table from HIR. This attempt is a prerequisite to creation of `LocalDefId` after HIR lowering (#96840), by controlling how `def_id` information is accessed. This first part adds the information to HIR nodes themselves instead of a table. The second part is #103902 The third part will be to make `hir::Visitor::visit_fn` take a `LocalDefId` as last parameter. The fourth part will be to completely remove the side table.

Attempt to reuse `Vec<T>` backing storage for `Rc/Arc<[T]>` If a `Vec<T>` has sufficient capacity to store the inner `RcBox<[T]>`, we can just reuse the existing allocation and shift the elements up, instead of making a new allocation.

cjgillot and others added 21 commits November 13, 2022 14:02

Ensure codegen_fn_attrs during collection.

dba1503

Store LocalDefId in hir::Closure.

290f078

Make user_provided_sigs a LocalDefIdMap.

2c4b0b2

Refactor rustc_hir_typeck::closure.

e82c08f

Store a LocalDefId in hir::GenericParam.

18482f7

Store a LocalDefId in hir::AnonConst.

607d0c2

Store a LocalDefId in hir::Variant & hir::Field.

9d20aca

Do not use local_def_id in node_to_string.

df5c11a

Reuse Vec<T> backing storage for Rc<[T]>

1c813c4

Co-authored-by: joboet <jonas.boettiger@icloud.com>

Add Vec storage optimization to Arc and add tests

8424c24

black_box test strings in str.contains(str) benchmarks

4844e51

update str.contains benchmarks

467b299

generalize str.contains() tests to a range of haystack sizes

c37e8fa

The Big-O is cubic, but this is only called with ~70 chars so it's still fast enough

- convert from core::arch to core::simd

a2b2010

- bump simd compare to 32bytes - import small slice compare code from memmem crate - try a few different probe bytes to avoid degenerate cases - but special-case 2-byte needles

Use &mut Bx more.

68194aa

For the next commit, `FunctionCx::codegen_*_terminator` need to take a `&mut Bx` instead of consuming a `Bx`. This triggers a cascade of similar changes across multiple functions. The resulting code is more concise and replaces many `&mut bx` expressions with `bx`.

Auto merge of #103138 - nnethercote:merge-BBs, r=bjorn3

251831e

Merge basic blocks where possible when generating LLVM IR. r? `@ghost`

pull bot added the ⤵️ pull label Nov 17, 2022

pull bot merged commit 36db030 into Mu-L:master Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from rust-lang:master#2169

[pull] master from rust-lang:master#2169
pull[bot] merged 21 commits intoMu-L:masterfrom
rust-lang:master

pull bot commented Nov 17, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pull bot commented Nov 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pull bot commented Nov 17, 2022 •

edited

Loading