Skip to content

PoC: Make the codelets operate entirely in registers#113

Merged
Shnatsel merged 19 commits intomainfrom
in-registers-codelet
Apr 18, 2026
Merged

PoC: Make the codelets operate entirely in registers#113
Shnatsel merged 19 commits intomainfrom
in-registers-codelet

Conversation

@Shnatsel
Copy link
Copy Markdown
Collaborator

No description provided.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.18%. Comparing base (4d3d75d) to head (d70d7eb).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #113      +/-   ##
==========================================
- Coverage   98.80%   98.18%   -0.62%     
==========================================
  Files           9        9              
  Lines        2086     1871     -215     
==========================================
- Hits         2061     1837     -224     
- Misses         25       34       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Shnatsel Shnatsel changed the title PoC: Make the f64 codelet operate entirely in registers PoC: Make the codelets operate entirely in registers Apr 11, 2026
@Shnatsel
Copy link
Copy Markdown
Collaborator Author

Shnatsel commented Apr 11, 2026

This is now completely green on benchmarks on both Zen4 and M4.

At this point there are still lots of register spills in the f64 version, but it's actually giving us a huge perf boost on x86 at some sizes without regressing anything, and Apple M4 is also all green but the gains are less dramatic there.

This still needs cleanup and better unconditional enabling of codelets (those are the failing tests), but the core is already in place and universally beneficial.

@smu160
Copy link
Copy Markdown
Member

smu160 commented Apr 14, 2026

Results on the latest version on the M2 chip:

warning: `phastft` (lib) generated 1 warning (run `cargo fix --lib -p phastft` to apply 1 suggestion)
    Finished `bench` profile [optimized] target(s) in 10.84s
     Running benches/bench.rs (target/release/deps/bench-bbf0fdc7d8455996)
Forward f32/PhastFT DIT/64
                        time:   [91.050 ns 91.323 ns 91.575 ns]
                        thrpt:  [698.88 Melem/s 700.81 Melem/s 702.91 Melem/s]
                        thrpt:  [5.2070 GiB/s 5.2215 GiB/s 5.2371 GiB/s]
                 change:
                        time:   [−8.6968% −8.2361% −7.7695%] (p = 0.00 < 0.05)
                        thrpt:  [+8.4240% +8.9753% +9.5251%]
                        Performance has improved.
Forward f32/RustFFT/64  time:   [83.868 ns 84.205 ns 84.480 ns]
                        thrpt:  [757.57 Melem/s 760.05 Melem/s 763.11 Melem/s]
                        thrpt:  [5.6444 GiB/s 5.6628 GiB/s 5.6856 GiB/s]
                 change:
                        time:   [+0.5176% +1.0094% +1.4906%] (p = 0.00 < 0.05)
                        thrpt:  [−1.4687% −0.9993% −0.5149%]
                        Change within noise threshold.
Forward f32/PhastFT DIT/128
                        time:   [164.59 ns 165.21 ns 165.70 ns]
                        thrpt:  [772.47 Melem/s 774.77 Melem/s 777.71 Melem/s]
                        thrpt:  [5.7553 GiB/s 5.7725 GiB/s 5.7944 GiB/s]
                 change:
                        time:   [−6.6059% −6.1471% −5.6822%] (p = 0.00 < 0.05)
                        thrpt:  [+6.0245% +6.5497% +7.0732%]
                        Performance has improved.
Forward f32/RustFFT/128 time:   [162.97 ns 163.69 ns 164.37 ns]
                        thrpt:  [778.73 Melem/s 781.96 Melem/s 785.43 Melem/s]
                        thrpt:  [5.8020 GiB/s 5.8260 GiB/s 5.8519 GiB/s]
                 change:
                        time:   [+0.3098% +0.9429% +1.6193%] (p = 0.01 < 0.05)
                        thrpt:  [−1.5935% −0.9341% −0.3089%]
                        Change within noise threshold.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Forward f32/PhastFT DIT/256
                        time:   [323.88 ns 325.16 ns 326.16 ns]
                        thrpt:  [784.88 Melem/s 787.31 Melem/s 790.43 Melem/s]
                        thrpt:  [5.8478 GiB/s 5.8659 GiB/s 5.8891 GiB/s]
                 change:
                        time:   [−5.9137% −5.3926% −4.8340%] (p = 0.00 < 0.05)
                        thrpt:  [+5.0795% +5.7000% +6.2854%]
                        Performance has improved.
Forward f32/RustFFT/256 time:   [359.21 ns 360.88 ns 362.12 ns]
                        thrpt:  [706.95 Melem/s 709.37 Melem/s 712.67 Melem/s]
                        thrpt:  [5.2672 GiB/s 5.2852 GiB/s 5.3098 GiB/s]
                 change:
                        time:   [−0.0215% +0.6066% +1.3015%] (p = 0.08 > 0.05)
                        thrpt:  [−1.2848% −0.6030% +0.0215%]
                        No change in performance detected.
Forward f32/PhastFT DIT/512
                        time:   [657.35 ns 659.94 ns 662.16 ns]
                        thrpt:  [773.23 Melem/s 775.83 Melem/s 778.88 Melem/s]
                        thrpt:  [5.7610 GiB/s 5.7804 GiB/s 5.8031 GiB/s]
                 change:
                        time:   [−5.5666% −4.9681% −4.3706%] (p = 0.00 < 0.05)
                        thrpt:  [+4.5703% +5.2278% +5.8948%]
                        Performance has improved.
Forward f32/RustFFT/512 time:   [747.47 ns 752.29 ns 755.66 ns]
                        thrpt:  [677.56 Melem/s 680.59 Melem/s 684.98 Melem/s]
                        thrpt:  [5.0482 GiB/s 5.0708 GiB/s 5.1035 GiB/s]
                 change:
                        time:   [−0.8200% +0.4070% +1.4960%] (p = 0.51 > 0.05)
                        thrpt:  [−1.4740% −0.4054% +0.8267%]
                        No change in performance detected.
Forward f32/PhastFT DIT/1024
                        time:   [1.4201 µs 1.4260 µs 1.4304 µs]
                        thrpt:  [715.91 Melem/s 718.11 Melem/s 721.05 Melem/s]
                        thrpt:  [5.3339 GiB/s 5.3504 GiB/s 5.3723 GiB/s]
                 change:
                        time:   [−6.2878% −5.3466% −4.3965%] (p = 0.00 < 0.05)
                        thrpt:  [+4.5987% +5.6486% +6.7097%]
                        Performance has improved.
Forward f32/RustFFT/1024
                        time:   [2.0500 µs 2.0776 µs 2.0934 µs]
                        thrpt:  [489.16 Melem/s 492.88 Melem/s 499.51 Melem/s]
                        thrpt:  [3.6445 GiB/s 3.6722 GiB/s 3.7217 GiB/s]
                 change:
                        time:   [−3.7472% +0.9153% +5.4606%] (p = 0.71 > 0.05)
                        thrpt:  [−5.1778% −0.9070% +3.8931%]
                        No change in performance detected.
Forward f32/PhastFT DIT/2048
                        time:   [3.1743 µs 3.1940 µs 3.2067 µs]
                        thrpt:  [638.66 Melem/s 641.19 Melem/s 645.19 Melem/s]
                        thrpt:  [4.7584 GiB/s 4.7773 GiB/s 4.8070 GiB/s]
                 change:
                        time:   [−4.5715% −3.3447% −2.2244%] (p = 0.00 < 0.05)
                        thrpt:  [+2.2750% +3.4604% +4.7906%]
                        Performance has improved.
Forward f32/RustFFT/2048
                        time:   [4.4099 µs 4.4879 µs 4.5309 µs]
                        thrpt:  [452.01 Melem/s 456.34 Melem/s 464.41 Melem/s]
                        thrpt:  [3.3677 GiB/s 3.4000 GiB/s 3.4601 GiB/s]
                 change:
                        time:   [−6.3203% −0.1133% +6.1477%] (p = 0.97 > 0.05)
                        thrpt:  [−5.7916% +0.1134% +6.7467%]
                        No change in performance detected.
Forward f32/PhastFT DIT/4096
                        time:   [7.2429 µs 7.3049 µs 7.3449 µs]
                        thrpt:  [557.67 Melem/s 560.72 Melem/s 565.52 Melem/s]
                        thrpt:  [4.1549 GiB/s 4.1777 GiB/s 4.2134 GiB/s]
                 change:
                        time:   [−6.8106% −4.5034% −2.0949%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1397% +4.7157% +7.3083%]
                        Performance has improved.
Forward f32/RustFFT/4096
                        time:   [8.6850 µs 8.7429 µs 8.7777 µs]
                        thrpt:  [466.64 Melem/s 468.49 Melem/s 471.62 Melem/s]
                        thrpt:  [3.4767 GiB/s 3.4905 GiB/s 3.5138 GiB/s]
                 change:
                        time:   [−0.6723% +1.2668% +3.2548%] (p = 0.22 > 0.05)
                        thrpt:  [−3.1522% −1.2510% +0.6768%]
                        No change in performance detected.
Forward f32/PhastFT DIT/8192
                        time:   [15.883 µs 16.006 µs 16.079 µs]
                        thrpt:  [509.47 Melem/s 511.81 Melem/s 515.77 Melem/s]
                        thrpt:  [3.7958 GiB/s 3.8133 GiB/s 3.8428 GiB/s]
                 change:
                        time:   [−6.6110% −4.2487% −1.9858%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0260% +4.4372% +7.0790%]
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Forward f32/RustFFT/8192
                        time:   [17.198 µs 17.285 µs 17.347 µs]
                        thrpt:  [472.24 Melem/s 473.92 Melem/s 476.34 Melem/s]
                        thrpt:  [3.5184 GiB/s 3.5310 GiB/s 3.5490 GiB/s]
                 change:
                        time:   [−1.8916% −0.4763% +0.8845%] (p = 0.52 > 0.05)
                        thrpt:  [−0.8767% +0.4785% +1.9281%]
                        No change in performance detected.
Forward f32/PhastFT DIT/16384
                        time:   [35.419 µs 35.680 µs 35.833 µs]
                        thrpt:  [457.23 Melem/s 459.20 Melem/s 462.58 Melem/s]
                        thrpt:  [3.4067 GiB/s 3.4213 GiB/s 3.4465 GiB/s]
                 change:
                        time:   [−12.953% −10.737% −8.4280%] (p = 0.00 < 0.05)
                        thrpt:  [+9.2037% +12.028% +14.880%]
                        Performance has improved.
Forward f32/RustFFT/16384
                        time:   [39.310 µs 39.441 µs 39.541 µs]
                        thrpt:  [414.36 Melem/s 415.41 Melem/s 416.78 Melem/s]
                        thrpt:  [3.0872 GiB/s 3.0950 GiB/s 3.1053 GiB/s]
                 change:
                        time:   [−0.2159% +0.5740% +1.3748%] (p = 0.18 > 0.05)
                        thrpt:  [−1.3561% −0.5707% +0.2164%]
                        No change in performance detected.
Forward f32/PhastFT DIT/32768
                        time:   [78.011 µs 78.598 µs 79.013 µs]
                        thrpt:  [414.72 Melem/s 416.91 Melem/s 420.04 Melem/s]
                        thrpt:  [3.0899 GiB/s 3.1062 GiB/s 3.1296 GiB/s]
                 change:
                        time:   [−11.890% −9.8731% −8.0148%] (p = 0.00 < 0.05)
                        thrpt:  [+8.7131% +10.955% +13.495%]
                        Performance has improved.
Forward f32/RustFFT/32768
                        time:   [84.408 µs 85.015 µs 85.388 µs]
                        thrpt:  [383.75 Melem/s 385.44 Melem/s 388.21 Melem/s]
                        thrpt:  [2.8592 GiB/s 2.8717 GiB/s 2.8924 GiB/s]
                 change:
                        time:   [−2.2396% −0.0569% +2.0939%] (p = 0.96 > 0.05)
                        thrpt:  [−2.0509% +0.0569% +2.2909%]
                        No change in performance detected.
Forward f32/PhastFT DIT/65536
                        time:   [189.79 µs 190.53 µs 191.23 µs]
                        thrpt:  [342.72 Melem/s 343.97 Melem/s 345.30 Melem/s]
                        thrpt:  [2.5534 GiB/s 2.5628 GiB/s 2.5727 GiB/s]
                 change:
                        time:   [−7.6258% −7.0444% −6.4252%] (p = 0.00 < 0.05)
                        thrpt:  [+6.8664% +7.5782% +8.2554%]
                        Performance has improved.
Forward f32/RustFFT/65536
                        time:   [193.12 µs 195.09 µs 196.21 µs]
                        thrpt:  [334.01 Melem/s 335.92 Melem/s 339.35 Melem/s]
                        thrpt:  [2.4885 GiB/s 2.5028 GiB/s 2.5284 GiB/s]
                 change:
                        time:   [−3.7442% −0.8750% +2.2704%] (p = 0.58 > 0.05)
                        thrpt:  [−2.2200% +0.8827% +3.8898%]
                        No change in performance detected.
Forward f32/PhastFT DIT/131072
                        time:   [387.28 µs 390.08 µs 391.98 µs]
                        thrpt:  [334.39 Melem/s 336.01 Melem/s 338.44 Melem/s]
                        thrpt:  [2.4914 GiB/s 2.5035 GiB/s 2.5216 GiB/s]
                 change:
                        time:   [−9.2302% −7.7305% −6.2920%] (p = 0.00 < 0.05)
                        thrpt:  [+6.7145% +8.3781% +10.169%]
                        Performance has improved.
Forward f32/RustFFT/131072
                        time:   [388.72 µs 392.31 µs 394.38 µs]
                        thrpt:  [332.35 Melem/s 334.10 Melem/s 337.19 Melem/s]
                        thrpt:  [2.4762 GiB/s 2.4892 GiB/s 2.5123 GiB/s]
                 change:
                        time:   [−4.1073% −1.1367% +1.8357%] (p = 0.47 > 0.05)
                        thrpt:  [−1.8026% +1.1497% +4.2833%]
                        No change in performance detected.
Forward f32/PhastFT DIT/262144
                        time:   [918.81 µs 929.83 µs 936.26 µs]
                        thrpt:  [279.99 Melem/s 281.93 Melem/s 285.31 Melem/s]
                        thrpt:  [2.0861 GiB/s 2.1005 GiB/s 2.1257 GiB/s]
                 change:
                        time:   [−11.215% −7.0050% −2.8853%] (p = 0.00 < 0.05)
                        thrpt:  [+2.9710% +7.5327% +12.631%]
                        Performance has improved.
Forward f32/RustFFT/262144
                        time:   [853.37 µs 858.15 µs 861.27 µs]
                        thrpt:  [304.37 Melem/s 305.48 Melem/s 307.19 Melem/s]
                        thrpt:  [2.2677 GiB/s 2.2760 GiB/s 2.2887 GiB/s]
                 change:
                        time:   [−3.0826% −0.8676% +1.3614%] (p = 0.47 > 0.05)
                        thrpt:  [−1.3431% +0.8751% +3.1806%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Forward f32/PhastFT DIT/524288
                        time:   [1.9948 ms 2.0141 ms 2.0256 ms]
                        thrpt:  [258.83 Melem/s 260.31 Melem/s 262.83 Melem/s]
                        thrpt:  [1.9284 GiB/s 1.9394 GiB/s 1.9582 GiB/s]
                 change:
                        time:   [−10.405% −6.5958% −2.8877%] (p = 0.00 < 0.05)
                        thrpt:  [+2.9736% +7.0616% +11.613%]
                        Performance has improved.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Forward f32/RustFFT/524288
                        time:   [1.7744 ms 1.7817 ms 1.7876 ms]
                        thrpt:  [293.29 Melem/s 294.27 Melem/s 295.48 Melem/s]
                        thrpt:  [2.1851 GiB/s 2.1925 GiB/s 2.2015 GiB/s]
                 change:
                        time:   [−2.3768% −0.2954% +1.7835%] (p = 0.79 > 0.05)
                        thrpt:  [−1.7523% +0.2963% +2.4346%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low severe
Forward f32/PhastFT DIT/1048576
                        time:   [4.3687 ms 4.4129 ms 4.4442 ms]
                        thrpt:  [235.94 Melem/s 237.61 Melem/s 240.02 Melem/s]
                        thrpt:  [1.7579 GiB/s 1.7704 GiB/s 1.7883 GiB/s]
                 change:
                        time:   [−9.0663% −5.6258% −2.4870%] (p = 0.00 < 0.05)
                        thrpt:  [+2.5504% +5.9612% +9.9703%]
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Forward f32/RustFFT/1048576
                        time:   [3.8208 ms 3.8433 ms 3.8703 ms]
                        thrpt:  [270.93 Melem/s 272.83 Melem/s 274.44 Melem/s]
                        thrpt:  [2.0186 GiB/s 2.0328 GiB/s 2.0447 GiB/s]
                 change:
                        time:   [−1.0516% +0.1574% +1.4257%] (p = 0.82 > 0.05)
                        thrpt:  [−1.4057% −0.1572% +1.0628%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  1 (5.00%) high mild
  2 (10.00%) high severe
Forward f32/PhastFT DIT/2097152
                        time:   [9.5787 ms 9.6537 ms 9.6995 ms]
                        thrpt:  [216.21 Melem/s 217.24 Melem/s 218.94 Melem/s]
                        thrpt:  [1.6109 GiB/s 1.6185 GiB/s 1.6312 GiB/s]
                 change:
                        time:   [−7.7811% −4.6802% −1.6270%] (p = 0.01 < 0.05)
                        thrpt:  [+1.6539% +4.9100% +8.4376%]
                        Performance has improved.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Forward f32/RustFFT/2097152
                        time:   [8.8762 ms 8.9170 ms 8.9683 ms]
                        thrpt:  [233.84 Melem/s 235.19 Melem/s 236.27 Melem/s]
                        thrpt:  [1.7422 GiB/s 1.7523 GiB/s 1.7603 GiB/s]
                 change:
                        time:   [+1.5186% +2.6683% +3.8519%] (p = 0.00 < 0.05)
                        thrpt:  [−3.7090% −2.5990% −1.4959%]
                        Performance has regressed.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) high mild
  1 (5.00%) high severe
Benchmarking Forward f32/PhastFT DIT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 10.
Forward f32/PhastFT DIT/4194304
                        time:   [22.474 ms 22.589 ms 22.673 ms]
                        thrpt:  [184.99 Melem/s 185.68 Melem/s 186.63 Melem/s]
                        thrpt:  [1.3783 GiB/s 1.3834 GiB/s 1.3905 GiB/s]
                 change:
                        time:   [−5.0575% −4.0768% −3.1417%] (p = 0.00 < 0.05)
                        thrpt:  [+3.2436% +4.2501% +5.3269%]
                        Performance has improved.
Benchmarking Forward f32/RustFFT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 10.
Forward f32/RustFFT/4194304
                        time:   [20.739 ms 20.810 ms 20.880 ms]
                        thrpt:  [200.88 Melem/s 201.55 Melem/s 202.24 Melem/s]
                        thrpt:  [1.4967 GiB/s 1.5017 GiB/s 1.5068 GiB/s]
                 change:
                        time:   [−0.4961% +0.1474% +0.8152%] (p = 0.67 > 0.05)
                        thrpt:  [−0.8086% −0.1472% +0.4986%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Forward f32/PhastFT DIT/8388608
                        time:   [48.642 ms 48.852 ms 49.083 ms]
                        thrpt:  [170.90 Melem/s 171.72 Melem/s 172.46 Melem/s]
                        thrpt:  [1.2733 GiB/s 1.2794 GiB/s 1.2849 GiB/s]
                 change:
                        time:   [−2.7215% −2.1823% −1.5724%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5975% +2.2310% +2.7977%]
                        Performance has improved.
Forward f32/RustFFT/8388608
                        time:   [46.682 ms 47.492 ms 48.259 ms]
                        thrpt:  [173.83 Melem/s 176.63 Melem/s 179.70 Melem/s]
                        thrpt:  [1.2951 GiB/s 1.3160 GiB/s 1.3389 GiB/s]
                 change:
                        time:   [+6.8449% +8.7729% +10.696%] (p = 0.00 < 0.05)
                        thrpt:  [−9.6628% −8.0653% −6.4064%]
                        Performance has regressed.
Forward f32/PhastFT DIT/16777216
                        time:   [106.82 ms 108.19 ms 109.42 ms]
                        thrpt:  [153.33 Melem/s 155.07 Melem/s 157.06 Melem/s]
                        thrpt:  [1.1424 GiB/s 1.1554 GiB/s 1.1702 GiB/s]
                 change:
                        time:   [+1.4580% +2.7536% +4.0889%] (p = 0.00 < 0.05)
                        thrpt:  [−3.9283% −2.6798% −1.4370%]
                        Performance has regressed.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Forward f32/RustFFT/16777216
                        time:   [96.198 ms 98.463 ms 100.87 ms]
                        thrpt:  [166.32 Melem/s 170.39 Melem/s 174.40 Melem/s]
                        thrpt:  [1.2392 GiB/s 1.2695 GiB/s 1.2994 GiB/s]
                 change:
                        time:   [+5.7998% +8.3730% +11.058%] (p = 0.00 < 0.05)
                        thrpt:  [−9.9570% −7.7261% −5.4819%]
                        Performance has regressed.

Inverse f32/PhastFT DIT/64
                        time:   [103.48 ns 104.45 ns 105.78 ns]
                        thrpt:  [605.04 Melem/s 612.74 Melem/s 618.45 Melem/s]
                        thrpt:  [4.5079 GiB/s 4.5653 GiB/s 4.6078 GiB/s]
                 change:
                        time:   [−6.2924% −5.4662% −4.6262%] (p = 0.00 < 0.05)
                        thrpt:  [+4.8506% +5.7822% +6.7149%]
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) high mild
  1 (5.00%) high severe
Inverse f32/RustFFT/64  time:   [85.052 ns 85.379 ns 85.603 ns]
                        thrpt:  [747.64 Melem/s 749.60 Melem/s 752.48 Melem/s]
                        thrpt:  [5.5703 GiB/s 5.5849 GiB/s 5.6064 GiB/s]
                 change:
                        time:   [+1.0195% +1.8736% +2.6064%] (p = 0.00 < 0.05)
                        thrpt:  [−2.5402% −1.8392% −1.0092%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/128
                        time:   [185.87 ns 188.63 ns 191.86 ns]
                        thrpt:  [667.15 Melem/s 678.58 Melem/s 688.67 Melem/s]
                        thrpt:  [4.9707 GiB/s 5.0558 GiB/s 5.1310 GiB/s]
                 change:
                        time:   [−5.0349% −3.9120% −2.6558%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7282% +4.0713% +5.3018%]
                        Performance has improved.
Found 4 outliers among 20 measurements (20.00%)
  4 (20.00%) high mild
Inverse f32/RustFFT/128 time:   [169.39 ns 172.65 ns 175.69 ns]
                        thrpt:  [728.54 Melem/s 741.37 Melem/s 755.64 Melem/s]
                        thrpt:  [5.4280 GiB/s 5.5236 GiB/s 5.6299 GiB/s]
                 change:
                        time:   [+2.4517% +4.0105% +5.4378%] (p = 0.00 < 0.05)
                        thrpt:  [−5.1574% −3.8559% −2.3930%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/256
                        time:   [366.93 ns 367.87 ns 368.74 ns]
                        thrpt:  [694.27 Melem/s 695.90 Melem/s 697.69 Melem/s]
                        thrpt:  [5.1727 GiB/s 5.1849 GiB/s 5.1982 GiB/s]
                 change:
                        time:   [−4.3804% −3.5397% −2.7464%] (p = 0.00 < 0.05)
                        thrpt:  [+2.8240% +3.6695% +4.5810%]
                        Performance has improved.
Found 4 outliers among 20 measurements (20.00%)
  1 (5.00%) low severe
  2 (10.00%) low mild
  1 (5.00%) high mild
Inverse f32/RustFFT/256 time:   [366.12 ns 369.73 ns 374.68 ns]
                        thrpt:  [683.25 Melem/s 692.40 Melem/s 699.22 Melem/s]
                        thrpt:  [5.0906 GiB/s 5.1588 GiB/s 5.2096 GiB/s]
                 change:
                        time:   [+1.2325% +2.4598% +3.7883%] (p = 0.00 < 0.05)
                        thrpt:  [−3.6500% −2.4007% −1.2175%]
                        Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) high mild
  1 (5.00%) high severe
Inverse f32/PhastFT DIT/512
                        time:   [751.31 ns 761.62 ns 774.48 ns]
                        thrpt:  [661.09 Melem/s 672.25 Melem/s 681.48 Melem/s]
                        thrpt:  [4.9255 GiB/s 5.0087 GiB/s 5.0774 GiB/s]
                 change:
                        time:   [−1.0185% +0.2106% +1.5609%] (p = 0.75 > 0.05)
                        thrpt:  [−1.5369% −0.2102% +1.0290%]
                        No change in performance detected.
Inverse f32/RustFFT/512 time:   [805.67 ns 825.65 ns 845.04 ns]
                        thrpt:  [605.89 Melem/s 620.12 Melem/s 635.50 Melem/s]
                        thrpt:  [4.5142 GiB/s 4.6202 GiB/s 4.7348 GiB/s]
                 change:
                        time:   [+3.4073% +5.9208% +8.8025%] (p = 0.00 < 0.05)
                        thrpt:  [−8.0903% −5.5898% −3.2951%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/1024
                        time:   [1.6560 µs 1.6865 µs 1.7207 µs]
                        thrpt:  [595.10 Melem/s 607.18 Melem/s 618.34 Melem/s]
                        thrpt:  [4.4338 GiB/s 4.5238 GiB/s 4.6070 GiB/s]
                 change:
                        time:   [−1.4203% +0.1615% +1.8870%] (p = 0.86 > 0.05)
                        thrpt:  [−1.8520% −0.1613% +1.4408%]
                        No change in performance detected.
Inverse f32/RustFFT/1024
                        time:   [2.1807 µs 2.2297 µs 2.2840 µs]
                        thrpt:  [448.34 Melem/s 459.25 Melem/s 469.57 Melem/s]
                        thrpt:  [3.3404 GiB/s 3.4217 GiB/s 3.4985 GiB/s]
                 change:
                        time:   [+4.6920% +10.038% +15.720%] (p = 0.00 < 0.05)
                        thrpt:  [−13.584% −9.1223% −4.4817%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/2048
                        time:   [3.4527 µs 3.5376 µs 3.6066 µs]
                        thrpt:  [567.85 Melem/s 578.92 Melem/s 593.15 Melem/s]
                        thrpt:  [4.2308 GiB/s 4.3133 GiB/s 4.4193 GiB/s]
                 change:
                        time:   [−4.1995% −2.5955% −0.9218%] (p = 0.00 < 0.05)
                        thrpt:  [+0.9304% +2.6647% +4.3836%]
                        Change within noise threshold.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) high mild
Inverse f32/RustFFT/2048
                        time:   [4.7604 µs 4.9062 µs 4.9952 µs]
                        thrpt:  [409.99 Melem/s 417.43 Melem/s 430.21 Melem/s]
                        thrpt:  [3.0547 GiB/s 3.1101 GiB/s 3.2053 GiB/s]
                 change:
                        time:   [+2.3796% +9.1090% +16.252%] (p = 0.01 < 0.05)
                        thrpt:  [−13.980% −8.3485% −2.3243%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/4096
                        time:   [8.0084 µs 8.2414 µs 8.4077 µs]
                        thrpt:  [487.17 Melem/s 497.00 Melem/s 511.46 Melem/s]
                        thrpt:  [3.6297 GiB/s 3.7030 GiB/s 3.8107 GiB/s]
                 change:
                        time:   [−2.3348% −0.2018% +2.4531%] (p = 0.87 > 0.05)
                        thrpt:  [−2.3944% +0.2023% +2.3906%]
                        No change in performance detected.
Inverse f32/RustFFT/4096
                        time:   [8.9048 µs 9.1087 µs 9.2509 µs]
                        thrpt:  [442.77 Melem/s 449.68 Melem/s 459.97 Melem/s]
                        thrpt:  [3.2989 GiB/s 3.3504 GiB/s 3.4271 GiB/s]
                 change:
                        time:   [+2.7725% +5.2698% +8.0027%] (p = 0.00 < 0.05)
                        thrpt:  [−7.4098% −5.0060% −2.6977%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/8192
                        time:   [17.695 µs 18.051 µs 18.313 µs]
                        thrpt:  [447.34 Melem/s 453.82 Melem/s 462.95 Melem/s]
                        thrpt:  [3.3330 GiB/s 3.3812 GiB/s 3.4492 GiB/s]
                 change:
                        time:   [−1.9084% +0.3492% +2.7449%] (p = 0.77 > 0.05)
                        thrpt:  [−2.6716% −0.3479% +1.9455%]
                        No change in performance detected.
Inverse f32/RustFFT/8192
                        time:   [17.487 µs 17.783 µs 18.108 µs]
                        thrpt:  [452.39 Melem/s 460.67 Melem/s 468.46 Melem/s]
                        thrpt:  [3.3706 GiB/s 3.4322 GiB/s 3.4903 GiB/s]
                 change:
                        time:   [+2.9648% +4.6122% +6.4244%] (p = 0.00 < 0.05)
                        thrpt:  [−6.0366% −4.4089% −2.8794%]
                        Performance has regressed.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Inverse f32/PhastFT DIT/16384
                        time:   [39.147 µs 40.272 µs 41.043 µs]
                        thrpt:  [399.19 Melem/s 406.84 Melem/s 418.52 Melem/s]
                        thrpt:  [2.9742 GiB/s 3.0312 GiB/s 3.1182 GiB/s]
                 change:
                        time:   [−7.8827% −5.4437% −3.0458%] (p = 0.00 < 0.05)
                        thrpt:  [+3.1415% +5.7571% +8.5572%]
                        Performance has improved.
Inverse f32/RustFFT/16384
                        time:   [39.158 µs 39.529 µs 39.846 µs]
                        thrpt:  [411.18 Melem/s 414.48 Melem/s 418.41 Melem/s]
                        thrpt:  [3.0635 GiB/s 3.0882 GiB/s 3.1174 GiB/s]
                 change:
                        time:   [−0.7800% +0.0577% +0.9813%] (p = 0.90 > 0.05)
                        thrpt:  [−0.9717% −0.0576% +0.7861%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/32768
                        time:   [86.558 µs 88.186 µs 89.601 µs]
                        thrpt:  [365.71 Melem/s 371.58 Melem/s 378.57 Melem/s]
                        thrpt:  [2.7247 GiB/s 2.7685 GiB/s 2.8205 GiB/s]
                 change:
                        time:   [−6.9139% −4.9228% −2.7363%] (p = 0.00 < 0.05)
                        thrpt:  [+2.8132% +5.1777% +7.4274%]
                        Performance has improved.
Inverse f32/RustFFT/32768
                        time:   [90.261 µs 91.598 µs 92.426 µs]
                        thrpt:  [354.53 Melem/s 357.74 Melem/s 363.03 Melem/s]
                        thrpt:  [2.6415 GiB/s 2.6653 GiB/s 2.7048 GiB/s]
                 change:
                        time:   [+2.6304% +5.8169% +9.0448%] (p = 0.00 < 0.05)
                        thrpt:  [−8.2946% −5.4972% −2.5630%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/65536
                        time:   [210.88 µs 214.16 µs 216.59 µs]
                        thrpt:  [302.58 Melem/s 306.02 Melem/s 310.78 Melem/s]
                        thrpt:  [2.2544 GiB/s 2.2800 GiB/s 2.3155 GiB/s]
                 change:
                        time:   [−3.0059% −2.0764% −1.1717%] (p = 0.00 < 0.05)
                        thrpt:  [+1.1856% +2.1204% +3.0991%]
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Inverse f32/RustFFT/65536
                        time:   [192.56 µs 196.95 µs 200.92 µs]
                        thrpt:  [326.17 Melem/s 332.75 Melem/s 340.34 Melem/s]
                        thrpt:  [2.4302 GiB/s 2.4792 GiB/s 2.5357 GiB/s]
                 change:
                        time:   [−2.7860% +0.3435% +3.6593%] (p = 0.84 > 0.05)
                        thrpt:  [−3.5302% −0.3423% +2.8659%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/131072
                        time:   [424.21 µs 432.03 µs 437.88 µs]
                        thrpt:  [299.33 Melem/s 303.39 Melem/s 308.98 Melem/s]
                        thrpt:  [2.2302 GiB/s 2.2604 GiB/s 2.3021 GiB/s]
                 change:
                        time:   [−5.3724% −3.8253% −2.1281%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1744% +3.9775% +5.6775%]
                        Performance has improved.
Inverse f32/RustFFT/131072
                        time:   [393.50 µs 397.19 µs 401.74 µs]
                        thrpt:  [326.26 Melem/s 329.99 Melem/s 333.09 Melem/s]
                        thrpt:  [2.4308 GiB/s 2.4587 GiB/s 2.4817 GiB/s]
                 change:
                        time:   [+0.1668% +3.0659% +6.2684%] (p = 0.05 > 0.05)
                        thrpt:  [−5.8987% −2.9747% −0.1665%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/262144
                        time:   [1.0211 ms 1.0378 ms 1.0503 ms]
                        thrpt:  [249.59 Melem/s 252.60 Melem/s 256.72 Melem/s]
                        thrpt:  [1.8596 GiB/s 1.8820 GiB/s 1.9127 GiB/s]
                 change:
                        time:   [−4.8748% −0.8551% +3.5842%] (p = 0.70 > 0.05)
                        thrpt:  [−3.4602% +0.8625% +5.1246%]
                        No change in performance detected.
Inverse f32/RustFFT/262144
                        time:   [915.44 µs 923.40 µs 929.82 µs]
                        thrpt:  [281.93 Melem/s 283.89 Melem/s 286.36 Melem/s]
                        thrpt:  [2.1005 GiB/s 2.1152 GiB/s 2.1335 GiB/s]
                 change:
                        time:   [+3.7741% +6.7739% +9.8356%] (p = 0.00 < 0.05)
                        thrpt:  [−8.9548% −6.3441% −3.6369%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/524288
                        time:   [2.2009 ms 2.2440 ms 2.2731 ms]
                        thrpt:  [230.65 Melem/s 233.64 Melem/s 238.22 Melem/s]
                        thrpt:  [1.7185 GiB/s 1.7408 GiB/s 1.7748 GiB/s]
                 change:
                        time:   [−4.7488% −0.5239% +3.8805%] (p = 0.82 > 0.05)
                        thrpt:  [−3.7356% +0.5266% +4.9856%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f32/RustFFT/524288
                        time:   [1.8437 ms 1.8742 ms 1.9037 ms]
                        thrpt:  [275.40 Melem/s 279.75 Melem/s 284.37 Melem/s]
                        thrpt:  [2.0519 GiB/s 2.0843 GiB/s 2.1187 GiB/s]
                 change:
                        time:   [+2.5561% +5.2851% +8.0954%] (p = 0.00 < 0.05)
                        thrpt:  [−7.4891% −5.0198% −2.4924%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/1048576
                        time:   [4.8560 ms 4.9200 ms 4.9592 ms]
                        thrpt:  [211.44 Melem/s 213.13 Melem/s 215.93 Melem/s]
                        thrpt:  [1.5754 GiB/s 1.5879 GiB/s 1.6088 GiB/s]
                 change:
                        time:   [−3.9566% +0.4035% +5.0316%] (p = 0.86 > 0.05)
                        thrpt:  [−4.7906% −0.4019% +4.1196%]
                        No change in performance detected.
Inverse f32/RustFFT/1048576
                        time:   [3.8936 ms 3.9640 ms 4.0408 ms]
                        thrpt:  [259.50 Melem/s 264.53 Melem/s 269.31 Melem/s]
                        thrpt:  [1.9334 GiB/s 1.9709 GiB/s 2.0065 GiB/s]
                 change:
                        time:   [+0.5493% +2.1103% +3.7349%] (p = 0.02 < 0.05)
                        thrpt:  [−3.6004% −2.0667% −0.5463%]
                        Change within noise threshold.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Inverse f32/PhastFT DIT/2097152
                        time:   [11.002 ms 11.110 ms 11.181 ms]
                        thrpt:  [187.56 Melem/s 188.77 Melem/s 190.62 Melem/s]
                        thrpt:  [1.3974 GiB/s 1.4064 GiB/s 1.4202 GiB/s]
                 change:
                        time:   [+0.9040% +4.2870% +7.8921%] (p = 0.02 < 0.05)
                        thrpt:  [−7.3148% −4.1108% −0.8959%]
                        Change within noise threshold.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f32/RustFFT/2097152
                        time:   [9.5183 ms 9.6469 ms 9.7705 ms]
                        thrpt:  [214.64 Melem/s 217.39 Melem/s 220.33 Melem/s]
                        thrpt:  [1.5992 GiB/s 1.6197 GiB/s 1.6416 GiB/s]
                 change:
                        time:   [+9.0421% +11.014% +13.420%] (p = 0.00 < 0.05)
                        thrpt:  [−11.832% −9.9216% −8.2923%]
                        Performance has regressed.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low mild
  1 (5.00%) high severe
Benchmarking Inverse f32/PhastFT DIT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 10.
Inverse f32/PhastFT DIT/4194304
                        time:   [25.601 ms 25.834 ms 26.073 ms]
                        thrpt:  [160.87 Melem/s 162.36 Melem/s 163.83 Melem/s]
                        thrpt:  [1.1986 GiB/s 1.2097 GiB/s 1.2206 GiB/s]
                 change:
                        time:   [+2.2899% +3.8298% +5.2743%] (p = 0.00 < 0.05)
                        thrpt:  [−5.0101% −3.6886% −2.2387%]
                        Performance has regressed.
Benchmarking Inverse f32/RustFFT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 10.
Inverse f32/RustFFT/4194304
                        time:   [22.603 ms 23.114 ms 23.440 ms]
                        thrpt:  [178.94 Melem/s 181.46 Melem/s 185.57 Melem/s]
                        thrpt:  [1.3332 GiB/s 1.3520 GiB/s 1.3826 GiB/s]
                 change:
                        time:   [+8.2703% +10.358% +12.274%] (p = 0.00 < 0.05)
                        thrpt:  [−10.933% −9.3860% −7.6386%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/8388608
                        time:   [55.204 ms 55.957 ms 56.638 ms]
                        thrpt:  [148.11 Melem/s 149.91 Melem/s 151.96 Melem/s]
                        thrpt:  [1.1035 GiB/s 1.1169 GiB/s 1.1322 GiB/s]
                 change:
                        time:   [+4.3184% +5.7724% +7.1675%] (p = 0.00 < 0.05)
                        thrpt:  [−6.6881% −5.4573% −4.1396%]
                        Performance has regressed.
Inverse f32/RustFFT/8388608
                        time:   [47.381 ms 48.139 ms 48.857 ms]
                        thrpt:  [171.70 Melem/s 174.26 Melem/s 177.05 Melem/s]
                        thrpt:  [1.2792 GiB/s 1.2983 GiB/s 1.3191 GiB/s]
                 change:
                        time:   [+7.4650% +9.0968% +10.762%] (p = 0.00 < 0.05)
                        thrpt:  [−9.7163% −8.3383% −6.9464%]
                        Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f32/PhastFT DIT/16777216
                        time:   [112.20 ms 113.87 ms 115.46 ms]
                        thrpt:  [145.30 Melem/s 147.33 Melem/s 149.53 Melem/s]
                        thrpt:  [1.0826 GiB/s 1.0977 GiB/s 1.1141 GiB/s]
                 change:
                        time:   [+1.8256% +3.3539% +4.9261%] (p = 0.00 < 0.05)
                        thrpt:  [−4.6948% −3.2450% −1.7929%]
                        Performance has regressed.
Inverse f32/RustFFT/16777216
                        time:   [98.928 ms 101.25 ms 103.48 ms]
                        thrpt:  [162.13 Melem/s 165.70 Melem/s 169.59 Melem/s]
                        thrpt:  [1.2080 GiB/s 1.2346 GiB/s 1.2635 GiB/s]
                 change:
                        time:   [+8.5317% +10.771% +13.552%] (p = 0.00 < 0.05)
                        thrpt:  [−11.935% −9.7236% −7.8610%]
                        Performance has regressed.

Forward f64/PhastFT DIT/64
                        time:   [153.31 ns 155.86 ns 157.90 ns]
                        thrpt:  [405.33 Melem/s 410.62 Melem/s 417.47 Melem/s]
                        thrpt:  [6.0399 GiB/s 6.1188 GiB/s 6.2207 GiB/s]
                 change:
                        time:   [−8.5380% −7.3290% −6.2468%] (p = 0.00 < 0.05)
                        thrpt:  [+6.6631% +7.9086% +9.3350%]
                        Performance has improved.
Forward f64/RustFFT/64  time:   [142.01 ns 144.07 ns 146.10 ns]
                        thrpt:  [438.06 Melem/s 444.23 Melem/s 450.68 Melem/s]
                        thrpt:  [6.5276 GiB/s 6.6196 GiB/s 6.7157 GiB/s]
                 change:
                        time:   [+8.1673% +9.4842% +10.880%] (p = 0.00 < 0.05)
                        thrpt:  [−9.8128% −8.6626% −7.5506%]
                        Performance has regressed.
Forward f64/PhastFT DIT/128
                        time:   [313.72 ns 318.30 ns 323.34 ns]
                        thrpt:  [395.87 Melem/s 402.13 Melem/s 408.01 Melem/s]
                        thrpt:  [5.8989 GiB/s 5.9922 GiB/s 6.0798 GiB/s]
                 change:
                        time:   [−1.3959% +0.2177% +1.8058%] (p = 0.80 > 0.05)
                        thrpt:  [−1.7737% −0.2172% +1.4157%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Forward f64/RustFFT/128 time:   [278.65 ns 283.42 ns 290.17 ns]
                        thrpt:  [441.12 Melem/s 451.63 Melem/s 459.35 Melem/s]
                        thrpt:  [6.5733 GiB/s 6.7298 GiB/s 6.8449 GiB/s]
                 change:
                        time:   [+6.2141% +8.0051% +9.5899%] (p = 0.00 < 0.05)
                        thrpt:  [−8.7507% −7.4118% −5.8505%]
                        Performance has regressed.
Forward f64/PhastFT DIT/256
                        time:   [601.88 ns 609.21 ns 617.84 ns]
                        thrpt:  [414.35 Melem/s 420.22 Melem/s 425.33 Melem/s]
                        thrpt:  [6.1743 GiB/s 6.2617 GiB/s 6.3379 GiB/s]
                 change:
                        time:   [−3.0284% −1.6330% −0.0087%] (p = 0.05 < 0.05)
                        thrpt:  [+0.0087% +1.6601% +3.1230%]
                        Change within noise threshold.
Forward f64/RustFFT/256 time:   [633.51 ns 642.15 ns 652.45 ns]
                        thrpt:  [392.37 Melem/s 398.66 Melem/s 404.10 Melem/s]
                        thrpt:  [5.8468 GiB/s 5.9405 GiB/s 6.0215 GiB/s]
                 change:
                        time:   [+6.5558% +8.1053% +9.8162%] (p = 0.00 < 0.05)
                        thrpt:  [−8.9387% −7.4976% −6.1525%]
                        Performance has regressed.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Forward f64/PhastFT DIT/512
                        time:   [1.2767 µs 1.2834 µs 1.2925 µs]
                        thrpt:  [396.13 Melem/s 398.94 Melem/s 401.03 Melem/s]
                        thrpt:  [5.9028 GiB/s 5.9447 GiB/s 5.9758 GiB/s]
                 change:
                        time:   [−8.5232% −6.7967% −4.9305%] (p = 0.00 < 0.05)
                        thrpt:  [+5.1862% +7.2923% +9.3174%]
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Forward f64/RustFFT/512 time:   [1.7240 µs 1.7721 µs 1.8159 µs]
                        thrpt:  [281.96 Melem/s 288.92 Melem/s 296.99 Melem/s]
                        thrpt:  [4.2015 GiB/s 4.3052 GiB/s 4.4255 GiB/s]
                 change:
                        time:   [−1.6088% +4.6100% +10.656%] (p = 0.15 > 0.05)
                        thrpt:  [−9.6300% −4.4068% +1.6351%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Forward f64/PhastFT DIT/1024
                        time:   [2.7061 µs 2.7451 µs 2.7811 µs]
                        thrpt:  [368.20 Melem/s 373.03 Melem/s 378.41 Melem/s]
                        thrpt:  [5.4866 GiB/s 5.5587 GiB/s 5.6388 GiB/s]
                 change:
                        time:   [−6.1888% −4.8529% −3.5144%] (p = 0.00 < 0.05)
                        thrpt:  [+3.6424% +5.1004% +6.5971%]
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Forward f64/RustFFT/1024
                        time:   [3.2271 µs 3.2972 µs 3.3412 µs]
                        thrpt:  [306.48 Melem/s 310.56 Melem/s 317.32 Melem/s]
                        thrpt:  [4.5669 GiB/s 4.6278 GiB/s 4.7284 GiB/s]
                 change:
                        time:   [+0.2315% +3.8067% +7.6575%] (p = 0.06 > 0.05)
                        thrpt:  [−7.1128% −3.6671% −0.2310%]
                        No change in performance detected.
Forward f64/PhastFT DIT/2048
                        time:   [6.7377 µs 6.8570 µs 6.9424 µs]
                        thrpt:  [295.00 Melem/s 298.67 Melem/s 303.96 Melem/s]
                        thrpt:  [4.3958 GiB/s 4.4505 GiB/s 4.5293 GiB/s]
                 change:
                        time:   [−4.5121% −1.4916% +1.5007%] (p = 0.36 > 0.05)
                        thrpt:  [−1.4786% +1.5142% +4.7253%]
                        No change in performance detected.
Forward f64/RustFFT/2048
                        time:   [7.3860 µs 7.5072 µs 7.5951 µs]
                        thrpt:  [269.65 Melem/s 272.80 Melem/s 277.28 Melem/s]
                        thrpt:  [4.0181 GiB/s 4.0651 GiB/s 4.1318 GiB/s]
                 change:
                        time:   [−0.2551% +2.8862% +6.0234%] (p = 0.08 > 0.05)
                        thrpt:  [−5.6812% −2.8053% +0.2558%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Forward f64/PhastFT DIT/4096
                        time:   [14.901 µs 15.181 µs 15.413 µs]
                        thrpt:  [265.75 Melem/s 269.81 Melem/s 274.87 Melem/s]
                        thrpt:  [3.9600 GiB/s 4.0204 GiB/s 4.0959 GiB/s]
                 change:
                        time:   [−4.3928% −1.5071% +1.3217%] (p = 0.32 > 0.05)
                        thrpt:  [−1.3045% +1.5302% +4.5946%]
                        No change in performance detected.
Forward f64/RustFFT/4096
                        time:   [15.119 µs 15.461 µs 15.739 µs]
                        thrpt:  [260.25 Melem/s 264.92 Melem/s 270.91 Melem/s]
                        thrpt:  [3.8780 GiB/s 3.9476 GiB/s 4.0369 GiB/s]
                 change:
                        time:   [+1.2841% +3.3818% +5.6896%] (p = 0.00 < 0.05)
                        thrpt:  [−5.3834% −3.2711% −1.2678%]
                        Performance has regressed.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Forward f64/PhastFT DIT/8192
                        time:   [33.576 µs 34.198 µs 34.586 µs]
                        thrpt:  [236.86 Melem/s 239.55 Melem/s 243.99 Melem/s]
                        thrpt:  [3.5295 GiB/s 3.5696 GiB/s 3.6357 GiB/s]
                 change:
                        time:   [−4.1308% −1.0267% +2.4239%] (p = 0.55 > 0.05)
                        thrpt:  [−2.3665% +1.0373% +4.3088%]
                        No change in performance detected.
Forward f64/RustFFT/8192
                        time:   [33.623 µs 34.089 µs 34.555 µs]
                        thrpt:  [237.07 Melem/s 240.31 Melem/s 243.64 Melem/s]
                        thrpt:  [3.5327 GiB/s 3.5809 GiB/s 3.6306 GiB/s]
                 change:
                        time:   [−1.0349% +0.9807% +3.1938%] (p = 0.38 > 0.05)
                        thrpt:  [−3.0949% −0.9712% +1.0457%]
                        No change in performance detected.
Forward f64/PhastFT DIT/16384
                        time:   [72.080 µs 72.768 µs 73.823 µs]
                        thrpt:  [221.94 Melem/s 225.15 Melem/s 227.30 Melem/s]
                        thrpt:  [3.3071 GiB/s 3.3550 GiB/s 3.3871 GiB/s]
                 change:
                        time:   [−10.056% −7.9348% −5.8519%] (p = 0.00 < 0.05)
                        thrpt:  [+6.2156% +8.6187% +11.180%]
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Forward f64/RustFFT/16384
                        time:   [83.430 µs 84.928 µs 86.886 µs]
                        thrpt:  [188.57 Melem/s 192.92 Melem/s 196.38 Melem/s]
                        thrpt:  [2.8099 GiB/s 2.8747 GiB/s 2.9263 GiB/s]
                 change:
                        time:   [+2.1279% +4.7235% +7.4871%] (p = 0.00 < 0.05)
                        thrpt:  [−6.9656% −4.5104% −2.0835%]
                        Performance has regressed.
Forward f64/PhastFT DIT/32768
                        time:   [172.68 µs 175.10 µs 176.70 µs]
                        thrpt:  [185.45 Melem/s 187.14 Melem/s 189.76 Melem/s]
                        thrpt:  [2.7634 GiB/s 2.7887 GiB/s 2.8276 GiB/s]
                 change:
                        time:   [−8.1704% −6.2338% −4.2782%] (p = 0.00 < 0.05)
                        thrpt:  [+4.4695% +6.6483% +8.8973%]
                        Performance has improved.
Forward f64/RustFFT/32768
                        time:   [176.92 µs 179.45 µs 182.35 µs]
                        thrpt:  [179.69 Melem/s 182.60 Melem/s 185.21 Melem/s]
                        thrpt:  [2.6777 GiB/s 2.7210 GiB/s 2.7599 GiB/s]
                 change:
                        time:   [−1.5195% +2.2432% +6.2924%] (p = 0.28 > 0.05)
                        thrpt:  [−5.9199% −2.1940% +1.5429%]
                        No change in performance detected.
Forward f64/PhastFT DIT/65536
                        time:   [391.21 µs 396.58 µs 402.84 µs]
                        thrpt:  [162.68 Melem/s 165.25 Melem/s 167.52 Melem/s]
                        thrpt:  [2.4242 GiB/s 2.4624 GiB/s 2.4962 GiB/s]
                 change:
                        time:   [−8.3239% −5.8179% −3.0085%] (p = 0.00 < 0.05)
                        thrpt:  [+3.1018% +6.1773% +9.0797%]
                        Performance has improved.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low mild
  1 (5.00%) high mild
Forward f64/RustFFT/65536
                        time:   [375.94 µs 385.09 µs 391.14 µs]
                        thrpt:  [167.55 Melem/s 170.18 Melem/s 174.33 Melem/s]
                        thrpt:  [2.4967 GiB/s 2.5359 GiB/s 2.5976 GiB/s]
                 change:
                        time:   [−0.6883% +3.0385% +6.6673%] (p = 0.11 > 0.05)
                        thrpt:  [−6.2505% −2.9489% +0.6931%]
                        No change in performance detected.
Forward f64/PhastFT DIT/131072
                        time:   [861.03 µs 875.64 µs 889.19 µs]
                        thrpt:  [147.41 Melem/s 149.69 Melem/s 152.23 Melem/s]
                        thrpt:  [2.1965 GiB/s 2.2305 GiB/s 2.2684 GiB/s]
                 change:
                        time:   [−7.2995% −4.1128% −0.8208%] (p = 0.02 < 0.05)
                        thrpt:  [+0.8275% +4.2892% +7.8743%]
                        Change within noise threshold.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Forward f64/RustFFT/131072
                        time:   [802.41 µs 820.61 µs 832.98 µs]
                        thrpt:  [157.35 Melem/s 159.73 Melem/s 163.35 Melem/s]
                        thrpt:  [2.3448 GiB/s 2.3801 GiB/s 2.4341 GiB/s]
                 change:
                        time:   [+1.4724% +4.4020% +7.5999%] (p = 0.01 < 0.05)
                        thrpt:  [−7.0631% −4.2164% −1.4510%]
                        Performance has regressed.
Forward f64/PhastFT DIT/262144
                        time:   [1.8945 ms 1.9230 ms 1.9465 ms]
                        thrpt:  [134.68 Melem/s 136.32 Melem/s 138.37 Melem/s]
                        thrpt:  [2.0068 GiB/s 2.0313 GiB/s 2.0618 GiB/s]
                 change:
                        time:   [−5.6428% −2.3217% +1.1211%] (p = 0.20 > 0.05)
                        thrpt:  [−1.1086% +2.3769% +5.9802%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Forward f64/RustFFT/262144
                        time:   [1.6902 ms 1.6981 ms 1.7040 ms]
                        thrpt:  [153.84 Melem/s 154.38 Melem/s 155.10 Melem/s]
                        thrpt:  [2.2924 GiB/s 2.3004 GiB/s 2.3111 GiB/s]
                 change:
                        time:   [−2.2061% −0.1783% +1.9122%] (p = 0.88 > 0.05)
                        thrpt:  [−1.8763% +0.1786% +2.2558%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  1 (5.00%) low severe
  1 (5.00%) low mild
  1 (5.00%) high mild
Forward f64/PhastFT DIT/524288
                        time:   [4.0824 ms 4.1223 ms 4.1603 ms]
                        thrpt:  [126.02 Melem/s 127.18 Melem/s 128.43 Melem/s]
                        thrpt:  [1.8778 GiB/s 1.8952 GiB/s 1.9137 GiB/s]
                 change:
                        time:   [−10.203% −6.7706% −3.3310%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4457% +7.2624% +11.362%]
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Forward f64/RustFFT/524288
                        time:   [3.5269 ms 3.5637 ms 3.6201 ms]
                        thrpt:  [144.83 Melem/s 147.12 Melem/s 148.66 Melem/s]
                        thrpt:  [2.1581 GiB/s 2.1923 GiB/s 2.2151 GiB/s]
                 change:
                        time:   [−2.1927% −0.5778% +1.2117%] (p = 0.53 > 0.05)
                        thrpt:  [−1.1972% +0.5812% +2.2419%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Forward f64/PhastFT DIT/1048576
                        time:   [9.2007 ms 9.2713 ms 9.3642 ms]
                        thrpt:  [111.98 Melem/s 113.10 Melem/s 113.97 Melem/s]
                        thrpt:  [1.6686 GiB/s 1.6853 GiB/s 1.6982 GiB/s]
                 change:
                        time:   [−5.0787% −2.1957% +0.7951%] (p = 0.17 > 0.05)
                        thrpt:  [−0.7888% +2.2450% +5.3505%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Forward f64/RustFFT/1048576
                        time:   [8.7052 ms 8.7494 ms 8.7897 ms]
                        thrpt:  [119.30 Melem/s 119.85 Melem/s 120.45 Melem/s]
                        thrpt:  [1.7777 GiB/s 1.7858 GiB/s 1.7949 GiB/s]
                 change:
                        time:   [+6.7135% +7.6981% +8.8398%] (p = 0.00 < 0.05)
                        thrpt:  [−8.1218% −7.1478% −6.2911%]
                        Performance has regressed.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Benchmarking Forward f64/PhastFT DIT/2097152: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 10.
Forward f64/PhastFT DIT/2097152
                        time:   [22.075 ms 22.469 ms 22.916 ms]
                        thrpt:  [91.516 Melem/s 93.337 Melem/s 95.001 Melem/s]
                        thrpt:  [1.3637 GiB/s 1.3908 GiB/s 1.4156 GiB/s]
                 change:
                        time:   [+1.0951% +2.4084% +3.7911%] (p = 0.00 < 0.05)
                        thrpt:  [−3.6526% −2.3517% −1.0833%]
                        Performance has regressed.
Benchmarking Forward f64/RustFFT/2097152: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 10.
Forward f64/RustFFT/2097152
                        time:   [18.730 ms 19.190 ms 19.660 ms]
                        thrpt:  [106.67 Melem/s 109.29 Melem/s 111.97 Melem/s]
                        thrpt:  [1.5895 GiB/s 1.6285 GiB/s 1.6684 GiB/s]
                 change:
                        time:   [+7.4248% +9.3675% +11.193%] (p = 0.00 < 0.05)
                        thrpt:  [−10.066% −8.5652% −6.9116%]
                        Performance has regressed.
Found 5 outliers among 20 measurements (25.00%)
  2 (10.00%) low severe
  1 (5.00%) low mild
  2 (10.00%) high mild
Forward f64/PhastFT DIT/4194304
                        time:   [46.963 ms 47.630 ms 48.397 ms]
                        thrpt:  [86.665 Melem/s 88.060 Melem/s 89.310 Melem/s]
                        thrpt:  [1.2914 GiB/s 1.3122 GiB/s 1.3308 GiB/s]
                 change:
                        time:   [−3.5156% −1.9922% −0.3602%] (p = 0.02 < 0.05)
                        thrpt:  [+0.3615% +2.0327% +3.6437%]
                        Change within noise threshold.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) high mild
Forward f64/RustFFT/4194304
                        time:   [40.932 ms 41.904 ms 42.963 ms]
                        thrpt:  [97.626 Melem/s 100.09 Melem/s 102.47 Melem/s]
                        thrpt:  [1.4547 GiB/s 1.4915 GiB/s 1.5269 GiB/s]
                 change:
                        time:   [+2.9725% +5.4379% +8.0483%] (p = 0.00 < 0.05)
                        thrpt:  [−7.4488% −5.1574% −2.8867%]
                        Performance has regressed.
Forward f64/PhastFT DIT/8388608
                        time:   [97.480 ms 98.983 ms 100.69 ms]
                        thrpt:  [83.314 Melem/s 84.748 Melem/s 86.054 Melem/s]
                        thrpt:  [1.2415 GiB/s 1.2628 GiB/s 1.2823 GiB/s]
                 change:
                        time:   [−5.3141% −3.7029% −1.9416%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9800% +3.8453% +5.6124%]
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Forward f64/RustFFT/8388608
                        time:   [89.536 ms 92.418 ms 95.570 ms]
                        thrpt:  [87.774 Melem/s 90.768 Melem/s 93.690 Melem/s]
                        thrpt:  [1.3079 GiB/s 1.3526 GiB/s 1.3961 GiB/s]
                 change:
                        time:   [+11.095% +14.734% +18.078%] (p = 0.00 < 0.05)
                        thrpt:  [−15.310% −12.842% −9.9866%]
                        Performance has regressed.
Benchmarking Forward f64/PhastFT DIT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 5.3s, or reduce sample count to 10.
Forward f64/PhastFT DIT/16777216
                        time:   [215.64 ms 219.27 ms 223.61 ms]
                        thrpt:  [75.028 Melem/s 76.513 Melem/s 77.803 Melem/s]
                        thrpt:  [1.1180 GiB/s 1.1401 GiB/s 1.1594 GiB/s]
                 change:
                        time:   [+2.2707% +4.0998% +5.8869%] (p = 0.00 < 0.05)
                        thrpt:  [−5.5596% −3.9384% −2.2203%]
                        Performance has regressed.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) high severe
Benchmarking Forward f64/RustFFT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 5.1s, or reduce sample count to 10.
Forward f64/RustFFT/16777216
                        time:   [199.38 ms 208.02 ms 216.72 ms]
                        thrpt:  [77.414 Melem/s 80.654 Melem/s 84.147 Melem/s]
                        thrpt:  [1.1536 GiB/s 1.2018 GiB/s 1.2539 GiB/s]
                 change:
                        time:   [+16.025% +21.211% +26.044%] (p = 0.00 < 0.05)
                        thrpt:  [−20.663% −17.499% −13.812%]
                        Performance has regressed.

Inverse f64/PhastFT DIT/64
                        time:   [170.43 ns 171.04 ns 171.77 ns]
                        thrpt:  [372.60 Melem/s 374.19 Melem/s 375.51 Melem/s]
                        thrpt:  [5.5521 GiB/s 5.5758 GiB/s 5.5956 GiB/s]
                 change:
                        time:   [−8.8447% −7.9766% −7.0562%] (p = 0.00 < 0.05)
                        thrpt:  [+7.5918% +8.6680% +9.7029%]
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/RustFFT/64  time:   [139.94 ns 141.59 ns 143.08 ns]
                        thrpt:  [447.30 Melem/s 452.00 Melem/s 457.35 Melem/s]
                        thrpt:  [6.6653 GiB/s 6.7353 GiB/s 6.8151 GiB/s]
                 change:
                        time:   [+0.7120% +2.5064% +4.1579%] (p = 0.01 < 0.05)
                        thrpt:  [−3.9919% −2.4451% −0.7070%]
                        Change within noise threshold.
Inverse f64/PhastFT DIT/128
                        time:   [340.79 ns 342.09 ns 343.63 ns]
                        thrpt:  [372.49 Melem/s 374.17 Melem/s 375.59 Melem/s]
                        thrpt:  [5.5506 GiB/s 5.5756 GiB/s 5.5968 GiB/s]
                 change:
                        time:   [−5.4079% −4.2517% −3.0641%] (p = 0.00 < 0.05)
                        thrpt:  [+3.1609% +4.4405% +5.7171%]
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/RustFFT/128 time:   [280.73 ns 282.16 ns 284.05 ns]
                        thrpt:  [450.63 Melem/s 453.64 Melem/s 455.95 Melem/s]
                        thrpt:  [6.7149 GiB/s 6.7598 GiB/s 6.7942 GiB/s]
                 change:
                        time:   [+0.8839% +2.3036% +3.7197%] (p = 0.00 < 0.05)
                        thrpt:  [−3.5863% −2.2517% −0.8761%]
                        Change within noise threshold.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/PhastFT DIT/256
                        time:   [669.28 ns 690.44 ns 729.72 ns]
                        thrpt:  [350.82 Melem/s 370.78 Melem/s 382.50 Melem/s]
                        thrpt:  [5.2276 GiB/s 5.5250 GiB/s 5.6997 GiB/s]
                 change:
                        time:   [−5.1350% −2.8288% −0.0122%] (p = 0.05 > 0.05)
                        thrpt:  [+0.0122% +2.9112% +5.4130%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low mild
  1 (5.00%) high severe
Inverse f64/RustFFT/256 time:   [649.50 ns 653.92 ns 660.11 ns]
                        thrpt:  [387.82 Melem/s 391.49 Melem/s 394.15 Melem/s]
                        thrpt:  [5.7789 GiB/s 5.8336 GiB/s 5.8733 GiB/s]
                 change:
                        time:   [+3.2820% +5.2184% +7.1781%] (p = 0.00 < 0.05)
                        thrpt:  [−6.6974% −4.9596% −3.1777%]
                        Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) low severe
  1 (5.00%) low mild
Inverse f64/PhastFT DIT/512
                        time:   [1.4809 µs 1.5102 µs 1.5607 µs]
                        thrpt:  [328.07 Melem/s 339.02 Melem/s 345.74 Melem/s]
                        thrpt:  [4.8886 GiB/s 5.0518 GiB/s 5.1520 GiB/s]
                 change:
                        time:   [−6.0324% −3.3342% −0.3869%] (p = 0.02 < 0.05)
                        thrpt:  [+0.3884% +3.4492% +6.4197%]
                        Change within noise threshold.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low mild
  1 (5.00%) high severe
Inverse f64/RustFFT/512 time:   [1.7252 µs 1.7450 µs 1.7562 µs]
                        thrpt:  [291.54 Melem/s 293.42 Melem/s 296.78 Melem/s]
                        thrpt:  [4.3443 GiB/s 4.3722 GiB/s 4.4223 GiB/s]
                 change:
                        time:   [−1.1637% +3.8986% +9.1478%] (p = 0.15 > 0.05)
                        thrpt:  [−8.3811% −3.7523% +1.1774%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/PhastFT DIT/1024
                        time:   [2.9341 µs 2.9794 µs 3.0476 µs]
                        thrpt:  [336.00 Melem/s 343.69 Melem/s 349.00 Melem/s]
                        thrpt:  [5.0068 GiB/s 5.1214 GiB/s 5.2004 GiB/s]
                 change:
                        time:   [−8.3998% −7.0140% −5.3945%] (p = 0.00 < 0.05)
                        thrpt:  [+5.7021% +7.5431% +9.1700%]
                        Performance has improved.
Found 4 outliers among 20 measurements (20.00%)
  3 (15.00%) low mild
  1 (5.00%) high severe
Inverse f64/RustFFT/1024
                        time:   [3.1763 µs 3.1916 µs 3.2012 µs]
                        thrpt:  [319.88 Melem/s 320.84 Melem/s 322.39 Melem/s]
                        thrpt:  [4.7665 GiB/s 4.7809 GiB/s 4.8040 GiB/s]
                 change:
                        time:   [−0.0598% +2.6398% +5.5739%] (p = 0.08 > 0.05)
                        thrpt:  [−5.2796% −2.5719% +0.0599%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/PhastFT DIT/2048
                        time:   [7.1433 µs 7.1792 µs 7.2024 µs]
                        thrpt:  [284.35 Melem/s 285.27 Melem/s 286.70 Melem/s]
                        thrpt:  [4.2371 GiB/s 4.2508 GiB/s 4.2722 GiB/s]
                 change:
                        time:   [−6.9670% −4.5458% −1.9826%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0227% +4.7623% +7.4887%]
                        Performance has improved.
Found 4 outliers among 20 measurements (20.00%)
  2 (10.00%) low severe
  2 (10.00%) low mild
Inverse f64/RustFFT/2048
                        time:   [7.3784 µs 7.4139 µs 7.4375 µs]
                        thrpt:  [275.36 Melem/s 276.24 Melem/s 277.57 Melem/s]
                        thrpt:  [4.1032 GiB/s 4.1163 GiB/s 4.1361 GiB/s]
                 change:
                        time:   [−0.3178% +2.1271% +4.5954%] (p = 0.11 > 0.05)
                        thrpt:  [−4.3935% −2.0828% +0.3189%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/PhastFT DIT/4096
                        time:   [15.764 µs 15.832 µs 15.883 µs]
                        thrpt:  [257.88 Melem/s 258.72 Melem/s 259.83 Melem/s]
                        thrpt:  [3.8428 GiB/s 3.8553 GiB/s 3.8718 GiB/s]
                 change:
                        time:   [−7.3417% −4.8361% −2.2897%] (p = 0.00 < 0.05)
                        thrpt:  [+2.3434% +5.0819% +7.9234%]
                        Performance has improved.
Found 4 outliers among 20 measurements (20.00%)
  2 (10.00%) low severe
  2 (10.00%) low mild
Inverse f64/RustFFT/4096
                        time:   [15.012 µs 15.091 µs 15.154 µs]
                        thrpt:  [270.28 Melem/s 271.42 Melem/s 272.86 Melem/s]
                        thrpt:  [4.0276 GiB/s 4.0445 GiB/s 4.0659 GiB/s]
                 change:
                        time:   [−0.9813% +1.1446% +3.2417%] (p = 0.31 > 0.05)
                        thrpt:  [−3.1399% −1.1316% +0.9910%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/PhastFT DIT/8192
                        time:   [34.959 µs 35.135 µs 35.248 µs]
                        thrpt:  [232.41 Melem/s 233.16 Melem/s 234.33 Melem/s]
                        thrpt:  [3.4632 GiB/s 3.4743 GiB/s 3.4918 GiB/s]
                 change:
                        time:   [−6.0615% −3.9872% −1.6850%] (p = 0.00 < 0.05)
                        thrpt:  [+1.7139% +4.1527% +6.4526%]
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/RustFFT/8192
                        time:   [33.488 µs 33.652 µs 33.824 µs]
                        thrpt:  [242.20 Melem/s 243.43 Melem/s 244.62 Melem/s]
                        thrpt:  [3.6090 GiB/s 3.6274 GiB/s 3.6452 GiB/s]
                 change:
                        time:   [−1.4566% +0.4807% +2.3236%] (p = 0.62 > 0.05)
                        thrpt:  [−2.2708% −0.4784% +1.4781%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Inverse f64/PhastFT DIT/16384
                        time:   [77.518 µs 77.974 µs 78.363 µs]
                        thrpt:  [209.08 Melem/s 210.12 Melem/s 211.36 Melem/s]
                        thrpt:  [3.1155 GiB/s 3.1311 GiB/s 3.1495 GiB/s]
                 change:
                        time:   [−11.729% −9.8938% −8.0985%] (p = 0.00 < 0.05)
                        thrpt:  [+8.8122% +10.980% +13.287%]
                        Performance has improved.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/RustFFT/16384
                        time:   [81.131 µs 81.574 µs 81.917 µs]
                        thrpt:  [200.01 Melem/s 200.85 Melem/s 201.95 Melem/s]
                        thrpt:  [2.9803 GiB/s 2.9929 GiB/s 3.0092 GiB/s]
                 change:
                        time:   [−2.4316% −0.1996% +2.2222%] (p = 0.87 > 0.05)
                        thrpt:  [−2.1739% +0.2000% +2.4922%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/32768
                        time:   [178.99 µs 179.83 µs 180.43 µs]
                        thrpt:  [181.61 Melem/s 182.21 Melem/s 183.08 Melem/s]
                        thrpt:  [2.7062 GiB/s 2.7152 GiB/s 2.7281 GiB/s]
                 change:
                        time:   [−9.9793% −8.7826% −7.6349%] (p = 0.00 < 0.05)
                        thrpt:  [+8.2660% +9.6282% +11.086%]
                        Performance has improved.
Inverse f64/RustFFT/32768
                        time:   [174.98 µs 176.67 µs 177.73 µs]
                        thrpt:  [184.37 Melem/s 185.48 Melem/s 187.26 Melem/s]
                        thrpt:  [2.7473 GiB/s 2.7638 GiB/s 2.7905 GiB/s]
                 change:
                        time:   [−3.7278% −0.6451% +2.6253%] (p = 0.71 > 0.05)
                        thrpt:  [−2.5582% +0.6493% +3.8722%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/65536
                        time:   [416.50 µs 418.97 µs 420.98 µs]
                        thrpt:  [155.68 Melem/s 156.42 Melem/s 157.35 Melem/s]
                        thrpt:  [2.3197 GiB/s 2.3309 GiB/s 2.3447 GiB/s]
                 change:
                        time:   [−10.509% −8.0228% −5.4266%] (p = 0.00 < 0.05)
                        thrpt:  [+5.7379% +8.7226% +11.743%]
                        Performance has improved.
Found 4 outliers among 20 measurements (20.00%)
  3 (15.00%) low severe
  1 (5.00%) low mild
Inverse f64/RustFFT/65536
                        time:   [365.13 µs 367.37 µs 368.90 µs]
                        thrpt:  [177.65 Melem/s 178.39 Melem/s 179.48 Melem/s]
                        thrpt:  [2.6472 GiB/s 2.6583 GiB/s 2.6745 GiB/s]
                 change:
                        time:   [−1.9923% +0.5212% +3.0366%] (p = 0.69 > 0.05)
                        thrpt:  [−2.9471% −0.5185% +2.0328%]
                        No change in performance detected.
Found 4 outliers among 20 measurements (20.00%)
  1 (5.00%) low severe
  3 (15.00%) low mild
Inverse f64/PhastFT DIT/131072
                        time:   [898.09 µs 905.64 µs 910.58 µs]
                        thrpt:  [143.94 Melem/s 144.73 Melem/s 145.95 Melem/s]
                        thrpt:  [2.1449 GiB/s 2.1566 GiB/s 2.1748 GiB/s]
                 change:
                        time:   [−10.768% −7.9603% −4.9555%] (p = 0.00 < 0.05)
                        thrpt:  [+5.2139% +8.6488% +12.067%]
                        Performance has improved.
Inverse f64/RustFFT/131072
                        time:   [779.79 µs 784.12 µs 787.29 µs]
                        thrpt:  [166.49 Melem/s 167.16 Melem/s 168.09 Melem/s]
                        thrpt:  [2.4808 GiB/s 2.4908 GiB/s 2.5047 GiB/s]
                 change:
                        time:   [−2.3067% −0.0817% +2.2161%] (p = 0.95 > 0.05)
                        thrpt:  [−2.1680% +0.0817% +2.3611%]
                        No change in performance detected.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/PhastFT DIT/262144
                        time:   [1.9440 ms 1.9579 ms 1.9680 ms]
                        thrpt:  [133.20 Melem/s 133.89 Melem/s 134.85 Melem/s]
                        thrpt:  [1.9849 GiB/s 1.9951 GiB/s 2.0094 GiB/s]
                 change:
                        time:   [−10.265% −7.5447% −4.4676%] (p = 0.00 < 0.05)
                        thrpt:  [+4.6765% +8.1604% +11.439%]
                        Performance has improved.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low severe
  1 (5.00%) low mild
Inverse f64/RustFFT/262144
                        time:   [1.7155 ms 1.7229 ms 1.7293 ms]
                        thrpt:  [151.59 Melem/s 152.15 Melem/s 152.81 Melem/s]
                        thrpt:  [2.2588 GiB/s 2.2672 GiB/s 2.2771 GiB/s]
                 change:
                        time:   [−2.6663% −0.4634% +1.8335%] (p = 0.69 > 0.05)
                        thrpt:  [−1.8005% +0.4656% +2.7393%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low severe
  1 (5.00%) low mild
Inverse f64/PhastFT DIT/524288
                        time:   [4.3716 ms 4.4048 ms 4.4270 ms]
                        thrpt:  [118.43 Melem/s 119.03 Melem/s 119.93 Melem/s]
                        thrpt:  [1.7647 GiB/s 1.7736 GiB/s 1.7871 GiB/s]
                 change:
                        time:   [−7.9806% −4.9515% −1.7869%] (p = 0.01 < 0.05)
                        thrpt:  [+1.8194% +5.2094% +8.6728%]
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/RustFFT/524288
                        time:   [3.7139 ms 3.7249 ms 3.7398 ms]
                        thrpt:  [140.19 Melem/s 140.75 Melem/s 141.17 Melem/s]
                        thrpt:  [2.0890 GiB/s 2.0974 GiB/s 2.1036 GiB/s]
                 change:
                        time:   [−0.2433% +0.4846% +1.3432%] (p = 0.25 > 0.05)
                        thrpt:  [−1.3254% −0.4822% +0.2439%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  1 (5.00%) high mild
  2 (10.00%) high severe
Inverse f64/PhastFT DIT/1048576
                        time:   [9.6474 ms 9.7702 ms 9.8541 ms]
                        thrpt:  [106.41 Melem/s 107.32 Melem/s 108.69 Melem/s]
                        thrpt:  [1.5856 GiB/s 1.5993 GiB/s 1.6196 GiB/s]
                 change:
                        time:   [−9.2726% −6.6235% −3.8813%] (p = 0.00 < 0.05)
                        thrpt:  [+4.0380% +7.0933% +10.220%]
                        Performance has improved.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/RustFFT/1048576
                        time:   [8.3414 ms 8.3970 ms 8.4531 ms]
                        thrpt:  [124.05 Melem/s 124.88 Melem/s 125.71 Melem/s]
                        thrpt:  [1.8484 GiB/s 1.8608 GiB/s 1.8732 GiB/s]
                 change:
                        time:   [−1.6407% −0.8285% −0.0222%] (p = 0.06 > 0.05)
                        thrpt:  [+0.0222% +0.8354% +1.6680%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Benchmarking Inverse f64/PhastFT DIT/2097152: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 10.
Inverse f64/PhastFT DIT/2097152
                        time:   [22.867 ms 23.042 ms 23.161 ms]
                        thrpt:  [90.548 Melem/s 91.015 Melem/s 91.709 Melem/s]
                        thrpt:  [1.3493 GiB/s 1.3562 GiB/s 1.3666 GiB/s]
                 change:
                        time:   [−5.8159% −4.8214% −3.8283%] (p = 0.00 < 0.05)
                        thrpt:  [+3.9807% +5.0656% +6.1750%]
                        Performance has improved.
Benchmarking Inverse f64/RustFFT/2097152: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 10.
Inverse f64/RustFFT/2097152
                        time:   [18.838 ms 18.930 ms 19.012 ms]
                        thrpt:  [110.31 Melem/s 110.79 Melem/s 111.33 Melem/s]
                        thrpt:  [1.6437 GiB/s 1.6508 GiB/s 1.6589 GiB/s]
                 change:
                        time:   [−1.0726% +0.1481% +1.8872%] (p = 0.85 > 0.05)
                        thrpt:  [−1.8523% −0.1479% +1.0842%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high severe
Inverse f64/PhastFT DIT/4194304
                        time:   [49.850 ms 50.182 ms 50.519 ms]
                        thrpt:  [83.024 Melem/s 83.581 Melem/s 84.139 Melem/s]
                        thrpt:  [1.2372 GiB/s 1.2455 GiB/s 1.2538 GiB/s]
                 change:
                        time:   [−8.0122% −7.2947% −6.5696%] (p = 0.00 < 0.05)
                        thrpt:  [+7.0316% +7.8687% +8.7101%]
                        Performance has improved.
Inverse f64/RustFFT/4194304
                        time:   [42.064 ms 42.322 ms 42.581 ms]
                        thrpt:  [98.501 Melem/s 99.104 Melem/s 99.713 Melem/s]
                        thrpt:  [1.4678 GiB/s 1.4768 GiB/s 1.4858 GiB/s]
                 change:
                        time:   [−0.4965% +0.4985% +1.5247%] (p = 0.36 > 0.05)
                        thrpt:  [−1.5018% −0.4960% +0.4989%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Inverse f64/PhastFT DIT/8388608
                        time:   [109.40 ms 110.18 ms 110.99 ms]
                        thrpt:  [75.578 Melem/s 76.134 Melem/s 76.675 Melem/s]
                        thrpt:  [1.1262 GiB/s 1.1345 GiB/s 1.1425 GiB/s]
                 change:
                        time:   [−4.0761% −3.2892% −2.4356%] (p = 0.00 < 0.05)
                        thrpt:  [+2.4964% +3.4011% +4.2493%]
                        Performance has improved.
Inverse f64/RustFFT/8388608
                        time:   [84.913 ms 85.476 ms 86.000 ms]
                        thrpt:  [97.542 Melem/s 98.140 Melem/s 98.791 Melem/s]
                        thrpt:  [1.4535 GiB/s 1.4624 GiB/s 1.4721 GiB/s]
                 change:
                        time:   [−2.0096% −1.1812% −0.4254%] (p = 0.00 < 0.05)
                        thrpt:  [+0.4272% +1.1953% +2.0508%]
                        Change within noise threshold.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Benchmarking Inverse f64/PhastFT DIT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 5.5s, or reduce sample count to 10.
Inverse f64/PhastFT DIT/16777216
                        time:   [224.69 ms 225.83 ms 226.93 ms]
                        thrpt:  [73.932 Melem/s 74.293 Melem/s 74.668 Melem/s]
                        thrpt:  [1.1017 GiB/s 1.1071 GiB/s 1.1126 GiB/s]
                 change:
                        time:   [−5.1612% −4.4985% −3.8749%] (p = 0.00 < 0.05)
                        thrpt:  [+4.0311% +4.7104% +5.4421%]
                        Performance has improved.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low mild
  1 (5.00%) high mild
Inverse f64/RustFFT/16777216
                        time:   [183.47 ms 184.64 ms 185.83 ms]
                        thrpt:  [90.283 Melem/s 90.865 Melem/s 91.446 Melem/s]
                        thrpt:  [1.3453 GiB/s 1.3540 GiB/s 1.3627 GiB/s]
                 change:
                        time:   [−2.0136% −1.3284% −0.6266%] (p = 0.00 < 0.05)
                        thrpt:  [+0.6306% +1.3463% +2.0549%]
                        Change within noise threshold.

@Shnatsel Shnatsel marked this pull request as ready for review April 18, 2026 09:35
@Shnatsel Shnatsel merged commit 7fd74f1 into main Apr 18, 2026
10 checks passed
@Shnatsel Shnatsel deleted the in-registers-codelet branch April 18, 2026 11:23
@Shnatsel
Copy link
Copy Markdown
Collaborator Author

This could really use linebender/fearless_simd#206 but I've polyfilled it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants