Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: use faster NEON bitmask and fix skip number bug #63

Merged
merged 6 commits into from
Feb 18, 2024
Merged

Conversation

liuq19
Copy link
Collaborator

@liuq19 liuq19 commented Feb 18, 2024

What type of PR is this?

Optimize and refactor

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

  1. implement faster bitmask for NEON vector.reference
  2. refactor the codes of portable SIMD to support faster bitmap above.
  3. deal with the different endianness when using bitmask for vector

benchmark data:
deserialize_struct

twitter/sonic_rs::from_slice_unchecked
                        time:   [416.42 µs 417.61 µs 418.95 µs]
                        change: [-8.1303% -7.4773% -6.8652%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  9 (9.00%) high mild
  10 (10.00%) high severe
twitter/sonic_rs::from_slice
                        time:   [437.19 µs 438.17 µs 439.25 µs]
                        change: [-10.175% -9.5169% -8.8809%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

citm_catalog/sonic_rs::from_slice_unchecked
                        time:   [829.53 µs 830.77 µs 832.16 µs]
                        change: [-5.8317% -5.5663% -5.3073%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) high mild
  15 (15.00%) high severe
citm_catalog/sonic_rs::from_slice
                        time:   [850.73 µs 852.09 µs 853.61 µs]
                        change: [-5.6535% -5.3738% -5.1004%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe

canada/sonic_rs::from_slice_unchecked
                        time:   [3.1962 ms 3.2209 ms 3.2451 ms]
                        change: [-1.7136% -0.7248% +0.3155%] (p = 0.16 > 0.05)
                        No change in performance detected.
canada/sonic_rs::from_slice
                        time:   [3.2662 ms 3.2874 ms 3.3082 ms]
                        change: [-0.4885% +0.3860% +1.3043%] (p = 0.40 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low mild

serialize_value

twitter/sonic_rs::to_string
                        time:   [176.28 µs 177.38 µs 178.71 µs]
                        change: [-3.8439% -2.9233% -1.9506%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

citm_catalog/sonic_rs::to_string
                        time:   [345.75 µs 346.43 µs 347.25 µs]
                        change: [-1.9978% -1.5666% -1.1737%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  11 (11.00%) high severe

canada/sonic_rs::to_string
                        time:   [2.9357 ms 2.9412 ms 2.9478 ms]
                        change: [-0.8109% -0.1844% +0.3632%] (p = 0.55 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

(Optional) Which issue(s) this PR fixes:

(optional) The PR that updates user documentation:

@liuq19 liuq19 changed the title opt: use faster neon mask opt: use faster NEON bitmask Feb 18, 2024
@liuq19 liuq19 changed the title opt: use faster NEON bitmask opt: use faster NEON bitmask and fix skip number bug when using sse2/avx2 simd Feb 18, 2024
@liuq19 liuq19 changed the title opt: use faster NEON bitmask and fix skip number bug when using sse2/avx2 simd opt: use faster NEON bitmask and fix skip number bug Feb 18, 2024
@liuq19 liuq19 marked this pull request as ready for review February 18, 2024 12:56
@liuq19 liuq19 merged commit b7e394c into main Feb 18, 2024
15 of 16 checks passed
@liuq19 liuq19 deleted the opt/aarch64 branch February 18, 2024 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants