Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use w_naf to reduce time consume of point_mul #30

Merged
merged 18 commits into from Jan 13, 2022
Merged

use w_naf to reduce time consume of point_mul #30

merged 18 commits into from Jan 13, 2022

Conversation

FullyRobert
Copy link
Contributor

w-naf can effectively reduce the time consumption of point multiplication. In sm2, we set the width of the window to 5, and the except time consumption is 50A+257D where the origin time consumption is 128A+256D.

FullyRobert and others added 7 commits August 18, 2021 09:19
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
@FullyRobert
Copy link
Contributor Author

In #27 you have mentioned the other repo efficient-sm2. In efficient-sm2, point multiplication use a fixed window of width 4, maybe we can modify this method to improve the performance of efficient-sm2.

If you have interest on performance maybe you would like to look efficient-sm2 which we replace libsm

The structure of the efficient-sm2 is rather complicated, so I hope you can briefly introduce the project structure in the readme.md

@Pencil-Yao
Copy link
Member

naf is wonderful idea, I need some time to understand

@FullyRobert
Copy link
Contributor Author

FullyRobert commented Sep 14, 2021

thanks😊

@Pencil-Yao
Copy link
Member

It's my turn to thank you. The performance of crypto lib alway determin the best tps of a blockchain system. NAF may be also a good point for efficient-sm2. But I just not have enough time to do it. My turn to say thank you.

@FullyRobert
Copy link
Contributor Author

ok,but I also not have enough time to do it recently😂. Maybe you can check this pr fisrtly and I will modify efficient-sm2 when I have time

@Pencil-Yao
Copy link
Member

Nothing serious. It's my duty to improve efficient-sm2

@FullyRobert
Copy link
Contributor Author

Is my pr having some problems?😂It has been on hold for over a month

@FullyRobert
Copy link
Contributor Author

😂

@Pencil-Yao
Copy link
Member

The result of performance test: No significant improvement.

cargo +nightly bench --workspace --features internal_benches

Origin: 2times
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,715 ns/iter (+/- 179)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,375 ns/iter (+/- 42)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,219 ns/iter (+/- 54)
test sm2::signature::signature_benches::sign_bench ... bench: 122,010 ns/iter (+/- 2,640)
test sm2::signature::signature_benches::verify_bench ... bench: 639,270 ns/iter (+/- 54,168)

test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,631 ns/iter (+/- 99)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,400 ns/iter (+/- 75)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,210 ns/iter (+/- 36)
test sm2::signature::signature_benches::sign_bench ... bench: 119,702 ns/iter (+/- 1,460)
test sm2::signature::signature_benches::verify_bench ... bench: 621,598 ns/iter (+/- 145,741)

Yours: 2times
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,681 ns/iter (+/- 635)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,395 ns/iter (+/- 49)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,215 ns/iter (+/- 44)
test sm2::signature::signature_benches::sign_bench ... bench: 121,787 ns/iter (+/- 1,436)
test sm2::signature::signature_benches::verify_bench ... bench: 649,102 ns/iter (+/- 12,225)

test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,870 ns/iter (+/- 118)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,400 ns/iter (+/- 44)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,208 ns/iter (+/- 47)
test sm2::signature::signature_benches::sign_bench ... bench: 121,145 ns/iter (+/- 2,176)
test sm2::signature::signature_benches::verify_bench ... bench: 664,214 ns/iter (+/- 17,916)

ps. bench environment:

  • cpu: amd r7 4800-h
  • memory: 32g
  • os: ubuntu 20.04

@Pencil-Yao
Copy link
Member

w-naf can effectively reduce the time consumption of point multiplication. In sm2, we set the width of the window to 5, and the except time consumption is 50A+257D where the origin time consumption is 128A+256D.

I just wonder what mean about A & D and how to count it?

@FullyRobert
Copy link
Contributor Author

A means ADD operation, D means double operation. In original, a point_mul operation need 128 add and 256 double operations.

@FullyRobert
Copy link
Contributor Author

and in my test, I can see an explicit improvement in verifyfunction😂

origin:
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 8,615 ns/iter (+/- 260)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,308 ns/iter (+/- 62)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,177 ns/iter (+/- 48)
test sm2::signature::signature_benches::sign_bench ... bench: 108,438 ns/iter (+/- 1,772)
test sm2::signature::signature_benches::verify_bench ... bench: 630,250 ns/iter (+/- 5,821)

mypr:
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 8,584 ns/iter (+/- 544)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,273 ns/iter (+/- 79)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,129 ns/iter (+/- 62)
test sm2::signature::signature_benches::sign_bench ... bench: 109,543 ns/iter (+/- 5,492)
test sm2::signature::signature_benches::verify_bench ... bench: 490,355 ns/iter (+/- 17,563)

@Pencil-Yao
Copy link
Member

have mistake, yours result:
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,869 ns/iter (+/- 50)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,373 ns/iter (+/- 13)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,185 ns/iter (+/- 21)
test sm2::signature::signature_benches::sign_bench ... bench: 120,412 ns/iter (+/- 959)
test sm2::signature::signature_benches::verify_bench ... bench: 517,179 ns/iter (+/- 13,110)

src/sm2/ecc.rs Outdated Show resolved Hide resolved
src/sm2/ecc.rs Outdated Show resolved Hide resolved
src/sm2/ecc.rs Outdated Show resolved Hide resolved
src/sm2/ecc.rs Outdated Show resolved Hide resolved
src/sm2/ecc.rs Outdated Show resolved Hide resolved
src/sm2/ecc.rs Outdated Show resolved Hide resolved
src/sm2/ecc.rs Outdated Show resolved Hide resolved
src/sm2/ecc.rs Outdated Show resolved Hide resolved
src/sm2/ecc.rs Outdated Show resolved Hide resolved
src/sm2/ecc.rs Outdated Show resolved Hide resolved
@Pencil-Yao
Copy link
Member

I thought the improve is from precompute table. I push new branch: https://github.com/citahub/libsm/tree/naf_bench
result:

test sm2::ecc::internal_benches::bench_mul_raw          ... bench:      75,152 ns/iter (+/- 1,488)
test sm2::ecc::internal_benches::bench_mul_raw_naf      ... bench:      73,674 ns/iter (+/- 7,059)
test sm2::ecc::internal_benches::sm2_inv_bench          ... bench:       9,788 ns/iter (+/- 200)
test sm2::ecc::internal_benches::sm2_point_add_bench    ... bench:       1,377 ns/iter (+/- 191)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench:       1,445 ns/iter (+/- 125)
test sm2::signature::signature_benches::sign_bench      ... bench:     129,971 ns/iter (+/- 5,475)
test sm2::signature::signature_benches::verify_bench    ... bench:     592,063 ns/iter (+/- 66,204)

bench_mul_raw & bench_mul_raw_naf I can't say naf improve the performance.

@FullyRobert
Copy link
Contributor Author

FullyRobert commented Jan 6, 2022

I think that is because the number of m is too small and if you get m from the function curve.random_uint, you can see an explicit improvement in waf, like this :
let m = curve.random_uint() % curve.get_n();
The point-mul during the actual verification will multiple a large number(256bits length), which facilitates the w-naf method to be effective.
That is my test result:

test sm2::ecc::internal_benches::bench_mul_raw ... bench: 463,839 ns/iter (+/- 11,877)
test sm2::ecc::internal_benches::bench_mul_raw_naf ... bench: 395,910 ns/iter (+/- 23,896)
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 8,475 ns/iter (+/- 724)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,345 ns/iter (+/- 139)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,289 ns/iter (+/- 45)
test sm2::signature::signature_benches::sign_bench ... bench: 110,690 ns/iter (+/- 2,277)
test sm2::signature::signature_benches::verify_bench ... bench: 502,630 ns/iter (+/- 10,705)

@Pencil-Yao
Copy link
Member

I think that is because the number of m is too small and if you get m from the function curve.random_uint, you can see an explicit improvement in waf, like this : let m = curve.random_uint() % curve.get_n(); The point-mul during the actual verification will multiple a large number(256bits length), which facilitates the w-naf method to be effective. That is my test result:

test sm2::ecc::internal_benches::bench_mul_raw ... bench: 463,839 ns/iter (+/- 11,877)
test sm2::ecc::internal_benches::bench_mul_raw_naf ... bench: 395,910 ns/iter (+/- 23,896)
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 8,475 ns/iter (+/- 724)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,345 ns/iter (+/- 139)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,289 ns/iter (+/- 45)
test sm2::signature::signature_benches::sign_bench ... bench: 110,690 ns/iter (+/- 2,277)
test sm2::signature::signature_benches::verify_bench ... bench: 502,630 ns/iter (+/- 10,705)

I got it. Last you need fix typo.

FullyRobert and others added 4 commits January 7, 2022 18:31
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
FullyRobert and others added 6 commits January 7, 2022 18:32
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
Co-authored-by: Yieazy <yuitta@163.com>
@FullyRobert
Copy link
Contributor Author

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants