use w_naf to reduce time consume of point_mul #30

FullyRobert · 2021-08-27T01:54:00Z

w-naf can effectively reduce the time consumption of point multiplication. In sm2, we set the width of the window to 5, and the except time consumption is 50A+257D where the origin time consumption is 128A+256D.

Co-authored-by: Yieazy <yuitta@163.com>

FullyRobert · 2021-08-27T01:55:53Z

In #27 you have mentioned the other repo efficient-sm2. In efficient-sm2, point multiplication use a fixed window of width 4, maybe we can modify this method to improve the performance of efficient-sm2.

If you have interest on performance maybe you would like to look efficient-sm2 which we replace libsm

The structure of the efficient-sm2 is rather complicated, so I hope you can briefly introduce the project structure in the readme.md

Pencil-Yao · 2021-09-04T08:32:06Z

naf is wonderful idea, I need some time to understand

FullyRobert · 2021-09-14T02:13:28Z

thanks😊

Pencil-Yao · 2021-09-14T12:20:09Z

It's my turn to thank you. The performance of crypto lib alway determin the best tps of a blockchain system. NAF may be also a good point for efficient-sm2. But I just not have enough time to do it. My turn to say thank you.

FullyRobert · 2021-10-06T07:54:33Z

ok，but I also not have enough time to do it recently😂. Maybe you can check this pr fisrtly and I will modify efficient-sm2 when I have time

Pencil-Yao · 2021-10-08T11:13:33Z

Nothing serious. It's my duty to improve efficient-sm2

FullyRobert · 2021-11-17T05:30:45Z

Is my pr having some problems？😂It has been on hold for over a month

FullyRobert · 2022-01-05T10:50:53Z

😂

Pencil-Yao · 2022-01-05T11:22:48Z

The result of performance test: No significant improvement.

cargo +nightly bench --workspace --features internal_benches

Origin: 2times
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,715 ns/iter (+/- 179)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,375 ns/iter (+/- 42)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,219 ns/iter (+/- 54)
test sm2::signature::signature_benches::sign_bench ... bench: 122,010 ns/iter (+/- 2,640)
test sm2::signature::signature_benches::verify_bench ... bench: 639,270 ns/iter (+/- 54,168)

test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,631 ns/iter (+/- 99)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,400 ns/iter (+/- 75)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,210 ns/iter (+/- 36)
test sm2::signature::signature_benches::sign_bench ... bench: 119,702 ns/iter (+/- 1,460)
test sm2::signature::signature_benches::verify_bench ... bench: 621,598 ns/iter (+/- 145,741)

Yours: 2times
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,681 ns/iter (+/- 635)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,395 ns/iter (+/- 49)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,215 ns/iter (+/- 44)
test sm2::signature::signature_benches::sign_bench ... bench: 121,787 ns/iter (+/- 1,436)
test sm2::signature::signature_benches::verify_bench ... bench: 649,102 ns/iter (+/- 12,225)

test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,870 ns/iter (+/- 118)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,400 ns/iter (+/- 44)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,208 ns/iter (+/- 47)
test sm2::signature::signature_benches::sign_bench ... bench: 121,145 ns/iter (+/- 2,176)
test sm2::signature::signature_benches::verify_bench ... bench: 664,214 ns/iter (+/- 17,916)

ps. bench environment:

cpu: amd r7 4800-h
memory: 32g
os: ubuntu 20.04

Pencil-Yao · 2022-01-05T11:26:06Z

w-naf can effectively reduce the time consumption of point multiplication. In sm2, we set the width of the window to 5, and the except time consumption is 50A+257D where the origin time consumption is 128A+256D.

I just wonder what mean about A & D and how to count it?

FullyRobert · 2022-01-05T13:03:35Z

A means ADD operation， D means double operation. In original, a point_mul operation need 128 add and 256 double operations.

FullyRobert · 2022-01-05T13:16:18Z

and in my test, I can see an explicit improvement in verifyfunction😂

origin:
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 8,615 ns/iter (+/- 260)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,308 ns/iter (+/- 62)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,177 ns/iter (+/- 48)
test sm2::signature::signature_benches::sign_bench ... bench: 108,438 ns/iter (+/- 1,772)
test sm2::signature::signature_benches::verify_bench ... bench: 630,250 ns/iter (+/- 5,821)

mypr:
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 8,584 ns/iter (+/- 544)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,273 ns/iter (+/- 79)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,129 ns/iter (+/- 62)
test sm2::signature::signature_benches::sign_bench ... bench: 109,543 ns/iter (+/- 5,492)
test sm2::signature::signature_benches::verify_bench ... bench: 490,355 ns/iter (+/- 17,563)

Pencil-Yao · 2022-01-06T02:33:28Z

have mistake, yours result:
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 9,869 ns/iter (+/- 50)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,373 ns/iter (+/- 13)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,185 ns/iter (+/- 21)
test sm2::signature::signature_benches::sign_bench ... bench: 120,412 ns/iter (+/- 959)
test sm2::signature::signature_benches::verify_bench ... bench: 517,179 ns/iter (+/- 13,110)

src/sm2/ecc.rs

Pencil-Yao · 2022-01-06T03:08:21Z

I thought the improve is from precompute table. I push new branch: https://github.com/citahub/libsm/tree/naf_bench
result:

test sm2::ecc::internal_benches::bench_mul_raw          ... bench:      75,152 ns/iter (+/- 1,488)
test sm2::ecc::internal_benches::bench_mul_raw_naf      ... bench:      73,674 ns/iter (+/- 7,059)
test sm2::ecc::internal_benches::sm2_inv_bench          ... bench:       9,788 ns/iter (+/- 200)
test sm2::ecc::internal_benches::sm2_point_add_bench    ... bench:       1,377 ns/iter (+/- 191)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench:       1,445 ns/iter (+/- 125)
test sm2::signature::signature_benches::sign_bench      ... bench:     129,971 ns/iter (+/- 5,475)
test sm2::signature::signature_benches::verify_bench    ... bench:     592,063 ns/iter (+/- 66,204)

bench_mul_raw & bench_mul_raw_naf I can't say naf improve the performance.

FullyRobert · 2022-01-06T06:26:39Z

I think that is because the number of m is too small and if you get m from the function curve.random_uint, you can see an explicit improvement in waf, like this :
let m = curve.random_uint() % curve.get_n();
The point-mul during the actual verification will multiple a large number（256bits length), which facilitates the w-naf method to be effective.
That is my test result:

test sm2::ecc::internal_benches::bench_mul_raw ... bench: 463,839 ns/iter (+/- 11,877)
test sm2::ecc::internal_benches::bench_mul_raw_naf ... bench: 395,910 ns/iter (+/- 23,896)
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 8,475 ns/iter (+/- 724)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,345 ns/iter (+/- 139)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,289 ns/iter (+/- 45)
test sm2::signature::signature_benches::sign_bench ... bench: 110,690 ns/iter (+/- 2,277)
test sm2::signature::signature_benches::verify_bench ... bench: 502,630 ns/iter (+/- 10,705)

Pencil-Yao · 2022-01-06T07:40:06Z

I think that is because the number of m is too small and if you get m from the function curve.random_uint, you can see an explicit improvement in waf, like this : let m = curve.random_uint() % curve.get_n(); The point-mul during the actual verification will multiple a large number（256bits length), which facilitates the w-naf method to be effective. That is my test result:

test sm2::ecc::internal_benches::bench_mul_raw ... bench: 463,839 ns/iter (+/- 11,877)
test sm2::ecc::internal_benches::bench_mul_raw_naf ... bench: 395,910 ns/iter (+/- 23,896)
test sm2::ecc::internal_benches::sm2_inv_bench ... bench: 8,475 ns/iter (+/- 724)
test sm2::ecc::internal_benches::sm2_point_add_bench ... bench: 1,345 ns/iter (+/- 139)
test sm2::ecc::internal_benches::sm2_point_double_bench ... bench: 1,289 ns/iter (+/- 45)
test sm2::signature::signature_benches::sign_bench ... bench: 110,690 ns/iter (+/- 2,277)
test sm2::signature::signature_benches::verify_bench ... bench: 502,630 ns/iter (+/- 10,705)

I got it. Last you need fix typo.

Co-authored-by: Yieazy <yuitta@163.com>

FullyRobert · 2022-01-07T10:33:38Z

ok

FullyRobert and others added 7 commits August 18, 2021 09:19

use shr to replace div 2 for inv_n

ac4afd1

Update src/sm2/ecc.rs

33726fa

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

87784b6

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

fe2e486

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

ca82249

Co-authored-by: Yieazy <yuitta@163.com>

Merge branch 'citahub:master' into master

ea13af6

use w_naf to reduce time consume of point_mul

6794b24

use clippy

a1714e4

Pencil-Yao reviewed Jan 6, 2022

View reviewed changes

src/sm2/ecc.rs Outdated Show resolved Hide resolved

FullyRobert and others added 4 commits January 7, 2022 18:31

Update src/sm2/ecc.rs

4156154

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

92ffa5c

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

50d90a8

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

91bc2b7

Co-authored-by: Yieazy <yuitta@163.com>

FullyRobert and others added 6 commits January 7, 2022 18:32

Update src/sm2/ecc.rs

65064b7

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

e2db30c

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

ed9dc98

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

bf14717

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

7d96827

Co-authored-by: Yieazy <yuitta@163.com>

Update src/sm2/ecc.rs

3573a6e

Co-authored-by: Yieazy <yuitta@163.com>

Pencil-Yao approved these changes Jan 13, 2022

View reviewed changes

Pencil-Yao merged commit e129d25 into citahub:master Jan 13, 2022

FullyRobert mentioned this pull request May 13, 2022

remove outdated method & modify bench function error #43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use w_naf to reduce time consume of point_mul #30

use w_naf to reduce time consume of point_mul #30

FullyRobert commented Aug 27, 2021

FullyRobert commented Aug 27, 2021

Pencil-Yao commented Sep 4, 2021

FullyRobert commented Sep 14, 2021 •

edited

Pencil-Yao commented Sep 14, 2021

FullyRobert commented Oct 6, 2021

Pencil-Yao commented Oct 8, 2021

FullyRobert commented Nov 17, 2021

FullyRobert commented Jan 5, 2022

Pencil-Yao commented Jan 5, 2022

Pencil-Yao commented Jan 5, 2022

FullyRobert commented Jan 5, 2022

FullyRobert commented Jan 5, 2022

Pencil-Yao commented Jan 6, 2022

Pencil-Yao commented Jan 6, 2022

FullyRobert commented Jan 6, 2022 •

edited

Pencil-Yao commented Jan 6, 2022

FullyRobert commented Jan 7, 2022

use w_naf to reduce time consume of point_mul #30

use w_naf to reduce time consume of point_mul #30

Conversation

FullyRobert commented Aug 27, 2021

FullyRobert commented Aug 27, 2021

Pencil-Yao commented Sep 4, 2021

FullyRobert commented Sep 14, 2021 • edited

Pencil-Yao commented Sep 14, 2021

FullyRobert commented Oct 6, 2021

Pencil-Yao commented Oct 8, 2021

FullyRobert commented Nov 17, 2021

FullyRobert commented Jan 5, 2022

Pencil-Yao commented Jan 5, 2022

Pencil-Yao commented Jan 5, 2022

FullyRobert commented Jan 5, 2022

FullyRobert commented Jan 5, 2022

Pencil-Yao commented Jan 6, 2022

Pencil-Yao commented Jan 6, 2022

FullyRobert commented Jan 6, 2022 • edited

Pencil-Yao commented Jan 6, 2022

FullyRobert commented Jan 7, 2022

FullyRobert commented Sep 14, 2021 •

edited

FullyRobert commented Jan 6, 2022 •

edited