New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use chunks of triples to compute Pedersen hash #1420

Merged
merged 31 commits into from Jan 18, 2019

Conversation

Projects
None yet
5 participants
@psteckler
Copy link
Contributor

psteckler commented Jan 10, 2019

Besides generating an ML file with Pedersen hash parameters, we also generate a table of curve values for chunks of triples. The size of the chunk is given by Chunked_triples.Chunk.size.

With a chunk size of 3, hashing takes about half the time. Generating the table takes a small number of seconds; increasing the size increases the generation time. The serialized version of the table is large, about 70 Mb, perhaps a consideration.

The code was tested by building and running all unit tests, and for every hash computed, it was computed using the existing method and the chunked method, and comparing the results.

There are conversions between triple chunks and integers, there's a unit test that exhaustively checks the range of possible integers.

@psteckler psteckler requested a review from imeckler Jan 10, 2019

Paul Steckler
@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 10, 2019

Build times on my laptop:

chunk size of 3

real	2m3.224s
user	8m34.369s
sys	1m34.976s

chunk size of 4

real	5m35.915s
user	11m52.500s
sys	1m45.818s

psteckler and others added some commits Jan 10, 2019

Paul Steckler
@bkase
Copy link
Contributor

bkase left a comment

I'll let @imeckler review, but can you (1) include microbenchmarks (do some let%benches cc @cmr ) (2) how much faster is this compared to not memoizing (can you include a table in the description of the PR maybe?)

2min impact on build times seems acceptable

@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 11, 2019

2min impact on build times seems acceptable

@bkase You're suggesting that the chunk size of 4 is OK for the dev config?

We can add a configuration item to bump up the chunk size to maybe 5 for release.

@bkase

This comment has been minimized.

Copy link
Contributor

bkase commented Jan 11, 2019

We should measure on a MacBook pro too, your machine may be better

@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 11, 2019

With a chunk size of 3, for a list of 5000 triples:

┌────────────────────────────────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
│ Name                               │ Time/Run │  mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├────────────────────────────────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
│ [app/bench/hashes.ml] using params │   5.99ms │ 842.75kw │  45.10kw │  45.10kw │    100.00% │
│ [app/bench/hashes.ml] using chunks │   2.26ms │ 349.92kw │  22.79kw │  22.79kw │     37.67% │
└────────────────────────────────────┴──────────┴──────────┴──────────┴──────────┴────────────┘

@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 11, 2019

With a chunk size of 4:

┌────────────────────────────────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
│ Name                               │ Time/Run │  mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├────────────────────────────────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
│ [app/bench/hashes.ml] using params │   6.25ms │ 986.01kw │  45.11kw │  45.11kw │    100.00% │
│ [app/bench/hashes.ml] using chunks │   1.77ms │ 270.43kw │  21.95kw │  21.95kw │     28.28% │
└────────────────────────────────────┴──────────┴──────────┴──────────┴──────────┴────────────┘
@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 11, 2019

Size of serialized chunk table:

For chunk size of 3: 70 Mb
For chunk size of 4: 425 Mb

I'd venture that the slightly better performance with 4 is not worth the large memory cost.

@bkase

This comment has been minimized.

Copy link
Contributor

bkase commented Jan 11, 2019

Yeah I agree, we should go with chunk size of 3 first

@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 11, 2019

The actual table sizes should be much smaller, less than 10 Mb for a chunk size of 4 by my quick calculation.

I will add code to allow the serialized table to be GCed, once it's deserialized.

That means we can think about larger chunk sizes with better performance, where there's a memory cost on startup, but not thereafter.

@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 12, 2019

I think that it might be better to not compute the chunk values at compile-time, and instead, allocate the table at run-time, and memoize the chunks as needed. That would save the extra compile time and large memory requirements, and allow the use of bigger chunks, yielding better performance.

Paul Steckler
@imeckler
Copy link
Contributor

imeckler left a comment

hi Paul -- looks great, just left a few comments, let me know what you think.

Show resolved Hide resolved src/lib/chunked_triples/chunk.ml Outdated
Show resolved Hide resolved src/lib/chunked_triples/chunk.ml
Show resolved Hide resolved src/lib/chunked_triples/chunk.ml Outdated
Show resolved Hide resolved src/lib/crypto_params/gen/gen.ml
Show resolved Hide resolved src/lib/crypto_params/gen/gen.ml Outdated
Show resolved Hide resolved src/lib/crypto_params/gen/gen.ml Outdated
Show resolved Hide resolved src/lib/snark_params/pedersen.ml Outdated

Paul Steckler and others added some commits Jan 15, 2019

Paul Steckler
WIP
@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 15, 2019

I've committed WIP for the chunking within the fold, CI won't pass yet.

psteckler added some commits Jan 16, 2019

@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 16, 2019

All comments addressed. Chunking happens within the fold, rather than as a separate preprocessing step, which should be faster.

psteckler and others added some commits Jan 17, 2019

@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 17, 2019

I've thought of another small optimization, will add that.

psteckler and others added some commits Jan 17, 2019

Paul Steckler
@imeckler

This comment has been minimized.

Copy link
Contributor

imeckler commented Jan 17, 2019

@psteckler looks great -- one last thing I forget to mention is I think would be good to have a unit test confirming the chunked version is doing the same thing as the unchunked version

Paul Steckler and others added some commits Jan 17, 2019

Paul Steckler
@psteckler

This comment has been minimized.

Copy link
Contributor Author

psteckler commented Jan 18, 2019

added tests to compare unchunked, chunked algorithm results

psteckler added some commits Jan 18, 2019

@psteckler psteckler merged commit af8f447 into master Jan 18, 2019

9 checks passed

ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: build-macos Your tests passed on CircleCI!
Details
ci/circleci: build_public Your tests passed on CircleCI!
Details
ci/circleci: build_withsnark Your tests passed on CircleCI!
Details
ci/circleci: test-all_sig_integration_tests Your tests passed on CircleCI!
Details
ci/circleci: test-all_stake_integration_tests Your tests passed on CircleCI!
Details
ci/circleci: test-unit-test Your tests passed on CircleCI!
Details
ci/circleci: test-withsnark Your tests passed on CircleCI!
Details
ci/circleci: tracetool Your tests passed on CircleCI!
Details

@psteckler psteckler deleted the feature/memoized-pedersen-hash branch Jan 18, 2019

@nicola

This comment has been minimized.

Copy link

nicola commented Feb 7, 2019

Is chunk size == window size in Zcash’ jargon?

@cmr

This comment has been minimized.

Copy link
Contributor

cmr commented Feb 12, 2019

I don't think they're related. Not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment