Skip to content
This repository has been archived by the owner on Jun 11, 2024. It is now read-only.

Introduce parallel processing #39

Closed
Tracked by #26
matjazv opened this issue Aug 26, 2022 · 2 comments · Fixed by #69
Closed
Tracked by #26

Introduce parallel processing #39

matjazv opened this issue Aug 26, 2022 · 2 comments · Fixed by #69
Assignees
Milestone

Comments

@matjazv
Copy link
Contributor

matjazv commented Aug 26, 2022

Description

Introduce parallel processing on places in the code where it is applicable and make sense from a code execution performance perspective. An average CPU nowadays has a multiple cores so it makes sense to transform CPU intensive calculations into parallel processing to gain code execution speed

  • a crate called Rayon might be used for parallel processing
  • among other features, Rayon has parallel iterators which may be used to transform code from sequential processing into parallel processing
  • example code how it's possible to make a code to run in parallel using Rayon parallel iterators:
// instead of writing code like this
let mut sum = 0;
for element in vec_array {
  sum += element*element
}

// it would be better to translate above code into this
vec_array.iter().map(|&el| el * el).sum()

// because the above code can be trivially converted to be executed in parallel with Rayon
vec_array.par_iter().map(|&el| el * el).sum()

When using parallelism in a code, shared resources between different threads need to be protected by not having simultaneous operations on them. If two different threads want to change a shared resource it might occur that this resource will be in invalid state if it won't be locked before any write occurs.

  • a crate called parking_lot might be used for locking a shared resources
  • where applicable RwLock may be used because multiple threads can read a shared resource at the same time and only one thread can lock a shared resource for writing purpose

Acceptance Criteria

  • where applicable code is running in a parallel
  • some tests before and after introducing parallelism are taken and measurements are documented

Additional Information

  • some functions in smt.rs file are CPU intensive with a lot of calculations
  • concentrate on those functions
@matjazv
Copy link
Contributor Author

matjazv commented Oct 10, 2022

Optimization description Performance result Included in code
Use par_iter() from Rayon crate to parallel iterations over vector elements Decreased No
Use parking_lot instead of std There is no big difference between these two libs, so sdt could be a better option for us No
Convert loop to use iter every where it is possible Decreased No
Refactor SparseMerkleTree to use thread With current implementation and some restrictions there is no option to implement multithreading for SparseMerkleTree No
Refactor Subtree to use thread Decreased No
Introduce Hasher as a new struct instead of function type Small improvement Yes
Do not calculate empty hash but use constant instead Quite big improvement Yes
Remove .clone() function calls and use references instead Improved Yes
Remove copying objects when passing them to a functions and use references instead Improved Yes
Use 4 as a subtree height in SparseMerkleTree module Big improvement Yes
Initialize vector with a fixed size if it's final size is known in advance Small improvement Yes
Refactor bytes_to_bools function to use function from bitvec crate instead of our own implementation Improved Yes

@shuse2 shuse2 modified the milestones: Sprint 9, Sprint 10 Oct 10, 2022
@matjazv
Copy link
Contributor Author

matjazv commented Oct 13, 2022

Below table shows benchmarking results from before and after code optimizations.
Commit used for before optimizations: 1e5cb1e
Commit used for after optimizations: 7ac861a
Script used for benchmarking: state_long.js
System used: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz, 8 GB RAM, 50 GB SSD

Original data Addition Time to set (before optimization) Time to update (before optimization) Time to set (after optimization) Time to update (after optimization)
0.5M 1k 89 ms 648 ms 70 ms 206 ms
1M 1k 71 ms 771 ms 81 ms 247 ms
0.5M 5k 433 ms 1354 ms 539 ms 789 ms
1M 5k 510 ms 1922 ms 314 ms 925 ms
0.5M 10k 683 ms 2145 ms 681 ms 1295 ms
1M 10k 794 ms 3250 ms 721 ms 1616 ms

shuse2 added a commit that referenced this issue Oct 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants