rust: opt 3 #63

chenyan-dfinity · 2023-07-20T03:08:23Z

No description provided.

github-actions · 2023-07-20T03:49:06Z

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	1_195_632_150	9_102_052	545_645	365_569_669	520_876
triemap	156_424	1_338_995_779	9_715_900	459_710	1_193_026	686_569
rbtree	153_258	1_115_533_975	8_902_160	354_721	964_237	495_133
splay	152_693	1_323_550_652	8_702_096	719_103	1_214_198	717_146
btree	180_227	1_222_588_229	7_556_172	502_876	1_090_262	540_393
zhenya_hashmap	148_470	989_558_312	9_301_800	334_927	818_203	335_264
btreemap_rs	463_869 ($\textcolor{green}{-6.17\%}$)	111_411_885 ($\textcolor{green}{-1.18\%}$)	1_638_400	57_790 ($\textcolor{green}{-3.03\%}$)	131_160 ($\textcolor{green}{-2.30\%}$)	60_886 ($\textcolor{red}{0.38\%}$)
hashmap_rs	455_846 ($\textcolor{green}{-6.06\%}$)	47_917_908 ($\textcolor{green}{-3.40\%}$)	1_835_008	17_679 ($\textcolor{green}{-10.26\%}$)	55_195 ($\textcolor{green}{-7.41\%}$)	18_200 ($\textcolor{green}{-13.06\%}$)

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	139_951	369_466_193	1_400_024	334_365	397_474
heap_rs	432_470 ($\textcolor{green}{-5.73\%}$)	5_222_958 ($\textcolor{red}{4.97\%}$)	819_200	45_955 ($\textcolor{green}{-5.86\%}$)	18_614 ($\textcolor{green}{-10.03\%}$)

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	238_966_334	1_820_844	543_937	73_525_914	518_626
hashmap_rs	455_846 ($\textcolor{green}{-6.06\%}$)	9_883_914 ($\textcolor{green}{-3.32\%}$)	950_272	17_010 ($\textcolor{green}{-10.62\%}$)	54_512 ($\textcolor{green}{-7.51\%}$)	17_117 ($\textcolor{green}{-13.88\%}$)
imrc_hashmap_rs	463_265 ($\textcolor{green}{-5.37\%}$)	25_635_285 ($\textcolor{green}{-1.96\%}$)	1_572_864	28_503 ($\textcolor{green}{-4.54\%}$)	149_652 ($\textcolor{green}{-2.79\%}$)	36_357 ($\textcolor{green}{-2.04\%}$)
movm_rs	1_790_469 ($\textcolor{green}{-2.96\%}$)	1_095_033_203 ($\textcolor{green}{-5.34\%}$)	2_654_208	2_514_537 ($\textcolor{green}{-6.17\%}$)	7_008_738 ($\textcolor{green}{-4.83\%}$)	5_528_729 ($\textcolor{green}{-5.19\%}$)
movm_dynamic_rs	1_925_342 ($\textcolor{green}{-2.30\%}$)	514_686_440 ($\textcolor{green}{-5.72\%}$)	2_129_920	2_063_142 ($\textcolor{green}{-4.82\%}$)	2_779_674 ($\textcolor{green}{-5.63\%}$)	2_061_925 ($\textcolor{green}{-3.92\%}$)

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	225_805	37_469 ($\textcolor{green}{-0.13\%}$)	16_274 ($\textcolor{red}{0.33\%}$)	12_700 ($\textcolor{red}{0.02\%}$)	14_155 ($\textcolor{green}{-0.15\%}$)
Rust	759_455 ($\textcolor{green}{-2.51\%}$)	471_410 ($\textcolor{green}{-5.41\%}$)	86_518 ($\textcolor{green}{-7.18\%}$)	104_067 ($\textcolor{green}{-8.94\%}$)	115_767 ($\textcolor{green}{-7.15\%}$)

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	183_882	12_181	22_319	4_710
Rust	832_835 ($\textcolor{green}{-2.68\%}$)	124_794 ($\textcolor{green}{-6.95\%}$)	323_617 ($\textcolor{green}{-6.21\%}$)	77_284 ($\textcolor{green}{-8.53\%}$)

Heartbeat

	binary_size	heartbeat
Motoko	118_909	7_392
Rust	26_624 ($\textcolor{green}{-10.97\%}$)	797 ($\textcolor{red}{45.70\%}$)

Timer

	binary_size	setTimer	cancelTimer
Motoko	125_168	15_208	1_679
Rust	462_023 ($\textcolor{green}{-7.26\%}$)	43_482 ($\textcolor{green}{-14.70\%}$)	7_663 ($\textcolor{green}{-21.53\%}$)

Publisher & Subscriber

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	139_886	126_827	14_632	8_451	10_530	3_662
Rust	510_465 ($\textcolor{green}{-4.53\%}$)	559_874 ($\textcolor{green}{-5.10\%}$)	52_068 ($\textcolor{green}{-10.47\%}$)	34_586 ($\textcolor{green}{-10.13\%}$)	74_166 ($\textcolor{green}{-7.79\%}$)	41_498 ($\textcolor{green}{-8.92\%}$)

github-actions · 2023-07-20T03:49:08Z

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

btree comes from Byron Becker's stable BTreeMap library.

zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.

The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	1_195_632_150	9_102_052	545_645	365_569_669	520_876
triemap	156_424	1_338_995_779	9_715_900	459_710	1_193_026	686_569
rbtree	153_258	1_115_533_975	8_902_160	354_721	964_237	495_133
splay	152_693	1_323_550_652	8_702_096	719_103	1_214_198	717_146
btree	180_227	1_222_588_229	7_556_172	502_876	1_090_262	540_393
zhenya_hashmap	148_470	989_558_312	9_301_800	334_927	818_203	335_264
btreemap_rs	463_869	111_411_885	1_638_400	57_790	131_160	60_886
hashmap_rs	455_846	47_917_908	1_835_008	17_679	55_195	18_200

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	139_951	369_466_193	1_400_024	334_365	397_474
heap_rs	432_470	5_222_958	819_200	45_955	18_614

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	238_966_334	1_820_844	543_937	73_525_914	518_626
hashmap_rs	455_846	9_883_914	950_272	17_010	54_512	17_117
imrc_hashmap_rs	463_265	25_635_285	1_572_864	28_503	149_652	36_357
movm_rs	1_790_469	1_095_033_203	2_654_208	2_514_537	7_008_738	5_528_729
movm_dynamic_rs	1_925_342	514_686_440	2_129_920	2_063_142	2_779_674	2_061_925

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO,
with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	225_805	37_469	16_274	12_700	14_155
Rust	759_455	471_410	86_518	104_067	115_767

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	183_882	12_181	22_319	4_710
Rust	832_835	124_794	323_617	77_284

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	118_909	7_392
Rust	26_624	797

Timer

	binary_size	setTimer	cancelTimer
Motoko	125_168	15_208	1_679
Rust	462_023	43_482	7_663

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	139_886	126_827	14_632	8_451	10_530	3_662
Rust	510_465	559_874	52_068	34_586	74_166	41_498

ggreif · 2023-07-20T11:16:55Z

Cargo.toml

@@ -18,7 +18,7 @@ members = [
 debug = false
 panic = "abort"
 lto = true
-opt-level = 2
+opt-level = 3


These numbers are very promising, maybe we should switch to using -O3 for the RTS too. @luc-blaeser did you ever try this?

Not yet. I can try this and measure with the GC benchmark...

Surprisingly, -O2 is even better than -O3, see #62

Confirmed from #66, the default release profile is already using -O3, the gain here purely comes from LTO.

chenyan-dfinity added 3 commits July 19, 2023 20:05

rust: opt z

b7eec48

rust: opt 2

d7d2474

rust: opt 3

cdf7050

chenyan-dfinity added the build_base Build base instead of fetching from gh-pages. Note that the build tool runs in the same version label Jul 20, 2023

ggreif reviewed Jul 20, 2023

View reviewed changes

chenyan-dfinity mentioned this pull request Jul 20, 2023

Rust canister release profile tuning #68

Open

chenyan-dfinity closed this Jul 20, 2023

chenyan-dfinity deleted the opt-3 branch July 20, 2023 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rust: opt 3 #63

rust: opt 3 #63

chenyan-dfinity commented Jul 20, 2023

github-actions bot commented Jul 20, 2023

github-actions bot commented Jul 20, 2023

ggreif Jul 20, 2023

luc-blaeser Jul 20, 2023

chenyan-dfinity Jul 20, 2023

chenyan-dfinity Jul 20, 2023

rust: opt 3 #63

rust: opt 3 #63

Conversation

chenyan-dfinity commented Jul 20, 2023

github-actions bot commented Jul 20, 2023

Map

Priority queue

MoVM

Basic DAO

DIP721 NFT

Heartbeat

Timer

Publisher & Subscriber

github-actions bot commented Jul 20, 2023

Collection libraries

💎 Takeaways

Map

Priority queue

MoVM

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer

Publisher & Subscriber

ggreif Jul 20, 2023

Choose a reason for hiding this comment

luc-blaeser Jul 20, 2023

Choose a reason for hiding this comment

chenyan-dfinity Jul 20, 2023

Choose a reason for hiding this comment

chenyan-dfinity Jul 20, 2023

Choose a reason for hiding this comment