rust: opt 3 thin #65

chenyan-dfinity · 2023-07-20T17:22:01Z

No description provided.

github-actions · 2023-07-20T18:03:58Z

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	1_195_632_150	9_102_052	545_645	365_569_669	520_876
triemap	156_424	1_338_995_779	9_715_900	459_710	1_193_026	686_569
rbtree	153_258	1_115_533_975	8_902_160	354_721	964_237	495_133
splay	152_693	1_323_550_652	8_702_096	719_103	1_214_198	717_146
btree	180_227	1_222_588_229	7_556_172	502_876	1_090_262	540_393
zhenya_hashmap	148_470	989_558_312	9_301_800	334_927	818_203	335_264
btreemap_rs	506_477 ($\textcolor{red}{9.19\%}$)	111_191_646 ($\textcolor{green}{-0.20\%}$)	1_638_400	58_859 ($\textcolor{red}{1.85\%}$)	131_753 ($\textcolor{red}{0.45\%}$)	61_543 ($\textcolor{red}{1.08\%}$)
hashmap_rs	497_853 ($\textcolor{red}{9.22\%}$)	47_904_477 ($\textcolor{green}{-0.03\%}$)	1_835_008	18_936 ($\textcolor{red}{7.11\%}$)	57_243 ($\textcolor{red}{3.71\%}$)	20_169 ($\textcolor{red}{10.82\%}$)

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	139_951	369_466_193	1_400_024	334_365	397_474
heap_rs	467_854 ($\textcolor{red}{8.18\%}$)	4_974_444 ($\textcolor{green}{-4.76\%}$)	819_200	48_067 ($\textcolor{red}{4.60\%}$)	19_935 ($\textcolor{red}{7.10\%}$)

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	238_966_334	1_820_844	543_937	73_525_914	518_626
hashmap_rs	497_853 ($\textcolor{red}{9.22\%}$)	9_882_637 ($\textcolor{green}{-0.01\%}$)	950_272	18_267 ($\textcolor{red}{7.39\%}$)	56_571 ($\textcolor{red}{3.78\%}$)	19_111 ($\textcolor{red}{11.65\%}$)
imrc_hashmap_rs	501_857 ($\textcolor{red}{8.33\%}$)	25_818_806 ($\textcolor{red}{0.72\%}$)	1_572_864	29_109 ($\textcolor{red}{2.13\%}$)	151_364 ($\textcolor{red}{1.14\%}$)	36_339 ($\textcolor{green}{-0.05\%}$)
movm_rs	1_983_837 ($\textcolor{red}{10.80\%}$)	1_143_760_754 ($\textcolor{red}{4.45\%}$)	2_654_208	2_657_622 ($\textcolor{red}{5.69\%}$)	7_287_409 ($\textcolor{red}{3.98\%}$)	5_767_927 ($\textcolor{red}{4.33\%}$)
movm_dynamic_rs	2_112_854 ($\textcolor{red}{9.74\%}$)	542_423_870 ($\textcolor{red}{5.39\%}$)	2_129_920	2_152_396 ($\textcolor{red}{4.33\%}$)	2_926_471 ($\textcolor{red}{5.28\%}$)	2_134_362 ($\textcolor{red}{3.51\%}$)

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	225_805	37_493	16_270 ($\textcolor{red}{0.28\%}$)	12_656	14_105 ($\textcolor{green}{-0.16\%}$)
Rust	823_213 ($\textcolor{red}{8.40\%}$)	483_359 ($\textcolor{red}{2.53\%}$)	89_457 ($\textcolor{red}{3.40\%}$)	109_251 ($\textcolor{red}{4.98\%}$)	119_102 ($\textcolor{red}{2.88\%}$)

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	183_882	12_181	22_319	4_710
Rust	904_185 ($\textcolor{red}{8.57\%}$)	128_807 ($\textcolor{red}{3.22\%}$)	332_869 ($\textcolor{red}{2.86\%}$)	81_309 ($\textcolor{red}{5.21\%}$)

Heartbeat

	binary_size	heartbeat
Motoko	118_909	3_751
Rust	29_169 ($\textcolor{red}{9.56\%}$)	456 ($\textcolor{green}{-5.00\%}$)

Timer

	binary_size	setTimer	cancelTimer
Motoko	125_168	15_208	1_679
Rust	502_862 ($\textcolor{red}{8.84\%}$)	48_112 ($\textcolor{red}{10.65\%}$)	8_373 ($\textcolor{red}{9.27\%}$)

Publisher & Subscriber

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	139_886	126_827	14_632	8_451	10_530	3_662
Rust	557_232 ($\textcolor{red}{9.16\%}$)	606_370 ($\textcolor{red}{8.30\%}$)	54_985 ($\textcolor{red}{5.60\%}$)	36_694 ($\textcolor{red}{6.09\%}$)	76_911 ($\textcolor{red}{3.70\%}$)	43_400 ($\textcolor{red}{4.58\%}$)

github-actions · 2023-07-20T18:04:00Z

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

btree comes from Byron Becker's stable BTreeMap library.

zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.

The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	1_195_632_150	9_102_052	545_645	365_569_669	520_876
triemap	156_424	1_338_995_779	9_715_900	459_710	1_193_026	686_569
rbtree	153_258	1_115_533_975	8_902_160	354_721	964_237	495_133
splay	152_693	1_323_550_652	8_702_096	719_103	1_214_198	717_146
btree	180_227	1_222_588_229	7_556_172	502_876	1_090_262	540_393
zhenya_hashmap	148_470	989_558_312	9_301_800	334_927	818_203	335_264
btreemap_rs	506_477	111_191_646	1_638_400	58_859	131_753	61_543
hashmap_rs	497_853	47_904_477	1_835_008	18_936	57_243	20_169

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	139_951	369_466_193	1_400_024	334_365	397_474
heap_rs	467_854	4_974_444	819_200	48_067	19_935

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	238_966_334	1_820_844	543_937	73_525_914	518_626
hashmap_rs	497_853	9_882_637	950_272	18_267	56_571	19_111
imrc_hashmap_rs	501_857	25_818_806	1_572_864	29_109	151_364	36_339
movm_rs	1_983_837	1_143_760_754	2_654_208	2_657_622	7_287_409	5_767_927
movm_dynamic_rs	2_112_854	542_423_870	2_129_920	2_152_396	2_926_471	2_134_362

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO,
with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	225_805	37_493	16_270	12_656	14_105
Rust	823_213	483_359	89_457	109_251	119_102

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	183_882	12_181	22_319	4_710
Rust	904_185	128_807	332_869	81_309

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	118_909	3_751
Rust	29_169	456

Timer

	binary_size	setTimer	cancelTimer
Motoko	125_168	15_208	1_679
Rust	502_862	48_112	8_373

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	139_886	126_827	14_632	8_451	10_530	3_662
Rust	557_232	606_370	54_985	36_694	76_911	43_400

rust: opt 3 thin

d69915c

chenyan-dfinity added the build_base Build base instead of fetching from gh-pages. Note that the build tool runs in the same version label Jul 20, 2023

trigger CI

6ebd3ff

chenyan-dfinity mentioned this pull request Jul 20, 2023

Rust canister release profile tuning #68

Open

chenyan-dfinity closed this Jul 20, 2023

chenyan-dfinity deleted the opt-3-thin branch July 20, 2023 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rust: opt 3 thin #65

rust: opt 3 thin #65

chenyan-dfinity commented Jul 20, 2023

github-actions bot commented Jul 20, 2023

github-actions bot commented Jul 20, 2023

rust: opt 3 thin #65

rust: opt 3 thin #65

Conversation

chenyan-dfinity commented Jul 20, 2023

github-actions bot commented Jul 20, 2023

Map

Priority queue

MoVM

Basic DAO

DIP721 NFT

Heartbeat

Timer

Publisher & Subscriber

github-actions bot commented Jul 20, 2023

Collection libraries

💎 Takeaways

Map

Priority queue

MoVM

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer

Publisher & Subscriber