feat(rust): add configurable size guardrails#3579
Conversation
|
Hey @chaokunyang |
|
@ayush00git Could you run benchmarks/rust and compare with main branch? |
|
some areas like MediaContentList serialization/deserialization, Sample serialization and StructList deserialization are showing regressions averagely of around 20%, i'll investigate these ones. most probably this is due to field type validations. |
|
StructList and MediaContentList serialize calls still shows around 10% regression feat/rust-sizeguards ## Benchmark Results
### Timing Results (nanoseconds)
| Datatype | Operation | fory (ns) | protobuf (ns) | Fastest |
| ---------------- | ----------- | --------- | ------------- | ------- |
| Struct | Serialize | 68.2 | 122.5 | fory |
| Struct | Deserialize | 37.9 | 64.8 | fory |
| Sample | Serialize | 102.9 | 566.3 | fory |
| Sample | Deserialize | 162.6 | 868.7 | fory |
| MediaContent | Serialize | 219.4 | 332.2 | fory |
| MediaContent | Deserialize | 280.4 | 599.8 | fory |
| StructList | Serialize | 192.0 | 606.2 | fory |
| StructList | Deserialize | 143.2 | 444.3 | fory |
| SampleList | Serialize | 391.3 | 4002.1 | fory |
| SampleList | Deserialize | 1279.0 | 4939.9 | fory |
| MediaContentList | Serialize | 856.0 | 2501.9 | fory |
| MediaContentList | Deserialize | 1676.1 | 3206.9 | fory |
### Throughput Results (ops/sec)
| Datatype | Operation | fory TPS | protobuf TPS | Fastest |
| ---------------- | ----------- | ---------- | ------------ | ------- |
| Struct | Serialize | 14,665,552 | 8,161,267 | fory |
| Struct | Deserialize | 26,369,222 | 15,434,242 | fory |
| Sample | Serialize | 9,721,007 | 1,765,880 | fory |
| Sample | Deserialize | 6,151,575 | 1,151,198 | fory |
| MediaContent | Serialize | 4,558,716 | 3,010,144 | fory |
| MediaContent | Deserialize | 3,565,952 | 1,667,167 | fory |
| StructList | Serialize | 5,208,605 | 1,649,702 | fory |
| StructList | Deserialize | 6,985,191 | 2,250,883 | fory |
| SampleList | Serialize | 2,555,323 | 249,869 | fory |
| SampleList | Deserialize | 781,861 | 202,433 | fory |
| MediaContentList | Serialize | 1,168,170 | 399,696 | fory |
| MediaContentList | Deserialize | 596,623 | 311,828 | fory |
main ## Benchmark Results
### Timing Results (nanoseconds)
| Datatype | Operation | fory (ns) | protobuf (ns) | Fastest |
| ---------------- | ----------- | --------- | ------------- | ------- |
| Struct | Serialize | 67.5 | 123.3 | fory |
| Struct | Deserialize | 38.3 | 63.4 | fory |
| Sample | Serialize | 101.4 | 561.7 | fory |
| Sample | Deserialize | 165.6 | 919.2 | fory |
| MediaContent | Serialize | 213.0 | 332.2 | fory |
| MediaContent | Deserialize | 281.9 | 568.0 | fory |
| StructList | Serialize | 175.2 | 678.8 | fory |
| StructList | Deserialize | 141.8 | 453.0 | fory |
| SampleList | Serialize | 448.6 | 3831.5 | fory |
| SampleList | Deserialize | 1347.9 | 4977.6 | fory |
| MediaContentList | Serialize | 759.1 | 2429.7 | fory |
| MediaContentList | Deserialize | 1665.3 | 3674.4 | fory |
### Throughput Results (ops/sec)
| Datatype | Operation | fory TPS | protobuf TPS | Fastest |
| ---------------- | ----------- | ---------- | ------------ | ------- |
| Struct | Serialize | 14,815,693 | 8,109,642 | fory |
| Struct | Deserialize | 26,132,177 | 15,766,902 | fory |
| Sample | Serialize | 9,864,852 | 1,780,215 | fory |
| Sample | Deserialize | 6,040,471 | 1,087,903 | fory |
| MediaContent | Serialize | 4,695,056 | 3,009,782 | fory |
| MediaContent | Deserialize | 3,547,861 | 1,760,563 | fory |
| StructList | Serialize | 5,707,437 | 1,473,231 | fory |
| StructList | Deserialize | 7,052,684 | 2,207,652 | fory |
| SampleList | Serialize | 2,229,008 | 260,994 | fory |
| SampleList | Deserialize | 741,895 | 200,900 | fory |
| MediaContentList | Serialize | 1,317,402 | 411,573 | fory |
| MediaContentList | Deserialize | 600,492 | 272,153 | fory | |
|
@chaokunyang updated feat/rust-sizeguards bench - Timing Results (nanoseconds)
Throughput Results (ops/sec)
|
|
@chaokunyang |
|
This pr still introduce some performance regression, we can't merge it |
|
Only stream mode do not introduce any performance regression, then we will support it in fory rust. |
@chaokunyang could you please share me the outputs of terminal or example of operations which are showing regressions? I tried a lot debugging, regressions are there in write path f32/f64 and sometimes in some other ops as well, and i'm thinking some of these are just because of noise. size guardrails should've introduced regressions in the read path, but they didn't, the most likely cause of regression in the write path is the buffer pre-reservation that i made in this commit. I'm investigating further, please if you find a likely cause running the benches on your machine, do guide me through what was real regressions or just noise. |
| Ok(Box::new(WriteContext::new(type_resolver.clone(), config))) | ||
| Ok(Box::new(WriteContext::new( | ||
| type_resolver.clone(), | ||
| self.config.clone(), |
There was a problem hiding this comment.
why you clone config?
There was a problem hiding this comment.
previously it was cloned as well, i just moved it inside creation closure, which avoided its access on every serialize call and using its cached context instead.
|
@chaokunyang have a look at the terminal output now, i tried setting the cpu performance fixed and now it didn't showed any regression fory/benchmarks/rust on main [$✘?] is v0.17.0-alpha.0 via v3.14.3 via v1.95.0 on (us-east-1) took 18s
❯ cargo bench --bench buffer_write_bench -- --save-baseline main
Finished `bench` profile [optimized] target(s) in 0.06s
Running benches/buffer_write_bench.rs (target/release/deps/buffer_write_bench-011c6e0ee322c976)
Gnuplot not found, using plotters backend
write_u8/current time: [6.9713 µs 6.9928 µs 7.0194 µs]
thrpt: [142.46 Melem/s 143.00 Melem/s 143.44 Melem/s]
Found 15 outliers among 100 measurements (15.00%)
8 (8.00%) high mild
7 (7.00%) high severe
write_i32/current time: [741.79 ns 743.48 ns 745.28 ns]
thrpt: [1.3418 Gelem/s 1.3450 Gelem/s 1.3481 Gelem/s]
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
1 (1.00%) low mild
7 (7.00%) high mild
1 (1.00%) high severe
write_i64/current time: [749.40 ns 751.35 ns 753.40 ns]
thrpt: [1.3273 Gelem/s 1.3309 Gelem/s 1.3344 Gelem/s]
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
write_f32/current time: [827.00 ns 834.11 ns 841.56 ns]
thrpt: [1.1883 Gelem/s 1.1989 Gelem/s 1.2092 Gelem/s]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
write_f64/current time: [860.26 ns 866.46 ns 872.89 ns]
thrpt: [1.1456 Gelem/s 1.1541 Gelem/s 1.1624 Gelem/s]
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
write_varint32_small/current
time: [894.10 ns 898.41 ns 902.69 ns]
thrpt: [1.1078 Gelem/s 1.1131 Gelem/s 1.1184 Gelem/s]
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low severe
1 (1.00%) low mild
6 (6.00%) high mild
3 (3.00%) high severe
write_varint32_medium/current
time: [1.4174 µs 1.4189 µs 1.4207 µs]
thrpt: [703.90 Melem/s 704.77 Melem/s 705.54 Melem/s]
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
7 (7.00%) high mild
3 (3.00%) high severe
write_varint32_large/current
time: [1.6701 µs 1.6717 µs 1.6734 µs]
thrpt: [597.59 Melem/s 598.20 Melem/s 598.77 Melem/s]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
write_varint64_small/current
time: [1.1521 µs 1.1594 µs 1.1669 µs]
thrpt: [856.95 Melem/s 862.49 Melem/s 868.01 Melem/s]
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low severe
3 (3.00%) low mild
10 (10.00%) high mild
write_varint64_medium/current
time: [1.7693 µs 1.7748 µs 1.7802 µs]
thrpt: [561.73 Melem/s 563.45 Melem/s 565.18 Melem/s]
write_varint64_large/current
time: [2.8024 µs 2.8496 µs 2.9050 µs]
thrpt: [344.24 Melem/s 350.93 Melem/s 356.84 Melem/s]
Found 38 outliers among 100 measurements (38.00%)
16 (16.00%) low severe
6 (6.00%) low mild
1 (1.00%) high mild
15 (15.00%) high severe
fory/benchmarks/rust on feat/rust-sizeguards [$✘?] is v0.17.0-alpha.0 via v3.14.3 via v1.95.0 on (us-east-1) took 5s
❯ cargo bench --bench buffer_write_bench -- --baseline main
Finished `bench` profile [optimized] target(s) in 0.06s
Running benches/buffer_write_bench.rs (target/release/deps/buffer_write_bench-011c6e0ee322c976)
Gnuplot not found, using plotters backend
write_u8/current time: [6.8933 µs 6.9147 µs 6.9399 µs]
thrpt: [144.09 Melem/s 144.62 Melem/s 145.07 Melem/s]
change:
time: [-2.0992% -1.4519% -0.9010%] (p = 0.00 < 0.05)
thrpt: [+0.9091% +1.4733% +2.1442%]
Change within noise threshold.
Found 20 outliers among 100 measurements (20.00%)
6 (6.00%) high mild
14 (14.00%) high severe
write_i32/current time: [729.41 ns 732.87 ns 736.71 ns]
thrpt: [1.3574 Gelem/s 1.3645 Gelem/s 1.3710 Gelem/s]
change:
time: [-2.2545% -1.4650% -0.3802%] (p = 0.00 < 0.05)
thrpt: [+0.3817% +1.4868% +2.3065%]
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
write_i64/current time: [741.69 ns 743.49 ns 745.33 ns]
thrpt: [1.3417 Gelem/s 1.3450 Gelem/s 1.3483 Gelem/s]
change:
time: [-1.2781% -0.9131% -0.5686%] (p = 0.00 < 0.05)
thrpt: [+0.5718% +0.9215% +1.2947%]
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
write_f32/current time: [814.51 ns 818.90 ns 823.47 ns]
thrpt: [1.2144 Gelem/s 1.2211 Gelem/s 1.2277 Gelem/s]
change:
time: [-2.0663% -1.1408% -0.1963%] (p = 0.02 < 0.05)
thrpt: [+0.1966% +1.1540% +2.1099%]
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
write_f64/current time: [823.12 ns 827.97 ns 833.88 ns]
thrpt: [1.1992 Gelem/s 1.2078 Gelem/s 1.2149 Gelem/s]
change:
time: [-6.2183% -4.5369% -2.9123%] (p = 0.00 < 0.05)
thrpt: [+2.9997% +4.7525% +6.6306%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
write_varint32_small/current
time: [899.07 ns 905.35 ns 914.34 ns]
thrpt: [1.0937 Gelem/s 1.1045 Gelem/s 1.1123 Gelem/s]
change:
time: [-0.7350% +0.7029% +2.6802%] (p = 0.46 > 0.05)
thrpt: [-2.6103% -0.6980% +0.7405%]
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
write_varint32_medium/current
time: [1.4128 µs 1.4142 µs 1.4159 µs]
thrpt: [706.29 Melem/s 707.09 Melem/s 707.83 Melem/s]
change:
time: [-0.6195% -0.3812% -0.1441%] (p = 0.00 < 0.05)
thrpt: [+0.1443% +0.3827% +0.6233%]
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low severe
1 (1.00%) low mild
6 (6.00%) high mild
3 (3.00%) high severe
write_varint32_large/current
time: [1.6586 µs 1.6692 µs 1.6780 µs]
thrpt: [595.95 Melem/s 599.07 Melem/s 602.92 Melem/s]
change:
time: [-0.5057% -0.1241% +0.1752%] (p = 0.49 > 0.05)
thrpt: [-0.1749% +0.1243% +0.5083%]
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe
write_varint64_small/current
time: [1.1234 µs 1.1247 µs 1.1260 µs]
thrpt: [888.12 Melem/s 889.16 Melem/s 890.17 Melem/s]
change:
time: [-2.1981% -1.4802% -0.7546%] (p = 0.00 < 0.05)
thrpt: [+0.7604% +1.5024% +2.2475%]
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low severe
3 (3.00%) high mild
2 (2.00%) high severe
write_varint64_medium/current
time: [1.7863 µs 1.7907 µs 1.7945 µs]
thrpt: [557.25 Melem/s 558.43 Melem/s 559.83 Melem/s]
change:
time: [-0.1626% +0.2047% +0.5850%] (p = 0.27 > 0.05)
thrpt: [-0.5816% -0.2042% +0.1629%]
No change in performance detected.
write_varint64_large/current
time: [2.7535 µs 2.7624 µs 2.7721 µs]
thrpt: [360.74 Melem/s 362.00 Melem/s 363.17 Melem/s]
change:
time: [-3.0492% -1.9645% -1.0085%] (p = 0.00 < 0.05)
thrpt: [+1.0188% +2.0038% +3.1451%]
Performance has improved.
|


Why?
To prevent excessive allocation from malicious untrusted payloads in the Rust runtime.
What does this PR do?
This brings the Rust implementation into parity with the C++ runtime by introducing configurable guardrails for binary sizes and collection counts.
Related issues
#3409
AI Contribution Checklist
yes/noyes, I included a completed AI Contribution Checklist in this PR description and the requiredAI Usage Disclosure.yes/noyes, I included the standardizedAI Usage Disclosureblock below.yes, I can explain and defend all important changes without AI help.yes, I reviewed AI-assisted code changes line by line before submission.yes, I completed line-by-line self-review first and fixed issues before requesting AI review.yes, I ran two fresh AI review agents on the current PR diff or current HEAD after the latest code changes: one using.claude/skills/fory-code-review/SKILL.mdand one without that skill.yes, I addressed all AI review comments and repeated the review loop until both ai reviewers reported no further actionable comments.yes, I attached screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes in this PR body.yes, I ran adequate human verification and recorded evidence (checks run locally or in CI, pass/fail summary, and confirmation I reviewed results).yes, I added/updated tests and specs where required.yes, I validated protocol/performance impacts with evidence when applicable.yes, I verified licensing and provenance compliance.AI Usage Disclosure (only when substantial AI assistance =
yes):yes, my PR description includes the requiredai_reviewsummary and screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes.Does this PR introduce any user-facing change?
max_binary_size()andmax_collection_size(), as well asError::SizeLimitExceeded))Benchmark