Skip to content

feat(rust): add configurable size guardrails#3579

Merged
chaokunyang merged 4 commits intoapache:mainfrom
ayush00git:feat/rust-sizeguards
Apr 19, 2026
Merged

feat(rust): add configurable size guardrails#3579
chaokunyang merged 4 commits intoapache:mainfrom
ayush00git:feat/rust-sizeguards

Conversation

@ayush00git
Copy link
Copy Markdown
Contributor

@ayush00git ayush00git commented Apr 16, 2026

Why?

To prevent excessive allocation from malicious untrusted payloads in the Rust runtime.

What does this PR do?

This brings the Rust implementation into parity with the C++ runtime by introducing configurable guardrails for binary sizes and collection counts.

Related issues

#3409

AI Contribution Checklist

  • Substantial AI assistance was used in this PR: yes / no
  • If yes, I included a completed AI Contribution Checklist in this PR description and the required AI Usage Disclosure.
  • Substantial AI assistance was used in this PR: yes / no
  • If yes, I included the standardized AI Usage Disclosure block below.
  • If yes, I can explain and defend all important changes without AI help.
  • If yes, I reviewed AI-assisted code changes line by line before submission.
  • If yes, I completed line-by-line self-review first and fixed issues before requesting AI review.
  • If yes, I ran two fresh AI review agents on the current PR diff or current HEAD after the latest code changes: one using .claude/skills/fory-code-review/SKILL.md and one without that skill.
  • If yes, I addressed all AI review comments and repeated the review loop until both ai reviewers reported no further actionable comments.
  • If yes, I attached screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes in this PR body.
  • If yes, I ran adequate human verification and recorded evidence (checks run locally or in CI, pass/fail summary, and confirmation I reviewed results).
  • If yes, I added/updated tests and specs where required.
  • If yes, I validated protocol/performance impacts with evidence when applicable.
  • If yes, I verified licensing and provenance compliance.

AI Usage Disclosure (only when substantial AI assistance = yes):

  • If yes, my PR description includes the required ai_review summary and screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes.
The implementation looks solid, matches the C++ reference implementation accurately, and is ready for the main branch. I've reviewed your code changes across config.rs, context.rs, and the serializer module, verifying edge cases and consistency.

Here are the key aspects that verify the correctness of the Rust implementation:

Guardrail Logic Parity with C++:

max_collection_size correctly intercepts untrusted payload lengths before large iterations or memory allocations occur in Vec, HashMap, BTreeMap, and other collection variants. This perfectly encapsulates the C++ read_collection_data_slow patterns.
max_binary_size accurately targets raw-byte slice operations across primitive_list.rs without inappropriately restricting types like String, mirroring how C++ omits this check inside string_serializer.h.
Accurate Error Propagation:

The addition of Error::SizeLimitExceeded works brilliantly with the new Fory configuration model. You're properly halting deserialization and passing clear payload dimensions to Error::size_limit_exceeded, ensuring graceful failure without panics or memory exhaustion.
Validation and Pre-Allocation Consistency:

Checking max_collection_size just prior to enforcing buffer limits inside check_collection_len and check_map_len is well-ordered. By prioritizing the runtime limit checks, you avoid false OOMs or deep iterations if Fory receives heavily corrupted payload headers.

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change? ((Yes, adds Fory configuration methods max_binary_size() and max_collection_size(), as well as Error::SizeLimitExceeded))
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

@ayush00git
Copy link
Copy Markdown
Contributor Author

Hey @chaokunyang
Have a look at the implementation and let me know the changes.
I don't write rust, but these issues (including streaming deserialization support) were pending from a long time, so I used AI to code but I have reviewed it.
Duplicate PR - #3421

@chaokunyang
Copy link
Copy Markdown
Collaborator

@ayush00git Could you run benchmarks/rust and compare with main branch?

@ayush00git
Copy link
Copy Markdown
Contributor Author

main branch -
image

feat/rust-sizeguards branch -
image

@ayush00git
Copy link
Copy Markdown
Contributor Author

some areas like MediaContentList serialization/deserialization, Sample serialization and StructList deserialization are showing regressions averagely of around 20%, i'll investigate these ones. most probably this is due to field type validations.

@ayush00git
Copy link
Copy Markdown
Contributor Author

StructList and MediaContentList serialize calls still shows around 10% regression

feat/rust-sizeguards

## Benchmark Results

### Timing Results (nanoseconds)

| Datatype         | Operation   | fory (ns) | protobuf (ns) | Fastest |
| ---------------- | ----------- | --------- | ------------- | ------- |
| Struct           | Serialize   | 68.2      | 122.5         | fory    |
| Struct           | Deserialize | 37.9      | 64.8          | fory    |
| Sample           | Serialize   | 102.9     | 566.3         | fory    |
| Sample           | Deserialize | 162.6     | 868.7         | fory    |
| MediaContent     | Serialize   | 219.4     | 332.2         | fory    |
| MediaContent     | Deserialize | 280.4     | 599.8         | fory    |
| StructList       | Serialize   | 192.0     | 606.2         | fory    |
| StructList       | Deserialize | 143.2     | 444.3         | fory    |
| SampleList       | Serialize   | 391.3     | 4002.1        | fory    |
| SampleList       | Deserialize | 1279.0    | 4939.9        | fory    |
| MediaContentList | Serialize   | 856.0     | 2501.9        | fory    |
| MediaContentList | Deserialize | 1676.1    | 3206.9        | fory    |

### Throughput Results (ops/sec)

| Datatype         | Operation   | fory TPS   | protobuf TPS | Fastest |
| ---------------- | ----------- | ---------- | ------------ | ------- |
| Struct           | Serialize   | 14,665,552 | 8,161,267    | fory    |
| Struct           | Deserialize | 26,369,222 | 15,434,242   | fory    |
| Sample           | Serialize   | 9,721,007  | 1,765,880    | fory    |
| Sample           | Deserialize | 6,151,575  | 1,151,198    | fory    |
| MediaContent     | Serialize   | 4,558,716  | 3,010,144    | fory    |
| MediaContent     | Deserialize | 3,565,952  | 1,667,167    | fory    |
| StructList       | Serialize   | 5,208,605  | 1,649,702    | fory    |
| StructList       | Deserialize | 6,985,191  | 2,250,883    | fory    |
| SampleList       | Serialize   | 2,555,323  | 249,869      | fory    |
| SampleList       | Deserialize | 781,861    | 202,433      | fory    |
| MediaContentList | Serialize   | 1,168,170  | 399,696      | fory    |
| MediaContentList | Deserialize | 596,623    | 311,828      | fory    |

main

## Benchmark Results

### Timing Results (nanoseconds)

| Datatype         | Operation   | fory (ns) | protobuf (ns) | Fastest |
| ---------------- | ----------- | --------- | ------------- | ------- |
| Struct           | Serialize   | 67.5      | 123.3         | fory    |
| Struct           | Deserialize | 38.3      | 63.4          | fory    |
| Sample           | Serialize   | 101.4     | 561.7         | fory    |
| Sample           | Deserialize | 165.6     | 919.2         | fory    |
| MediaContent     | Serialize   | 213.0     | 332.2         | fory    |
| MediaContent     | Deserialize | 281.9     | 568.0         | fory    |
| StructList       | Serialize   | 175.2     | 678.8         | fory    |
| StructList       | Deserialize | 141.8     | 453.0         | fory    |
| SampleList       | Serialize   | 448.6     | 3831.5        | fory    |
| SampleList       | Deserialize | 1347.9    | 4977.6        | fory    |
| MediaContentList | Serialize   | 759.1     | 2429.7        | fory    |
| MediaContentList | Deserialize | 1665.3    | 3674.4        | fory    |

### Throughput Results (ops/sec)

| Datatype         | Operation   | fory TPS   | protobuf TPS | Fastest |
| ---------------- | ----------- | ---------- | ------------ | ------- |
| Struct           | Serialize   | 14,815,693 | 8,109,642    | fory    |
| Struct           | Deserialize | 26,132,177 | 15,766,902   | fory    |
| Sample           | Serialize   | 9,864,852  | 1,780,215    | fory    |
| Sample           | Deserialize | 6,040,471  | 1,087,903    | fory    |
| MediaContent     | Serialize   | 4,695,056  | 3,009,782    | fory    |
| MediaContent     | Deserialize | 3,547,861  | 1,760,563    | fory    |
| StructList       | Serialize   | 5,707,437  | 1,473,231    | fory    |
| StructList       | Deserialize | 7,052,684  | 2,207,652    | fory    |
| SampleList       | Serialize   | 2,229,008  | 260,994      | fory    |
| SampleList       | Deserialize | 741,895    | 200,900      | fory    |
| MediaContentList | Serialize   | 1,317,402  | 411,573      | fory    |
| MediaContentList | Deserialize | 600,492    | 272,153      | fory    |

@ayush00git
Copy link
Copy Markdown
Contributor Author

@chaokunyang
The benches are now fine, just MediaContentList on serialization calls is showing a regression of around 6%, rest every parameter is either in noise or improved.

updated feat/rust-sizeguards bench -

Timing Results (nanoseconds)

Datatype Operation fory (ns) protobuf (ns) Fastest
Struct Serialize 74.2 138.3 fory
Struct Deserialize 38.4 69.8 fory
Sample Serialize 102.5 578.6 fory
Sample Deserialize 164.8 881.1 fory
MediaContent Serialize 229.7 330.2 fory
MediaContent Deserialize 287.9 523.5 fory
StructList Serialize 172.4 652.9 fory
StructList Deserialize 144.6 413.2 fory
SampleList Serialize 399.5 3725.5 fory
SampleList Deserialize 1269.1 5017.5 fory
MediaContentList Serialize 808.5 2440.5 fory
MediaContentList Deserialize 1646.0 3546.2 fory

Throughput Results (ops/sec)

Datatype Operation fory TPS protobuf TPS Fastest
Struct Serialize 13,468,195 7,231,181 fory
Struct Deserialize 26,035,565 14,319,262 fory
Sample Serialize 9,754,194 1,728,250 fory
Sample Deserialize 6,069,066 1,134,996 fory
MediaContent Serialize 4,352,936 3,028,376 fory
MediaContent Deserialize 3,473,790 1,910,220 fory
StructList Serialize 5,801,474 1,531,581 fory
StructList Deserialize 6,917,064 2,419,960 fory
SampleList Serialize 2,503,317 268,420 fory
SampleList Deserialize 787,960 199,302 fory
MediaContentList Serialize 1,236,889 409,752 fory
MediaContentList Deserialize 607,533 281,992 fory

@ayush00git
Copy link
Copy Markdown
Contributor Author

@chaokunyang
Do we plan to roll out this PR and the streaming deserialization support (which is still pending) in the v0.17.0 release ?

@chaokunyang
Copy link
Copy Markdown
Collaborator

This pr still introduce some performance regression, we can't merge it

@chaokunyang
Copy link
Copy Markdown
Collaborator

Only stream mode do not introduce any performance regression, then we will support it in fory rust.

@ayush00git
Copy link
Copy Markdown
Contributor Author

Only stream mode do not introduce any performance regression, then we will support it in fory rust.

@chaokunyang could you please share me the outputs of terminal or example of operations which are showing regressions? I tried a lot debugging, regressions are there in write path f32/f64 and sometimes in some other ops as well, and i'm thinking some of these are just because of noise. size guardrails should've introduced regressions in the read path, but they didn't, the most likely cause of regression in the write path is the buffer pre-reservation that i made in this commit.

I'm investigating further, please if you find a likely cause running the benches on your machine, do guide me through what was real regressions or just noise.

Ok(Box::new(WriteContext::new(type_resolver.clone(), config)))
Ok(Box::new(WriteContext::new(
type_resolver.clone(),
self.config.clone(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you clone config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously it was cloned as well, i just moved it inside creation closure, which avoided its access on every serialize call and using its cached context instead.

@ayush00git
Copy link
Copy Markdown
Contributor Author

ayush00git commented Apr 19, 2026

@chaokunyang have a look at the terminal output now, i tried setting the cpu performance fixed and now it didn't showed any regression

fory/benchmarks/rust on  main [$✘?] is 󰏗 v0.17.0-alpha.0 via  v3.14.3 via 󱘗 v1.95.0 on  (us-east-1) took 18s 
❯ cargo bench --bench buffer_write_bench -- --save-baseline main
    Finished `bench` profile [optimized] target(s) in 0.06s
     Running benches/buffer_write_bench.rs (target/release/deps/buffer_write_bench-011c6e0ee322c976)
Gnuplot not found, using plotters backend
write_u8/current        time:   [6.9713 µs 6.9928 µs 7.0194 µs]
                        thrpt:  [142.46 Melem/s 143.00 Melem/s 143.44 Melem/s]
Found 15 outliers among 100 measurements (15.00%)
  8 (8.00%) high mild
  7 (7.00%) high severe

write_i32/current       time:   [741.79 ns 743.48 ns 745.28 ns]
                        thrpt:  [1.3418 Gelem/s 1.3450 Gelem/s 1.3481 Gelem/s]
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  7 (7.00%) high mild
  1 (1.00%) high severe

write_i64/current       time:   [749.40 ns 751.35 ns 753.40 ns]
                        thrpt:  [1.3273 Gelem/s 1.3309 Gelem/s 1.3344 Gelem/s]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

write_f32/current       time:   [827.00 ns 834.11 ns 841.56 ns]
                        thrpt:  [1.1883 Gelem/s 1.1989 Gelem/s 1.2092 Gelem/s]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

write_f64/current       time:   [860.26 ns 866.46 ns 872.89 ns]
                        thrpt:  [1.1456 Gelem/s 1.1541 Gelem/s 1.1624 Gelem/s]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

write_varint32_small/current
                        time:   [894.10 ns 898.41 ns 902.69 ns]
                        thrpt:  [1.1078 Gelem/s 1.1131 Gelem/s 1.1184 Gelem/s]
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

write_varint32_medium/current
                        time:   [1.4174 µs 1.4189 µs 1.4207 µs]
                        thrpt:  [703.90 Melem/s 704.77 Melem/s 705.54 Melem/s]
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe

write_varint32_large/current
                        time:   [1.6701 µs 1.6717 µs 1.6734 µs]
                        thrpt:  [597.59 Melem/s 598.20 Melem/s 598.77 Melem/s]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

write_varint64_small/current
                        time:   [1.1521 µs 1.1594 µs 1.1669 µs]
                        thrpt:  [856.95 Melem/s 862.49 Melem/s 868.01 Melem/s]
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  10 (10.00%) high mild

write_varint64_medium/current
                        time:   [1.7693 µs 1.7748 µs 1.7802 µs]
                        thrpt:  [561.73 Melem/s 563.45 Melem/s 565.18 Melem/s]

write_varint64_large/current
                        time:   [2.8024 µs 2.8496 µs 2.9050 µs]
                        thrpt:  [344.24 Melem/s 350.93 Melem/s 356.84 Melem/s]
Found 38 outliers among 100 measurements (38.00%)
  16 (16.00%) low severe
  6 (6.00%) low mild
  1 (1.00%) high mild
  15 (15.00%) high severe
  
fory/benchmarks/rust on  feat/rust-sizeguards [$✘?] is 󰏗 v0.17.0-alpha.0 via  v3.14.3 via 󱘗 v1.95.0 on  (us-east-1) took 5s 
❯ cargo bench --bench buffer_write_bench -- --baseline main
    Finished `bench` profile [optimized] target(s) in 0.06s
     Running benches/buffer_write_bench.rs (target/release/deps/buffer_write_bench-011c6e0ee322c976)
Gnuplot not found, using plotters backend
write_u8/current        time:   [6.8933 µs 6.9147 µs 6.9399 µs]
                        thrpt:  [144.09 Melem/s 144.62 Melem/s 145.07 Melem/s]
                 change:
                        time:   [-2.0992% -1.4519% -0.9010%] (p = 0.00 < 0.05)
                        thrpt:  [+0.9091% +1.4733% +2.1442%]
                        Change within noise threshold.
Found 20 outliers among 100 measurements (20.00%)
  6 (6.00%) high mild
  14 (14.00%) high severe

write_i32/current       time:   [729.41 ns 732.87 ns 736.71 ns]
                        thrpt:  [1.3574 Gelem/s 1.3645 Gelem/s 1.3710 Gelem/s]
                 change:
                        time:   [-2.2545% -1.4650% -0.3802%] (p = 0.00 < 0.05)
                        thrpt:  [+0.3817% +1.4868% +2.3065%]
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

write_i64/current       time:   [741.69 ns 743.49 ns 745.33 ns]
                        thrpt:  [1.3417 Gelem/s 1.3450 Gelem/s 1.3483 Gelem/s]
                 change:
                        time:   [-1.2781% -0.9131% -0.5686%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5718% +0.9215% +1.2947%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

write_f32/current       time:   [814.51 ns 818.90 ns 823.47 ns]
                        thrpt:  [1.2144 Gelem/s 1.2211 Gelem/s 1.2277 Gelem/s]
                 change:
                        time:   [-2.0663% -1.1408% -0.1963%] (p = 0.02 < 0.05)
                        thrpt:  [+0.1966% +1.1540% +2.1099%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

write_f64/current       time:   [823.12 ns 827.97 ns 833.88 ns]
                        thrpt:  [1.1992 Gelem/s 1.2078 Gelem/s 1.2149 Gelem/s]
                 change:
                        time:   [-6.2183% -4.5369% -2.9123%] (p = 0.00 < 0.05)
                        thrpt:  [+2.9997% +4.7525% +6.6306%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

write_varint32_small/current
                        time:   [899.07 ns 905.35 ns 914.34 ns]
                        thrpt:  [1.0937 Gelem/s 1.1045 Gelem/s 1.1123 Gelem/s]
                 change:
                        time:   [-0.7350% +0.7029% +2.6802%] (p = 0.46 > 0.05)
                        thrpt:  [-2.6103% -0.6980% +0.7405%]
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

write_varint32_medium/current
                        time:   [1.4128 µs 1.4142 µs 1.4159 µs]
                        thrpt:  [706.29 Melem/s 707.09 Melem/s 707.83 Melem/s]
                 change:
                        time:   [-0.6195% -0.3812% -0.1441%] (p = 0.00 < 0.05)
                        thrpt:  [+0.1443% +0.3827% +0.6233%]
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

write_varint32_large/current
                        time:   [1.6586 µs 1.6692 µs 1.6780 µs]
                        thrpt:  [595.95 Melem/s 599.07 Melem/s 602.92 Melem/s]
                 change:
                        time:   [-0.5057% -0.1241% +0.1752%] (p = 0.49 > 0.05)
                        thrpt:  [-0.1749% +0.1243% +0.5083%]
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

write_varint64_small/current
                        time:   [1.1234 µs 1.1247 µs 1.1260 µs]
                        thrpt:  [888.12 Melem/s 889.16 Melem/s 890.17 Melem/s]
                 change:
                        time:   [-2.1981% -1.4802% -0.7546%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7604% +1.5024% +2.2475%]
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  3 (3.00%) high mild
  2 (2.00%) high severe

write_varint64_medium/current
                        time:   [1.7863 µs 1.7907 µs 1.7945 µs]
                        thrpt:  [557.25 Melem/s 558.43 Melem/s 559.83 Melem/s]
                 change:
                        time:   [-0.1626% +0.2047% +0.5850%] (p = 0.27 > 0.05)
                        thrpt:  [-0.5816% -0.2042% +0.1629%]
                        No change in performance detected.

write_varint64_large/current
                        time:   [2.7535 µs 2.7624 µs 2.7721 µs]
                        thrpt:  [360.74 Melem/s 362.00 Melem/s 363.17 Melem/s]
                 change:
                        time:   [-3.0492% -1.9645% -1.0085%] (p = 0.00 < 0.05)
                        thrpt:  [+1.0188% +2.0038% +3.1451%]
                        Performance has improved.

Copy link
Copy Markdown
Collaborator

@chaokunyang chaokunyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaokunyang chaokunyang merged commit 3e94a45 into apache:main Apr 19, 2026
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants