Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk_walk causes a LOT of allocations? #2

Open
cchance27 opened this issue May 6, 2023 · 7 comments
Open

bulk_walk causes a LOT of allocations? #2

cchance27 opened this issue May 6, 2023 · 7 comments

Comments

@cchance27
Copy link

Started testing with csnmp since theirs not much snmp support packages in rust, and noticed that the btreemap generation seems to result in a lot of allocations, i bulk walked a table that returned ~52k allocations for the 400 returned oids... is this expected behaviour, also was wondering why you went with btreemap instead of hashmap as i imagine that would be lighter on memory footprint no?

@RavuAlHemio
Copy link
Owner

RavuAlHemio commented May 6, 2023

To be honest, I never really instrumented the memory behavior. Which tool would you recommend?

I chose BTreeMap over HashMap to ensure the OIDs remain in order; perhaps I should rearchitect the API to allow a choice of either.

@cchance27
Copy link
Author

I used DHAT to try and diagnose it as I'm remaking a larger internal tool from C# that does SNMP and some other stuff, trying to watch memory usage and allocations to keep performance high as i go through, and was a bit shocked at the number of allocations.

I haven't taken a deeper look at your parsing etc, but not sure if u knew there's even a nom parsing crate now that can offload the parsing side of the snmp, not sure how it benchmarks/performs vs the way you're doing it internally, might make your internal work easier to refactor.

@RavuAlHemio
Copy link
Owner

Switching from BTreeMap to HashMap didn't do much; it appears most of the allocations are related to BigInt/BigUint, since ASN.1 allows arbitrary-length integers. I'll sample the other available ASN.1 libraries; maybe one of them provides a more efficient representation.

@RavuAlHemio
Copy link
Owner

As you might have seen, I started an experimental derparser branch to switch from simple_asn1 to the nom-based der-parser.

Comparing the two branches when bulk-walking 1.3.6.1.2.1 on a pretty bog-standard Cisco switch:

parser bytes blocks time
simple_asn1 358,385,203 3,692,966 51,74 s
der-parser 217,130,033 96,592 5,66 s

Honestly, just time-wise, that seems like quite the improvement, although I'm not sure that it solves your allocation issue -- it allocates fewer blocks but still a very similar number of bytes.

@cchance27
Copy link
Author

cchance27 commented May 9, 2023

I mean your allocating similar bytes but much fewer blocks and a 10-fold performance increase is pretty darn impressive

Here's my run of my code swaping between branches...

parser bytes blocks time
simple_asn1 34,130,594 54,246 21,84 s
der-parser 32,264,796 7,238 3,55 s

However with dhat disabled actual execution time is about the same ~600ms, the improvement we're seeing in time is because of dhat having to record the much higher number of memory traffic, but saving 47000 allocations to memory is an improvement regardless, still odd the byte size is so high for relatively little data, sort of points toward the memory usage being from something outside the decoding...

I wonder if the bytes allocated is coming from the objectvalue side or the objectidentifier side.

@RavuAlHemio
Copy link
Owner

RavuAlHemio commented May 10, 2023

I tried rewriting ObjectIdentifier to use Vec<u32> instead of a fixed-size array; the dhat results are as follows:

ObjectIdentifier bytes blocks
array 313,295,165 262,723
vector 96,831,229 488,935

So kinda as expected: less memory but more blocks. Also, ObjectIdentifier::new can no longer be const.

@cchance27
Copy link
Author

cchance27 commented May 10, 2023

bytes drastically getting reduced makes sense, but it's odd now that allocations went up 2x... on the bright side it's not just total bytes allocated, but t-gmax also went down, i went from a t-gmax of 2mb to 170kb a major savings.

In the dhat view, it seems almost all the allocations that are popping up now are coming from clones of the objectidentifier vec

15k of the 24k allocations in my run were from the clones.

Total: 951,036 bytes (31.91%, 93,911.16/s) in 15,851 blocks (64.34%, 1,565.23/s), avg size 60 bytes, avg lifetime 397,278.04 µs (3.92% of program duration)
Max: 57,600 bytes in 960 blocks, avg size 60 bytes
At t-gmax: 57,600 bytes (33.73%) in 960 blocks (68.23%), avg size 60 bytes
At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
Allocated at {
^1: 0x7ff61d217ded: alloc::alloc::impl$1::allocate (alloc\src\alloc.rs:237:0)
^2: 0x7ff61d2152cc: alloc::raw_vec::RawVec<u32,alloc::alloc::Global>::allocate_in<u32,alloc::alloc::Global> (alloc\src\raw_vec.rs:185:0)
^3: 0x7ff61d236884: alloc::slice::hack::impl$1::to_vec<u32,alloc::alloc::Global> (alloc\src\slice.rs:162:0)
^4: 0x7ff61d2107ba: alloc::slice::hack::to_vec (alloc\src\slice.rs:111:0)
^5: 0x7ff61d2107ba: alloc::slice::impl$0::to_vec_in (alloc\src\slice.rs:441:0)
^6: 0x7ff61d2107ba: alloc::vec::impl$11::clone<u32,alloc::alloc::Global> (src\vec\mod.rs:2655:0)
^7: 0x7ff61d17515d: csnmp::oid::impl$18::clone (csnmp\src\oid.rs:75:0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants