-
-
Notifications
You must be signed in to change notification settings - Fork 6
Remove libipld dependency #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The `libipld` crate is deprecated. Usually the transition from `libipld` is into using `ipld-core` and `serde_ipld_dagcbor`. Though this crate is so low-level, that it should use `cbor4ii` directly. `cbor4ii` is the CBOR library that `serde_ipld_dagcbor` is using.
CodSpeed Performance ReportMerging #80 will improve performances by ×2.5Comparing Summary
Benchmarks breakdown
|
|
This is awesome! I considered migrating to something else after the deprecation of I love how compatible it is with the current test suite. Moreover, we do have some incredible performance boosts! For example, However, we also got a performance regression. Which is worth checking:
The interesting note (possible hint) is that this is |
| "RecursionError: maximum recursion depth exceeded in DAG-CBOR decoding", | ||
| ).restore(py); | ||
| ) | ||
| .restore(py); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What code formatter do you use? Gonna make it a step in the CI pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a plain cargo fmt.
|
@MarshalX I've pushed a new commit which should fix the performance regression. Can you please re-trigger a benchmark run? |
|
@vmx done! Incredible results! Decoding gained x2.5 at peak and there is no problems with decoding anymore!!! The rest is only 3 benchmarks around encoding. Still canada.json -9% btw, do you know why CID decoding sped up so much? maybe you are familiar with the differences |
The maximum recursion limit tracked within `decode_dag_cbor_to_pyobject()`, hence it doesn't need to be part of the SliceReader.
The latest version contains performance improvements.
A comment from you said that it's allocating. The new version shouldn't allocate. Maybe that makes the difference? For the encoding I did some improvements upstream for numbers. That should at least get the canada.json test performance up. Let's see what happens with the others. I've have a hard time trying to figure out what makes them slower. The please retrigger the benchmark once again. |
Hero! Canada encoding faster by 5% than one commit before 🔥 Still some little degradation by 4%, but this is acceptable This is hard to believe but somehow github.json in encoding is now slower by 9% I am looking only at real data benchmarks since the rest are so micro and could be noise as hell Upd. Github benchmark measured in us. Mb tiny enough to make some noise... but historically it showed always the correct number to rely on |
|
I've tried things locally. I had large variations between runs. I then added some timing (wall clock) within the Rust code. A large overhead (30x) and large variations came from the Python initialization part. The actual parsing for the github.json one is really small. Is there a way to re-run the current main branch benchmark again, just to see how big the variation is between runs? |
|
Here is the result of the main benchmark run from today VS 26 days ago: https://codspeed.io/MarshalX/python-libipld/runs/compare/691bc274750130912a26cc99..68f8f140424026582c5e7fc4 |
My take away from those two runs is:
I'd still like to know, why the github.json is slower, but I'm not sure if spending much more time on it is really worth it. As it's all mostly a single function it's kind of hard to profile and investigate what's really going on (and I also haven't done that much with Rust projects yet). |
|
I do agree with you. And I am ready to move forward. Moreover, encoding is not yet critical for the atproto community, so it will not affect the major user base of the library at all. Let's wait until the next upstream lib release with your perf boost and public api and merge it. Thank you for your hard work! |
Do the same as the original version and rely on Python for the string UTF-8 validation.
|
i do recall some perf problems around upd. not sure how pyo3 changed since prev year. they did a great job around this new bound api to illuminate overheads upd2. i misread it with |
|
In local testing it didn't make things slower, hence I've used it. Do I read your updates correctly that it's all good? I run benchmarks via e.g. |
|
The latest regression shows that tests vary a lot between runs. The encoding code path did not change. |
|
Yes, all good! It is the correct way. At least this is the exact way how to runs inside pipelines I start hating the results on the codspeed. Maybe PGO adds this randomness... but the input data is static... The CI pipeline looks awkward to me. The PGO gathering stage runs benchmarks properly,
and they do use |
|
Yeap. looks like codspeed was completely off and is not compatible with how |
|
Welp, I spent a few hours playing around and here are my notes:
I do think that we must use local bench comparison only. The greatest thing that I did was a group of useful benchmarks. Here is how to start comparing locally: Remote comparison using the new workflow: Verdict: encoding is still -2-11% slower, which is so strange because without digging too deep, I can not answer why. Which correlates with codspeed simulation |
I also have no clue why encoding would be slower. I'll rerun the tests as you mentioned above (it would be good to have that in the README). In the past local re-runs had still a pretty big variation which i'm also not sure why this is. I also tried to run them directly from Rust through a binary, but even there the variations are large. |
|
I just couldn't give up. I think I've found the main issue. Please try again. |
Rust's BufWriter is highly optimized. Use it instead of a custom one. Wrap it in a newtype so that we can implement `cbor4ii`s `enc::Write`.
|
Looks like this is it! You did it @vmx! I would say that now the perf is the same, and some +-2% is randomness I really like to see how max values are much lower in cbor4ii. I feel some potential here. We need to see gains with PGO :) |
|
Please re-run again locally. I've missed the flushing in the last version. Now the code is even closer to the original one, if you look at the full diff. No changes on the main encoding entry. |





The
libipldcrate is deprecated. Usually the transition fromlibipldis into usingipld-coreandserde_ipld_dagcbor. Though this crate is so low-level, that it should usecbor4iidirectly.cbor4iiis the CBOR library thatserde_ipld_dagcboris using.The tests pass locally. I also haven't done any benchmarking. So this should be seen as a starting point, I'm happy to get it over the finish line if there's interest.
Copying the
SliceReaderfromcbor4iiisn't ideal, maybe we get an upstream fix. I've opened quininer/cbor4ii#50.