Remove libipld dependency #80

vmx · 2025-11-12T15:32:25Z

The libipld crate is deprecated. Usually the transition from libipld is into using ipld-core and serde_ipld_dagcbor. Though this crate is so low-level, that it should use cbor4ii directly.

cbor4ii is the CBOR library that serde_ipld_dagcbor is using.

The tests pass locally. I also haven't done any benchmarking. So this should be seen as a starting point, I'm happy to get it over the finish line if there's interest.

Copying the SliceReader from cbor4ii isn't ideal, maybe we get an upstream fix. I've opened quininer/cbor4ii#50.

The `libipld` crate is deprecated. Usually the transition from `libipld` is into using `ipld-core` and `serde_ipld_dagcbor`. Though this crate is so low-level, that it should use `cbor4ii` directly. `cbor4ii` is the CBOR library that `serde_ipld_dagcbor` is using.

codspeed-hq · 2025-11-12T19:37:26Z

CodSpeed Performance Report

Merging #80 will improve performances by ×2.5

_{Comparing vmx:remove-libipld (b64c304) with main (7a1eabd)}

Summary

⚡ 161 improvements
✅ 31 untouched

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Change
⚡	Simulation	`test_dag_cbor_decode[roundtrip01.json]`	19.2 µs	16.6 µs	+15.55%
⚡	Simulation	`test_dag_cbor_decode[roundtrip02.json]`	19.2 µs	16.6 µs	+15.53%
⚡	Simulation	`test_dag_cbor_decode[roundtrip03.json]`	19.2 µs	16.6 µs	+15.15%
⚡	Simulation	`test_dag_cbor_decode[roundtrip04.json]`	19.1 µs	16.7 µs	+14.78%
⚡	Simulation	`test_dag_cbor_decode[roundtrip05.json]`	21.2 µs	18 µs	+17.72%
⚡	Simulation	`test_dag_cbor_decode[roundtrip06.json]`	17.7 µs	14.8 µs	+19.56%
⚡	Simulation	`test_dag_cbor_decode[roundtrip07.json]`	17.6 µs	14.8 µs	+18.71%
⚡	Simulation	`test_dag_cbor_decode[roundtrip08.json]`	19.1 µs	17.1 µs	+11.46%
⚡	Simulation	`test_dag_cbor_decode[roundtrip09.json]`	21.7 µs	17.6 µs	+23.68%
⚡	Simulation	`test_dag_cbor_decode[roundtrip10.json]`	22.6 µs	17.9 µs	+26.03%
⚡	Simulation	`test_dag_cbor_decode[roundtrip11.json]`	19.8 µs	17.5 µs	+13.07%
⚡	Simulation	`test_dag_cbor_decode[roundtrip12.json]`	20.2 µs	17.8 µs	+13.19%
⚡	Simulation	`test_dag_cbor_decode[roundtrip13.json]`	20.2 µs	17.9 µs	+13.3%
⚡	Simulation	`test_dag_cbor_decode[roundtrip14.json]`	20.2 µs	17.8 µs	+13.49%
⚡	Simulation	`test_dag_cbor_decode[roundtrip15.json]`	19 µs	16.7 µs	+14.05%
⚡	Simulation	`test_dag_cbor_decode[roundtrip16.json]`	19.7 µs	17.5 µs	+12.6%
⚡	Simulation	`test_dag_cbor_decode[roundtrip17.json]`	19.7 µs	17.4 µs	+13.04%
⚡	Simulation	`test_dag_cbor_decode[roundtrip18.json]`	19.8 µs	17.3 µs	+14.47%
⚡	Simulation	`test_dag_cbor_decode[roundtrip19.json]`	19.7 µs	17.3 µs	+13.77%
⚡	Simulation	`test_dag_cbor_decode[roundtrip20.json]`	19.3 µs	17 µs	+14.08%
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

MarshalX · 2025-11-13T01:59:46Z

This is awesome!

I considered migrating to something else after the deprecation of libipld... During my investigations, I remember creating issues in one of your repos related to the recursion limit. I do see that this is solved by impl of dec::Read.

I love how compatible it is with the current test suite. Moreover, we do have some incredible performance boosts! For example, test_decode_car got +24% (858ms -> 693.6ms) and test_dag_cbor_decode_torture_cids an unbelievable +92% (93ms -> 48.4ms)!!!

However, we also got a performance regression. Which is worth checking:

test_dag_cbor_decode_real_data[canada.json] -26%
test_dag_cbor_encode_real_data[canada.json] -9%
test_dag_cbor_encode_real_data[citm_catalog.json] -3%

The interesting note (possible hint) is that this is canada.json, which is a lot of lists of floats. This is where to dig into

src/lib.rs

MarshalX · 2025-11-13T02:12:10Z

src/lib.rs

                "RecursionError: maximum recursion depth exceeded in DAG-CBOR decoding",
-            ).restore(py);
+            )
+            .restore(py);


What code formatter do you use? Gonna make it a step in the CI pipeline

Just a plain cargo fmt.

vmx · 2025-11-14T20:46:03Z

@MarshalX I've pushed a new commit which should fix the performance regression. Can you please re-trigger a benchmark run?

MarshalX · 2025-11-14T21:13:21Z

@vmx done! Incredible results! Decoding gained x2.5 at peak and there is no problems with decoding anymore!!! The rest is only 3 benchmarks around encoding. Still canada.json -9%

btw, do you know why CID decoding sped up so much? maybe you are familiar with the differences

The maximum recursion limit tracked within `decode_dag_cbor_to_pyobject()`, hence it doesn't need to be part of the SliceReader.

The latest version contains performance improvements.

vmx · 2025-11-17T15:46:43Z

btw, do you know why CID decoding sped up so much?

A comment from you said that it's allocating. The new version shouldn't allocate. Maybe that makes the difference?

For the encoding I did some improvements upstream for numbers. That should at least get the canada.json test performance up. Let's see what happens with the others. I've have a hard time trying to figure out what makes them slower.

The please retrigger the benchmark once again.

MarshalX · 2025-11-17T21:09:28Z

I did some improvements upstream for numbers

Hero!

Canada encoding faster by 5% than one commit before 🔥 Still some little degradation by 4%, but this is acceptable

This is hard to believe but somehow github.json in encoding is now slower by 9%

I am looking only at real data benchmarks since the rest are so micro and could be noise as hell

Upd. Github benchmark measured in us. Mb tiny enough to make some noise... but historically it showed always the correct number to rely on

vmx · 2025-11-17T23:58:28Z

I've tried things locally. I had large variations between runs. I then added some timing (wall clock) within the Rust code. A large overhead (30x) and large variations came from the Python initialization part. The actual parsing for the github.json one is really small.

Is there a way to re-run the current main branch benchmark again, just to see how big the variation is between runs?

MarshalX · 2025-11-18T00:53:36Z

Here is the result of the main benchmark run from today VS 26 days ago: https://codspeed.io/MarshalX/python-libipld/runs/compare/691bc274750130912a26cc99..68f8f140424026582c5e7fc4

vmx · 2025-11-18T10:40:22Z

Here is the result of the main benchmark run from today VS 26 days ago: https://codspeed.io/MarshalX/python-libipld/runs/compare/691bc274750130912a26cc99..68f8f140424026582c5e7fc4

My take away from those two runs is:

the github.json one seems pretty stable
random other tests (twitter.json) can deviate up to 9%. And that one is even in the "ms"-range.

I'd still like to know, why the github.json is slower, but I'm not sure if spending much more time on it is really worth it. As it's all mostly a single function it's kind of hard to profile and investigate what's really going on (and I also haven't done that much with Rust projects yet).

MarshalX · 2025-11-18T12:18:18Z

I do agree with you. And I am ready to move forward.

Moreover, encoding is not yet critical for the atproto community, so it will not affect the major user base of the library at all.

Let's wait until the next upstream lib release with your perf boost and public api and merge it.

Thank you for your hard work!

Do the same as the original version and rely on Python for the string UTF-8 validation.

MarshalX · 2025-11-20T11:36:00Z

i do recall some perf problems around PyString::new_bound, that's why I've picked (#41) an unsafe approach to make direct CPython ffi calls instead of using pyo3 wrap. i do see in the regression the calls to PyObject_GetMethod, which is possible the pyo3 overhead

upd. not sure how pyo3 changed since prev year. they did a great job around this new bound api to illuminate overheads

upd2. i misread it with from_bytes. it looks exacty fii call as before https://github.com/PyO3/pyo3/blob/d8e9a3860b5a08b8020364841808b2d3cb2f4f68/src/types/string.rs#L175-L183
upd3. yeah, they just added from_bytes in September this year, that why unsafe was in place before

vmx · 2025-11-20T11:53:11Z

In local testing it didn't make things slower, hence I've used it. Do I read your updates correctly that it's all good?

I run benchmarks via e.g. uv run pytest -k 'test_dag_cbor_decode_real_data[github.json]' --benchmark-enable. Is that the correct way?

Before the most recent change:
------------------------------------------- benchmark: 1 tests -------------------------------------------
Name (time in us)                                    Min      Mean    StdDev  Outliers  Rounds  Iterations
----------------------------------------------------------------------------------------------------------
test_dag_cbor_decode_real_data[github.json]     375.4650  479.6358  114.6506     487;0    1779           1
----------------------------------------------------------------------------------------------------------

After the change:
------------------------------------------- benchmark: 1 tests ------------------------------------------
Name (time in us)                                    Min      Mean   StdDev  Outliers  Rounds  Iterations
---------------------------------------------------------------------------------------------------------
test_dag_cbor_decode_real_data[github.json]     284.8470  325.7824  73.1020   268;358    2016           1
---------------------------------------------------------------------------------------------------------

vmx · 2025-11-20T11:56:49Z

The latest regression shows that tests vary a lot between runs. The encoding code path did not change.

MarshalX · 2025-11-20T12:22:58Z

Yes, all good!

It is the correct way. At least this is the exact way how to runs inside pipelines

I start hating the results on the codspeed. Maybe PGO adds this randomness... but the input data is static... The CI pipeline looks awkward to me. The PGO gathering stage runs benchmarks properly, benchmark: 192 tests with the same table in the output as your local runs. But the codpeed benchmark run looks like uses only tests (0 benchmarked)? without doing proper rounds and iterations? that's rly hard to tell because they are injecting their own benchmark runner as far as i know...

CodSpeed had to disable the following plugins: pytest-benchmark

and they do use pytest-codspeed

MarshalX · 2025-11-20T12:27:38Z

Yeap. looks like codspeed was completely off and is not compatible with how pytest-benchmark defines benchmarks in the code... Let me dig into and and push fixed in separated PR

MarshalX · 2025-11-20T19:56:08Z

Welp, I spent a few hours playing around and here are my notes:

I do not think that we should rely on codspeed; today I open to myself how it works https://codspeed.io/docs/instruments/cpu/overview. The most important thing is "A benchmark will be run only once and the CPU behavior will be simulated". So, there are no real runs more than 1
I do not think that we should rely on any CI/CD benchmarks because this repo uses GitHub-hosted runners. As far as I learn today, the results are +-10-20% XD

I do think that we must use local bench comparison only. The greatest thing that I did was a group of useful benchmarks. Here is how to start comparing locally:

# checkout main
uv pip install -v -e .  
uv run pytest . -m benchmark_main --benchmark-enable --benchmark-save=main
# checkout your branch
uv pip install -v -e .  
uv run pytest . -m benchmark_main --benchmark-enable --benchmark-save=cbor4ii

uv run pytest-benchmark compare --group-by="name"

My local comparison:

Remote comparison using the new workflow:

src: https://github.com/MarshalX/python-libipld/actions/runs/19549156715/attempts/2#summary-55975972813

Verdict: encoding is still -2-11% slower, which is so strange because without digging too deep, I can not answer why. Which correlates with codspeed simulation

vmx · 2025-11-20T20:10:39Z

Verdict: encoding is still -2-11% slower, which is so strange because without digging too deep, I can not answer why. Which correlates with codspeed simulation

I also have no clue why encoding would be slower. I'll rerun the tests as you mentioned above (it would be good to have that in the README). In the past local re-runs had still a pretty big variation which i'm also not sure why this is. I also tried to run them directly from Rust through a binary, but even there the variations are large.

vmx · 2025-11-20T21:24:13Z

I just couldn't give up. I think I've found the main issue. Please try again.

MarshalX · 2025-11-20T22:13:34Z

Decoding fails, but here is my local encoding tests

main is main
0002_cbor4ii - old writer
0003_cbor4ii - new writer

Rust's BufWriter is highly optimized. Use it instead of a custom one. Wrap it in a newtype so that we can implement `cbor4ii`s `enc::Write`.

MarshalX · 2025-11-20T22:15:57Z

Looks like this is it! You did it @vmx! I would say that now the perf is the same, and some +-2% is randomness

I really like to see how max values are much lower in cbor4ii. I feel some potential here. We need to see gains with PGO :)

vmx · 2025-11-20T22:21:02Z

Please re-run again locally. I've missed the flushing in the last version. Now the code is even closer to the original one, if you look at the full diff. No changes on the main encoding entry.

MarshalX · 2025-11-20T22:28:44Z

Not sure what happened with twitter encoding this time but the rest is ok

0003_cbor4ii - your latest commit

Btw codspeed results are here!

MarshalX reviewed Nov 13, 2025

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

MarshalX reviewed Nov 13, 2025

View reviewed changes

Lazily evaluate error

828a12c

vmx mentioned this pull request Nov 17, 2025

Making marker module public quininer/cbor4ii#52

Open

vmx added 2 commits November 17, 2025 15:22

Simplify SliceReader

b2c0fa4

The maximum recursion limit tracked within `decode_dag_cbor_to_pyobject()`, hence it doesn't need to be part of the SliceReader.

Use latest unreleased cbor4ii for now

10005c6

The latest version contains performance improvements.

Don't double validate strings

d532235

Do the same as the original version and rely on Python for the string UTF-8 validation.

vmx force-pushed the remove-libipld branch from 48e2e93 to d532235 Compare November 20, 2025 11:19

Merge branch 'main' into remove-libipld

d8be9fa

Use Rust's BufWriter

42842b0

Rust's BufWriter is highly optimized. Use it instead of a custom one. Wrap it in a newtype so that we can implement `cbor4ii`s `enc::Write`.

vmx force-pushed the remove-libipld branch from f5c8c58 to 42842b0 Compare November 20, 2025 22:17

Merge branch 'main' into remove-libipld

b64c304

Uh oh!

Remove libipld dependency #80

Are you sure you want to change the base?

Remove libipld dependency #80

Conversation

vmx commented Nov 12, 2025

Uh oh!

codspeed-hq bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #80 will improve performances by ×2.5

Summary

Benchmarks breakdown

Uh oh!

MarshalX commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MarshalX Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vmx Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

vmx commented Nov 14, 2025

Uh oh!

MarshalX commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmx commented Nov 17, 2025

Uh oh!

MarshalX commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmx commented Nov 17, 2025

Uh oh!

MarshalX commented Nov 18, 2025

Uh oh!

vmx commented Nov 18, 2025

Uh oh!

MarshalX commented Nov 18, 2025

Uh oh!

MarshalX commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmx commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmx commented Nov 20, 2025

Uh oh!

MarshalX commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarshalX commented Nov 20, 2025

Uh oh!

MarshalX commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmx commented Nov 20, 2025

Uh oh!

vmx commented Nov 20, 2025

Uh oh!

MarshalX commented Nov 20, 2025

Uh oh!

MarshalX commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmx commented Nov 20, 2025

Uh oh!

MarshalX commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq bot commented Nov 12, 2025 •

edited

Loading

MarshalX commented Nov 13, 2025 •

edited

Loading

MarshalX Nov 13, 2025 •

edited

Loading

MarshalX commented Nov 14, 2025 •

edited

Loading

MarshalX commented Nov 17, 2025 •

edited

Loading

MarshalX commented Nov 20, 2025 •

edited

Loading

vmx commented Nov 20, 2025 •

edited

Loading

MarshalX commented Nov 20, 2025 •

edited

Loading

MarshalX commented Nov 20, 2025 •

edited

Loading

MarshalX commented Nov 20, 2025 •

edited

Loading