Skip to content

Conversation

@scoder
Copy link
Contributor

@scoder scoder commented Oct 24, 2025

For exported C names, it's wasteful to have each name and signature created as separate Python strings in the module state when it's unlikely to be used there. Storing joined, compacted string constants instead allows using the available byte string compression and reusing signature C strings for identical PyCapsule signatures.

Closes #7107

For exported C names, it's wasteful to have the name created as Python string in the module state when it's unlikely to be used there. Storing plain C string constants allows the C compiler to store them away in the constant data segment instead.

See cython#7107
@scoder scoder changed the title Use plain C strings for capsule names. Use compressed strings for capsule names. Oct 25, 2025
@scoder
Copy link
Contributor Author

scoder commented Oct 25, 2025

I think I've squeezed every possible byte out of the storage. It now uses a loop over a function pointer array and a single compressed byte string for the names and the signatures, and compacts the signatures by removing duplicates (which also nicely uses the same signature C string for PyCapsules with the same signature).

@da-woods
Copy link
Contributor

To me this makes more sense for the signatures than the names.

The signatures are (probably) duplicated while the names aren't.

For the names, it saves a little bit of memory making the string-tab array shorter (one pointer per name), but increases the runtime memory since you now have the big unified string plus a separate Python string for each name in the dictionary. For the signatures they can just be a pointer into the big unified string so that's fine.

My (possibly wrong) guess would be that this doesn't make much difference to the compression.


Only slightly related to this PR but I do think it would be nice to cut down the constant tables to exclude things that are only used when initializing the module - i.e. have a separate shorter-lived table outside the module state for constants that we don't need to keep around forever.

@scoder
Copy link
Contributor Author

scoder commented Oct 25, 2025

To me this makes more sense for the signatures than the names.
The signatures are (probably) duplicated while the names aren't.

Yes, that's why it's done that way. I also moved the signatures first so that that part of the aligned byte string can stay in memory or caches, while the names in the back part can reside elsewhere, in unused memory places.

For the names, it saves a little bit of memory making the string-tab array shorter

And also the module state. It's now a single string entry for all signatures (which need to stay alive) and all names. I doubt that the names as part of the byte string hurt that much at runtime.

My (possibly wrong) guess would be that this doesn't make much difference to the compression.

I tried it on lxml.etree, which exports 51 functions, and it saves 4 KB in the stripped binary module. Not much, but also not nothing. Can't say much about the runtime impact, but even just the module state reduction (816 bytes) and the signature deduplication (17 signatures less, probably another ~0.5 KB) should be relevant enough to make sure we don't waste more RAM than before.

cut down the constant tables to exclude things that are only used when initializing the module - i.e. have a separate shorter-lived table outside the module state for constants that we don't need to keep around forever.

I think that's a good idea. We could also clear out init-time-only constants at the end of PyInit as a first step, but a separate table could really discard them completely.

…st of tuples for the export instead of three separate lists.
@scoder
Copy link
Contributor Author

scoder commented Oct 25, 2025

I think we can probably do the same for the import code. I'll take a look as well.

@da-woods
Copy link
Contributor

My (possibly wrong) guess would be that this doesn't make much difference to the compression.

I tried it on lxml.etree, which exports 51 functions, and it saves 4 KB in the stripped binary module. Not much, but also not nothing. Can't say much about the runtime impact, but even just the module state reduction (816 bytes) and the signature deduplication (17 signatures less, probably another ~0.5 KB) should be relevant enough to make sure we don't waste more RAM than before.

My original comment was a little ambiguous. I completely believe in the compression+deduplication of the signatures. I really meant the "compared to a version where the names stay in the module state as they are now".

FWIW I had a quick go at that in https://github.com/da-woods/cython/tree/export-without-names. Not really sure how I'd actually compare runtime memory usage - I tried tracemalloc with Cython but don't think it told me too much.

I added a switch to use either PyBytes or a C string for sig+names and with PyBytes, modules that import the shared module become visibly smaller. So I'll leave the switch at "True".
…nd of ModuleNode.py to keep the ModuleNode class near the top.
@scoder
Copy link
Contributor Author

scoder commented Oct 25, 2025

I integrated also the import code. Looking at the shared module test, the modules that import the shared module become visibly smaller:
Current master:

 61840 TEST_TMP/memoryview/memoryview_shared_utility/pkg1/pkg11/add_one.cpython-312-x86_64-linux-gnu.so*
 66064 TEST_TMP/memoryview/memoryview_shared_utility/pkg1/pkg11/cast.cpython-312-x86_64-linux-gnu.so*
104048 TEST_TMP/memoryview/memoryview_shared_utility/pkg1/pkg11/dependency.cpython-312-x86_64-linux-gnu.so*
178600 TEST_TMP/memoryview/memoryview_shared_utility/pkg2/CythonShared.cpython-312-x86_64-linux-gnu.so*

Before integrating the import code:

 61840 TEST_TMP/memoryview/memoryview_shared_utility/pkg1/pkg11/add_one.cpython-312-x86_64-linux-gnu.so*
 66064 TEST_TMP/memoryview/memoryview_shared_utility/pkg1/pkg11/cast.cpython-312-x86_64-linux-gnu.so*
104048 TEST_TMP/memoryview/memoryview_shared_utility/pkg1/pkg11/dependency.cpython-312-x86_64-linux-gnu.so*
178920 TEST_TMP/memoryview/memoryview_shared_utility/pkg2/CythonShared.cpython-312-x86_64-linux-gnu.so*

After:

 57744 TEST_TMP/memoryview/memoryview_shared_utility/pkg1/pkg11/add_one.cpython-312-x86_64-linux-gnu.so*
 66064 TEST_TMP/memoryview/memoryview_shared_utility/pkg1/pkg11/cast.cpython-312-x86_64-linux-gnu.so*
 99952 TEST_TMP/memoryview/memoryview_shared_utility/pkg1/pkg11/dependency.cpython-312-x86_64-linux-gnu.so*
178920 TEST_TMP/memoryview/memoryview_shared_utility/pkg2/CythonShared.cpython-312-x86_64-linux-gnu.so*

It's interesting that the shared Cython module gets a little larger with this change. Not sure what triggers that. I guess it fails to benefit from the signature deduplication. I also tried using a plain C string for sig+name instead of compressed Python bytes and that makes it much larger (183016 bytes), so that's not a good idea.

@scoder
Copy link
Contributor Author

scoder commented Oct 25, 2025

It's also interesting that the gain is always exactly 4096 bytes. Might be a segmentation issue. That could also explain why the shared module grows in the test, it might simply move data into a different binary segment that then grows to the next unit boundary. Something like that, maybe?

EDIT: This SO question suggests that segments are padded to the page size of the CPU architecture:
https://stackoverflow.com/questions/67288459/elf-executable-file-many-zero-bytes

@scoder
Copy link
Contributor Author

scoder commented Oct 26, 2025

the names stay in the module state as they are now

I considered that along the way but I doubt that the names would also be used inside of the module, so that they'd benefit from being Python strings. I'd expect them to be used exactly once in the module, when exporting the capsules.

OTOH, we also intern the Python identifiers, and other modules that import the capsules need to provide the same strings, so (globally) interning them on export (and import) could reduce the total amount of different string objects in the Python runtime. That's assuming that the majority of exported names are also used somewhere, which isn't always true for cross package cimports from larger packages (say, SciPy-sized). Here, I'd expect the gain from shared interned strings to be fairly small in comparison to the larger module states. Avoiding interned Python names on both sides (export and import) might actually be a better choice.

That written, we're talking about a few kilobytes back and forth here. Whichever we choose probably won't turn a big wheel either way in a real system. If we eventually manage to discard import-time data after use, this will be easy to split up again.

With the import integration, I consider the PR in its current state fine for 3.2rc. What do you think?

@scoder scoder modified the milestones: 3.3, 3.2 Oct 26, 2025
@scoder
Copy link
Contributor Author

scoder commented Oct 26, 2025

We could also clear out init-time-only constants at the end of PyInit as a first step

I implemented this for the simple case here, but then noticed that this isn't safe because we deduplicate constants (and strings specifically) in the string array. Thus, if a user happens to have a bytes string around that looks like the names or signatures string, then clearing the array index in the module state will kill a user visible string. However unlikely that is, it's easy to get there for user code. That speaks for keeping the strings entirely separate. Definitely not in 3.2 any more.

@da-woods
Copy link
Contributor

We could also clear out init-time-only constants at the end of PyInit as a first step

I implemented this for the simple case here, but then noticed that this isn't safe because we deduplicate constants (and strings specifically) in the string array. Thus, if a user happens to have a bytes string around that looks like the names or signatures string, then clearing the array index in the module state will kill a user visible string. However unlikely that is, it's easy to get there for user code. That speaks for keeping the strings entirely separate. Definitely not in 3.2 any more.

Yes - for me I think it'd make most sense to do as a general mechanism rather than adding specific special-cases. (I also don't know exactly what you did here, but the signature strings do need to live as long as the module for the export code so it'd have to be names-only).

Will have a thorough look at it later today.

Copy link
Contributor

@da-woods da-woods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any issues.

This feels like the sort of thing that's sufficiently repetitive that it probably doesn't have hidden corner cases so is probably OK for 3.2. But it's always a bit hard to judge

@scoder scoder merged commit 2a00d4c into cython:master Oct 27, 2025
92 checks passed
@scoder scoder deleted the exported_names branch October 27, 2025 07:37
@scoder
Copy link
Contributor Author

scoder commented Oct 27, 2025

cut down the constant tables to exclude things that are only used when initializing the module

This is now #7266

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] Compress string constants

2 participants