Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only used PyUnicode_Concat on unicode objects #3433

Merged
merged 9 commits into from
Mar 21, 2020
Merged

Conversation

da-woods
Copy link
Contributor

@da-woods da-woods commented Mar 15, 2020

Closes #3426


(I'm getting a segmentation fault on the fstring tests with the current master, which is making it difficult to run this through a thorough set of tests. It happens with and without this PR. I'm not sure where this is due to something being messed up on my side or a real bug, but I'll look into and report it separately. Just noting it here because it means I couldn't test this PR with with "runtest.py string") Caching issues to do with Cython.inline - I'd clearly ended up with a broken version cached - ignore

Cython/Compiler/ExprNodes.py Outdated Show resolved Hide resolved
tests/run/test_unicode_string_tests.pxi Show resolved Hide resolved
da-woods and others added 3 commits March 16, 2020 15:52
Co-Authored-By: Stefan Behnel <stefan_ml@behnel.de>
Optimized inplace operations for bytes and unicode so that they're
genuinely done in place if no-one else needs the object. This
is what CPython tries to do (and was a string concatenation was
a point where it significantly beat Cython at times)

This only works if the types are known at compile time, so with
unknown types CPython will still be faster in some cases
@da-woods
Copy link
Contributor Author

da-woods commented Mar 16, 2020

I've updated this to also optimize str_type where its clear that it's unicode (i.e. when language_level = 3).

I've also added in some optimization of inplace concatenation. This does give a significant speed-up in some (probably mostly artificial) cases. However, it also uses some slightly awkward macros to make it fit. If you'd rather drop it then I can just remove 7d2608e reverted because it breaks stuff. I'll investigate some more and see if it is usable, but it doesn't need to be in this PR

Better if unicode concat is done with more settings of str_type
@scoder
Copy link
Contributor

scoder commented Mar 21, 2020

CPython has a specific optimization to check the reference count of the RHS and append rather than concat if possible

Yes, that seems worth doing on our side as well. Especially since 2-value f-strings are already replaced by a concatenation rather than joining because it's much faster. Making this even faster with this "hack" would be nice.

@da-woods
Copy link
Contributor Author

da-woods commented Mar 21, 2020

CPython has a specific optimization to check the reference count of the RHS and append rather than concat if possible

Yes, that seems worth doing on our side as well. Especially since 2-value f-strings are already replaced by a concatenation rather than joining because it's much faster. Making this even faster with this "hack" would be nice.

I've pushed my patch for this. I had managed to get it working but was a bit undecided about if it should go in this PR. Notes from the relevant commit below

Will be submitted separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError on list += str
2 participants