Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Parsing.py in the limited api #5845

Merged
merged 18 commits into from Dec 4, 2023
Merged

Conversation

da-woods
Copy link
Contributor

It isn't actually usable because it cimports other modules but it does build without warning. Changes are mainly to string handling (because that's the nature of parsing).

Depends on #5841 and #5798 to actually build Parsing.py, but the PR itself can be applied independently.

It isn't actually usable because it cimports other modules
but it does build without warning. Changes are mainly to string
handling (because that's the nature of parsing).

Depends on cython#5841 and cython#5798 to actually build, but the PR itself
doesn't depend on those
da-woods added a commit to da-woods/cython that referenced this pull request Nov 19, 2023
This PR (on top of cython#5845 and its dependencies) is sufficient to
compile all the built modules in the Plex folder of Cython.
It's currently unknown if they actually work (since it really needs
all of Cython to be built to test).
@@ -14193,6 +14194,9 @@ def generate_result_code(self, code):
checks = ["(%s != Py_None)" % self.arg.py_result()] if self.arg.may_be_none() else []
checks.append("(%s(%s) != 0)" % (test_func, self.arg.py_result()))
code.putln("%s = %s;" % (self.result(), '&&'.join(checks)))
code.putln("#if !CYTHON_ASSUME_SAFE_MACROS")
code.putln(code.error_goto_if_neg(self.result(), self.pos))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate, because it may lead to an unused label. In rare cases, admittedly…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rewrite it as

if ((!CYTHON_ASSUME_SAFE_MACROS) && %result% < 0) goto error

so the label is never unused. That might be slightly better (provided we manage to avoid "condition always true" warnings)

Cython/Utility/Builtins.c Outdated Show resolved Hide resolved
Cython/Utility/Builtins.c Outdated Show resolved Hide resolved
Cython/Utility/Builtins.c Outdated Show resolved Hide resolved
Cython/Utility/ObjectHandling.c Outdated Show resolved Hide resolved
Comment on lines 90 to 100
#if CYTHON_ASSUME_SAFE_MACROS
#define __Pyx_unpack_tuple2(tuple, value1, value2, is_tuple, has_known_size, decref_tuple) \
(likely(is_tuple || PyTuple_Check(tuple)) ? \
(likely(has_known_size || PyTuple_GET_SIZE(tuple) == 2) ? \
__Pyx_unpack_tuple2_exact(tuple, value1, value2, decref_tuple) : \
(__Pyx_UnpackTupleError(tuple, 2), -1)) : \
__Pyx_unpack_tuple2_generic(tuple, value1, value2, has_known_size, decref_tuple))
#else
static CYTHON_INLINE int __Pyx_unpack_tuple2(
PyObject* tuple, PyObject** value1, PyObject** value2, int is_tuple, int has_known_size, int decref_tuple);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems worth generally unfolding this lengthy macro into the inline function you wrote, and just special casing the size check.

#if CYTHON_COMPILING_IN_LIMITED_API
// Note that from Python 3.7, the indices of FindChar account for wraparound so no
// need to check the length
Py_ssize_t idx = PyUnicode_FindChar(unicode, character, 0, -1, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also pass PY_SSIZE_T_MAX as end to avoid the risk of an off-by-one error for -1 (because, is that inclusive or exclusive?).

Given that there's a C-API function for this, though, I wonder if we're really still faster than that. The implementation seems quite efficient. From my expecience, PyUnicode_READ() isn't the fastest thing to use, compared to a straight memory loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken this to mean "just use PyUnicode_FindChar in all cases". Happy to revert if this wasn't what you mean though!

Cython/Utility/StringTools.c Outdated Show resolved Hide resolved
Cython/Utility/TypeConversion.c Outdated Show resolved Hide resolved
Cython/Utility/Builtins.c Outdated Show resolved Hide resolved
Cython/Utility/Builtins.c Outdated Show resolved Hide resolved
Cython/Utility/Builtins.c Outdated Show resolved Hide resolved
static CYTHON_INLINE PyObject* __Pyx_PyUnicode_Substring(
PyObject* text, Py_ssize_t start, Py_ssize_t stop);
#else
// In the limited API since 3.7
#define __Pyx_PyUnicode_Substring(text, start, stop) PyUnicode_Substring(text, start, stop)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right - the __Pyx_PyUnicode_Substring function handles negative indices and PyUnicode_Substring doesn't. Will fix

scoder pushed a commit that referenced this pull request Dec 2, 2023
This PR (on top of #5845 and its dependencies)
is sufficient to compile all the built modules in the Plex folder of Cython.
PyTuple_SET_ITEM(tuple, 0, key);
PyTuple_SET_ITEM(tuple, 1, value);
#else
if (unlikely(PyTuple_SetItem(tuple, 0, key) < 0)) {
Py_DECREF(value); // we haven't set this yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't actually set key in this case either. We need to decref both (and value in the failure case below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTuple_SetItem decrefs key on failure (https://github.com/python/cpython/blob/939fc6d6eab9b7ea8c244d513610dbdd556503a7/Objects/tupleobject.c#L122C10-L122C10) so that it effectively steals a reference whether or not it succeeds.

I'll add a comment because it isn't obvious though

Cython/Utility/StringTools.c Outdated Show resolved Hide resolved
Cython/Utility/StringTools.c Outdated Show resolved Hide resolved
Cython/Utility/TypeConversion.c Outdated Show resolved Hide resolved
Cython/Utility/ObjectHandling.c Outdated Show resolved Hide resolved
da-woods and others added 2 commits December 2, 2023 12:50
@scoder
Copy link
Contributor

scoder commented Dec 2, 2023

I've cleaned up the PyUnicode_GET_LENGTH() usages in #5890

Cython/Utility/StringTools.c Outdated Show resolved Hide resolved
Cython/Utility/Builtins.c Outdated Show resolved Hide resolved
@scoder scoder merged commit 02ddde3 into cython:master Dec 4, 2023
63 checks passed
@scoder
Copy link
Contributor

scoder commented Dec 4, 2023

Nice!

@da-woods da-woods deleted the limited-api-parsing branch December 4, 2023 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants