Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rust extensions for PyPy via cpyext #393

Merged
merged 158 commits into from
Apr 23, 2019

Conversation

omerbenamram
Copy link
Contributor

@omerbenamram omerbenamram commented Mar 9, 2019

Hi,
Following the discussion on #370, I've taken some time to dust off my work on support for PyPy via cpyext.

It runs nicely on my machine (osx) for the several test cases I've been using (raising errors, moving objects back and forth and so on).

I've managed to run the word_count example successfully, altough for this usecase pypy beats rust in terms of performance (overhead is greater than benefit).

Unfortunely, it seems the rustapi_module example is still crashing, because of 2 outstanding issues:
EDIT: Both solved!

1. The datetime API wrapper.
2. The way initialize_type behaves (causes SIGSEGV on attempt to register a new class)
My guess is that there is still some outstanding difference in the implementation of the PyTypeObject struct that causes it to crash

I'd like to outline some changes I've done, and why I think there is definitely some more work needed on this before it's ready for prime time.

  1. cpyext does not offer complete support for the entirety of the cpython API. So I've had (mostly by dumping symbols out of the pypy interpreter) manually add the parts of it that it implements with annotations like this:
    #[cfg_attr(PyPy, link_name = "PyPyErr_SetInterrupt")]

I couldn't find any automatic way to set this up, since it's not for all the symbols, so it's quite verbose (I've done this with a regexes), but it makes it very easy to spot changes when the cpyext api layer changes and add them.

  1. I've never succeeded in getting this to work with python2, and since I've read in the PR that it's likely to be discontinued, I didn't invest more time with it.

  2. I've had to manually shim some functions that pypy expects to find when compiling against their version of Python.h, like PyUnicodeDecodeError_Create, these are mostly utility functions which they never bothered implementing, so it looks something like this:
    image

  3. I've modified the build.rs script originally in my implementation, since I needed it to detect and correctly emit configuration variables that are suitable for PyPy, it's a slight drift from the current model (single build.rs file), into a more structured internal crate that I've used. I didn't want to spend time porting all the changes back into the big build.rs file without hearing your thoughts on it first.

  4. As the title suggests, this only supports python -> rust, and not embedding pypy (due to pypy not behaving nicely with embedding), though I don't think embedding pypy is a significant use case, as it's mostly used to squeeze performance out of existing python codebases.

EDIT: Functionallity that is still missing from cpyext that is used by PyO3 internally:
These functions will need to be implemented in rust.

  • complex numbers _Py_c_* functions
  • PyDict_MergeFromSeq2
  • _PyLong_FromByteArray, _PyLong_AsByteArray
  • PySequence_Count
  • Py_CompileStringExFlags (EDIT: PyPy exports Py_CompileStringFlags, so this is actually OK)

I'm still certain there are some semantics that I've missed, but i'd love to get another pair of eyes looking over some of this stuff.

Would love to hear your thoughts :) Cheers 🐍

P.S I'm away for next week, but I will be able to address suggestions the week after.

@omerbenamram
Copy link
Contributor Author

fixed the issues with tox.
Everything is now checked against the pypy version as expected.

OK with me to merge :)

@omerbenamram
Copy link
Contributor Author

@konstin anything else needed?

@kngwyu
Copy link
Member

kngwyu commented Apr 17, 2019

I'll merge this PR after waiting @konstin's reaction 1 day.

@pganssle
Copy link
Member

@kngwyu Please don't forget to squash merge!

src/ffi/datetime.rs Outdated Show resolved Hide resolved
@konstin
Copy link
Member

konstin commented Apr 17, 2019

I think the unification of the ffi API is more likely to cause problems than not, I'm just going to try to fix this on the PyPy side, so this can go ahead.

Big 👍 for fixing this in pypy

@konstin anything else needed?

It's ready to merge from my point of view.

I went through the unresolved conversations and found two minor things:

@omerbenamram
Copy link
Contributor Author

@konstin do you think it may be better to move https://github.com/PyO3/pyo3/pull/393/files#r272995675 to the module level documentation? I could also beef it up a little bit and let @pganssle go over it.

@konstin
Copy link
Member

konstin commented Apr 17, 2019

Yes, that sounds good

@pganssle
Copy link
Member

@pganssle Is https://github.com/PyO3/pyo3/pull/393/files#r268843941 resolved?

It's not resolved per se, but I don't see it as a blocking issue here. The skipping is sufficiently well-scoped that I think it doesn't undermine the effectiveness of the test, and we can just open a new issue about why that's happening (it may not have anything to do with PyPy, just happens to manifest in PyPy).

@konstin do you think it may be better to move https://github.com/PyO3/pyo3/pull/393/files#r272995675 to the module level documentation? I could also beef it up a little bit and let @pganssle go over it.

Happy to look it over, but hopefully it will be fairly ephemeral, as I've already fixed this issue on PyPy for 2.7 and 3.6. Depending on how far back you want to support versions of PyPy, you can drop at least this part of the difference in the API as soon as the next PyPy release.

@omerbenamram
Copy link
Contributor Author

@pganssle https://bitbucket.org/pypy/pypy/commits/d5b44bb7b84d799cb7bc2f7dd530502c40619229?at=py3.6 looks awesome!
I just pushed some notes to the module level.

Feel free to revise them as you see fit, since you clearly have a broader outlook on the subject, and when 3.6 support will be added, we could simply use this version.

Co-Authored-By: omerbenamram <omerbenamram@gmail.com>
@kngwyu
Copy link
Member

kngwyu commented Apr 17, 2019

Happy to look it over, but hopefully it will be fairly ephemeral, as I've already fixed this issue on PyPy for 2.7 and 3.6.

Wow it's great

src/ffi/datetime.rs Outdated Show resolved Hide resolved
Co-Authored-By: omerbenamram <omerbenamram@gmail.com>
//!
//! (PyPy 7.0.0) - A note regarding PyPy (cpyext) support:
//!
//! PyPy's internal representation of dates differs from that of cpython.
Copy link
Member

@pganssle pganssle Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this section or the action that a user is intended to take from it. Using the C API struct is supported for PyPy, right? It's just that not all of the functions are available, and you've added one additional variable in its place.

I think the stuff about the import logic and the stuff about the macros in the C API documentation are not relevant here, just keep it simple and say that there is limited function for some of the PyDatetimeCAPI functions in earlier versions of PyPy, and specifically DateTime_FromTimestamp and Date_FromTimestamp are not currently supported (we can specify the version that they are supported for later, when it comes out).

I'm torn about the PyDate_FromTimestamp and PyDateTime_FromTimestamp methods. I think don't mention them in this note. Ideally they would go away entirely or be made private (if either of those things is possible while still allowing access to them for the safe Rust api).

Copy link
Contributor Author

@omerbenamram omerbenamram Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point regarding what users would potentially care about.
The original comment was more of a note for/code comment to help reason about the rationale of this behavior.

I've removed most of it and mentioned only DateTime_FromTimestamp and Date_FromTimestamp.

ptr::null_mut(),
);
#[cfg(PyPy)]
let ptr = PyDateTime_FromTimestamp(args.as_ptr());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that this may lead to a panic or segfault.

I believe the current PyPy implementation of PyDateTime_FromTimestamp requires that you call PyDatetime_Import() before calling this function. All the other functions don't have this requirement because it's done once at global scope in the implementation of PyDatetimeAPI, so dereferencing that object at all guarantees that Import() has been called.

I think maybe the easiest short-term solution for this is to create a wrapper function around PyDateTime_FromTimestamp in the ffi::datetime module that just calls import (or dereferences PyDateTimeAPI) and then calls the function from PyPy.

Obviously this is under-tested as well, but I think because it involves global state it might be a bigger undertaking to add proper testing to all these functions.

I'm having trouble building the project again (cannot find -lpypy3-c), so I'm having trouble testing whether or not this causes problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble building the project again (cannot find -lpypy3-c), so I'm having trouble testing whether or not this causes problems.

Have you tried building in a fresh pypy venv? It sometimes seems that a venv works better than PYTHON_SYS_EXECUTABLE.

Copy link
Member

@pganssle pganssle Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konstin I think it was an issue with conflicts in my pyenv setup. I do a lot of library development so my pyenv global has a bunch of different things in it. When I set pyenv shell to just the pypy version I want it will build, then I just have to adjust LD_LIBRARY_PATH to get it to run.

In any case, here's an MWE to reproduce the issue, I can confirm it produces a segfault on pypy3.5:

extern crate pyo3;

use pyo3::prelude::*;
use pyo3::types::PyDateTime;

fn main() {
    let gil = Python::acquire_gil();
    let py = gil.python();
    let x = PyDateTime::from_timestamp(py, 123456.0, None);

}

It does not segfault on Python 3.7.2.

It will only segfault if from_timestamp is the first datetime-related thing you call, hence the problem.

Edit: Oops, this was segfaulting because I forgot that you can't call PyPy from within Rust. Working on a new MWE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my mistake, I can't reproduce the segfault using extension modules, I guess these particular symbols don't rely on the _PyDateTime_Import being called.

Copy link
Contributor Author

@omerbenamram omerbenamram Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems PyPy will import datetime internally when this is called.

The relevant code being:

@cpython_api([PyObject], PyObject)
def PyDateTime_FromTimestamp(space, w_args):
    """Create and return a new datetime.datetime object given an argument tuple
    suitable for passing to datetime.datetime.fromtimestamp().
    """
    w_datetime = PyImport_Import(space, space.newtext("datetime"))
    w_type = space.getattr(w_datetime, space.newtext("datetime"))
    w_method = space.getattr(w_type, space.newtext("fromtimestamp"))
    return space.call(w_method, w_args)

Where with the _* symbols it will not, for example:

@cpython_api([rffi.INT_real, rffi.INT_real, rffi.INT_real,
              rffi.INT_real, rffi.INT_real, rffi.INT_real, rffi.INT_real,
              PyObject, PyTypeObjectPtr], PyObject)
def _PyDateTime_FromDateAndTime(space, year, month, day,
                                hour, minute, second, usecond,
                                w_tzinfo, w_type):
    """Return a datetime.datetime object with the specified year, month, day, hour,
    minute, second and microsecond.
    """
    year = rffi.cast(lltype.Signed, year)
    month = rffi.cast(lltype.Signed, month)
    day = rffi.cast(lltype.Signed, day)
    hour = rffi.cast(lltype.Signed, hour)
    minute = rffi.cast(lltype.Signed, minute)
    second = rffi.cast(lltype.Signed, second)
    usecond = rffi.cast(lltype.Signed, usecond)
    return space.call_function(
        w_type,
        space.newint(year), space.newint(month), space.newint(day),
        space.newint(hour), space.newint(minute), space.newint(second),
        space.newint(usecond), w_tzinfo)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think PyDatetime_Import is needed to initialize the CAPI object, but not when you use the macro directly (but only for PyPy), because of the way it's implemented.

I remember I had some segfaults when I didn't do the import when I was implementing the feature for pypy, which is why I was worried, but I realize now that was because I was dereferencing PyDatetimeCAPI, not because I was calling the fromtimestamp function.

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Looks good to me!

@konstin
Copy link
Member

konstin commented Apr 23, 2019

Big thanks to everyone involved!

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants