Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pyo3 to 0.22.0 (without "py-clone") #435

Merged
merged 31 commits into from
Oct 6, 2024

Conversation

JRRudy1
Copy link
Contributor

@JRRudy1 JRRudy1 commented Jul 14, 2024

This PR builds on #431 to migrate to PyO3 0.22.0 without needing to activate the problematic py-clone feature.

Background

PyO3 recently released version 0.22.0 which includes many new features, but also includes the next step in deprecating the gil-refs API by putting it behind the opt-in gil-refs feature flag. @bschoenmaeckers's PR went long way towards updating numpy for the migration by mirroring the gil-refs deprecation and feature-gating the methods that depend on it, but @juntyr raised valid concerns about its unconditional activation of PyO3's py-clone feature flag.

The problem with py-clone

The PyO3 0.22.0 release includes PyO3#4095's removal of the ability to clone Py<T> while the GIL is not held, which was implemented by removing its Clone impl and requiring users to either convert to Bound<T> before cloning or call the clone_ref method instead. The opt-in py-clone feature flag was added that restores the Clone implementation, but a panic is triggered if a clone is attempted while the GIL is not held. Enabling this feature may be convenient and fairly harmless in some use cases but can be hugely problematic in others, and we should not unconditionally impose the risk of panicking clones on all numpy users.

Why not just add our own py-clone feature?

This was suggested in #431, but has a major problem related to the Element impl for PyObject (aka Py<PyAny>). The problem is that Clone is a supertrait of Element, so PyObject losing its Clone impl when the py-clone feature is disabled means its Element impl must be disabled as well. So if we made the feature opt-in as was done by PyO3, then by default we would lose all support for object arrays which would be a huge regression. Making the feature opt-out instead is also a bad option, so we need another way.

Could we implement Element for Bound<PyAny> instead?

No. Unfortunately Bound<T> does not (and can not) implement Send, which is another requirement for implementing Element. So retaining the Element impl for PyObject is our only option for supporting object arrays without major reworks.

Eliminating our dependence on py-clone

The sad thing about this issue is that it really shouldn't be a problem in the context of numpy -- all of our API's require the GIL anyways, so it should always be safe to clone the PyObjects in our arrays. But that that doesn't help the fact that we still need a generic way to clone elements (and to_owned slices and array views), and we were relying on the Element trait's Clone requirement to do so. This PR solves the dilemma by replacing the Clone supertrait with a new trait PyClone, which can be safely implemented for PyObject and all existing Element impls.

The new PyClone trait

The new PyClone trait is a weaker form of the standard Clone trait that represents types which can be cloned while the GIL is held. Philosophically (though not in practice), this includes all Clone types that don't care about the GIL, as well as types like Py<T> that can only be cloned when it is held.

The trait has the following (simplified) definition:

pub trait PyClone: Sized {
    /// Create a clone of the value while the GIL is guaranteed to be held.
    fn py_clone(&self, py: Python<'_>) -> Self;
}

Any type that implements Clone can trivially implement PyClone by ignoring the py argument and simply delegating to its clone method. Meanwhile, PyObject can implement this trait by using the provided Python token to call Py::clone_ref. The actual trait also defines a couple of methods (not shown above) for cloning collections, which have default implementations that types can override if there is a more efficient way to do so than simply mapping the py_clone method to each item.

Adding a trait like this to PyO3 was discussed in PyO3#4133, but it is not clear if/when this would be implemented. If such a trait ends up getting added to PyO3 we can later transition to using it, but I don't think we should hold up numpy's migration to 0.22.0 waiting for that to happen.

Drawbacks

Ideally we would provide a blanket impl of PyClone for all T: Clone and then a specific impl for PyObject, which would let us avoid the breaking change for anyone with custom Element impls that will now need to add a PyClone impl as well. Unfortunately, this is not allowed due to possible impl overlaps since the compiler asserts that PyO3 restoring the Clone impl for Py<T> should not be a breaking change. If PyClone trait was instead defined in the pyo3 crate then they could provide these impls, but it is unlikely that they would (as discussed in PyO3#4133) since it would prevent them from defining necessary generic impls for types like Option<T: PyClone>.

So, this will inevitably need to be a breaking change for downstream crates defining custom Element impls for their types. Luckily it extremely rare for crates to do this, and adding the PyClone impl is absolutely trivial in comparison to all the other changes needed to migrate away from the gil-refs API in the PyO3 0.21/0.22 updates anyways.

Patching numpy internals

As mentioned before, all of the numpy API's already require access to a Python token to prove that the GIL is held, so most of the necessary updates simply require replacing item.clone() calls with item.py_clone(py). The only exceptions are the calls to <[T] as ToOwned>::to_owned and ArrayView::to_owned (both of which require T: Clone), which now delegate to PyClone's provided methods discussed above. Delegating to these collection-specific methods is used instead of simply mapping the py_clone method to avoid losing out on the optimized to_owned implementations for Clone types.

Tasks

  • Deprecate the gil-refs API
    • Add gil-refs feature guarding all usages of PyO3's gil-refs API (completed by @bschoenmaeckers)
    • Update doctests and examples to use the bound API instead of gil-refs
  • Eliminate py-clone feature dependency
    • Add PyClone trait to replace Clone as a supertrait of Element
    • Add PyClone impls for all internal Element types
    • Update numpy internals to use PyClone instead of Clone
  • Documentation
    • Document the PyClone trait
    • Update the Element docs to reference the newly required supertrait
    • Create changelog and/or migration guide
  • Add tests -- the existing tests cover the changes pretty well, but there may be PyClone-specific tests worth adding.

@EricLBuehler
Copy link

@JRRudy1 is there still active work on this PR?

@JRRudy1
Copy link
Contributor Author

JRRudy1 commented Jul 20, 2024

@EricLBuehler all it really needs is a review and some minor updates to the docs, but I was hoping for some feedback from the devs before I bothered putting on the finishing touches in case they disagree with the approach and reject it outright. I'm not an official maintainer for the project so I can't really do much to move towards a release, so hopefully someone will get to it before too long...

@EricLBuehler
Copy link

Sounds good @JRRudy1! We're looking to use 0.22 in HF Tokenizers, so this would be super helpful.

@davidhewitt would you possibly be able to review this?

@davidhewitt
Copy link
Member

I am not the usual rust-numpy maintainer at the moment; @adamreichold has been leading that, however I believe they have been busy recently.

So have I (new baby), so there's a few things blocked on me which I need to unstack first, however if we don't hear from @adamreichold within maybe a week then I will seek to get to reviewing this. 👍

@juntyr
Copy link

juntyr commented Jul 20, 2024

So have I (new baby), so there's a few things blocked on me

Congratulations on the kid! Please take your time off, having a newborn is such an intense time where you should not add work stress on top. Take care of yourself and your family <3

@EricLBuehler
Copy link

So have I (new baby), so there's a few things blocked on me which I need to unstack first, however if we don't hear from @adamreichold within maybe a week then I will seek to get to reviewing this. 👍

Congratulations!

@adamreichold
Copy link
Member

Sorry for being off the radar for quite some time. I am trying to get back into the rotation now starting with rust-numpy and more specifically starting with the work to support NumPy 2 and then looking into the the two PR to bump pyo3 itself.

(If I understand things correctly, this one would supersede #431 and contains its changes? If so, could #431 be closed by its author @bschoenmaeckers if they agree?)

@bschoenmaeckers
Copy link
Contributor

Thanks for continuering on my work! You did a very nice job of removing the clone requirement, it is probably way better then I could do. I will close my MR and let’s continue on this MR.

If you may, please add Co-authored-by: Bas Schoenmaeckers <bas.schoenmaeckers@gmail.com> to the final merge commit. Thanks!

@JRRudy1
Copy link
Contributor Author

JRRudy1 commented Aug 1, 2024

Congratulations David! And no worries Adam, I haven't had any time for open-source work lately either. But I absolutely agree that @bschoenmaeckers deserves co-authorship on this :)

I just fixed the pyo3 version in the example crates that was causing the checks to fail, so they should hopefully pass now when @adamreichold gets a chance to approve another CI run.

@JRRudy1
Copy link
Contributor Author

JRRudy1 commented Aug 1, 2024

I fixed the first check, but I don't quite understand the second... does anyone have a clue why cargo would be failing to find a compatible syn version for pyo3-macros-backend?

@davidhewitt
Copy link
Member

PyO3 updated MSRV to 1.63, so try updating CI to match here.

@JRRudy1
Copy link
Contributor Author

JRRudy1 commented Aug 2, 2024

PyO3 updated MSRV to 1.63, so try updating CI to match here.

Thanks for the tip! I bumped the MSRV in Cargo.toml and don't see it referenced anywhere else (including the CI configs), but let me know if you are aware of anything else.

Edit: I finally found a rust version spec in ci.yml and updated it to the MSRV, hopefully it should work now...

@JRRudy1
Copy link
Contributor Author

JRRudy1 commented Aug 2, 2024

Also @bschoenmaeckers I gave you write access to my fork in case you want to contribute any more changes (but no pressure to do so)

@juntyr
Copy link

juntyr commented Aug 7, 2024

ndarray just released v0.16 - would it be easy to bump it in this PR to get the changes into v22 as well, or would it need a separate PR (if greater code changes are required)?

@djc
Copy link

djc commented Aug 20, 2024

Would be nice to move this forward (ideally including the ndarray bump).

@sheldonrong
Copy link

Hi @adamreichold @davidhewitt anything left to push this forward? thanks!

@davidhewitt
Copy link
Member

I think both myself and @adamreichold have been struggling for availability recently, sorry to all who have been waiting. I have some conflicting priority work to achieve in PyO3 (patch fix 0.22.3, upcoming Python 3.13 freethreaded support).

I am acutely aware that both rust-numpy and pyo3-asyncio need 0.22 releases, which is going to be important to unblock the ecosystem.

Here it looks like the strategy from @adamreichold seems to be to merge #429 first, and then circle back to this PR. It seems that the contributor to #429 hasn't replied to the there. So if someone else is willing to help update #429, by opening a PR which supersedes with an updated form as per @adamreichold's review comments from July 21st, I think that will be the next best step to move this forward.

I can potentially do this myself, although it may take me up to a few weeks depending on how much time I can find.

@maffoo
Copy link
Contributor

maffoo commented Aug 31, 2024

I think both myself and @adamreichold have been struggling for availability recently, sorry to all who have been waiting. I have some conflicting priority work to achieve in PyO3 (patch fix 0.22.3, upcoming Python 3.13 freethreaded support).

I am acutely aware that both rust-numpy and pyo3-asyncio need 0.22 releases, which is going to be important to unblock the ecosystem.

Thanks for the update. I'm sure I speak for others when I say we really appreciate your work on pyo3 and related libraries!

Here it looks like the strategy from @adamreichold seems to be to merge #429 first, and then circle back to this PR. It seems that the contributor to #429 hasn't replied to the there. So if someone else is willing to help update #429, by opening a PR which supersedes with an updated form as per @adamreichold's review comments from July 21st, I think that will be the next best step to move this forward.

I'll take a stab at this.

@piotryordanov
Copy link

Hey guys, thx for the hard work. Any update on this?

@maffoo
Copy link
Contributor

maffoo commented Sep 30, 2024

Hey guys, thx for the hard work. Any update on this?

We're trying to get support for numpy2 merged first (#442), then we'll come back to this. We're having an issue with numpy2 support for 32-bit python builds on windows, once that is resolved it should be ready to go.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #442 merged, we are now ready to come back here:

  • let's rebase this,
  • add a CHANGELOG entry,
  • and I also have added a comment below to suggest inlining the proposed PyClone trait directly into Element (as it seems to me that it's really the only intended use with those specialized methods anyway)

src/dtype.rs Outdated Show resolved Hide resolved
@JRRudy1
Copy link
Contributor Author

JRRudy1 commented Oct 4, 2024

Hey @davidhewitt, thanks for taking a look! I can take care of this tonight/tomorrow depending on how smoothly the rebase goes.

One remaining question I have involves bumping the ndarray dependency to v0.16 as requested by @juntyr, which will likely require minimal, if any, code changes. Do you think that should happen in a separate PR, or could it be included in this one? I'd be inclined to include it and avoid the hoops of another PR, but maybe you would prefer to keep things more atomic.

JRRudy1 and others added 18 commits October 5, 2024 13:34
This is the motivating impl for the changes, since `PyObject` can only be safely cloned while the GIL is held.
This is necessary in order to continue supporting object arrays after the update to PyO3 0.22.0 since `Py<T>` no longer implements `Clone` unless the 'py-clone' feature is enabled (which would be problematic for `numpy` to enable).

This is a breaking change for 2 main reasons:
1. Custom `Element` impls will now need to provide a `PyClone` impl as well, since providing a blanket impl for all `T: Clone` is not possible.
2. A type parameter `T: Element` will no longer have an implicit `Clone` bound, so code relying on it will need to update the bounds to `T: Element + Clone`.
… token and call the `py_clone` method instead of `Clone`.

Updated usages accordingly, which is trivial since the GIL is already held everywhere it is called.
…ad of `Clone` methods.

This change is trivial since the GIL is already held everywhere that the clones occur, with the minor technicality that the GIL is not guaranteed to be held in the definition of the `PyArrayMethods` trait, but only in the actual impl block for `Bound<PyArray`. As such, several method implementations had to be moved to the impl block instead of being default implementations in the trait definition, but this is not a breaking change since the trait is `Sealed` and we control the only impl.
…ttribute.

This is necessary since the `pyo3::pyobject_native_type_base` macro only implements `Debug` with the "gil-refs" feature since the 0.22.0 update. The `Debug` impls are no longer relevant anyway because `&PyArray`, `&PyArrayDescr`, etc. can only be obtained through the gil-refs API's.
Note that the `numpy` version will still need to be bumped in the README (along with Cargo.toml) before released.
…mplementations/usages.

The updates including renaming the `py_clone` method to `clone_ref` to match that of `Py::clone_ref`, and replacing the `impl_py_clone!` macro with `clone_methods_impl!`.
@JRRudy1
Copy link
Contributor Author

JRRudy1 commented Oct 5, 2024

I rebased the commits and made the requested changes. It all checks out locally but I'm sure CI will catch something; can you please approve the workflow when you get a chance?

Awesome, thanks! We also just got the ndarray support done in #439 so a rebase hopefully solves that too!

Oh perfect, I didn't see that PR!

@davidhewitt
Copy link
Member

Looks great, thanks! I also pushed a quick commit to revert some removals of deprecation messages (they will help nudge users off the GIL Refs). I thought that was more efficient than requesting the change and I had just a drop of time.

Let's see how CI goes now...

@juntyr
Copy link

juntyr commented Oct 6, 2024

Thank you all for your work on this PR <3

@davidhewitt
Copy link
Member

CI all green! Coverage could be better but that's also taking a hit because of all the GIL Refs removal duplication. We can worry about that on the way to 0.23.

I'll merge now and try to find time to prep and release in the next couple days.

Thanks everyone who has helped here and with all the other upgrades for this 0.22 release!

@davidhewitt davidhewitt merged commit 505a79c into PyO3:main Oct 6, 2024
46 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.