Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update py03 from 0.20 to 0.21 #5566

Merged
merged 10 commits into from
Apr 5, 2024
Merged

Conversation

Jefffrey
Copy link
Contributor

@Jefffrey Jefffrey commented Mar 29, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Per guidance from pyo3 migration guide from 0.20 to 0.21:

https://pyo3.rs/v0.21.0/migration.html

Change usages of &PyAny to Bound<PyAny> and adjust the public trait FromPyArrow accordingly with a new function that accepts a bound.

Are there any user-facing changes?

Yes

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 29, 2024
let pyarrow = PyModule::import(value.py(), "pyarrow")?;
let class = pyarrow.getattr(expected)?;
let pyarrow = PyModule::import_bound(value.py(), "pyarrow")?;
let class = pyarrow.getattr(expected)?.into_gil_ref(); // TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with pyo3, so I was following the migration guide: https://pyo3.rs/v0.21.0/migration.html

Of particular note is:

PyO3 0.21 introduces a new Bound<'py, T> smart pointer which replaces the existing "GIL Refs" API to interact with Python objects. For example, in PyO3 0.20 the reference &'py PyAny would be used to interact with Python objects. In PyO3 0.21 the updated type is Bound<'py, PyAny>.

So it seems they are moving away from &PyAny is my understanding. This raises a question of if the PyArrow API's should follow suit?

e.g. here, introduce a new function that uses Bound?

fn from_pyarrow(value: &PyAny) -> PyResult<Self> {

For now I've used into_gil_ref() to get around this deprecation process, but I don't think it's meant to be a permanent solution.

I'll do some more reading, maybe it's been discussed in pyo3 before somewhere 🤔

@Jefffrey Jefffrey marked this pull request as ready for review March 29, 2024 03:27
@tustvold
Copy link
Contributor

Perhaps @wjones127 or @kylebarron might be able to advise on this

@kylebarron
Copy link
Contributor

I haven't tried to upgrade my projects to 0.21 yet, but yes

So it seems they are moving away from &PyAny is my understanding.

that's my understanding as well. I'd say that the PyArrow API should probably follow suit. Maybe while upgrading to 0.21 is the best opportunity?

@Jefffrey Jefffrey marked this pull request as draft March 31, 2024 06:06
@abhiaagarwal
Copy link
Contributor

Thanks for this PR :)

Fyi, I pulled in this branch to update delta-rs (see delta-io/delta-rs#2363). When gil-refs is enabled it compiles successfully, but when gil-refs is disabled I get a traits bounds error.

error[E0277]: the trait bound `&str: PyClass` is not satisfied
    --> /Users/abhiagarwal/.cargo/git/checkouts/arrow-rs-d4122f2d122601f4/b1f5c49/arrow/src/pyarrow.rs:114:47
     |
114  |         let expected_module = expected_module.extract::<&str>()?;
     |                                               ^^^^^^^ the trait `PyClass` is not implemented for `&str`, which is required by `&str: FromPyObjectBound<'_, '_>`
     |
     = help: the following other types implement trait `FromPyObjectBound<'a, 'py>`:
               Cow<'a, [u8]>
               Cow<'a, str>
               &'a [u8]
     = note: required for `&str` to implement `pyo3::FromPyObject<'_>`
     = note: required for `&str` to implement `FromPyObjectBound<'_, '_>`
note: required by a bound in `pyo3::prelude::PyAnyMethods::extract`
    --> /Users/abhiagarwal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.0/src/types/any.rs:1649:12
     |
1647 |     fn extract<'a, T>(&'a self) -> PyResult<T>
     |        ------- required by a bound in this associated function
1648 |     where
1649 |         T: FromPyObjectBound<'a, 'py>;
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `PyAnyMethods::extract`

error[E0277]: the trait bound `&str: PyClass` is not satisfied
    --> /Users/abhiagarwal/.cargo/git/checkouts/arrow-rs-d4122f2d122601f4/b1f5c49/arrow/src/pyarrow.rs:116:43
     |
116  |         let expected_name = expected_name.extract::<&str>()?;
     |                                           ^^^^^^^ the trait `PyClass` is not implemented for `&str`, which is required by `&str: FromPyObjectBound<'_, '_>`
     |
     = help: the following other types implement trait `FromPyObjectBound<'a, 'py>`:
               Cow<'a, [u8]>
               Cow<'a, str>
               &'a [u8]
     = note: required for `&str` to implement `pyo3::FromPyObject<'_>`
     = note: required for `&str` to implement `FromPyObjectBound<'_, '_>`
note: required by a bound in `pyo3::prelude::PyAnyMethods::extract`
    --> /Users/abhiagarwal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.0/src/types/any.rs:1649:12
     |
1647 |     fn extract<'a, T>(&'a self) -> PyResult<T>
     |        ------- required by a bound in this associated function
1648 |     where
1649 |         T: FromPyObjectBound<'a, 'py>;
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `PyAnyMethods::extract`

error[E0277]: the trait bound `&str: PyClass` is not satisfied
    --> /Users/abhiagarwal/.cargo/git/checkouts/arrow-rs-d4122f2d122601f4/b1f5c49/arrow/src/pyarrow.rs:119:41
     |
119  |         let found_module = found_module.extract::<&str>()?;
     |                                         ^^^^^^^ the trait `PyClass` is not implemented for `&str`, which is required by `&str: FromPyObjectBound<'_, '_>`
     |
     = help: the following other types implement trait `FromPyObjectBound<'a, 'py>`:
               Cow<'a, [u8]>
               Cow<'a, str>
               &'a [u8]
     = note: required for `&str` to implement `pyo3::FromPyObject<'_>`
     = note: required for `&str` to implement `FromPyObjectBound<'_, '_>`
note: required by a bound in `pyo3::prelude::PyAnyMethods::extract`
    --> /Users/abhiagarwal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.0/src/types/any.rs:1649:12
     |
1647 |     fn extract<'a, T>(&'a self) -> PyResult<T>
     |        ------- required by a bound in this associated function
1648 |     where
1649 |         T: FromPyObjectBound<'a, 'py>;
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `PyAnyMethods::extract`

error[E0277]: the trait bound `&str: PyClass` is not satisfied
    --> /Users/abhiagarwal/.cargo/git/checkouts/arrow-rs-d4122f2d122601f4/b1f5c49/arrow/src/pyarrow.rs:121:37
     |
121  |         let found_name = found_name.extract::<&str>()?;
     |                                     ^^^^^^^ the trait `PyClass` is not implemented for `&str`, which is required by `&str: FromPyObjectBound<'_, '_>`
     |
     = help: the following other types implement trait `FromPyObjectBound<'a, 'py>`:
               Cow<'a, [u8]>
               Cow<'a, str>
               &'a [u8]
     = note: required for `&str` to implement `pyo3::FromPyObject<'_>`
     = note: required for `&str` to implement `FromPyObjectBound<'_, '_>`
note: required by a bound in `pyo3::prelude::PyAnyMethods::extract`
    --> /Users/abhiagarwal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.0/src/types/any.rs:1649:12
     |
1647 |     fn extract<'a, T>(&'a self) -> PyResult<T>
     |        ------- required by a bound in this associated function
1648 |     where
1649 |         T: FromPyObjectBound<'a, 'py>;
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `PyAnyMethods::extract`

The FromPyObject trait gets disabled without gil-refs, leading to the error. (source) It should be able to be fixed by changing all the extract to extract_bound per the migration guide.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @Jefffrey -- this code looks good to me, though it might be nice to get someone else more versed with python to give it another check (maybe @westonpace or @wjones127 ?)

As for the report from @abhiaagarwal (thanks for that ❤️ )

The FromPyObject trait gets disabled without gil-refs, leading to the error. (source) It should be able to be fixed by changing all the extract to extract_bound per the migration guide.

I think we could potentially fix it in a follow on PR as well (or we could fix it here too)

Ideally, we should also add a CI check/test for running without gil-refs as well

@abhiaagarwal
Copy link
Contributor

This PR doesn't enable gil-refs, so I'm not sure why the compile error wasn't caught in CI. Maybe there isn't any coverage for the pyarrow integration in the main arrow-rs package?

@Jefffrey
Copy link
Contributor Author

Jefffrey commented Apr 1, 2024

Thanks for this PR :)

Fyi, I pulled in this branch to update delta-rs (see delta-io/delta-rs#2363). When gil-refs is enabled it compiles successfully, but when gil-refs is disabled I get a traits bounds error.

error[E0277]: the trait bound `&str: PyClass` is not satisfied
    --> /Users/abhiagarwal/.cargo/git/checkouts/arrow-rs-d4122f2d122601f4/b1f5c49/arrow/src/pyarrow.rs:114:47
     |
114  |         let expected_module = expected_module.extract::<&str>()?;
     |                                               ^^^^^^^ the trait `PyClass` is not implemented for `&str`, which is required by `&str: FromPyObjectBound<'_, '_>`
     |
     = help: the following other types implement trait `FromPyObjectBound<'a, 'py>`:
               Cow<'a, [u8]>
               Cow<'a, str>
               &'a [u8]
     = note: required for `&str` to implement `pyo3::FromPyObject<'_>`
     = note: required for `&str` to implement `FromPyObjectBound<'_, '_>`
note: required by a bound in `pyo3::prelude::PyAnyMethods::extract`
    --> /Users/abhiagarwal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.0/src/types/any.rs:1649:12
     |
1647 |     fn extract<'a, T>(&'a self) -> PyResult<T>
     |        ------- required by a bound in this associated function
1648 |     where
1649 |         T: FromPyObjectBound<'a, 'py>;
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `PyAnyMethods::extract`

error[E0277]: the trait bound `&str: PyClass` is not satisfied
    --> /Users/abhiagarwal/.cargo/git/checkouts/arrow-rs-d4122f2d122601f4/b1f5c49/arrow/src/pyarrow.rs:116:43
     |
116  |         let expected_name = expected_name.extract::<&str>()?;
     |                                           ^^^^^^^ the trait `PyClass` is not implemented for `&str`, which is required by `&str: FromPyObjectBound<'_, '_>`
     |
     = help: the following other types implement trait `FromPyObjectBound<'a, 'py>`:
               Cow<'a, [u8]>
               Cow<'a, str>
               &'a [u8]
     = note: required for `&str` to implement `pyo3::FromPyObject<'_>`
     = note: required for `&str` to implement `FromPyObjectBound<'_, '_>`
note: required by a bound in `pyo3::prelude::PyAnyMethods::extract`
    --> /Users/abhiagarwal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.0/src/types/any.rs:1649:12
     |
1647 |     fn extract<'a, T>(&'a self) -> PyResult<T>
     |        ------- required by a bound in this associated function
1648 |     where
1649 |         T: FromPyObjectBound<'a, 'py>;
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `PyAnyMethods::extract`

error[E0277]: the trait bound `&str: PyClass` is not satisfied
    --> /Users/abhiagarwal/.cargo/git/checkouts/arrow-rs-d4122f2d122601f4/b1f5c49/arrow/src/pyarrow.rs:119:41
     |
119  |         let found_module = found_module.extract::<&str>()?;
     |                                         ^^^^^^^ the trait `PyClass` is not implemented for `&str`, which is required by `&str: FromPyObjectBound<'_, '_>`
     |
     = help: the following other types implement trait `FromPyObjectBound<'a, 'py>`:
               Cow<'a, [u8]>
               Cow<'a, str>
               &'a [u8]
     = note: required for `&str` to implement `pyo3::FromPyObject<'_>`
     = note: required for `&str` to implement `FromPyObjectBound<'_, '_>`
note: required by a bound in `pyo3::prelude::PyAnyMethods::extract`
    --> /Users/abhiagarwal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.0/src/types/any.rs:1649:12
     |
1647 |     fn extract<'a, T>(&'a self) -> PyResult<T>
     |        ------- required by a bound in this associated function
1648 |     where
1649 |         T: FromPyObjectBound<'a, 'py>;
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `PyAnyMethods::extract`

error[E0277]: the trait bound `&str: PyClass` is not satisfied
    --> /Users/abhiagarwal/.cargo/git/checkouts/arrow-rs-d4122f2d122601f4/b1f5c49/arrow/src/pyarrow.rs:121:37
     |
121  |         let found_name = found_name.extract::<&str>()?;
     |                                     ^^^^^^^ the trait `PyClass` is not implemented for `&str`, which is required by `&str: FromPyObjectBound<'_, '_>`
     |
     = help: the following other types implement trait `FromPyObjectBound<'a, 'py>`:
               Cow<'a, [u8]>
               Cow<'a, str>
               &'a [u8]
     = note: required for `&str` to implement `pyo3::FromPyObject<'_>`
     = note: required for `&str` to implement `FromPyObjectBound<'_, '_>`
note: required by a bound in `pyo3::prelude::PyAnyMethods::extract`
    --> /Users/abhiagarwal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.21.0/src/types/any.rs:1649:12
     |
1647 |     fn extract<'a, T>(&'a self) -> PyResult<T>
     |        ------- required by a bound in this associated function
1648 |     where
1649 |         T: FromPyObjectBound<'a, 'py>;
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `PyAnyMethods::extract`

The FromPyObject trait gets disabled without gil-refs, leading to the error. (source) It should be able to be fixed by changing all the extract to extract_bound per the migration guide.

Hmm, I'm a bit confused by this, as we don't have the gil-refs feature enabled and I don't think it's a default feature enabled by pyo3 (even if it is, we disable the default-features). And the affected code seems to fall in line with what the migration guide suggests: https://pyo3.rs/v0.21.0/migration.html#deactivating-the-gil-refs-feature

Am I missing something?

Comment on lines 112 to 121
if !value.is_instance(&class)? {
let expected_module = class.getattr("__module__")?;
let expected_module = expected_module.extract::<&str>()?;
let expected_name = class.getattr("__name__")?;
let expected_name = expected_name.extract::<&str>()?;
let found_class = value.get_type();
let found_module = found_class.getattr("__module__")?.extract::<&str>()?;
let found_name = found_class.getattr("__name__")?.extract::<&str>()?;
let found_module = found_class.getattr("__module__")?;
let found_module = found_module.extract::<&str>()?;
let found_name = found_class.getattr("__name__")?;
let found_name = found_name.extract::<&str>()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jefffrey these extracts should be changed to extract_bound. extract only exists with gil-refs enabled. I have no clue why the extract version is passing CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, a bit of digging, and a public extract actually does still exist, but it's been redirected to extract_bound. https://github.com/PyO3/pyo3/blob/c1f11fb4bddc836f820e0db3db514b4742b8a3e7/src/types/any.rs#L805

I have no clue what's causing my build to freak out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also breaks for me when gil-refs is not enabled. From this issue it appears that &str lost FromPyObject:

Finally, without the gil-refs feature we remove the existing implementation of FromPyObject for &str and instead implement this new trait. This works except for abi3 on old Python versions, where we have to settle for PyFunctionArgument only and allow users of .extract() to break.

This does implement some very targeted breakage for specific considerations, but I wonder if this is necessary as the correct way to users off the GIL Ref API in a case where otherwise it's near impossible for them to realise this uses it.

The reason it is not caught by CI is because CI does not have any abi... feature enabled.

@Jefffrey
Copy link
Contributor Author

Jefffrey commented Apr 1, 2024

I think maybe wait for @abhiaagarwal to resolve their issue first to prove downstream consumers shouldn't be broken by this change? Just in case something was done wrong with this migration 🤔

@abhiaagarwal
Copy link
Contributor

worst case, delta-rs can keep gil-refs. if it ends up being an issue with arrow itself, I can try and fix the issue in a separate PR after merge. the feature will exist for awhile!

@westonpace
Copy link
Member

I'll try and update pylance to pyo3 0.21 using this branch tomorrow.

@kylebarron kylebarron mentioned this pull request Apr 2, 2024
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can wait until a future PR but I think this will need to get fixed upstream at some point. I'm not sure there is any workaround we do other than use gil-refs. Changing the pyo3 dependency (in the integration test) like this:

pyo3 = { version = "0.21", features = ["extension-module", "abi3-py38"] }

will allow the issue to be reproduced.

Comment on lines 112 to 121
if !value.is_instance(&class)? {
let expected_module = class.getattr("__module__")?;
let expected_module = expected_module.extract::<&str>()?;
let expected_name = class.getattr("__name__")?;
let expected_name = expected_name.extract::<&str>()?;
let found_class = value.get_type();
let found_module = found_class.getattr("__module__")?.extract::<&str>()?;
let found_name = found_class.getattr("__name__")?.extract::<&str>()?;
let found_module = found_class.getattr("__module__")?;
let found_module = found_module.extract::<&str>()?;
let found_name = found_class.getattr("__name__")?;
let found_name = found_name.extract::<&str>()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also breaks for me when gil-refs is not enabled. From this issue it appears that &str lost FromPyObject:

Finally, without the gil-refs feature we remove the existing implementation of FromPyObject for &str and instead implement this new trait. This works except for abi3 on old Python versions, where we have to settle for PyFunctionArgument only and allow users of .extract() to break.

This does implement some very targeted breakage for specific considerations, but I wonder if this is necessary as the correct way to users off the GIL Ref API in a case where otherwise it's near impossible for them to realise this uses it.

The reason it is not caught by CI is because CI does not have any abi... feature enabled.

@Jefffrey
Copy link
Contributor Author

Jefffrey commented Apr 2, 2024

We can wait until a future PR but I think this will need to get fixed upstream at some point. I'm not sure there is any workaround we do other than use gil-refs. Changing the pyo3 dependency (in the integration test) like this:

pyo3 = { version = "0.21", features = ["extension-module", "abi3-py38"] }

will allow the issue to be reproduced.

Thanks for checking this 👍

I'll add the gil-refs feature as part of this PR for now to mitigate breaking downstream consumers

Edit: actually, I changed it to extract a PyBackedStr instead of just &str (which was also suggested in the migration guide: https://pyo3.rs/main/migration#deactivating-the-gil-refs-feature)

This seems to work the same, and I test with enabling abi3-py38 in arrow-pyarrow-integration-testing and looks like things work fine now, without needing gil-refs enabled.

(Need 0.21.1 as std::fmt::Display was only added for PyBackedStr in that version, 0.21.0 won't work)

@abhiaagarwal
Copy link
Contributor

Thanks @Jefffrey! I'll try a build by the end of the day with the new commit here and see if we can build.

@Jefffrey Jefffrey merged commit 1b0ef02 into apache:master Apr 5, 2024
23 checks passed
@Jefffrey Jefffrey deleted the update-py03-0.21 branch April 5, 2024 09:13
@Jefffrey
Copy link
Contributor Author

Jefffrey commented Apr 5, 2024

Feel free to raise a new issue if you still encounter problems with this @abhiaagarwal @westonpace 👍

@abhiaagarwal
Copy link
Contributor

will do, apologies, I didn't get a chance to run a new build :) I'll hopefully get to it soon. Though I'll wait on DataFusion to update its dependency as well.

@Jefffrey
Copy link
Contributor Author

Jefffrey commented Apr 6, 2024

will do, apologies, I didn't get a chance to run a new build :) I'll hopefully get to it soon. Though I'll wait on DataFusion to update its dependency as well.

Just a note is you might be waiting a while as DataFusion will need to wait for arrow-rs to release a new version before it can bump it's own, I think?

Feel free to chip in on discussion of release cadence for arrow-rs here #5368

@tustvold
Copy link
Contributor

Can someone confirm that this isn't a breaking API change to the arrow APIs?

@Jefffrey
Copy link
Contributor Author

Can someone confirm that this isn't a breaking API change to the arrow APIs?

I think it may be classified as a breaking change in that it'll require consumers to upgrade their pyo3 version, but in terms of API itself we preserve the old API (and just deprecate it)

Michael-J-Ward added a commit to Michael-J-Ward/dora that referenced this pull request Apr 16, 2024
Arrow needs to be upgraded alongside pyo3 because they both link to python.

Arrow's upgrade to pyo3 is in master but not yet released as of v51.
apache/arrow-rs#5566

NOTE: Arrow marked `ffi::from_ffi` unsafe.
apache/arrow-rs#5080
Michael-J-Ward added a commit to Michael-J-Ward/dora that referenced this pull request Apr 18, 2024
Arrow needs to be upgraded alongside pyo3 because they both link to python.

Arrow's upgrade to pyo3 is in master but not yet released as of v51.
apache/arrow-rs#5566

NOTE: Arrow marked `ffi::from_ffi` unsafe.
apache/arrow-rs#5080
@tustvold tustvold added the api-change Changes to the arrow API label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants