Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement __array__ for sequence types #615

Merged
merged 15 commits into from Aug 9, 2022

Conversation

IvanIsCoding
Copy link
Collaborator

@IvanIsCoding IvanIsCoding commented May 22, 2022

Fixes #614

Implements an explicit conversion from our custom types to Numpy arrays

This a quick fix to support converting using np.array and np.asarray using the API specified by NumPy. The solution is short and not too hard to maintain, the only burden will be remembering to add py_convert_to_py_array_obj_impl! for each new sequence return type we create.

Tasks:

  • Return 1d array of objects intead of raising not implemented error for some types
  • Handle dtype argument
  • Test things more thoroughly

@coveralls
Copy link

coveralls commented May 22, 2022

Pull Request Test Coverage Report for Build 2521772788

  • 30 of 31 (96.77%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 97.171%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/iterators.rs 30 31 96.77%
Totals Coverage Status
Change from base Build 2494694599: 0.02%
Covered Lines: 12674
Relevant Lines: 13043

💛 - Coveralls

@IvanIsCoding IvanIsCoding changed the title [DRAFT] Implement __array__ for sequence types Implement __array__ for sequence types May 22, 2022
Copy link
Collaborator

@georgios-ts georgios-ts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see this fixed in pyo3 since numpy can automatically handle nested sequence objects but in general the approach here looks good.

src/iterators.rs Outdated
($($t:ty)*) => ($(
impl PyConvertToPyArray for Vec<$t> {
fn convert_to_pyarray(&self, py: Python) -> PyResult<PyObject> {
Ok(self.clone().to_pyarray(py).into())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any downside of using IntoPyArray which doesn't allocate more memory? https://docs.rs/numpy/latest/numpy/array/struct.PyArray.html#memory-location

Suggested change
Ok(self.clone().to_pyarray(py).into())
Ok(self.clone().into_pyarray(py).into())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it depends on what we want to do here, if we want to return a copy of the inner data structure I would say lets drop the clone() and just call to_pyarray(). That will copy the data into a python/numpy allocated array so numpy will have full read/write on it. If we want the fastest return but at the cost of having some function restricted on the array we should drop the clone() and just do into_pyarray(py). I'm not sure what is more expected from numpy's __array__ I'm thinking probably the to_pyarray() path so it's an array copy of the data. But either way I don't think we need a clone() here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, talking this over with @jakelishman I think we should just do self.into_pyarray(py).into() here and return a numpy view directly into the buffer without copying. If the numpy function requires writing inplace or something they can use the copy() method explicitly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not drop clone() if we use IntoPyArray since it needs to take ownership of the vector but __array__ receives a ref &self (and pyo3 doesn't allow to receive self and for a good reason since you can not move data from Python).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the C API level of Numpy, you can have an array wrap a raw pointer, and tell Numpy that some other Python-managed object owns the lifetime of that array (so it won't attempt to free the memory). You can also set flags in Numpy to make an array non-writeable. I guess it likely would need unsafe Rust to achieve that from an &self borrow because you can't tell the Rust compiler about Python/Numpy's guarantees, but that behaviour (with appropriate flags set) is sort-of compatible with an &self borrow, if you use the base attribute to tie the lifetime of the array to the lifetime of the underlying Rust object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the underlying trait method into_pyarray (which is part of the IntoPyArray @georgios-ts linked to) is doing exactly that for us. It uses an unsafe block to get the pointer: https://github.com/PyO3/rust-numpy/blob/19bfc9d2d0e72bb3ed30c08244ee81abd02d7386/src/convert.rs#L67

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's missing the lifetime/read-only ties, though, since it takes self not &mut self/&self.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakelishman Yeah, this is indeed possible and rust-numpy provides the unsafe borrow_from_array (which results in a writeable array but you can't resize it) so we can avoid cloning the data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to using into_pyarray

@mtreinish mtreinish added this to the 0.12.0 milestone Aug 9, 2022
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, I agree with @georgios-ts having a fix in PyO3 would be the better path forward here. But having this in place now will avoid us from blocking the 0.12 release on this issue. We can revisit if we want this after the PYO3 issue is fixed and included in a release (I meant to try and tackle it but haven't had the time lately). The only thing I was on the fence about is having a release note to document we natively implement __array__ now on the custom return types. But I think I'm ok without it as I don't think most users will notice vs older releases where having the full sequence protocol implemented implicitly exposed an array form.

@mtreinish mtreinish added the automerge Queue a approved PR for merging label Aug 9, 2022
@coveralls
Copy link

Pull Request Test Coverage Report for Build 2826944969

  • 30 of 31 (96.77%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 97.102%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/iterators.rs 30 31 96.77%
Totals Coverage Status
Change from base Build 2806523038: 0.02%
Covered Lines: 12498
Relevant Lines: 12871

💛 - Coveralls

@mergify mergify bot merged commit c6067ba into Qiskit:main Aug 9, 2022
@IvanIsCoding IvanIsCoding deleted the numpy-array-fix branch August 9, 2022 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Queue a approved PR for merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to create numpy int array from an EdgeList
5 participants