Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grids 4 #1027

Merged
merged 19 commits into from
Feb 27, 2021
Merged

Grids 4 #1027

merged 19 commits into from
Feb 27, 2021

Conversation

pheuer
Copy link
Member

@pheuer pheuer commented Feb 21, 2021

A place for more minor changes to plasmapy.plasma.grids.

Changelog:

  • grid.__getitem__(key) now returns a u.Quantity array instead of an xarray.DataArray object. The units are more apparent this way (not buried in the attrs).

  • Added grid.require_quantities method to make sure that a list of required quantities are present on a grid. @StanczakDominik suggested this on Proton Radiography Module #895 to replace some functionality in the proton radiography __init__ function. This (very minor) refactor is also included in this PR.

  • Fixed bugs in grids.py for non-uniform grids that arose when xarray upgraded to v0.17.0.

The goal of the grids module is for users to not have to be familiar with xarray to use it. Returning a u.Quantity is more useful.
@pheuer pheuer changed the title grid.__getitem__(key) now returns u.Quantity instead of xarray.Dataset Grids 4 Feb 21, 2021
@codecov
Copy link

codecov bot commented Feb 21, 2021

Codecov Report

Merging #1027 (fc82ef4) into master (d4371b5) will increase coverage by 0.04%.
The diff coverage is 99.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1027      +/-   ##
==========================================
+ Coverage   96.82%   96.86%   +0.04%     
==========================================
  Files          69       70       +1     
  Lines        6737     6864     +127     
==========================================
+ Hits         6523     6649     +126     
- Misses        214      215       +1     
Impacted Files Coverage Δ
plasmapy/formulary/quantum.py 67.05% <ø> (ø)
plasmapy/particles/nuclear.py 100.00% <ø> (ø)
plasmapy/particles/parsing.py 100.00% <ø> (ø)
plasmapy/particles/particle_collections.py 99.02% <99.02%> (ø)
plasmapy/diagnostics/proton_radiography.py 99.65% <100.00%> (-0.01%) ⬇️
plasmapy/particles/__init__.py 100.00% <100.00%> (ø)
plasmapy/particles/particle_class.py 98.74% <100.00%> (+0.03%) ⬆️
plasmapy/plasma/grids.py 98.37% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4371b5...fc82ef4. Read the comment docs.

@pheuer pheuer added this to In progress in HEDP Diagnostics Feb 22, 2021
@StanczakDominik
Copy link
Member

Would you mind if we commandeered this PR to also fix the issue of the new xarray 0.17 breaking our stuff, as found out by #1045?

@pheuer
Copy link
Member Author

pheuer commented Feb 26, 2021

@StanczakDominik Not at all - go for it!

@StanczakDominik
Copy link
Member

StanczakDominik commented Feb 26, 2021

... actually sorry, I should have squash-merged #1045 to master first, then just merged master here T_T my bad! Let's ignore the added changes that show up here for now.

@pheuer
Copy link
Member Author

pheuer commented Feb 26, 2021

I've noticed an unintended consequence of changing grid.__getitem__ from

return self.ds[key]

to

return self.ds[key].data * self.ds[key].attrs["unit"]

Which is that the following code no longer changes the underlying array:

grid['B_x'][0,0,0]  = 1 * u.T

because grid['B_x'] no longer returns a reference to the underlying dataset. Overall this isn't a use case I foresee really wanting to encourage, and you can still accomplish the same goal by just overwriting the quantity with a new array eg.

Bx = grid['B_x']
Bx[0,0,0] = 1 * u.T
grid.add_quantities(B_x=Bx)

But does anyone have thoughts on this? I can try to explain it better in the documentation and/or notebook. Ultimately I think it's still worth it to have grid['B_x'] return a u.Quantity instead of an xarray.DataArray with the units buried.

Is there a way to use __setitem__ that takes both a key AND a slice, eg. for grid['B_x'][0,0,0] passes in both 'B_x' AND 'slice(0,0,0)`?

@pheuer
Copy link
Member Author

pheuer commented Feb 26, 2021

I think I've narrowed (at least one of) the xarray -> 0.17.0 problems down: for some reason adding a quantity to a non-uniform array now seems to delete a number of coordinates axes? This must have to do with some change they've made involving MultiIndex.

print(grid.ds)
grid.add_quantities(x=grid.grids[0])
print(grid.ds)

Results in the following printouts:

<xarray.Dataset>
Dimensions:  (ax: 1000)
Coordinates:
  * ax       (ax) MultiIndex
  - ax0      (ax) float64 -0.799 -0.799 -0.799 -0.799 ... 0.851 0.851 0.851
  - ax1      (ax) float64 -0.7455 -0.7455 -0.7455 -0.7455 ... 0.936 0.936 0.936
  - ax2      (ax) float64 -0.4083 -0.3509 -0.1682 ... 0.4617 0.6889 0.7463
Data variables:
    *empty*
Attributes:
    axis_units:  [Unit("cm"), Unit("cm"), Unit("cm")]



<xarray.Dataset>
Dimensions:  (ax: 1000)
Coordinates:
  * ax       (ax) object (-0.7989620018508881, -0.745516486055434, -0.4083180...
Data variables:
    x        (ax) float64 -0.799 -0.799 -0.799 -0.799 ... 0.851 0.851 0.851
Attributes:
    axis_units:  [Unit("cm"), Unit("cm"), Unit("cm")]

Notice ax0, ax1, and ax2 have disappeared and ax is no longer a MultiIndex.

EDIT

In case anyone is interested, the solution was that I needed to manually add the coordinate array to the new DataArray before putting it into the Dataset:

dims = ["ax"]
coords = {"ax": self.ds.coords["ax"]}

data = xr.DataArray(
                quantity, dims=dims, coords=coords, attrs={"unit": quantity.unit}
            )
self.ds[key] = data

Previously I was just specifying dims=['ax'] and thinking it got the coordinates from the dataset, but I guess what was happening is that it thought the DataArray had NO coordinates and then removed them from the DataSet when the DataArray was added.

@pheuer
Copy link
Member Author

pheuer commented Feb 26, 2021

@StanczakDominik I believe I've resolved the xarray v0.17.0 issue: let me know what you think.

I do want to resolve the __getitem__ issue (described a few comments up) before we consider merging this PR.

@StanczakDominik
Copy link
Member

grid['B_x'] no longer returns a reference to the underlying dataset. Overall this isn't a use case I foresee really wanting to encourage, and you can still accomplish the same goal by just overwriting the quantity with a new array eg.

Bx = grid['B_x']
Bx[0,0,0] = 1 * u.T
grid.add_quantities(B_x=Bx)

But does anyone have thoughts on this? I can try to explain it better in the documentation and/or notebook. Ultimately I think it's still worth it to have grid['B_x'] return a u.Quantity instead of an xarray.DataArray with the units buried.

Is there a way to use __setitem__ that takes both a key AND a slice, eg. for grid['B_x'][0,0,0] passes in both 'B_x' AND 'slice(0,0,0)`?

Oh bloody hell... This is exactly what got me one of the previous times when I tried integrating xarray with units for our use cases. No, I don't think there's any way of dodging that just yet. There may be some improvements coming in the future;

But no, for now AFAIK there's no way to do that cleanly UNLESSSSSSSSSSSSSSSSSSSSSSS I just had an idea that might be a hack but just might work. We currently do this:

> return self.ds[key].data * self.ds[key].attrs["unit"]

But u.Quantity has a copy argument:

copy : bool, optional
    If `True` (default), then the value is copied.  Otherwise, a copy will
    only be made if ``__array__`` returns a copy, if value is a nested
    sequence, or if a copy is needed to satisfy an explicitly given
    ``dtype``.  (The `False` option is intended mostly for internal use,
    to speed up initialization where a copy is known to have been made.
    Use with care.)

so, suppose we did some variation of this:

> return u.Quantity(self.ds[key].data, self.ds[key].attrs["unit"], copy = False)

... the more I think about it the more I seem to remember experimenting with that and failing.

We definitely need a @pytest.mark.xfail test with this functionality, at the very least, and we need to document the heck out of it.


As for the MultiIndex, I don't see anything all that relevant in their changelog. We might want to report back to them with an MVE... If you could share a script that reproduces this bug, I'd step through with pudb and try to make one.

@pheuer
Copy link
Member Author

pheuer commented Feb 27, 2021

Fingers crossed, but I think your solution works!

def __getitem__(self, key):
        """
        Given a key, return the corresponding array as an `astropy.Quantity`

        Returning with copy=False means that the array returned is a direct
        reference to the underlying DataArray, so changes made will be reflected
        in the underlying DataArray.
        """
        return  u.Quantity(self.ds[key].data, self.ds[key].attrs["unit"], copy=False)

Then the following test now passes!

grid['B_x'][0,0,0] = 21 * u.T
assert grid['B_x'][0,0,0] == 21 * u.T

I loved the Hamilton reference, btw ;)

As for MultiIndex, I will create a minimal example. However, I'm not sure it's really a bug (they may have always intended this to be required and the fact that it worked previously was the bug...)

@pheuer
Copy link
Member Author

pheuer commented Feb 27, 2021

Here is the minimal example: I've verified that the two examples run differently on xarray==0.17.0 but the same on xarray==0.16.2.

2021_2_27 xarray dataset MultiIndex bug.txt

@pheuer pheuer marked this pull request as ready for review February 27, 2021 16:08
Copy link
Member

@StanczakDominik StanczakDominik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Awesome that you got the quantity no copy thing working, I have a couple of places i know I want to use it already :)

@StanczakDominik StanczakDominik merged commit af7b9d3 into PlasmaPy:master Feb 27, 2021
@pheuer pheuer moved this from In progress to Done in HEDP Diagnostics Mar 1, 2021
@StanczakDominik StanczakDominik added this to the 0.6.0 milestone Mar 12, 2021
rocco8773 added a commit to rocco8773/PlasmaPy that referenced this pull request Mar 13, 2021
@namurphy namurphy added the plasmapy.plasma Related to the plasmapy.plasma subpackage label May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plasmapy.plasma Related to the plasmapy.plasma subpackage
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants