-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenPMD HDF5Reader Plasma subclass #500
Conversation
Hello @ritiek! Thanks for updating your pull request. Congratulations! There are no PEP8 issues in this pull request. 😸 Comment last updated on June 28, 2018 at 07:32 Hours UTC |
And happy #500! |
I am going to compare h5py and openPMD-api soon, so we can choose whichever one seems better for our purpose. |
OpenPMD API looks something like this: >>> import openPMD
>>> series = openPMD.Series('plasmapy/classes/tests/data/data00000255.h5',
openPMD.Access_Type.read_only)
>>> items = list(series.iterations.items())
>>> items
[(255, <openPMD.Iteration of at t = '0.000000 s'>)]
>>> iteration, data = items[0]
>>> meshes = data.meshes.items()
>>> particles = data.particles.items()
>>> list(meshes)
[('E', <openPMD.Mesh>), ('rho', <openPMD.Mesh>)]
>>> list(particles)
[('Hydrogen1+', <openPMD.ParticleSpecies>),
('electrons', <openPMD.ParticleSpecies>)] I personally think we should just stick with I found we could also do this with h5py to navigate through multiple levels easily: >>> h5 = h5py.File('plasmapy/classes/tests/data/data00000255.h5')
>>> h5.get('data/255/fields/E/x')
<HDF5 dataset "x": shape (51, 201), type "<f8"> What do you guys think? |
Well, I guess we could start out with h5py for now. As long as we're using just hdf5 files, we don't really get advantages from the API right now besides potentially being nice to the guys at OpenPMD and testing their API out for them. But that's obviously a lower priority than getting this working! I'm fine with skipping that for now. |
By the way, I'm absolutely not sure we should default to OpenPMD on a "hdf5" keyword argument. I think there are two solutions:
|
Ok, let's put our stakes on h5py then.
Maybe we could have both of them. I checked it out and OpenPMD does seem to set standards to check if a dataset is based on OpenPMD format User could force HDF5 to read as OpenPMD by passing |
Sounds reasonable 👍 |
924c965
to
2c61119
Compare
# convinced otherwise if another way makes more sense | ||
|
||
def test_x(openPMD2DPlasma): | ||
assert openPMD2DPlasma.x.shape == (51, 201) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is whose position coordinates we are talking about? If this is for particles like electrons, etc., it would make more sense to have something like openPMD2DPlasma.electrons.x
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely would! I thought I mentioned it, but guess not. OpenPMD lets you attach discrete particle data to your mesh data, but I think it would be fine to skip the particle
s for now and just handle the mesh data well. Adding particles would mean we'd have to rethink our species
class... which we could, of course, do.
Also, I now have doubts about the fact that OpenPMD lets you define separate coordinates for each field (e.g. you can have electric and magnetic fields defined at a spatial offset to each other, like for Yee grids )... we'll have to think more about how to handle this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be fine to skip the particles for now
yes, fully optional. mesh-only works as well.
[o]penPMD lets you define separate coordinates for each field
Just in case this is relevant for you. Otherwise just assign them to zero for now if it's irrelevant for your use case (e.g. node-centered).
|
||
def test_has_charge_density_with_units(openPMD2DPlasma): | ||
assert openPMD2DPlasma.charge_density.to(u.C/u.m**3) # unless it's some | ||
# 2D charge density |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to check units this way:
>>> import astropy.units as u
>>> import h5py
>>> h5 = h5py.File('data00000255.h5')
>>> path = 'data/255/fields/rho'
>>> grid_size = len(h5[path].attrs['axisLabels'])
>>> units = u.C / u.m**grid_size
Would this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there's a way that synergizes with what OpenPMD provides for us way more nicely:
In [4]: dict(h5[path].attrs)['unitDimension']
Out[4]: array([-3., 0., 1., 1., 0., 0., 0.])
And then we can interpret that based on documentation for the standard as
powers of the 7 base measures characterizing the record's unit in SI (length L, mass M, time T, electric current I, thermodynamic temperature theta, amount of substance N, luminous intensity J)
so we'll have length^-3 (1/volume, reasonable), no mass, time ^1 and current ^1 gets you electric charge. We can then multiply each of these by its appropriate astropy.unit
and we'll get the complete Quantity for that field! I really liked that system while reading the docs. I guess I was too hyped up to note that down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is really neat!
def test_correct_shape_electric_field(openPMD2DPlasma): | ||
assert openPMD2DPlasma.electric_field.shape == (3, 51, 201) | ||
# IIRC this is how we defined it in the old Plasma class but I can be | ||
# convinced otherwise if another way makes more sense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like this?
assert openPMD2DPlasma.electric_field.x.shape == (51, 201)
assert openPMD2DPlasma.electric_field.y.shape == (51, 201)
assert openPMD2DPlasma.electric_field.z.shape == (51, 201)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with either of them though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might make sense, but it also might make using the electric field harder in simulations if you can't just access it as one big 3D array... but there may be ways around that. I'm not sure what the optimal choice is here. I think this is a good point to talk about at tomorrow's telecon, if we're all up for that.
For reading a
For writing I would just auto-annotate all |
94a8a2d
to
dee8828
Compare
Ok, as of now we can handle both 2D and 3D HDF5 datasets for mesh (electric field and electric density) data but not particle data. What do we want to do with particle data? Should we just implement them the way we did with mesh data? |
Also, should we ship the example OpenPMD HDF5 datasets that we are using for our tests with our main codebase? Maybe we could have them downloaded the moment a user runs those tests but I don't know how hard that would be practically. |
Oh yeah, downloading the test data is a sick idea that I completely forgot about! Maybe we could use that --online flag to setup.py test and emulate Astro/sunpy functionality. By the way, we'll have to keep in mind to rebase and remove the files from these commits before we merge this PR into master. I can do that without a problem. |
Good idea. We need to upload
Yep, once we are ready to merge this, we're going to have to take care of this. |
We need to replace the URLs in |
There is one problem at the moment that test data would need be to downloaded every single time the tests are run. We could workaround this by creating a module similar to https://github.com/sunpy/sunpy/blob/master/sunpy/data/sample.py and then data would need downloading only once? For example, SunPy downloads all the data at once and stores it somewhere for future use when a code involving external datasets is first run. >>> import sunpy.map
>>> import sunpy.data.sample
>>> mymap = sunpy.map.Map(sunpy.data.sample.AIA_171_IMAGE) |
Having thought about this for a bit, I think we're fine putting this sample data directly in the repository. These datasets are not really large, and SunPy appears to include a bit of its sample data right in the repo. I thus think we're fine not having to set up AWS or other storage for now! We definitely need to reference the source for it (which I probably forgot to do...), and I think it may be good to store it in |
Just a note: I am not 100% sure we already compressed the checked-in files since we use them under git lfs. Please try to use |
f0c2750
to
1361cec
Compare
Codecov Report
@@ Coverage Diff @@
## master #500 +/- ##
==========================================
- Coverage 97.71% 97.58% -0.14%
==========================================
Files 41 44 +3
Lines 3590 3679 +89
==========================================
+ Hits 3508 3590 +82
- Misses 82 89 +7
Continue to review full report at Codecov.
|
@ax3l Looks like they were indeed not compressed. Total size for (those two) HDF5 files reduced from about 9 MB to 6 MB. Thanks! @StanczakDominik @ax3l We already mention from where we downloaded the datasets (here), is there anything else we need to do? |
aa8393f
to
a404ddb
Compare
That looks perfect, you also mention the license CC-0 so I think that's it :-) |
Sorry for the hijack of the thread: I would like to mention PlasmaPy with the PlasmaPy logo in upcoming conference talks/posters as part of the "openPMD community". Is that fine with you? I won't go into details in those but I think it's cool to mention that you are in the process of adding openPMD support. We also should not forget to add you here so people find you: https://github.com/openPMD/openPMD-projects |
a404ddb
to
50cb17c
Compare
@ritiek awesome, looking into the changes and I hope I can merge this in today! @ax3l by all means, go right ahead. Pick a version you like from the logo source repo! I'd love to see the poster once you're done there. Also, if you want to not hijack issues in the future, check out our Matrix chatroom or our mailing list! :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, this looks nice! I think there's a few issues that could be addressed with a follow up PR, but I'm happy to merge it as a prototype. :)
Parameters | ||
---------- | ||
hdf5 : `str` | ||
Path to HDF5 file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this could use a References
link to the site: http://openpmd.org/
electric_field : `astropy.units.Quantity` | ||
An (x, y, z) array containing electric field data. | ||
charge_density : `astropy.units.Quantity` | ||
An array containing charge density data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... my bad here, but I guess I kind of forgot that the 2D and 3D examples were showing off electrostatic + particle simulations. If you look at the thetaMode example, you will find two more mesh quantities:
(base) 20:04:58 dominik@dell: ~/Code/openPMD/openPMD-example-datasets/example-thetaMode/hdf5 $ h5ls -r data00000100.h5
/ Group
/data Group
/data/100 Group
/data/100/fields Group
/data/100/fields/B Group
/data/100/fields/B/r Dataset {3, 51, 201}
/data/100/fields/B/t Dataset {3, 51, 201}
/data/100/fields/B/z Dataset {3, 51, 201}
/data/100/fields/E Group
/data/100/fields/E/r Dataset {3, 51, 201}
/data/100/fields/E/t Dataset {3, 51, 201}
/data/100/fields/E/z Dataset {3, 51, 201}
/data/100/fields/J Group
/data/100/fields/J/r Dataset {3, 51, 201}
/data/100/fields/J/t Dataset {3, 51, 201}
/data/100/fields/J/z Dataset {3, 51, 201}
/data/100/fields/rho Dataset {3, 51, 201}
B being the magnetic field and J the electric current, which can be neglected in electrostatic simulations. I think we should probably handle those as well - electrostatics are a fraction of all possible simulations. Do you think it would be much of a hassle to add them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few tests for any of those datasets would also be very nice, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I'll work on them today.
units = _fetch_units(self.h5[path].attrs["unitDimension"]) | ||
return np.array((self.h5[path]['x'], | ||
self.h5[path]['y'], | ||
self.h5[path]['z'])) * units |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll definitely have to look at how np.array
here is going to work... but it's nice enough for the prototype!
assert self.h5.electric_field.to(u.V / u.m) | ||
|
||
def test_correct_shape_electric_field(self): | ||
assert self.h5.electric_field.shape == (3, 26, 26, 201) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might mess around with this one so that the interface handles better in a soon-to-be-made PR.
plasmapy/utils/exceptions.py
Outdated
@@ -78,6 +78,11 @@ class InvalidParticleError(AtomicError): | |||
pass | |||
|
|||
|
|||
class OpenPMDError(PlasmaPyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe DataStandardError
? OpenPMDError
feels quite specific.
cf52393
to
6dd3ce4
Compare
6dd3ce4
to
e3d0664
Compare
I've pushed the suggested changes. For the failing coverage, its because we don't test for exceptions. We would have to include more data which in my opinion isn't worth it. Otherwise, this should be ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Gonna merge it now. I'd say the same thing about testing for exceptions. Thanks, @ritiek!
Also see #499. This PR adds the ability to read HDF5 files based on OpenPMD standard when passing a keyword argument
hdf5
.The initial idea is to set appropriate properties for dealing with 2D datasets on the subclass. Once it works out nicely, we can go ahead having a similar implementation for working with 3D datasets.
EDIT: We ended up adding support for all types available in example datasets. 2D, 3D and even thetaMode. :D