OpenPMD HDF5Reader Plasma subclass #500

ritiek · 2018-06-17T08:40:25Z

Also see #499. This PR adds the ability to read HDF5 files based on OpenPMD standard when passing a keyword argument hdf5.

>>> plasmapy.classes.Plasma(hdf5='plasmapy/classes/tests/data/data00000255.h5')

The initial idea is to set appropriate properties for dealing with 2D datasets on the subclass. Once it works out nicely, we can go ahead having a similar implementation for working with 3D datasets.

EDIT: We ended up adding support for all types available in example datasets. 2D, 3D and even thetaMode. :D

Initial tests for 2D OpenPMD sample

pep8speaks · 2018-06-17T08:40:27Z

Hello @ritiek! Thanks for updating your pull request.

Congratulations! There are no PEP8 issues in this pull request. 😸

Comment last updated on June 28, 2018 at 07:32 Hours UTC

ritiek · 2018-06-17T08:40:50Z

And happy #500!

ritiek · 2018-06-17T08:43:10Z

I am going to compare h5py and openPMD-api soon, so we can choose whichever one seems better for our purpose.

ritiek · 2018-06-17T11:36:14Z

OpenPMD API looks something like this:

>>> import openPMD
>>> series = openPMD.Series('plasmapy/classes/tests/data/data00000255.h5',
                            openPMD.Access_Type.read_only)

>>> items = list(series.iterations.items())
>>> items
[(255, <openPMD.Iteration of at t = '0.000000 s'>)]

>>> iteration, data = items[0]
>>> meshes = data.meshes.items()
>>> particles = data.particles.items()

>>> list(meshes)
[('E', <openPMD.Mesh>), ('rho', <openPMD.Mesh>)]
>>> list(particles)
[('Hydrogen1+', <openPMD.ParticleSpecies>),
 ('electrons', <openPMD.ParticleSpecies>)]

I personally think we should just stick with h5py. It is packaged on PyPi which makes it easy to distribute unlike OpenPMD-api which is available only through conda (and spack). Also, it seems a bit weird to me working with iter objects with OpenPMD API.

I found we could also do this with h5py to navigate through multiple levels easily:

>>> h5 = h5py.File('plasmapy/classes/tests/data/data00000255.h5')
>>> h5.get('data/255/fields/E/x')
<HDF5 dataset "x": shape (51, 201), type "<f8">

What do you guys think?

StanczakDominik · 2018-06-17T13:30:26Z

Well, I guess we could start out with h5py for now. As long as we're using just hdf5 files, we don't really get advantages from the API right now besides potentially being nice to the guys at OpenPMD and testing their API out for them. But that's obviously a lower priority than getting this working! I'm fine with skipping that for now.

StanczakDominik · 2018-06-17T13:34:00Z

By the way, I'm absolutely not sure we should default to OpenPMD on a "hdf5" keyword argument. I think there are two solutions:

openPMD = True forces initialization via OpenPMD
I think there's an attr saying this is an OpenPMD file in h5 = h5py.File('plasmapy/classes/tests/data/data00000255.h5'); h5.attrs. We could read that and if it's found, awesome, we know how to handle that. If it isn't, ValueError or something.

ritiek · 2018-06-17T14:25:07Z

Ok, let's put our stakes on h5py then.

I'm absolutely not sure we should default to OpenPMD on a "hdf5" keyword argument. I think there are two solutions

Maybe we could have both of them. I checked it out and OpenPMD does seem to set standards to check if a dataset is based on OpenPMD format h5.attrs['openPMDextension'].

User could force HDF5 to read as OpenPMD by passing openPMD=True. If this keyword argument is not supplied, we could automatically decide checking openPMDextension attribute.

StanczakDominik · 2018-06-17T18:14:47Z

Sounds reasonable 👍

ritiek · 2018-06-18T14:06:15Z

plasmapy/classes/sources/tests/test_openpmd_hdf5.py

+    # convinced otherwise if another way makes more sense
+
+def test_x(openPMD2DPlasma):
+    assert openPMD2DPlasma.x.shape == (51, 201)


This is whose position coordinates we are talking about? If this is for particles like electrons, etc., it would make more sense to have something like openPMD2DPlasma.electrons.x.

It definitely would! I thought I mentioned it, but guess not. OpenPMD lets you attach discrete particle data to your mesh data, but I think it would be fine to skip the particles for now and just handle the mesh data well. Adding particles would mean we'd have to rethink our species class... which we could, of course, do.

Also, I now have doubts about the fact that OpenPMD lets you define separate coordinates for each field (e.g. you can have electric and magnetic fields defined at a spatial offset to each other, like for Yee grids )... we'll have to think more about how to handle this...

I think it would be fine to skip the particles for now

yes, fully optional. mesh-only works as well.

[o]penPMD lets you define separate coordinates for each field

Just in case this is relevant for you. Otherwise just assign them to zero for now if it's irrelevant for your use case (e.g. node-centered).

ritiek · 2018-06-18T14:26:27Z

plasmapy/classes/sources/tests/test_openpmd_hdf5.py

+
+def test_has_charge_density_with_units(openPMD2DPlasma):
+    assert openPMD2DPlasma.charge_density.to(u.C/u.m**3)  # unless it's some
+    # 2D charge density


We might be able to check units this way:

>>> import astropy.units as u >>> import h5py >>> h5 = h5py.File('data00000255.h5') >>> path = 'data/255/fields/rho' >>> grid_size = len(h5[path].attrs['axisLabels']) >>> units = u.C / u.m**grid_size

Would this work?

Actually, there's a way that synergizes with what OpenPMD provides for us way more nicely:

In [4]: dict(h5[path].attrs)['unitDimension'] Out[4]: array([-3., 0., 1., 1., 0., 0., 0.])

And then we can interpret that based on documentation for the standard as

powers of the 7 base measures characterizing the record's unit in SI (length L, mass M, time T, electric current I, thermodynamic temperature theta, amount of substance N, luminous intensity J)

so we'll have length^-3 (1/volume, reasonable), no mass, time ^1 and current ^1 gets you electric charge. We can then multiply each of these by its appropriate astropy.unit and we'll get the complete Quantity for that field! I really liked that system while reading the docs. I guess I was too hyped up to note that down.

Wow, this is really neat!

ritiek · 2018-06-18T14:33:51Z

plasmapy/classes/sources/tests/test_openpmd_hdf5.py

+def test_correct_shape_electric_field(openPMD2DPlasma):
+    assert openPMD2DPlasma.electric_field.shape == (3, 51, 201)
+    # IIRC this is how we defined it in the old Plasma class but I can be
+    # convinced otherwise if another way makes more sense


How about something like this?

assert openPMD2DPlasma.electric_field.x.shape == (51, 201) assert openPMD2DPlasma.electric_field.y.shape == (51, 201) assert openPMD2DPlasma.electric_field.z.shape == (51, 201)

I'm fine with either of them though.

That might make sense, but it also might make using the electric field harder in simulations if you can't just access it as one big 3D array... but there may be ways around that. I'm not sure what the optimal choice is here. I think this is a good point to talk about at tomorrow's telecon, if we're all up for that.

ax3l · 2018-06-20T08:01:04Z

User could force HDF5 to read as [o]penPMD by passing openPMD=True. If this keyword argument is not supplied, we could automatically decide checking openPMDextension attribute.

For reading a .h5 file I would do the following detection (as we implemented in yt):

is a hdf5 file?
- [yes] has the attribute openPMD in /?
  - [yes] attribute openPMD contains a supported version (e.g. 1.1.0 <= v < 2.0.0)
    - [yes] is use openPMD reader
    - [no] warn user about unsupported openPMD version, if version in file is too old, offer an update of it
  - [no] test for a different HDF5 reader
- [no] test for different file format

openPMDextension is not the main identifier, but this attribute will also be present. It allows to add further, domain-specific meaning on top of the base standard data description, e.g. for electro-magnetic PIC codes (1.0+), physical particle species (2.0+) or particle accelerator codes (2.0+). Maybe you even want to propose your own as well later on?

For writing I would just auto-annotate all .h5 output with openPMD markup. That might be a biased opinion, but it does not harm to add those attributes and just makes the data by default more portable, self-documented, readable by more tools, ... ;-)

ritiek · 2018-06-21T17:17:40Z

Ok, as of now we can handle both 2D and 3D HDF5 datasets for mesh (electric field and electric density) data but not particle data. What do we want to do with particle data? Should we just implement them the way we did with mesh data?

ritiek · 2018-06-21T17:29:51Z

Also, should we ship the example OpenPMD HDF5 datasets that we are using for our tests with our main codebase? Maybe we could have them downloaded the moment a user runs those tests but I don't know how hard that would be practically.

StanczakDominik · 2018-06-22T06:05:36Z

Oh yeah, downloading the test data is a sick idea that I completely forgot about! Maybe we could use that --online flag to setup.py test and emulate Astro/sunpy functionality.

By the way, we'll have to keep in mind to rebase and remove the files from these commits before we merge this PR into master. I can do that without a problem.

ritiek · 2018-06-23T09:12:11Z

Good idea. --remote-data flag seems good to me.

We need to upload data00000100.h5 and data00000255.h5 somewhere. Do you have any preference? I'd put them on AWS but they ask for scary details during registration. :/

By the way, we'll have to keep in mind to rebase and remove the files from these commits before we merge this PR into master. I can do that without a problem.

Yep, once we are ready to merge this, we're going to have to take care of this.

ritiek · 2018-06-23T09:48:45Z

We need to replace the URLs in test_openpmd_hdf5.py once we have those datasets uploaded somewhere and then we should be ready to merge this as a prototype. :D

ritiek · 2018-06-23T10:11:13Z

There is one problem at the moment that test data would need be to downloaded every single time the tests are run. We could workaround this by creating a module similar to https://github.com/sunpy/sunpy/blob/master/sunpy/data/sample.py and then data would need downloading only once?

For example, SunPy downloads all the data at once and stores it somewhere for future use when a code involving external datasets is first run.

>>> import sunpy.map
>>> import sunpy.data.sample  
>>> mymap = sunpy.map.Map(sunpy.data.sample.AIA_171_IMAGE)

StanczakDominik · 2018-06-26T05:54:58Z

Having thought about this for a bit, I think we're fine putting this sample data directly in the repository. These datasets are not really large, and SunPy appears to include a bit of its sample data right in the repo. I thus think we're fine not having to set up AWS or other storage for now!

We definitely need to reference the source for it (which I probably forgot to do...), and I think it may be good to store it in plasmapy/data/ - this would be a better place for Langmuir data as well, most likely.

ax3l · 2018-06-26T21:39:15Z

Just a note: I am not 100% sure we already compressed the checked-in files since we use them under git lfs. Please try to use h5repack, e.g. with gzip, on them and see if their size changes before checking them in to save space. Here is an example ("h5compress"): https://github.com/ax3l/cluster-scripts

codecov · 2018-06-27T13:54:14Z

Codecov Report

Merging #500 into master will decrease coverage by 0.13%.
The diff coverage is 92.22%.

@@            Coverage Diff             @@
##           master     #500      +/-   ##
==========================================
- Coverage   97.71%   97.58%   -0.14%     
==========================================
  Files          41       44       +3     
  Lines        3590     3679      +89     
==========================================
+ Hits         3508     3590      +82     
- Misses         82       89       +7

Impacted Files	Coverage Δ
plasmapy/utils/__init__.py	`100% <ø> (ø)`	⬆️
plasmapy/classes/__init__.py	`100% <100%> (ø)`	⬆️
plasmapy/utils/exceptions.py	`100% <100%> (ø)`	⬆️
plasmapy/data/test/__init__.py	`100% <100%> (ø)`
plasmapy/classes/sources/__init__.py	`100% <100%> (ø)`	⬆️
plasmapy/data/setup_package.py	`50% <50%> (ø)`
plasmapy/classes/sources/openpmd_hdf5.py	`92.4% <92.4%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7310aa...e3d0664. Read the comment docs.

ritiek · 2018-06-27T13:55:18Z

@ax3l Looks like they were indeed not compressed. Total size for (those two) HDF5 files reduced from about 9 MB to 6 MB. Thanks!

@StanczakDominik @ax3l We already mention from where we downloaded the datasets (here), is there anything else we need to do?

ax3l · 2018-06-27T14:36:17Z

That looks perfect, you also mention the license CC-0 so I think that's it :-)

ax3l · 2018-06-27T14:40:54Z

Sorry for the hijack of the thread: I would like to mention PlasmaPy with the PlasmaPy logo in upcoming conference talks/posters as part of the "openPMD community". Is that fine with you? I won't go into details in those but I think it's cool to mention that you are in the process of adding openPMD support.

We also should not forget to add you here so people find you: https://github.com/openPMD/openPMD-projects

StanczakDominik · 2018-06-27T17:59:54Z

@ritiek awesome, looking into the changes and I hope I can merge this in today!

@ax3l by all means, go right ahead. Pick a version you like from the logo source repo! I'd love to see the poster once you're done there.

Also, if you want to not hijack issues in the future, check out our Matrix chatroom or our mailing list! :D

StanczakDominik

All right, this looks nice! I think there's a few issues that could be addressed with a follow up PR, but I'm happy to merge it as a prototype. :)

StanczakDominik · 2018-06-27T18:03:09Z

plasmapy/classes/sources/openpmd_hdf5.py

+        Parameters
+        ----------
+        hdf5 : `str`
+            Path to HDF5 file.


Looks like this could use a References link to the site: http://openpmd.org/

StanczakDominik · 2018-06-27T18:23:02Z

plasmapy/classes/sources/openpmd_hdf5.py

+        electric_field : `astropy.units.Quantity`
+            An (x, y, z) array containing electric field data.
+        charge_density : `astropy.units.Quantity`
+            An array containing charge density data.


So... my bad here, but I guess I kind of forgot that the 2D and 3D examples were showing off electrostatic + particle simulations. If you look at the thetaMode example, you will find two more mesh quantities:

(base) 20:04:58 dominik@dell: ~/Code/openPMD/openPMD-example-datasets/example-thetaMode/hdf5 $ h5ls -r data00000100.h5 / Group /data Group /data/100 Group /data/100/fields Group /data/100/fields/B Group /data/100/fields/B/r Dataset {3, 51, 201} /data/100/fields/B/t Dataset {3, 51, 201} /data/100/fields/B/z Dataset {3, 51, 201} /data/100/fields/E Group /data/100/fields/E/r Dataset {3, 51, 201} /data/100/fields/E/t Dataset {3, 51, 201} /data/100/fields/E/z Dataset {3, 51, 201} /data/100/fields/J Group /data/100/fields/J/r Dataset {3, 51, 201} /data/100/fields/J/t Dataset {3, 51, 201} /data/100/fields/J/z Dataset {3, 51, 201} /data/100/fields/rho Dataset {3, 51, 201}

B being the magnetic field and J the electric current, which can be neglected in electrostatic simulations. I think we should probably handle those as well - electrostatics are a fraction of all possible simulations. Do you think it would be much of a hassle to add them?

A few tests for any of those datasets would also be very nice, of course.

Sure! I'll work on them today.

StanczakDominik · 2018-06-27T18:23:47Z

plasmapy/classes/sources/openpmd_hdf5.py

+            units = _fetch_units(self.h5[path].attrs["unitDimension"])
+            return np.array((self.h5[path]['x'],
+                             self.h5[path]['y'],
+                             self.h5[path]['z'])) * units


I'll definitely have to look at how np.array here is going to work... but it's nice enough for the prototype!

StanczakDominik · 2018-06-27T18:27:09Z

plasmapy/classes/sources/tests/test_openpmd_hdf5.py

+        assert self.h5.electric_field.to(u.V / u.m)
+
+    def test_correct_shape_electric_field(self):
+        assert self.h5.electric_field.shape == (3, 26, 26, 201)


I might mess around with this one so that the interface handles better in a soon-to-be-made PR.

StanczakDominik · 2018-06-27T18:28:50Z

plasmapy/utils/exceptions.py

@@ -78,6 +78,11 @@ class InvalidParticleError(AtomicError):
    pass


+class OpenPMDError(PlasmaPyError):


Maybe DataStandardError? OpenPMDError feels quite specific.

ritiek · 2018-06-28T07:34:24Z

I've pushed the suggested changes. For the failing coverage, its because we don't test for exceptions. We would have to include more data which in my opinion isn't worth it. Otherwise, this should be ready to merge.

StanczakDominik

Awesome! Gonna merge it now. I'd say the same thing about testing for exceptions. Thanks, @ritiek!

StanczakDominik and others added 4 commits June 16, 2018 18:40

Initial tests for 2D OpenPMD sample

d4d7861

Skeleton for OpenPMD HDF5 Reader subclass

c662dfe

Add test for Plasma HDF5Reader subclass

928fe4a

Merge pull request #6 from PlasmaPy/openpmd-subclass

a2541f5

Initial tests for 2D OpenPMD sample

ritiek changed the title ~~OpenPMD HDF5Reader Plasma subclass~~ [WIP] OpenPMD HDF5Reader Plasma subclass Jun 17, 2018

ritiek added 2 commits June 18, 2018 00:37

Improve is_datasource_for method

9dc535c

Add h5py as a dependency

2c61119

ritiek force-pushed the openpmd-hdf5reader branch from 924c965 to 2c61119 Compare June 17, 2018 19:08

Fetch data from HDF5 when accessed

c33796b

ritiek commented Jun 18, 2018

View reviewed changes

ritiek added 2 commits June 19, 2018 18:12

Read units from record attributes

526e9ad

Add docstrings for classes

161e1a5

StanczakDominik mentioned this pull request Jun 19, 2018

Create PLEP on adopting OpenPMD standard? PlasmaPy/PlasmaPy-PLEPs#13

Open

ritiek added 2 commits June 21, 2018 18:50

Use a 3D (x, y, z) array for electric field

d3110fc

Moar tests

dee8828

ritiek force-pushed the openpmd-hdf5reader branch from 94a8a2d to dee8828 Compare June 21, 2018 17:11

Move test data to plasmapy/data/

1361cec

ritiek force-pushed the openpmd-hdf5reader branch from f0c2750 to 1361cec Compare June 27, 2018 13:48

ritiek force-pushed the openpmd-hdf5reader branch from aa8393f to a404ddb Compare June 27, 2018 14:16

Make pep8 bot happy

50cb17c

ritiek force-pushed the openpmd-hdf5reader branch from a404ddb to 50cb17c Compare June 27, 2018 15:06

StanczakDominik approved these changes Jun 27, 2018

View reviewed changes

ritiek force-pushed the openpmd-hdf5reader branch from cf52393 to 6dd3ce4 Compare June 28, 2018 06:29

Add support for thetaMode geometry and some minor fixes

e3d0664

ritiek force-pushed the openpmd-hdf5reader branch from 6dd3ce4 to e3d0664 Compare June 28, 2018 07:32

StanczakDominik approved these changes Jun 28, 2018

View reviewed changes

StanczakDominik merged commit 334dc37 into PlasmaPy:master Jun 28, 2018

ritiek deleted the openpmd-hdf5reader branch July 5, 2018 14:59

namurphy changed the title ~~[WIP] OpenPMD HDF5Reader Plasma subclass~~ OpenPMD HDF5Reader Plasma subclass Jul 23, 2018

namurphy added the plasmapy.plasma Related to the plasmapy.plasma subpackage label Jul 23, 2018

namurphy added the needs changelog entry See: https://docs.plasmapy.org/en/latest/contributing/changelog_guide.html label May 31, 2019

ax3l mentioned this pull request Aug 20, 2019

Use OpenPMD standard for output data #167

Closed

		@@ -78,6 +78,11 @@ class InvalidParticleError(AtomicError):
		pass


		class OpenPMDError(PlasmaPyError):

OpenPMD HDF5Reader Plasma subclass #500

OpenPMD HDF5Reader Plasma subclass #500

Conversation

ritiek commented Jun 17, 2018 • edited Loading

pep8speaks commented Jun 17, 2018 • edited Loading

Comment last updated on June 28, 2018 at 07:32 Hours UTC

ritiek commented Jun 17, 2018

ritiek commented Jun 17, 2018

ritiek commented Jun 17, 2018

StanczakDominik commented Jun 17, 2018

StanczakDominik commented Jun 17, 2018

ritiek commented Jun 17, 2018 • edited Loading

StanczakDominik commented Jun 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ax3l commented Jun 20, 2018 • edited Loading

ritiek commented Jun 21, 2018

ritiek commented Jun 21, 2018

StanczakDominik commented Jun 22, 2018

ritiek commented Jun 23, 2018

ritiek commented Jun 23, 2018

ritiek commented Jun 23, 2018

StanczakDominik commented Jun 26, 2018

ax3l commented Jun 26, 2018

codecov bot commented Jun 27, 2018 • edited Loading

Codecov Report

ritiek commented Jun 27, 2018 • edited Loading

ax3l commented Jun 27, 2018

ax3l commented Jun 27, 2018

StanczakDominik commented Jun 27, 2018

StanczakDominik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ritiek commented Jun 28, 2018

StanczakDominik left a comment

Choose a reason for hiding this comment

ritiek commented Jun 17, 2018 •

edited

Loading

pep8speaks commented Jun 17, 2018 •

edited

Loading

ritiek commented Jun 17, 2018 •

edited

Loading

ax3l commented Jun 20, 2018 •

edited

Loading

codecov bot commented Jun 27, 2018 •

edited

Loading

ritiek commented Jun 27, 2018 •

edited

Loading