Pyclaw hdf5 parallel read/write #528

aymkhalil · 2015-11-01T12:34:30Z

This PR:

Includes and replaces PyClaw/PetClaw hdf5 parallel write with h5py #526 by @weslowrie
Adds the ability to read HDF5 files in parallel
Adds tests for parallel HDF5 I/O
Please note that the tests could fail if we don't have parallel HDF5 on Travis.

updating fork

…ith the 4D dimensions (for 3D datasets), where the global dimensions are num_eqn, nx, ny, nz. This makes the file compatable with the serial hdf5 output, and is also more user friendly for visualization. The processor ranges are for each dimension are retrieved with patch._da.getRanges()

…on filters with parallel h5py.

…nd parallel hdf5 I/O tests

aymkhalil · 2015-11-03T18:06:55Z

Parallel HDF5 is available on Travis (out of the box):
https://travis-ci.org/clawpack/pyclaw/jobs/88962967#L2358
All I/O tests now run from pycalw/tests/test_io.py with variations for serial & parallel runs. In order for them to run, --exclude=io option has been removed from .travis.yml

mandli · 2015-11-03T20:41:16Z

And so is the python bindings, nice!

If no one else objects to the new inclusion of the io tests I am fine merging this in.

weslowrie · 2015-11-03T21:14:42Z

@aymkhalil thanks for following up on this. Looks great!

One comment: I noticed previously before this development that if one uses Petclaw, and has the serial hdf5 output_format=hdf5 it just writes a file with the portion of the domain/solution that processor 0 owns. Do you think we need to flag this as an error/warning? Or maybe now force the parallel read/write when using petclaw? That would also eliminate the need to have the extra hdf5p output format type.

ketch · 2015-11-04T07:20:05Z

src/pyclaw/util.py

@@ -284,8 +317,41 @@ def check_diff(expected, test, **kwargs):
    else:
        raise Exception('Incorrect use of check_diff verifier, specify tol!')

-
-
+def check_solutions_are_same(sol_a,sol_b):


Good idea; moving this to util.py makes sense to me.

ketch · 2015-11-04T07:23:27Z

@weslowrie is referring to issue #490. I agree that if one is using petclaw and requests HDF output, it should default to the parallel writer. This means that in solution.write(), we need to check whether the solution object is a pyclaw Solution or a petclaw Solution, and import the appropriate write function. @aymkhalil, can you add that logic?

Meanwhile, in the serial HDF5 writer, we should check if it is a parallel run and if so throw a warning or exception.

aymkhalil · 2015-11-04T08:55:31Z

@weslowrie & @ketch, I see what you are saying and it makes sense. But what about the test cases? I followed the current testing theme: all tests are run from pyclaw (in serial) and then some utility functions generate variations to run the parallel tests based on function arguments (like use_petsc, in my case, it is file_format). If we were to omit the dedicated parallel hdf5p file_format and solely depend on the imported solution object, then we either need to write specific parallel tests under pyclaw directory (with the proper solution object imported), or maybe just move the parallel tests under petclaw directory and update .travis.yaml to run tests directly from there for the petclaw package test (which is inconsistent with what we already have), so what are your thoughts here?

On a side note, are you fine with overriding the read() function in petclaw solution and have it import the proper parallel read/write functions instead of having pyclaw make the import decision for serial as well as parallel I/O?

aymkhalil · 2015-11-05T14:24:52Z

Now there is no hdf5p file format. Using petclaw will force parallel HDF5 I/O.
Serial tests are run from pyclaw/tests and parallel tests are run from petclaw/tests for the sake of clarity.
petclaw.geometry now defines a Dimension class subclassed from pycalw.geometry. This way, petclaw.io.hdf5 doesn’t import pyclaw whatsoever.
There is a special IO test in pyclaw/examples/acoustics_2d_variable/test_acoustics_2d_variable_io.py to which I have to pass disable_petsc=True, otherwise, the test will fail because based on the new changes, a parallel reader will be forced when use_petsc=True and the binary output format requested by the test is not supported.

aymkhalil · 2015-11-05T14:28:14Z

On a side note, I noticed that sometimes, even though some tests fail, Travis reports a successful build (when TEST_PACKAGE="petclaw"):
https://travis-ci.org/clawpack/pyclaw/jobs/89407080#L2758
Is this behavior expected?

mandli · 2015-11-05T20:07:23Z

Does not seem like it. Are those somehow marked as skipped (I do not see that they are)?

aymkhalil · 2015-11-08T10:36:24Z

Nothing is skipped. They are just regular tests. However, I noticed that in .travis.yml, if I use:

- if [[ "${TEST_PACKAGE}" == "petclaw" ]]; then
     mpirun -n 4 nosetests -v --first-pkg-wins --exclude=limiters --exclude=sharpclaw --exclude=io;
  fi
- if [[ "${TEST_PACKAGE}" == "petclaw" ]]; then
       cd ../petclaw/tests;
       mpirun -n 4 nosetests -v --first-pkg-wins; # if this fails, travis will report failure
  fi
- if [[ "${TEST_PACKAGE}" == "petclaw" ]]; then
     cd ../../pyclaw;
  fi

instead of:

- if [[ "${TEST_PACKAGE}" == "petclaw" ]]; then
     mpirun -n 4 nosetests -v --first-pkg-wins --exclude=limiters --exclude=sharpclaw --exclude=io;
     cd ../petclaw/tests;
     mpirun -n 4 nosetests -v --first-pkg-wins; # if this fails, travis will NOT report failure
     cd ../../pyclaw;
  fi

and some of the tests fail, Travis will correctly report failure.

I included this modification, but I don't have a good explanation though.

ketch · 2015-11-08T10:40:57Z

I'm guessing that it only checks the exit code of the last command. Which seems strange, but would explain what you see.

ketch · 2015-11-08T10:51:30Z

src/petclaw/io/hdf5.py

+                    elif len(patch.name) == 3:
+                        dset[:,r[0][0]:r[0][1],r[1][0]:r[1][1],r[2][0]:r[2][1]] = state.aux
+
+    elif use_PyTables:


Let's just remove all the references to PyTables from this file.

Done. Check bafefc7.

ketch · 2015-11-08T11:01:42Z

src/petclaw/io/hdf5.py

+                    globalSize.append(state.num_aux)
+                    globalSize.extend(patch.num_cells_global)
+                    dset = subgroup.create_dataset('aux',globalSize,dtype='float',**options)
+                    if len(patch.name) == 1:


I would use patch.dimensions here. Actually, I don't have any idea why patch.name gives a list of the dimensions! @mandli ?

Done. For example, check https://github.com/aymkhalil/pyclaw/blob/pyclaw_hdf5_parallel_write/src/petclaw/io/hdf5.py#L200.

ketch · 2015-11-08T11:39:16Z

Besides my relatively minor comments above, I think this PR looks good -- many thanks to @weslowrie and @aymkhalil .

One last thing to consider is that this introduces a different mechanism for running the same tests in PyClaw and PetClaw without duplicating code. Previously we have done that through the gen_variants() approach; here it is done with object inheritance. This approach seems simple and clean, so I'm inclined to go with it. @mandli do you have any concerns or comments?

mandli · 2015-11-08T23:23:09Z

@ketch Nope, I would love to remove gen_variants. FWIW the patch.name behavior comes from the shortcut that will access attributes of objects (in this case a dimension) that are contained with the class. This was originally done in for ease of accessing things like dimensions at a high level rather than going through the entire heiarchy of container classes.

ketch · 2015-11-09T05:58:49Z

the patch.name behavior comes from the shortcut that will access attributes of objects (in this case a dimension) that are contained with the class. This was originally done in for ease of accessing things like dimensions at a high level rather than going through the entire heiarchy of container classes.

That makes sense from some perspective, but I feel like it's more confusing than helpful. Having forgotten the original design, when I see patch.name now, I expect it to be a name that identifies the patch. I would vote for removing the patch.name attribute, but that's unrelated to this PR.

Merging.

Pyclaw hdf5 parallel read/write

weslowrie and others added 5 commits October 14, 2015 08:00

Merge pull request clawpack#1 from clawpack/master

3eaed0e

updating fork

Added error message and raises exception when trying to use compressi…

45a3e58

…on filters with parallel h5py.

Add a test for hdf5 parallel write

30991f4

Add parallel read function to petclaw.io.hdf5

5be8c43

aymkhalil force-pushed the pyclaw_hdf5_parallel_write branch 2 times, most recently from 942bb4b to f1be4d8 Compare November 3, 2015 17:08

aymkhalil added 2 commits November 3, 2015 20:51

Refactor pyclaw/tests/test_io.py as to generate variants for serial a…

c1f760c

…nd parallel hdf5 I/O tests

Remove --exclude=io from nosetests commands (fixes clawpack#530)

2fc6ca9

aymkhalil force-pushed the pyclaw_hdf5_parallel_write branch from f1be4d8 to 2fc6ca9 Compare November 3, 2015 17:52

ketch mentioned this pull request Nov 4, 2015

PyClaw/PetClaw hdf5 parallel write with h5py #526

Closed

ketch reviewed Nov 4, 2015
View reviewed changes

Remove hdf5p file format and force parallel I/O when using petclaw

04ca0a9

aymkhalil force-pushed the pyclaw_hdf5_parallel_write branch 4 times, most recently from 7df5f7d to 0604218 Compare November 5, 2015 14:12

aymkhalil force-pushed the pyclaw_hdf5_parallel_write branch from 0604218 to 44c57e0 Compare November 8, 2015 10:29

ketch reviewed Nov 8, 2015
View reviewed changes

aymkhalil added 2 commits November 8, 2015 18:08

Refactor I/O tests as to run parallel ones directly from petclaw

d435a6d

Remove all references to PyTables

bafefc7

aymkhalil force-pushed the pyclaw_hdf5_parallel_write branch from 44c57e0 to bafefc7 Compare November 8, 2015 15:10

ketch added a commit that referenced this pull request Nov 9, 2015

Merge pull request #528 from aymkhalil/pyclaw_hdf5_parallel_write

ed6d9c0

Pyclaw hdf5 parallel read/write

ketch merged commit ed6d9c0 into clawpack:master Nov 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pyclaw hdf5 parallel read/write #528

Pyclaw hdf5 parallel read/write #528

aymkhalil commented Nov 1, 2015

aymkhalil commented Nov 3, 2015

mandli commented Nov 3, 2015

weslowrie commented Nov 3, 2015

ketch Nov 4, 2015

ketch commented Nov 4, 2015

aymkhalil commented Nov 4, 2015

aymkhalil commented Nov 5, 2015

aymkhalil commented Nov 5, 2015

mandli commented Nov 5, 2015

aymkhalil commented Nov 8, 2015

ketch commented Nov 8, 2015

ketch Nov 8, 2015

aymkhalil Nov 8, 2015

ketch Nov 8, 2015

aymkhalil Nov 8, 2015

ketch commented Nov 8, 2015

mandli commented Nov 8, 2015

ketch commented Nov 9, 2015

Pyclaw hdf5 parallel read/write #528

Pyclaw hdf5 parallel read/write #528

Conversation

aymkhalil commented Nov 1, 2015

aymkhalil commented Nov 3, 2015

mandli commented Nov 3, 2015

weslowrie commented Nov 3, 2015

ketch Nov 4, 2015

Choose a reason for hiding this comment

ketch commented Nov 4, 2015

aymkhalil commented Nov 4, 2015

aymkhalil commented Nov 5, 2015

aymkhalil commented Nov 5, 2015

mandli commented Nov 5, 2015

aymkhalil commented Nov 8, 2015

ketch commented Nov 8, 2015

ketch Nov 8, 2015

Choose a reason for hiding this comment

aymkhalil Nov 8, 2015

Choose a reason for hiding this comment

ketch Nov 8, 2015

Choose a reason for hiding this comment

aymkhalil Nov 8, 2015

Choose a reason for hiding this comment

ketch commented Nov 8, 2015

mandli commented Nov 8, 2015

ketch commented Nov 9, 2015