Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyclaw hdf5 parallel read/write #528

Merged
merged 10 commits into from
Nov 9, 2015

Conversation

aymkhalil
Copy link
Contributor

This PR:

  1. Includes and replaces PyClaw/PetClaw hdf5 parallel write with h5py #526 by @weslowrie
  2. Adds the ability to read HDF5 files in parallel
  3. Adds tests for parallel HDF5 I/O
    Please note that the tests could fail if we don't have parallel HDF5 on Travis.

weslowrie and others added 5 commits October 14, 2015 08:00
…ith the 4D dimensions (for 3D datasets), where the global dimensions are num_eqn, nx, ny, nz. This makes the file compatable with the serial hdf5 output, and is also more user friendly for visualization. The processor ranges are for each dimension are retrieved with patch._da.getRanges()
@aymkhalil aymkhalil force-pushed the pyclaw_hdf5_parallel_write branch 2 times, most recently from 942bb4b to f1be4d8 Compare November 3, 2015 17:08
@aymkhalil
Copy link
Contributor Author

  1. Parallel HDF5 is available on Travis (out of the box):
    https://travis-ci.org/clawpack/pyclaw/jobs/88962967#L2358
  2. All I/O tests now run from pycalw/tests/test_io.py with variations for serial & parallel runs. In order for them to run, --exclude=io option has been removed from .travis.yml

@mandli
Copy link
Member

mandli commented Nov 3, 2015

And so is the python bindings, nice!

If no one else objects to the new inclusion of the io tests I am fine merging this in.

@weslowrie
Copy link
Contributor

@aymkhalil thanks for following up on this. Looks great!

One comment: I noticed previously before this development that if one uses Petclaw, and has the serial hdf5 output_format=hdf5 it just writes a file with the portion of the domain/solution that processor 0 owns. Do you think we need to flag this as an error/warning? Or maybe now force the parallel read/write when using petclaw? That would also eliminate the need to have the extra hdf5p output format type.

@@ -284,8 +317,41 @@ def check_diff(expected, test, **kwargs):
else:
raise Exception('Incorrect use of check_diff verifier, specify tol!')



def check_solutions_are_same(sol_a,sol_b):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; moving this to util.py makes sense to me.

@ketch
Copy link
Member

ketch commented Nov 4, 2015

@weslowrie is referring to issue #490. I agree that if one is using petclaw and requests HDF output, it should default to the parallel writer. This means that in solution.write(), we need to check whether the solution object is a pyclaw Solution or a petclaw Solution, and import the appropriate write function. @aymkhalil, can you add that logic?

Meanwhile, in the serial HDF5 writer, we should check if it is a parallel run and if so throw a warning or exception.

@aymkhalil
Copy link
Contributor Author

@weslowrie & @ketch, I see what you are saying and it makes sense. But what about the test cases? I followed the current testing theme: all tests are run from pyclaw (in serial) and then some utility functions generate variations to run the parallel tests based on function arguments (like use_petsc, in my case, it is file_format). If we were to omit the dedicated parallel hdf5p file_format and solely depend on the imported solution object, then we either need to write specific parallel tests under pyclaw directory (with the proper solution object imported), or maybe just move the parallel tests under petclaw directory and update .travis.yaml to run tests directly from there for the petclaw package test (which is inconsistent with what we already have), so what are your thoughts here?

On a side note, are you fine with overriding the read() function in petclaw solution and have it import the proper parallel read/write functions instead of having pyclaw make the import decision for serial as well as parallel I/O?

@aymkhalil aymkhalil force-pushed the pyclaw_hdf5_parallel_write branch 4 times, most recently from 7df5f7d to 0604218 Compare November 5, 2015 14:12
@aymkhalil
Copy link
Contributor Author

  1. Now there is no hdf5p file format. Using petclaw will force parallel HDF5 I/O.
  2. Serial tests are run from pyclaw/tests and parallel tests are run from petclaw/tests for the sake of clarity.
  3. petclaw.geometry now defines a Dimension class subclassed from pycalw.geometry. This way, petclaw.io.hdf5 doesn’t import pyclaw whatsoever.
  4. There is a special IO test in pyclaw/examples/acoustics_2d_variable/test_acoustics_2d_variable_io.py to which I have to pass disable_petsc=True, otherwise, the test will fail because based on the new changes, a parallel reader will be forced when use_petsc=True and the binary output format requested by the test is not supported.

@aymkhalil
Copy link
Contributor Author

On a side note, I noticed that sometimes, even though some tests fail, Travis reports a successful build (when TEST_PACKAGE="petclaw"):
https://travis-ci.org/clawpack/pyclaw/jobs/89407080#L2758
Is this behavior expected?

@mandli
Copy link
Member

mandli commented Nov 5, 2015

Does not seem like it. Are those somehow marked as skipped (I do not see that they are)?

@aymkhalil
Copy link
Contributor Author

Nothing is skipped. They are just regular tests. However, I noticed that in .travis.yml, if I use:

- if [[ "${TEST_PACKAGE}" == "petclaw" ]]; then
     mpirun -n 4 nosetests -v --first-pkg-wins --exclude=limiters --exclude=sharpclaw --exclude=io;
  fi
- if [[ "${TEST_PACKAGE}" == "petclaw" ]]; then
       cd ../petclaw/tests;
       mpirun -n 4 nosetests -v --first-pkg-wins; # if this fails, travis will report failure
  fi
- if [[ "${TEST_PACKAGE}" == "petclaw" ]]; then
     cd ../../pyclaw;
  fi

instead of:

- if [[ "${TEST_PACKAGE}" == "petclaw" ]]; then
     mpirun -n 4 nosetests -v --first-pkg-wins --exclude=limiters --exclude=sharpclaw --exclude=io;
     cd ../petclaw/tests;
     mpirun -n 4 nosetests -v --first-pkg-wins; # if this fails, travis will NOT report failure
     cd ../../pyclaw;
  fi

and some of the tests fail, Travis will correctly report failure.

I included this modification, but I don't have a good explanation though.

@ketch
Copy link
Member

ketch commented Nov 8, 2015

I'm guessing that it only checks the exit code of the last command. Which seems strange, but would explain what you see.

elif len(patch.name) == 3:
dset[:,r[0][0]:r[0][1],r[1][0]:r[1][1],r[2][0]:r[2][1]] = state.aux

elif use_PyTables:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just remove all the references to PyTables from this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Check bafefc7.

globalSize.append(state.num_aux)
globalSize.extend(patch.num_cells_global)
dset = subgroup.create_dataset('aux',globalSize,dtype='float',**options)
if len(patch.name) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use patch.dimensions here. Actually, I don't have any idea why patch.name gives a list of the dimensions! @mandli ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ketch
Copy link
Member

ketch commented Nov 8, 2015

Besides my relatively minor comments above, I think this PR looks good -- many thanks to @weslowrie and @aymkhalil .

One last thing to consider is that this introduces a different mechanism for running the same tests in PyClaw and PetClaw without duplicating code. Previously we have done that through the gen_variants() approach; here it is done with object inheritance. This approach seems simple and clean, so I'm inclined to go with it. @mandli do you have any concerns or comments?

@mandli
Copy link
Member

mandli commented Nov 8, 2015

@ketch Nope, I would love to remove gen_variants. FWIW the patch.name behavior comes from the shortcut that will access attributes of objects (in this case a dimension) that are contained with the class. This was originally done in for ease of accessing things like dimensions at a high level rather than going through the entire heiarchy of container classes.

@ketch
Copy link
Member

ketch commented Nov 9, 2015

the patch.name behavior comes from the shortcut that will access attributes of objects (in this case a dimension) that are contained with the class. This was originally done in for ease of accessing things like dimensions at a high level rather than going through the entire heiarchy of container classes.

That makes sense from some perspective, but I feel like it's more confusing than helpful. Having forgotten the original design, when I see patch.name now, I expect it to be a name that identifies the patch. I would vote for removing the patch.name attribute, but that's unrelated to this PR.

Merging.

ketch added a commit that referenced this pull request Nov 9, 2015
@ketch ketch merged commit ed6d9c0 into clawpack:master Nov 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants