Accelerating VIC with Open-MP / Manycore architectures. #693

jhamman · 2017-03-03T17:53:06Z

We've been getting a bunch of questions/comments/pull-requests from folks interested in accelerating VIC with Open-MP / Manycore architectures - specifically targeting the the highly parallel Intel Xeon Phi Co-Processor, CUDA and NVCC. It sounds like this is all coming from an Accelerated computing course at Portland State University. This issue is just meant to consolidate the conversations that these individuals are trying to have.

References:#521, #522

Closed PRs: #689, #691

Email from Nic McHale:

Hello Joe,

My group and I will be working to accelerate the UW-Hydro VIC program. This will be our term project for an Accelerated computing course at Portland State University. Our hope is that we can identify the computationally intensive parts of the model and accelerate them using the highly parallel Intel Xeon Phi Co-Processor. The idea being that if we can decrease the run time of your model, you will be able to run larger samples, and achieve more accurate results.

Are you aware of any particular section of the code that would benefit from this type of modification?

Do you have any thoughts or advice for us before we dig in?

Thank you for your time,

Nick McHale
Portland State University

My reply:

Hi Nick -

Thanks for getting in touch. I'm also cc'ing Bart Nijssen, my PhD advisor and other maintainer of the VIC model.

I'd be happy to give you some pointers on where to focus your efforts - provided, of course, that you share your improvements with us at the end. As you may know, we've just completed a major refactor of the VIC model (VIC 5.0 was released this past summer). The new version includes a configuration that we've named the "image driver" (docs, github). This driver is targeted at HPC use and includes parallelization via MPI.

Are you aware of any particular section of the code that would benefit from this type of modification?

We have done some initial profiling (github) and parallel scaling analysis (blog post). At the driver level, there are probably two main bottlenecks in the code where optimization could occur and where we'd see some appreciable speedup in the model

I/O - Nearly all of the I/O in the VIC image driver is done using the netCDF library. Currently, VIC reads/writes in fairly small chunks (all variables and grid cells for a single timestep). We suspect the frequency of disk i/o calls is a limiting factor.

Parallelization strategy - VIC is not using any explicit threading at the moment. I played around (github) with adding openmp into our main parallelization loop a while back but didn't have time to finish that off. Since memory requirements in VIC are relatively small, a many-processor hybrid MPI/OpenMP adaption of the parallelization in VIC may be really interesting (github).

Do you have any thoughts or advice for us before we dig in?

Yes.

The main thing is you should start by doing some profiling of the parallelization in VIC. This would be useful to you for your project and for us in understanding how everything is working.

You should work off the develop branch since there are some I/O speedups that have just been implemented (github).

You probably also want to work off a fairly large dataset. We have a test dataset we can share with your for this purpose.

Pending an initial look at profiling, I would offer three possible development projects of increasing complexity:

Implement hybrid MPI/OpenMP parallelization

Implement netCDF's parallel I/O (github) and/or develop a mechanism to read / write larger chunks of data in individual write calls.

Experiment with VIC's Python driver. Right now there is just a stub for this driver but, in theory, all the I/O and parallelization could be easily handled in Python. If any of you are Pythonistas, this could be a fun, albeit open-ended, project.

Happy to answer other questions you may have.

Cheers,

Joe Hamman

From Nick:

Joe,

Of course we would share any optimizations we make!
We would like to do some initial profiling on a large dataset. Can you please connect us to the dataset you mentioned?

Thank you,

-Nick

Finally, the sample dataset we're providing:

Hi Nick -

Here's a link to sample dataset: https://www.dropbox.com/s/u502jhktfkn5ok7/vic_livneh_testset.tar.gz?dl=0

Good luck. From here on, let's have any further discussion on Github.

jhamman · 2017-08-25T19:12:40Z

I'm going to close this as we never really got much of a response from Nick and co. If anyone is interested in taking this further, feel free to reopen.

lcebaman · 2019-06-05T10:48:18Z

Has this been done? Any GPU code available? Still interested?

This was referenced Mar 3, 2017

ssshenoy pulling #691

Closed

Compiling with NVCC. #689

Closed

jhamman mentioned this issue Aug 22, 2017

Feature/openmp threading #735

Merged

5 tasks

jhamman added the question label Aug 25, 2017

jhamman closed this as completed Aug 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerating VIC with Open-MP / Manycore architectures. #693

Accelerating VIC with Open-MP / Manycore architectures. #693

jhamman commented Mar 3, 2017

jhamman commented Aug 25, 2017

lcebaman commented Jun 5, 2019

Accelerating VIC with Open-MP / Manycore architectures. #693

Accelerating VIC with Open-MP / Manycore architectures. #693

Comments

jhamman commented Mar 3, 2017

jhamman commented Aug 25, 2017

lcebaman commented Jun 5, 2019