You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been getting a bunch of questions/comments/pull-requests from folks interested in accelerating VIC with Open-MP / Manycore architectures - specifically targeting the the highly parallel Intel Xeon Phi Co-Processor, CUDA and NVCC. It sounds like this is all coming from an Accelerated computing course at Portland State University. This issue is just meant to consolidate the conversations that these individuals are trying to have.
My group and I will be working to accelerate the UW-Hydro VIC program. This will be our term project for an Accelerated computing course at Portland State University. Our hope is that we can identify the computationally intensive parts of the model and accelerate them using the highly parallel Intel Xeon Phi Co-Processor. The idea being that if we can decrease the run time of your model, you will be able to run larger samples, and achieve more accurate results.
Are you aware of any particular section of the code that would benefit from this type of modification?
Do you have any thoughts or advice for us before we dig in?
Thank you for your time,
Nick McHale
Portland State University
My reply:
Hi Nick -
Thanks for getting in touch. I'm also cc'ing Bart Nijssen, my PhD advisor and other maintainer of the VIC model.
I'd be happy to give you some pointers on where to focus your efforts - provided, of course, that you share your improvements with us at the end. As you may know, we've just completed a major refactor of the VIC model (VIC 5.0 was released this past summer). The new version includes a configuration that we've named the "image driver" (docs, github). This driver is targeted at HPC use and includes parallelization via MPI.
Are you aware of any particular section of the code that would benefit from this type of modification?
We have done some initial profiling (github) and parallel scaling analysis (blog post). At the driver level, there are probably two main bottlenecks in the code where optimization could occur and where we'd see some appreciable speedup in the model
I/O - Nearly all of the I/O in the VIC image driver is done using the netCDF library. Currently, VIC reads/writes in fairly small chunks (all variables and grid cells for a single timestep). We suspect the frequency of disk i/o calls is a limiting factor.
Parallelization strategy - VIC is not using any explicit threading at the moment. I played around (github) with adding openmp into our main parallelization loop a while back but didn't have time to finish that off. Since memory requirements in VIC are relatively small, a many-processor hybrid MPI/OpenMP adaption of the parallelization in VIC may be really interesting (github).
Do you have any thoughts or advice for us before we dig in?
Yes.
The main thing is you should start by doing some profiling of the parallelization in VIC. This would be useful to you for your project and for us in understanding how everything is working.
You should work off the develop branch since there are some I/O speedups that have just been implemented (github).
You probably also want to work off a fairly large dataset. We have a test dataset we can share with your for this purpose.
Pending an initial look at profiling, I would offer three possible development projects of increasing complexity:
Implement hybrid MPI/OpenMP parallelization
Implement netCDF's parallel I/O (github) and/or develop a mechanism to read / write larger chunks of data in individual write calls.
Experiment with VIC's Python driver. Right now there is just a stub for this driver but, in theory, all the I/O and parallelization could be easily handled in Python. If any of you are Pythonistas, this could be a fun, albeit open-ended, project.
Happy to answer other questions you may have.
Cheers,
Joe Hamman
From Nick:
Joe,
Of course we would share any optimizations we make!
We would like to do some initial profiling on a large dataset. Can you please connect us to the dataset you mentioned?
I'm going to close this as we never really got much of a response from Nick and co. If anyone is interested in taking this further, feel free to reopen.
We've been getting a bunch of questions/comments/pull-requests from folks interested in accelerating VIC with Open-MP / Manycore architectures - specifically targeting the the highly parallel Intel Xeon Phi Co-Processor, CUDA and NVCC. It sounds like this is all coming from an Accelerated computing course at Portland State University. This issue is just meant to consolidate the conversations that these individuals are trying to have.
References:#521, #522
Closed PRs: #689, #691
Email from Nic McHale:
My reply:
From Nick:
Finally, the sample dataset we're providing:
The text was updated successfully, but these errors were encountered: