Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamlined LW calculations, OpenMP GPU support, small efficiency changes, updates to continuous integration #110

Merged
merged 112 commits into from
Apr 15, 2021

Conversation

RobertPincus
Copy link
Member

Accumulated changes from branch develop

RobertPincus and others added 30 commits May 12, 2019 14:23
* Missed one file in last commit

* tweaks for CPU compilation

* In OpenACC: allocating types as well as data components in ACC copyins, deletes

* Ignoring things for Luis...

* Missing statement

* Aligning array sizes in kernels with arguments

* Refined argument intents for some kernels

* Workaround for PGI compiler problem with logicals

* Input sanitizing gets it own module

* Yeah, we'll need the sanitizing module too.

* OpenACC-compatible checking for max and min values

* OpenACC value checking working; OLCF makefile doesn't use managed memory

* Logical kind chosen with pre-preprocessor flags
-- Several small bug fixes (argument intent, maximum interpolation indices, thanks to Sebastian Rast)
-- Parameterized checking for out-of-range values (works also on GPU)
-- Continuous integration with Travis (thanks to Valentin Clement)
-- Logical type defaults to Fortran; can be set to use c_bool with -DUSE_CBOOL
-- Internal build system can use environmental variables instead of specified files (Makefile.conf etc.) to define compilers, flags, choose kernel directory
-- Python scripts to automate running and testing of RFMIP examples
-- Update RFMIP examples to use version 1.2 of atmospheres file
-- End-to-end RFMIP examples on GPU are broken; fixes pending
Remove nullify statements on declaration of pointers in subroutines to ensure
thread safety for mo_gas_optics_rrtmgp. When pointers get assigned in
declarations, they implicitly get a save attribute and are assumed static. This
is a problem when then occurs in a threaded region, so this code was NOT
thread-safe before. Removing the `=> NULL()` does not change the behavior of the
code for non-threaded applications, but does ensure thread-safety.
Open coefficients files for read-only, rather than read-write access
because we do NOT want to accidentally write to these files, nor do we
want to require users to have write-permissions to load these files.

Closes #31, #32.
* Shortwave RFMIP running end-to-end on GPU. Boundary conditions still on CPU.

* Upper boundary condition lives on GPU in LW no-scattering calculation.

* Moved optical props validation in rte_lw(); simplified data movement in gas optics. Source function still sloshing back and forth between host and device.

* Surface emissivity computed on GPU in LW RFMIP example

* Moved transposition of surface Planck source onto GPU, clumsily; LW RFMIP cases now running end-to-end on device.

* RFMIP boundary conditions on GPU; removing async (may add back later)

* Reorder kernels use a single source

* Moving array-zeroing routines into mo_util_array

* Single-source for array utilities

* rte_sw uses array utilities to check validity of boundary conditions

* Adding 1D array-zeroing routine

* Some SW RFMIP boundary conditions on GPU.

* Single-source for fluxes_broadband_kernels

* Removing an unneeded OpenACC data transfer

* Array value checking uses functions in mo_rte_lw; syntactic cleanup

* Correcting mal-formed Makefile

* Refined copying of one array in SW examples.

* Ben Hillman spots a GPU array being initialized on the CPU. Fixed that.
Contributions from Dmitry Alexeev from Nvidia. Dramatically improves performance of clear-sky RFMIP standalone cases on GPUs. Fixes compiler bug in PGI related to private local variables. 

* tiling reorder, could still be improved by about 40%

* using kernels in the mo_util_array, faster and more compact

* constant shouldn't be explicitly allocated

* tile combine_and_reorder, almost full bandwidth now

* process several elements per thread in expensive kernels

* fix interpolation, use gang vector

* add another kernel for optical_depths_minor, only working in a special case (hopefully typical case). about 3x faster.

* avoid atomics in sum_broadband, 2.5x faster

* explicitly make private arrays stay in global memory, fixed possible errors and 8x faster due to collapsed loop

* fuse lw_source_noscat into lw_solver_noscat, faster by about 1ms on Piz Daint

* fixed subscript typo in denom

* fix out-of-bounds issue

* added comments about how the reorder kernels work

* introduce point-wise lw_source_noscat_stencil to express concepts separation while keeping fused loops

* added comments explaining bug workaround for adding routine

* fixed unrolled loop bounds and added comments

* removed unused variable

* added missing data deletion

* remove the default kernel in gas_optical_depths_minor, the optimized kernel can deal with variable number of g-points per band now
…PUs during calculations (#53)

Should be more tightly integrated with host models already running on the GPU.
…options and doesn't download reference files automatically.
…es are larger than 1.e-5 (user-configurable)
RobertPincus and others added 29 commits September 5, 2020 17:16
Github actions is revised - the file is renamed and reorganized, and Nvidia (formerly PGI) Fortran compilers are now tested
Github actions reorganized, uses caching to speed up library and compiler installs. Intel Fortran temporarily removed because C compiler isn't working.
Self-hosted CI with Azure now restricted to CCE compiler, and PGI 20 running on GPUs.
Intel Fortran is working again on Github actions.
This PR contains three new functions in the ty_optical_props class.
In mo_optical_props.F90, finalize_1scl, finalize_2str, and finalize_nstr were added to the ty_optical_props_1scl, ty_optical_props_2str, and ty_optical_props_nstr, respectively.

There are some additional modifications of the continuous integration testing from @RobertPincus
Updating package names for Nvidia and Intel compilers for continuous integrations, thanks to @rscohn2. 
Co-authored-by: Robert Pincus <Robert.Pincus@colorado.edu>
Introduce support for GPUs via OpenMP, thanks to Nichols Romero at Argonne National Labs. Requires Cray compilers CCE > 10.0.3. Continuous integration pending - we are testing GPU implementations on hosts that have outdated versions of the compilers. 

Co-authored-by: Nichols A. Romero <naromero77@users.noreply.github.com>
Changes to OpenMP directives to enable regression tests to run correctly with Cray 11.x compiler with the NVidia V100 backend. Untested with P100. Thanks to @naromero77
Longwave code is streamlined 1) to compute spectrally-integrated Jacobians with respect to surface temperature only when this output is requested, and 2) to combine the no-scattering and rescaled-for-scattering approaches as much as possible. Argument lists to some kernels have changed but arguments to the front-end have not. nvfortran is failing and needs debugging. OpenACC implementation may not be optimal.
Use explicit kind with PaTohPa floating point literal to be sure of precision.
…ave moved to an organizational repo. gfortran-8 is no longer supported by Github actions and is removed. Switched back to LLNL ESGF node for validation plots but need a more robust solution to flakey ESGF nodes.
Makefiles are added and amended to build libraries, run tests, and verify results using "make libs", "make tests", "make check" respectively in the top-level directory. Environmental variables set the Fortran compiler and flags and the locations of the netCDF libraries and include files.

The CSCS project used for CI is also updated.
@RobertPincus RobertPincus merged commit aca56b7 into main Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet