Streamlined LW calculations, OpenMP GPU support, small efficiency changes, updates to continuous integration #110

RobertPincus · 2021-04-15T03:04:56Z

Accumulated changes from branch develop

* Missed one file in last commit * tweaks for CPU compilation * In OpenACC: allocating types as well as data components in ACC copyins, deletes * Ignoring things for Luis... * Missing statement * Aligning array sizes in kernels with arguments * Refined argument intents for some kernels * Workaround for PGI compiler problem with logicals * Input sanitizing gets it own module * Yeah, we'll need the sanitizing module too. * OpenACC-compatible checking for max and min values * OpenACC value checking working; OLCF makefile doesn't use managed memory * Logical kind chosen with pre-preprocessor flags

-- Several small bug fixes (argument intent, maximum interpolation indices, thanks to Sebastian Rast) -- Parameterized checking for out-of-range values (works also on GPU) -- Continuous integration with Travis (thanks to Valentin Clement) -- Logical type defaults to Fortran; can be set to use c_bool with -DUSE_CBOOL -- Internal build system can use environmental variables instead of specified files (Makefile.conf etc.) to define compilers, flags, choose kernel directory -- Python scripts to automate running and testing of RFMIP examples -- Update RFMIP examples to use version 1.2 of atmospheres file -- End-to-end RFMIP examples on GPU are broken; fixes pending

Remove nullify statements on declaration of pointers in subroutines to ensure thread safety for mo_gas_optics_rrtmgp. When pointers get assigned in declarations, they implicitly get a save attribute and are assumed static. This is a problem when then occurs in a threaded region, so this code was NOT thread-safe before. Removing the `=> NULL()` does not change the behavior of the code for non-threaded applications, but does ensure thread-safety.

Open coefficients files for read-only, rather than read-write access because we do NOT want to accidentally write to these files, nor do we want to require users to have write-permissions to load these files. Closes #31, #32.

…ript to comparision

* Shortwave RFMIP running end-to-end on GPU. Boundary conditions still on CPU. * Upper boundary condition lives on GPU in LW no-scattering calculation. * Moved optical props validation in rte_lw(); simplified data movement in gas optics. Source function still sloshing back and forth between host and device. * Surface emissivity computed on GPU in LW RFMIP example * Moved transposition of surface Planck source onto GPU, clumsily; LW RFMIP cases now running end-to-end on device. * RFMIP boundary conditions on GPU; removing async (may add back later) * Reorder kernels use a single source * Moving array-zeroing routines into mo_util_array * Single-source for array utilities * rte_sw uses array utilities to check validity of boundary conditions * Adding 1D array-zeroing routine * Some SW RFMIP boundary conditions on GPU. * Single-source for fluxes_broadband_kernels * Removing an unneeded OpenACC data transfer * Array value checking uses functions in mo_rte_lw; syntactic cleanup * Correcting mal-formed Makefile * Refined copying of one array in SW examples. * Ben Hillman spots a GPU array being initialized on the CPU. Fixed that.

… Marti. Closes #36.

Contributions from Dmitry Alexeev from Nvidia. Dramatically improves performance of clear-sky RFMIP standalone cases on GPUs. Fixes compiler bug in PGI related to private local variables. * tiling reorder, could still be improved by about 40% * using kernels in the mo_util_array, faster and more compact * constant shouldn't be explicitly allocated * tile combine_and_reorder, almost full bandwidth now * process several elements per thread in expensive kernels * fix interpolation, use gang vector * add another kernel for optical_depths_minor, only working in a special case (hopefully typical case). about 3x faster. * avoid atomics in sum_broadband, 2.5x faster * explicitly make private arrays stay in global memory, fixed possible errors and 8x faster due to collapsed loop * fuse lw_source_noscat into lw_solver_noscat, faster by about 1ms on Piz Daint * fixed subscript typo in denom * fix out-of-bounds issue * added comments about how the reorder kernels work * introduce point-wise lw_source_noscat_stencil to express concepts separation while keeping fused loops * added comments explaining bug workaround for adding routine * fixed unrolled loop bounds and added comments * removed unused variable * added missing data deletion * remove the default kernel in gas_optical_depths_minor, the optimized kernel can deal with variable number of g-points per band now

…gwave reference.

… otherwise no changes.

…PUs during calculations (#53) Should be more tightly integrated with host models already running on the GPU.

…options and doesn't download reference files automatically.

…es are larger than 1.e-5 (user-configurable)

…ection was a squash-merge.

…ty more realistic benchamrk.

…els.

Github actions is revised - the file is renamed and reorganized, and Nvidia (formerly PGI) Fortran compilers are now tested

…mber of loops 32 ->4

…ctions

Github actions reorganized, uses caching to speed up library and compiler installs. Intel Fortran temporarily removed because C compiler isn't working. Self-hosted CI with Azure now restricted to CCE compiler, and PGI 20 running on GPUs.

Intel Fortran is working again on Github actions.

@RobertPincus

This PR contains three new functions in the ty_optical_props class. In mo_optical_props.F90, finalize_1scl, finalize_2str, and finalize_nstr were added to the ty_optical_props_1scl, ty_optical_props_2str, and ty_optical_props_nstr, respectively. There are some additional modifications of the continuous integration testing from @RobertPincus

@rscohn2

Updating package names for Nvidia and Intel compilers for continuous integrations, thanks to @rscohn2. Co-authored-by: Robert Pincus <Robert.Pincus@colorado.edu>

Introduce support for GPUs via OpenMP, thanks to Nichols Romero at Argonne National Labs. Requires Cray compilers CCE > 10.0.3. Continuous integration pending - we are testing GPU implementations on hosts that have outdated versions of the compilers. Co-authored-by: Nichols A. Romero <naromero77@users.noreply.github.com>

@naromero77

Changes to OpenMP directives to enable regression tests to run correctly with Cray 11.x compiler with the NVidia V100 backend. Untested with P100. Thanks to @naromero77

Longwave code is streamlined 1) to compute spectrally-integrated Jacobians with respect to surface temperature only when this output is requested, and 2) to combine the no-scattering and rescaled-for-scattering approaches as much as possible. Argument lists to some kernels have changed but arguments to the front-end have not. nvfortran is failing and needs debugging. OpenACC implementation may not be optimal.

Use explicit kind with PaTohPa floating point literal to be sure of precision.

…n branch

… to come

…ave moved to an organizational repo. gfortran-8 is no longer supported by Github actions and is removed. Switched back to LLNL ESGF node for validation plots but need a more robust solution to flakey ESGF nodes.

Makefiles are added and amended to build libraries, run tests, and verify results using "make libs", "make tests", "make check" respectively in the top-level directory. Environmental variables set the Fortran compiler and flags and the locations of the netCDF libraries and include files. The CSCS project used for CI is also updated.

RobertPincus and others added 30 commits May 12, 2019 14:23

Update README.md

2fdc779

Travis CI integration (#24)

5a5fb86

Merge branch 'master' into develop

23ef3c9

Removing unneeded USE statement (thanks to Cheil van Heerwarden).

37d1748

Array size bug fix in compute_bc()

cf06dcf

Merge branch 'master' into develop

f279b8b

Open coefficients files for read-only access (#32)

5f3aeb4

Open coefficients files for read-only, rather than read-write access because we do NOT want to accidentally write to these files, nor do we want to require users to have write-permissions to load these files. Closes #31, #32.

Moved downloading of reference results for RFMIP from file staging sc…

6e5a78a

…ript to comparision

Updating README with DOI for overview paper.

de55554

Updating CSCS compiler and module information as suggesed by Phillipe…

08c6dee

… Marti. Closes #36.

Further updates to Daint modules, library paths from Philippe Marti.

be29508

Using OpenACC kernels to simplify array utilities.

22cb493

Fixed error reading RFMIP surface temperatures; updated upwelling lon…

44d9335

…gwave reference.

Merge branch 'master' into develop. Reverting to loops in zero_array,…

9f56730

… otherwise no changes.

Revised cloud optics, all-sky example, more data and computation on G…

de24173

…PUs during calculations (#53) Should be more tightly integrated with host models already running on the GPU.

Further documentation of examples, RFMIP comparison script gets some …

ce1bca3

…options and doesn't download reference files automatically.

Comparision scripts for examples have non-zero exit if flux differenc…

975bbf0

…es are larger than 1.e-5 (user-configurable)

Fixed misleading comment. Closes #52.

51fb89d

Use local directories for example run scripts

3326566

Loosening failure tolerance for RFMIP clear-sky examples to 1.e-4 W/m2.

8b82489

Missed a variable default in comparison script.

8a38ab8

Right... teach me not to try commits at home.

fcddc0e

Merge branch 'master' into develop, since last merge in the other dir…

584f40b

…ection was a squash-merge.

Remove clouds from every third column in all-sky example, as a slighl…

8737feb

…ty more realistic benchamrk.

Giving all _util modules local names to avoid conflicts with host mod…

006543b

…els.

RobertPincus and others added 29 commits September 5, 2020 17:16

Enhanced CI (#88)

1a67de4

Github actions is revised - the file is renamed and reorganized, and Nvidia (formerly PGI) Fortran compilers are now tested

Peter Ukkonen suggests an efficiency change

5e2f8b8

Working CI into develop from feature-enchanced-ci

fef974d

Set env var GPTL_DIR to enable timing with GPTL in RFMIP examples; nu…

7ad6af7

…mber of loops 32 ->4

Removing ifort from CI - packages have changed and aren't working?

201e5f4

Update Python module for Azure CI

f5f4d7b

Removing CPU builds from Azure CI; these are now tested with Github A…

d2f6a46

…ctions

CI updates (#90)

5f7740d

Github actions reorganized, uses caching to speed up library and compiler installs. Intel Fortran temporarily removed because C compiler isn't working. Self-hosted CI with Azure now restricted to CCE compiler, and PGI 20 running on GPUs.

Update CI (#91)

5cde584

Intel Fortran is working again on Github actions.

Continuous integration: package name changes (#98)

d4bc936

Updating package names for Nvidia and Intel compilers for continuous integrations, thanks to @rscohn2. Co-authored-by: Robert Pincus <Robert.Pincus@colorado.edu>

CI: update netcdf-c and netcdf-fortran; add gfortran-10 in tests

0d01651

Updating validation plotting script to use DKRZ ESGF node

0366e6e

OpenMP GPU fixes for the Cray 11.x compiler (#102)

8a4598d

Changes to OpenMP directives to enable regression tests to run correctly with Cray 11.x compiler with the NVidia V100 backend. Untested with P100. Thanks to @naromero77

Fix a floating point literal (#104)

2923ad4

Use explicit kind with PaTohPa floating point literal to be sure of precision.

Switch to CCE 11.0.0 for continuous integration

a0489dc

Improved error handling for optical properties

fb84634

Fortran error stop in all examples run during continuous integration

40a1a59

Hotfix: missing values for Jacobians in CPU kernels

5c2d274

Omiting ifort from CI since the compiler repo is failing

ae84999

Tyring ifort as a container

9f5930d

Nvidia containers?

edb3450

No sudo for you

0316e7a

Reverting to full CI - container testing was supposed to be on its ow…

769da43

…n branch

Continuous integration in Docker containers: ifort for now, nvfortran…

7bc5fa0

… to come

RobertPincus merged commit aca56b7 into main Apr 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamlined LW calculations, OpenMP GPU support, small efficiency changes, updates to continuous integration #110

Streamlined LW calculations, OpenMP GPU support, small efficiency changes, updates to continuous integration #110

RobertPincus commented Apr 15, 2021

Streamlined LW calculations, OpenMP GPU support, small efficiency changes, updates to continuous integration #110

Streamlined LW calculations, OpenMP GPU support, small efficiency changes, updates to continuous integration #110

Conversation

RobertPincus commented Apr 15, 2021