Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moist Held-Suarez is qualitatively different between CPU and GPU #2876

Open
LenkaNovak opened this issue Apr 5, 2024 · 3 comments
Open

Moist Held-Suarez is qualitatively different between CPU and GPU #2876

LenkaNovak opened this issue Apr 5, 2024 · 3 comments

Comments

@LenkaNovak
Copy link
Contributor

LenkaNovak commented Apr 5, 2024

See stand-alone Moist held-suarez atmos runs here:
https://buildkite.com/clima/climacoupler-longruns/builds/514

Running with this script, which mimics the atmos driver.

Some plots (100d, i.e. 10 10-day averages):

Zonal mean wind
Screen Shot 2024-04-04 at 7 07 43 PM

Zonal mean temperature
Screen Shot 2024-04-04 at 7 08 11 PM

1st level temperature
Screen Shot 2024-04-04 at 7 09 00 PM

Instantaneous 100d rhoe_tot (first level)

Screen Shot 2024-04-04 at 7 16 50 PM

@charleskawczynski
Copy link
Member

I think that the first thing that we should do in narrowing down this issue, is making sure that we get the same result within machine precision between CPU and GPU. A few places where we could have differences is where the order of operations could differ between the two implementations, in particular, reductions and DSS.

@LenkaNovak
Copy link
Contributor Author

Thanks, @charleskawczynski , this sounds like a good way forward.

@juliasloan25 has already done some machine precision tests of atmos states, and she's seen departures after the first timestep (CliMA/ClimaCoupler.jl#614). I don't think we've looked any deeper yet though. Do we already have any CPU-GPU consistency tests in ClimaCore? I thought someone's mentioned them recently but I'm not totally sure where to look.

@charleskawczynski
Copy link
Member

charleskawczynski commented Apr 5, 2024

Here is one place in ClimaCore where I'd be surprised to see bitwise equality:

function reduce_cuda_blocks_kernel!(
    reduce_cuda::AbstractArray{T, 2},
    op,
    ::Val{shmemsize},
) where {T, shmemsize}
    blksize = blockDim().x
    fidx = blockIdx().x
    tidx = threadIdx().x
    nitems = size(reduce_cuda, 1)
    nloads = cld(nitems, blksize) - 1
    reduction = CUDA.CuStaticSharedArray(T, shmemsize)

    reduction[tidx] = reduce_cuda[tidx, fidx]

    for i in 1:nloads
        idx = tidx + blksize * i
        if idx  nitems
            reduction[tidx] = op(reduction[tidx], reduce_cuda[idx, fidx])
        end
    end

    blksize > 32 && sync_threads()
    _cuda_intrablock_reduce!(op, reduction, tidx, blksize)

    tidx == 1 && (reduce_cuda[1, fidx] = reduction[1])
    return nothing
end

That said, I'm not even sure if/where this comes into a simulation.

@szy21 szy21 added this to the Maintenance and Improvements milestone Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants