Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to compile PTX code ... uses too much parameter space #2700

Closed
jagoosw opened this issue Aug 17, 2022 · 11 comments
Closed

Failed to compile PTX code ... uses too much parameter space #2700

jagoosw opened this issue Aug 17, 2022 · 11 comments
Labels
GPU 👾 Where Oceananigans gets its powers from

Comments

@jagoosw
Copy link
Collaborator

jagoosw commented Aug 17, 2022

Hi all,

I'm currently building a model with lots of forced tracers with @johnryantaylor and @syou83syou83 and have come across an issue that I'm struggling to find a way around related to GPU compilation.

I've spent a while looking through your previous similar issues (e.g. PR #746) and going through the error messages (below) the error message it gives and I'm fairly sure that its coming from the parameters size passed to the boundary tendency functions being too large.

I have looked through the source and while I think it would be possible to reduce the parameter size for the velocity tendencies by only passing them the T and S tracers as required (see the first, nonworking, step towards this here) I don't see how this problem can be solved in the tracer_tendency function since tracer forcing/boundaries may depend on any number of fields. I also realize that the choice to pass all the tracers may be required for something else I've missed.

I can have a go at overhauling the tracer_tendency function etc. based on this suggestion and this but am not sure if its necessarily the best/nicest solution?

Edit: Possibly could just remove a lot of the information from the fields using the adapt_structure method from PR #1057?

Thanks!

LoadError: Failed to compile PTX code (ptxas exited with code 255)
ptxas /tmp/jl_4JwMaF.ptx, line 4214; error   : Entry function '_Z29julia_gpu_calculate_Gu__1315516CompilerMetadataI10StaticSizeI10_3__3__33_E12DynamicCheckvv7NDRangeILi3ES0_I10_1__1__33_ES0_I11_16__16__1_EvvEE11OffsetArrayI7Float64Li3E13CuDeviceArrayIS4_Li3ELi1EEE15RectilinearGridIS4_8PeriodicS7_7BoundedS4_S4_S3_IS4_Li1ES5_IS4_Li1ELi1EEES3_IS4_Li1E12StepRangeLenIS4_14TwicePrecisionIS4_ES10_IS4_E5Int64EES3_IS4_Li1ES9_IS4_S10_IS4_ES10_IS4_ES11_EES3_IS4_Li1ES5_IS4_Li1ELi1EEEvE22UpwindBiasedFifthOrder6FPlaneIS4_Ev17ScalarDiffusivityI26ExplicitTimeDiscretization27ThreeDimensionalFormulation3___10NamedTupleI57__b___NO____NH____P___Z___D___DD___DOM___DIC___ALK___OXY_5TupleIS17_S17_S17_S17_S17_S17_S17_S17_S17_S17_S17_EEE17BoundaryConditionI4FluxvE8BuoyancyI14BuoyancyTracer10ZDirectionES18_I23__velocities___tracers_S19_IS18_I12__u___v___w_S19_I9ZeroFieldIS11_Li3EES25_IS11_Li3EES25_IS11_Li3EEEES18_I57__b___NO____NH____P___Z___D___DD___DOM___DIC___ALK___OXY_S19_IS25_IS11_Li3EES25_IS11_Li3EES25_IS11_Li3EES25_IS11_Li3EES25_IS11_Li3EES25_IS11_Li3EES25_IS11_Li3EES25_IS11_Li3EES25_IS11_Li3EES25_IS11_Li3EES25_IS11_Li3EEEEEES18_I12__u___v___w_S19_IS3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEEEES18_I57__b___NO____NH____P___Z___D___DD___DOM___DIC___ALK___OXY_S19_IS3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEEEEvS18_I69__u___v___w___b___NO____NH____P___Z___D___DD___DOM___DIC___ALK___OXY_S19_I12_zeroforcingS26_S26_S26_17ContinuousForcingI6CenterS28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_NO__forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_I10_identity410_identity510_identity110_identity210_identity3S30_S31_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_NH__forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS32_S33_S34_S30_S31_S32_S33_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE10_P_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS34_S30_S31_S32_S33_S34_S30_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE10_Z_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS31_S32_S33_S34_S30_S31_S32_EE16MultipleForcingsILi2ES19_IS27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE10_D_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS33_S34_S30_S31_S32_S33_S34_EE16AdvectiveForcingIS18_I12__u___v___w_S19_IS25_IS11_Li3EES25_IS11_Li3EES3_IS4_Li3ES5_IS4_Li3ELi1EEEEES12_7_div_UcS3_IS4_Li3ES5_IS4_Li3ELi1EEEEEES38_ILi2ES19_IS27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE11_DD_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS30_S31_S32_S33_S34_S30_S31_EES40_IS18_I12__u___v___w_S19_IS25_IS11_Li3EES25_IS11_Li3EES3_IS4_Li3ES5_IS4_Li3ELi1EEEEES12_S41_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEEES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_DOM_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS32_S33_S34_S30_S31_S32_S33_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_DIC_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_S11_S11_ES19_IS34_S30_S31_S32_S33_S34_S30_S31_S32_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_ALK_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_S11_S11_ES19_IS33_S34_S30_S31_S32_S33_S34_S30_S31_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_OXY_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_S11_ES19_IS32_S33_S34_S30_S31_S32_S33_S34_EEEES3_IS4_Li3ES5_IS4_Li3ELi1EEES18_I27__time___iteration___stage_S19_IS4_S11_S11_EE' uses too much parameter space (0x19a8 bytes, 0x1100 max).
ptxas fatal   : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach /tmp/jl_4JwMaF.ptx
in expression starting at /nfs/st01/hpc-atmos-jrt51/js2430/OceanBioME.jl/examples/subpolar.jl:223
     (stacktrace)
      (user)
 >     Base
   +    error ./error.jl:33
       CUDA
   +    cufunction_compile ~/.julia/packages/CUDA/DfvRa/src/c
   +   [inlined]
       GPUCompiler
   +    JuliaContext ~/.julia/packages/GPUCompiler/N98un/src/
v      CUDA
_w_S19_IS3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEEEES18_I57__b___NO____NH____P___Z___D___DD___DOM___DIC___ALK___OXY_S19_IS3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEES3_IS4_Li3ES5_IS4_Li3ELi1EEEEEvS18_I69__u___v___w___b___NO____NH____P___Z___D___DD___DOM___DIC___ALK___OXY_S19_I12_zeroforcingS26_S26_S26_17ContinuousForcingI6CenterS28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_NO__forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_I10_identity410_identity510_identity110_identity210_identity3S30_S31_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_NH__forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS32_S33_S34_S30_S31_S32_S33_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE10_P_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS34_S30_S31_S32_S33_S34_S30_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE10_Z_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS31_S32_S33_S34_S30_S31_S32_EE16MultipleForcingsILi2ES19_IS27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE10_D_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS33_S34_S30_S31_S32_S33_S34_EE16AdvectiveForcingIS18_I12__u___v___w_S19_IS25_IS11_Li3EES25_IS11_Li3EES3_IS4_Li3ES5_IS4_Li3ELi1EEEEES12_7_div_UcS3_IS4_Li3ES5_IS4_Li3ELi1EEEEEES38_ILi2ES19_IS27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE11_DD_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS30_S31_S32_S33_S34_S30_S31_EES40_IS18_I12__u___v___w_S19_IS25_IS11_Li3EES25_IS11_Li3EES3_IS4_Li3ES5_IS4_Li3ELi1EEEEES12_S41_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEEES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_DOM_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS32_S33_S34_S30_S31_S32_S33_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_DIC_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_S11_S11_ES19_IS34_S30_S31_S32_S33_S34_S30_S31_S32_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_ALK_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_S11_S11_ES19_IS33_S34_S30_S31_S32_S33_S34_S30_S31_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_OXY_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_S11_ES19_IS32_S33_S34_S30_S31_S32_S33_S34_EEEES3_IS4_Li3ES5_IS4_Li3ELi1EEES18_I27__time___iteration___stage_S19_IS4_S11_S11_EE' uses too much parameter space (0x19a8 bytes, 0x1100 max).
ptxas fatal   : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach /tmp/jl_4JwMaF.ptx
in expression starting at /nfs/st01/hpc-atmos-jrt51/js2430/OceanBioME.jl/examples/subpolar.jl:223
     (stacktrace)
      (user)
 >     Base
   +    error ./error.jl:33
       CUDA
   +    cufunction_compile ~/.julia/packages/CUDA/DfvRa/src/c
   +   [inlined]
       GPUCompiler
   +    JuliaContext ~/.julia/packages/GPUCompiler/N98un/src/
v      CUDA
_Li3ES5_IS4_Li3ELi1EEEEE12_DOM_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_ES19_IS32_S33_S34_S30_S31_S32_S33_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_DIC_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_S11_S11_ES19_IS34_S30_S31_S32_S33_S34_S30_S31_S32_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_ALK_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_S11_S11_ES19_IS33_S34_S30_S31_S32_S33_S34_S30_S31_EES27_IS28_S28_S28_S18_I291__p____g_z___K_z___k_r0___k_b0_____rp_____bp___e_r___e_b___r_pig___K_par_______K_no____K_nh____v_dd_min___v_dd_max___V_d___V_dd_________p___a_z___m_z_____z___m_p_____d_____dd_________n_____p_____z_____d_____dd___Rd_phy___Rd_dom___Rd_chl_____caco3___Rd_oxy___Rd_nit___f_z___f_d_____dom___PAR_S19_IS4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S4_S11_S11_S4_S4_S4_S4_S4_S4_S4_S4_S4_S3_IS4_Li3ES5_IS4_Li3ELi1EEEEE12_OXY_forcingvS19_IS11_S11_S11_S11_S11_S11_S11_S11_ES19_IS32_S33_S34_S30_S31_S32_S33_S34_EEEES3_IS4_Li3ES5_IS4_Li3ELi1EEES18_I27__time___iteration___stage_S19_IS4_S11_S11_EE' uses too much parameter space (0x19a8 bytes, 0x1100 max).
ptxas fatal   : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach /tmp/jl_4JwMaF.ptx
in expression starting at /nfs/st01/hpc-atmos-jrt51/js2430/OceanBioME.jl/examples/subpolar.jl:223
     (stacktrace)
      (user)
 >     Base
   +    error ./error.jl:33
       CUDA
   +    cufunction_compile ~/.julia/packages/CUDA/DfvRa/src/c
   +   [inlined]
       GPUCompiler
   +    JuliaContext ~/.julia/packages/GPUCompiler/N98un/src/
v      CUDA

(Apologies this error message is a bit mangled because I use InteractiveErrors)

@navidcy
Copy link
Collaborator

navidcy commented Aug 17, 2022

Can you please report which version of Oceananigans are you using?

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 18, 2022

Apologies, currently using v0.76.1. I can try to update if you think that could solve it?

@navidcy
Copy link
Collaborator

navidcy commented Aug 19, 2022

Try v0.76.8 please.

@glwagner
Copy link
Member

I can have a go at overhauling the tracer_tendency function etc. based on JuliaGPU/CUDA.jl#267 (comment) suggestion and #722 but am not sure if its necessarily the best/nicest solution?

Why are we overhauling the tracer tendency function? The problem appears to be in calculate_Gu, right? Eg

ptxas /tmp/jl_4JwMaF.ptx, line 4214; error   : Entry function '_Z29julia_gpu_calculate_Gu__ ...

I have looked through the source and while I think it would be possible to reduce the parameter size for the velocity tendencies by only passing them the T and S tracers as required (see the first, nonworking, step towards this here) I don't see how this problem can be solved in the tracer_tendency function since tracer forcing/boundaries may depend on any number of fields. I also realize that the choice to pass all the tracers may be required for something else I've missed.

It's just that we support user specification of velocity forcing functions that depend on any of the tracer fields. We could discontinue support for this though.

As for passing T, S --- just to clarify, you mean to pass any tracers that are involved in the calculation of the buoyancy perturbation? (This could be buoyancy b, only T, only S, or no tracer fields at all.)

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 23, 2022

Try v0.76.8 please.

Okay, will try this

@jagoosw
Copy link
Collaborator Author

jagoosw commented Aug 23, 2022

I can have a go at overhauling the tracer_tendency function etc. based on JuliaGPU/CUDA.jl#267 (comment) suggestion and #722 but am not sure if its necessarily the best/nicest solution?

Why are we overhauling the tracer tendency function? The problem appears to be in calculate_Gu, right? Eg


ptxas /tmp/jl_4JwMaF.ptx, line 4214; error   : Entry function '_Z29julia_gpu_calculate_Gu__ ...

I have looked through the source and while I think it would be possible to reduce the parameter size for the velocity tendencies by only passing them the T and S tracers as required (see the first, nonworking, step towards this here) I don't see how this problem can be solved in the tracer_tendency function since tracer forcing/boundaries may depend on any number of fields. I also realize that the choice to pass all the tracers may be required for something else I've missed.

It's just that we support user specification of velocity forcing functions that depend on any of the tracer fields. We could discontinue support for this though.

As for passing T, S --- just to clarify, you mean to pass any tracers that are involved in the calculation of the buoyancy perturbation? (This could be buoyancy b, only T, only S, or no tracer fields at all.)

Sorry I'm away from the computer with a GPU at the moment so can't check, but I think the reason I thought it would require changes to the tracer_tendency function is because it seems relatively straightforward to fix the issue with the velocity tendency functions (I hadn't realised you could have tracer dependent velocity forcing too so that may make it less straightforward), but then when I'd changed that a bit I realised it would be much harder to reduce the parameter size of the tracer tendency function if the tracers depend on lots of other tracers.

And yeah sorry for the lack of clarity, I meant pass as required for buoyancy b/T/S

@glwagner
Copy link
Member

Edit: Possibly could just remove a lot of the information from the fields using the adapt_structure method from PR #1057?

We already do "unwrap" data from fields on the GPU:

Adapt.adapt_structure(to, f::Field) = Adapt.adapt(to, f.data)

I realised it would be much harder to reduce the parameter size of the tracer tendency function if the tracers depend on lots of other tracers.

Okay, this makes sense. Solving the problem for calculate_Gu just bumps the error down to calculate_tracer_tendency, and the issue isn't tractable in that case, since it's clearly important to support tracer forcing functions that depend on other tracers (eg for biogeochemical models).

Can you provide a minimum script that reproduces the error? I do think many tracers is an important use case so solving this could warrant reducing the kinds of forcing functions we support --- if that's necessary. I think there also may be solutions that don't change what we support while still solving this problem; eg we provide some features that extend model capabilities specifically for the case of large numbers of tracers.

For example, we could avoid passing the tracers explicitly into the kernels. Instead, we can attach references only to the relevant / used fields directly in Forcing (we'd have to change the user API to DiscreteForcing to support this, but the changes need not be major). This way one might be able to support systems of reacting tracers, as well as many "additional" passive tracers that are not involved in a forcing function.

Or, we can explore JuliaGPU/CUDA.jl#267.

@glwagner
Copy link
Member

Note that #1886 is related.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Sep 4, 2022

Sorry for the slow replies, I am currently away. I will put together a MWE when I am back.

More out of interest, is there a benefit to passing variables explicitly?

@glwagner
Copy link
Member

glwagner commented Sep 6, 2022

More out of interest, is there a benefit to passing variables explicitly?

Is this question about CUDA.jl behavior? I'm afraid I don't know, but this comment might help:

#746 (comment)

It could be worth asking on #gpu slack, or on JuliaGPU/CUDA.jl#267.

@glwagner
Copy link
Member

Dup of #1886 so lets discuss further there is need be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU 👾 Where Oceananigans gets its powers from
Projects
None yet
Development

No branches or pull requests

3 participants