Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA support broken since Enzyme v0.8 and GPUCompiler v13 #230

Closed
luciano-drozda opened this issue Feb 12, 2022 · 18 comments
Closed

CUDA support broken since Enzyme v0.8 and GPUCompiler v13 #230

luciano-drozda opened this issue Feb 12, 2022 · 18 comments

Comments

@luciano-drozda
Copy link

luciano-drozda commented Feb 12, 2022

MWE

using Enzyme
using CUDA
if has_cuda()

  @info "CUDA is on"
  CUDA.allowscalar(false)
  
end

function f!(s, a, b)

  i = threadIdx().x
  s[i] = a[i] + b[i]
  return nothing

end # f!

function df!(ds, da, db)

  Enzyme.autodiff_deferred(f!, ds, da, db)
  return nothing

end # df!

function f()

  a  = cu(rand(3))
  b  = cu(rand(3))
  s  = zero(a)
  nt = length(s)

  ds = Duplicated(s, cu(rand(nt)))
  da = Duplicated(a, zero(a))
  db = Duplicated(b, zero(b))
  @cuda threads=nt df!(ds, da, db)

end # f

f()

Returns

[ Info: CUDA is on
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:414: LLVMOpaqueValue* EnzymeCreatePrimalAndGradient(EnzymeLogicRef, LLVMValueRef, CDIFFE_TYPE, CDIFFE_TYPE*, size_t, EnzymeTypeAnalysisRef, uint8_t, uint8_t, CDerivativeMode, unsigned int, LLVMTypeRef, CFnTypeInfo, uint8_t*, size_t, EnzymeAugmentedReturnPtr, uint8_t): Assertion `argnum < uncacheable_args_size' failed.

signal (6): Aborted
in expression starting at /Users/drozda/cukernel.jl:80
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__assert_fail_base at /lib64/libc.so.6 (unknown line)
__assert_fail at /lib64/libc.so.6 (unknown line)
EnzymeCreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:414
EnzymeCreatePrimalAndGradient at /Users/drozda/.julia/packages/Enzyme/i3uGf/src/api.jl:108
enzyme! at /Users/drozda/.julia/packages/Enzyme/i3uGf/src/compiler.jl:1740
unknown function (ip: 0x2b0ad98db872)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#codegen#38 at /Users/drozda/.julia/packages/Enzyme/i3uGf/src/compiler.jl:2282
codegen##kw at /Users/drozda/.julia/packages/Enzyme/i3uGf/src/compiler.jl:2126 [inlined]
#114 at /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/driver.jl:227
get! at ./dict.jl:464
unknown function (ip: 0x2b0ad98b8eef)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
macro expansion at /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/driver.jl:226 [inlined]
#emit_llvm#110 at /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/utils.jl:64
unknown function (ip: 0x2b0ad9875e55)
emit_llvm at /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/utils.jl:62 [inlined]
cufunction_compile at /Users/drozda/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:325
cached_compilation at /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/cache.jl:90
#cufunction#243 at /Users/drozda/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:297
cufunction at /Users/drozda/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:291
unknown function (ip: 0x2b0acd44e5cd)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
macro expansion at /Users/drozda/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:102 [inlined]
f at /Users/drozda/cukernel.jl:76
unknown function (ip: 0x2b0acd449abf)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:166 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:587
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:731
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:885
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:944
eval at ./boot.jl:373 [inlined]
include_string at ./loading.jl:1196
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
_include at ./loading.jl:1253
include at ./client.jl:451
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:166 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:587
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:731
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:885
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:944
eval at ./boot.jl:373 [inlined]
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:244
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:229
#run_repl#47 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:362
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:349
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#930 at ./client.jl:394
jfptr_YY.930_32578.clone_1 at /Users/drozda/julia-1.7.0/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:757
#invokelatest#2 at ./essentials.jl:716 [inlined]
invokelatest at ./essentials.jl:714 [inlined]
run_main_repl at ./client.jl:379
exec_options at ./client.jl:309
_start at ./client.jl:495
jfptr__start_43127.clone_1 at /Users/drozda/julia-1.7.0/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x400808)
Allocations: 67188360 (Pool: 67161364; Big: 26996); GC: 59
Aborted
@vchuravy
Copy link
Member

Can you tell me what ]status -m is? Ok oks to me like there is an ABI mismatch between Enzyme_jll and Enzyme.jl and I need to know the version numbers.

@luciano-drozda
Copy link
Author

@vchuravy

(@v1.7) pkg> status -m
      Status `/Users/drozda/.julia/environments/v1.7/Manifest.toml`
  [621f4979] AbstractFFTs v1.1.0
  [1520ce14] AbstractTrees v0.3.4
  [79e6a3ab] Adapt v3.3.3
  [4fba245c] ArrayInterface v4.0.3
  [ab4f0b2a] BFloat16s v0.2.0
  [fbb218c0] BSON v0.3.4
  [6e4b80f9] BenchmarkTools v1.3.1
  [fa961155] CEnum v0.4.1
  [336ed68f] CSV v0.10.2
  [052768ef] CUDA v3.8.0
  [082447d4] ChainRules v1.26.1
  [d360d2e6] ChainRulesCore v1.12.0
  [9e997f8a] ChangesOfVariables v0.1.2
  [da1fd8a2] CodeTracking v1.0.6
  [523fee87] CodecBzip2 v0.7.2
  [944b1d66] CodecZlib v0.7.0
  [35d6a980] ColorSchemes v3.17.1
  [3da002f7] ColorTypes v0.11.0
  [5ae59095] Colors v0.12.8
  [bbf7d656] CommonSubexpressions v0.3.0
  [34da2185] Compat v3.41.0
  [d38c429a] Contour v0.5.7
  [9a962f9c] DataAPI v1.9.0
  [864edb3b] DataStructures v0.18.11
  [e2d170a0] DataValueInterfaces v1.0.0
  [163ba53b] DiffResults v1.0.3
  [b552c78f] DiffRules v1.9.1
  [ffbed154] DocStringExtensions v0.8.6
  [7da242da] Enzyme v0.8.4
  [e2ba6199] ExprTools v0.1.8
  [c87230d0] FFMPEG v0.4.1
  [7a1cc6ca] FFTW v1.4.5
  [48062228] FilePathsBase v0.9.17
  [1a297f60] FillArrays v0.12.8
  [53c48c17] FixedPointNumbers v0.8.4
  [587475ba] Flux v0.12.9
  [59287772] Formatting v0.4.2
  [f6369f11] ForwardDiff v0.10.25
  [d9f16b24] Functors v0.2.8
  [0c68f7d7] GPUArrays v8.2.1
  [61eb1bfa] GPUCompiler v0.13.11
  [28b8d3ca] GR v0.63.1
  [5c1252a2] GeometryBasics v0.4.1
  [42e2da0e] Grisu v1.0.2
  [f67ccb44] HDF5 v0.16.2
  [cd3eb016] HTTP v0.9.17
  [7869d1d1] IRTools v0.4.5
  [615f187c] IfElse v0.1.1
  [83e8ac13] IniFile v0.5.0
  [842dd82b] InlineStrings v1.1.2
  [d8418881] Intervals v1.5.0
  [3587e190] InverseFunctions v0.1.2
  [92d709cd] IrrationalConstants v0.1.1
  [c8e1da08] IterTools v1.4.0
  [82899510] IteratorInterfaceExtensions v1.0.0
  [692b3bcd] JLLWrappers v1.4.1
  [682c06a0] JSON v0.21.3
  [aa1ae85d] JuliaInterpreter v0.9.3
  [e5e0dc1b] Juno v0.8.4
  [929cbde3] LLVM v4.7.1
  [b964fa9f] LaTeXStrings v1.3.0
  [23fbe1c1] Latexify v0.15.9
  [9c8b4983] LightXML v0.9.0
  [2ab3a3ac] LogExpFunctions v0.3.6
  [6f1432cf] LoweredCodeUtils v2.2.1
  [1914dd2f] MacroTools v0.5.9
  [b8f27783] MathOptInterface v0.10.8
  [fdba3010] MathProgBase v0.7.8
  [739be429] MbedTLS v1.0.3
  [442fdcdd] Measures v0.3.1
  [e89f7d12] Media v0.5.0
  [e1d29d7a] Missings v1.0.2
  [78c3b35d] Mocking v0.7.3
  [d8a4904e] MutableArithmetics v0.3.3
  [76087f3c] NLopt v0.6.4
  [872c559c] NNlib v0.8.1
  [a00861dc] NNlibCUDA v0.2.1
  [77ba4419] NaNMath v0.3.7
  [d8793406] ObjectFile v0.3.7
  [bac558e1] OrderedCollections v1.4.1
  [69de0a69] Parsers v2.2.2
  [ccf2f8ad] PlotThemes v2.0.1
  [995b91a9] PlotUtils v1.1.3
  [91a5bcdd] Plots v1.25.9
  [f27b6e38] Polynomials v2.0.24
  [2dfb63ee] PooledArrays v1.4.0
  [21216c6a] Preferences v1.2.3
  [74087812] Random123 v1.4.2
  [e6cf234a] RandomNumbers v1.5.3
  [c1ae055f] RealDot v0.1.0
  [3cdcf5f2] RecipesBase v1.2.1
  [01d81517] RecipesPipeline v0.5.0
  [189a3867] Reexport v1.2.2
  [05181044] RelocatableFolders v0.1.3
  [ae029012] Requires v1.3.0
  [295af30f] Revise v3.3.1
  [6c6a2e73] Scratch v1.1.0
  [91c51154] SentinelArrays v1.3.12
  [992d4aef] Showoff v1.0.3
  [a2af1166] SortingAlgorithms v1.0.1
  [276daf66] SpecialFunctions v2.1.2
  [aedffcd0] Static v0.5.5
  [90137ffa] StaticArrays v1.3.4
  [82ae8749] StatsAPI v1.2.0
  [2913bbd2] StatsBase v0.33.15
  [69024149] StringEncodings v0.3.5
  [09ab397b] StructArrays v0.6.4
  [53d494c1] StructIO v0.3.0
  [3783bdb8] TableTraits v1.0.1
  [bd369af6] Tables v1.6.1
  [f269a46b] TimeZones v1.7.1
  [a759f4b9] TimerOutputs v0.5.15
  [3bb67fe8] TranscodingStreams v0.9.6
  [5c2747f8] URIs v1.3.0
  [1cfade01] UnicodeFun v0.4.1
  [41fe7b60] Unzip v0.1.2
  [ea10d353] WeakRefStrings v1.4.1
  [64499a7a] WriteVTK v1.14.0
  [ddb6d928] YAML v0.4.7
  [a5390f91] ZipFile v0.9.4
  [e88e6eb3] Zygote v0.6.34
  [700de1a5] ZygoteRules v0.2.2
  [6e34b625] Bzip2_jll v1.0.8+0
  [83423d85] Cairo_jll v1.16.1+1
  [5ae413db] EarCut_jll v2.2.3+0
  [7cc45869] Enzyme_jll v0.0.27+0
  [2e619515] Expat_jll v2.4.4+0
  [b22a6f82] FFMPEG_jll v4.4.0+0
  [f5851436] FFTW_jll v3.3.10+0
  [a3f928ae] Fontconfig_jll v2.13.93+0
  [d7e528f0] FreeType2_jll v2.10.4+0
  [559328eb] FriBidi_jll v1.0.10+0
  [0656b61e] GLFW_jll v3.3.6+0
  [d2c73de3] GR_jll v0.63.1+0
  [78b55507] Gettext_jll v0.21.0+0
  [7746bdde] Glib_jll v2.68.3+2
  [3b182d85] Graphite2_jll v1.3.14+0
  [0234f1f7] HDF5_jll v1.12.1+0
  [2e76f6c2] HarfBuzz_jll v2.8.1+1
  [1d5cc7b8] IntelOpenMP_jll v2018.0.3+2
  [aacddb02] JpegTurbo_jll v2.1.2+0
  [c1c5ebd0] LAME_jll v3.100.1+0
  [dad2f222] LLVMExtra_jll v0.0.13+1
  [dd4b983a] LZO_jll v2.10.1+0
  [e9f186c6] Libffi_jll v3.2.2+1
  [d4300ac3] Libgcrypt_jll v1.8.7+0
  [7e76a0d4] Libglvnd_jll v1.3.0+3
  [7add5ba3] Libgpg_error_jll v1.42.0+0
  [94ce4f54] Libiconv_jll v1.16.1+1
  [4b2f31a3] Libmount_jll v2.35.0+0
  [89763e89] Libtiff_jll v4.3.0+0
  [38a345b3] Libuuid_jll v2.36.0+0
  [856f044c] MKL_jll v2021.1.1+2
  [079eb43e] NLopt_jll v2.7.1+0
  [e7412a2a] Ogg_jll v1.3.5+1
  [458c3c95] OpenSSL_jll v1.1.13+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [91d4177d] Opus_jll v1.3.2+0
  [2f80f16e] PCRE_jll v8.44.0+0
  [30392449] Pixman_jll v0.40.1+0
  [ea2cea3b] Qt5Base_jll v5.15.3+0
  [a2964d1f] Wayland_jll v1.19.0+0
  [2381bf8a] Wayland_protocols_jll v1.23.0+0
  [02c8fc9c] XML2_jll v2.9.12+0
  [aed1982a] XSLT_jll v1.1.34+0
  [4f6342f7] Xorg_libX11_jll v1.6.9+4
  [0c0b7dd1] Xorg_libXau_jll v1.0.9+4
  [935fb764] Xorg_libXcursor_jll v1.2.0+4
  [a3789734] Xorg_libXdmcp_jll v1.1.3+4
  [1082639a] Xorg_libXext_jll v1.3.4+4
  [d091e8ba] Xorg_libXfixes_jll v5.0.3+4
  [a51aa0fd] Xorg_libXi_jll v1.7.10+4
  [d1454406] Xorg_libXinerama_jll v1.1.4+4
  [ec84b674] Xorg_libXrandr_jll v1.5.2+4
  [ea2f1a96] Xorg_libXrender_jll v0.9.10+4
  [14d82f49] Xorg_libpthread_stubs_jll v0.1.0+3
  [c7cfdc94] Xorg_libxcb_jll v1.13.0+3
  [cc61e674] Xorg_libxkbfile_jll v1.1.0+4
  [12413925] Xorg_xcb_util_image_jll v0.4.0+1
  [2def613f] Xorg_xcb_util_jll v0.4.0+1
  [975044d2] Xorg_xcb_util_keysyms_jll v0.4.0+1
  [0d47668e] Xorg_xcb_util_renderutil_jll v0.3.9+1
  [c22f9ab0] Xorg_xcb_util_wm_jll v0.4.1+1
  [35661453] Xorg_xkbcomp_jll v1.4.2+4
  [33bec58e] Xorg_xkeyboard_config_jll v2.27.0+4
  [c5fb5394] Xorg_xtrans_jll v1.4.0+3
  [3161d3a3] Zstd_jll v1.5.2+0
  [0ac62f75] libass_jll v0.15.1+0
  [f638f0a6] libfdk_aac_jll v2.0.2+0
  [b53b4c65] libpng_jll v1.6.38+0
  [f27f6e37] libvorbis_jll v1.3.7+1
  [1270edf5] x264_jll v2021.5.5+0
  [dfaa095f] x265_jll v3.5.0+0
  [d8fb68d0] xkbcommon_jll v0.9.1+5
  [0dad84c5] ArgTools
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8bb1440f] DelimitedFiles
  [8ba89e20] Distributed
  [f43a241f] Downloads
  [7b1f6079] FileWatching
  [9fa8497b] Future
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [ca575930] NetworkOptions
  [44cfe95a] Pkg
  [de0858da] Printf
  [9abbd945] Profile
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA
  [9e88b42a] Serialization
  [1a1011a3] SharedArrays
  [6462fe0b] Sockets
  [2f01184e] SparseArrays
  [10745b16] Statistics
  [fa267f1f] TOML
  [a4e569a6] Tar
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll
  [deac9b47] LibCURL_jll
  [29816b5a] LibSSH2_jll
  [c8ffd9c3] MbedTLS_jll
  [14a3606d] MozillaCACerts_jll
  [4536629a] OpenBLAS_jll
  [05823500] OpenLibm_jll
  [83775a58] Zlib_jll
  [8e850b90] libblastrampoline_jll
  [8e850ede] nghttp2_jll
  [3f19e933] p7zip_jll

@vchuravy
Copy link
Member

So you got [7cc45869] Enzyme_jll v0.0.27+0 and [7da242da] Enzyme v0.8.4

How the heck did that happen?

Enzyme_jll = "~0.0.25"

@luciano-drozda
Copy link
Author

@vchuravy Don't know, just ] up today and this came out.

@vchuravy
Copy link
Member

Releasing a fixed 0.8.5 and will then adjust the registry post-hoc.

@luciano-drozda
Copy link
Author

@vchuravy Alright, thanks, I'll rerun it once it's released.

@vchuravy
Copy link
Member

@luciano-drozda
Copy link
Author

@vchuravy Didn't solve the problem.

ERROR: AssertionError: llvmtype(state_arg) == T_state
Stacktrace:
  [1] lower_kernel_state!(fun::LLVM.Function)
    @ GPUCompiler /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/irgen.jl:694
  [2] function_pass_callback(ptr::Ptr{Nothing}, data::Ptr{Nothing})
    @ LLVM /Users/drozda/.julia/packages/LLVM/vQ98J/src/pass.jl:49
  [3] LLVMRunPassManager
    @ /Users/drozda/.julia/packages/LLVM/vQ98J/lib/12/libLLVM_h.jl:4741 [inlined]
  [4] run!
    @ /Users/drozda/.julia/packages/LLVM/vQ98J/src/passmanager.jl:39 [inlined]
  [5] (::GPUCompiler.var"#77#80"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(cukernel.df!), Tuple{Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}}}}, LLVM.Module, LLVM.TargetMachine, String})(pm::LLVM.ModulePassManager)
    @ GPUCompiler /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/optim.jl:205
  [6] LLVM.ModulePassManager(::GPUCompiler.var"#77#80"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(cukernel.df!), Tuple{Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}}}}, LLVM.Module, LLVM.TargetMachine, String}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ LLVM /Users/drozda/.julia/packages/LLVM/vQ98J/src/passmanager.jl:33
  [7] LLVM.ModulePassManager(::Function)
    @ LLVM /Users/drozda/.julia/packages/LLVM/vQ98J/src/passmanager.jl:31
  [8] optimize!(job::GPUCompiler.CompilerJob, mod::LLVM.Module)
    @ GPUCompiler /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/optim.jl:175
  [9] macro expansion
    @ /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/driver.jl:259 [inlined]
 [10] macro expansion
    @ /Users/drozda/.julia/packages/TimerOutputs/5tW2E/src/TimerOutput.jl:252 [inlined]
 [11] macro expansion
    @ /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/driver.jl:258 [inlined]
 [12] macro expansion
    @ /Users/drozda/.julia/packages/TimerOutputs/5tW2E/src/TimerOutput.jl:252 [inlined]
 [13] macro expansion
    @ /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/driver.jl:256 [inlined]
 [14] emit_llvm(job::GPUCompiler.CompilerJob, method_instance::Any; libraries::Bool, deferred_codegen::Bool, optimize::Bool, only_entry::Bool)
    @ GPUCompiler /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/utils.jl:64
 [15] emit_llvm
    @ /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/utils.jl:62 [inlined]
 [16] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA /Users/drozda/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:325
 [17] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler /Users/drozda/.julia/packages/GPUCompiler/1Ajz2/src/cache.jl:90
 [18] cufunction(f::typeof(cukernel.df!), tt::Type{Tuple{Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA /Users/drozda/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:297
 [19] cufunction(f::typeof(cukernel.df!), tt::Type{Tuple{Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}}})
    @ CUDA /Users/drozda/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:291
 [20] macro expansion
    @ /Users/drozda/.julia/packages/CUDA/bki2w/src/compiler/execution.jl:102 [inlined]
 [21] f()
    @ cukernel /Users/drozda/cukernel.jl:37
 [22] top-level scope
    @ REPL[2]:1
 [23] top-level scope
    @ /Users/drozda/.julia/packages/CUDA/bki2w/src/initialization.jl:52

@vchuravy
Copy link
Member

Note that that is a vastly different error. (Also fixed the general issue in JuliaRegistries/General#54523)

@vchuravy
Copy link
Member

Can you try GPUCompiler@0.12?

@luciano-drozda
Copy link
Author

luciano-drozda commented Feb 12, 2022

@vchuravy Sure, sorry, I referred to the differentiation itself.

The original error reappers, I guess because Enzyme_jll went back to v0.0.27+0

(@v1.7) pkg> add GPUCompiler@0.12
    Updating registry at `/Users/drozda/.julia/registries/General.toml`
   Resolving package versions...
   Installed BFloat16s ──────── v0.1.0
  Progress [========>                                
   Installed SpecialFunctions ─ v1.8.1
   Installed Enzyme ─────────── v0.7.0
   Installed GPUCompiler ────── v0.12.9
   Installed CUDA ───────────── v3.4.2
  Progress [========================================>
    Updating `/Users/drozda/.julia/environments/v1.7/Project.toml`
  [052768ef] ↓ CUDA v3.8.0 ⇒ v3.4.2
  [7da242da] ↓ Enzyme v0.8.5 ⇒ v0.7.0
  [61eb1bfa] + GPUCompiler v0.12.9
  [276daf66] ↓ SpecialFunctions v2.1.2 ⇒ v1.8.1
    Updating `/Users/drozda/.julia/environments/v1.7/Manifest.toml`
  [ab4f0b2a] ↓ BFloat16s v0.2.0 ⇒ v0.1.0
  [052768ef] ↓ CUDA v3.8.0 ⇒ v3.4.2
  [7da242da] ↓ Enzyme v0.8.5 ⇒ v0.7.0
  [61eb1bfa] ↓ GPUCompiler v0.13.11 ⇒ v0.12.9
  [276daf66] ↓ SpecialFunctions v2.1.2 ⇒ v1.8.1
  [7cc45869] ↑ Enzyme_jll v0.0.25+0 ⇒ v0.0.27+0

The error message :

julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:414: LLVMOpaqueValue* EnzymeCreatePrimalAndGradient(EnzymeLogicRef, LLVMValueRef, CDIFFE_TYPE, CDIFFE_TYPE*, size_t, EnzymeTypeAnalysisRef, uint8_t, uint8_t, CDerivativeMode, unsigned int, LLVMTypeRef, CFnTypeInfo, uint8_t*, size_t, EnzymeAugmentedReturnPtr, uint8_t): Assertion `argnum < uncacheable_args_size' failed.

signal (6): Aborted
in expression starting at REPL[3]:1
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__assert_fail_base at /lib64/libc.so.6 (unknown line)
__assert_fail at /lib64/libc.so.6 (unknown line)
EnzymeCreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:414
EnzymeCreatePrimalAndGradient at /Users/drozda/.julia/packages/Enzyme/afnXq/src/api.jl:94
enzyme! at /Users/drozda/.julia/packages/Enzyme/afnXq/src/compiler.jl:1160
unknown function (ip: 0x2ad2fd49b2e1)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#codegen#34 at /Users/drozda/.julia/packages/Enzyme/afnXq/src/compiler.jl:1423
codegen##kw at /Users/drozda/.julia/packages/Enzyme/afnXq/src/compiler.jl:1291 [inlined]
#94 at /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/driver.jl:265
get! at ./dict.jl:464
unknown function (ip: 0x2ad2fd48518f)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
macro expansion at /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/driver.jl:264 [inlined]
#emit_llvm#87 at /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/utils.jl:62
unknown function (ip: 0x2ad2da1e8585)
emit_llvm at /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/utils.jl:60 [inlined]
cufunction_compile at /Users/drozda/.julia/packages/CUDA/9T5Sq/src/compiler/execution.jl:316
cached_compilation at /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/cache.jl:89
#cufunction#206 at /Users/drozda/.julia/packages/CUDA/9T5Sq/src/compiler/execution.jl:288
cufunction at /Users/drozda/.julia/packages/CUDA/9T5Sq/src/compiler/execution.jl:282
unknown function (ip: 0x2ad2fd464bca)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
macro expansion at /Users/drozda/.julia/packages/CUDA/9T5Sq/src/compiler/execution.jl:102 [inlined]
f at /Users/drozda/cukernel.jl:37
unknown function (ip: 0x2ad2da175dbf)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:166 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:587
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:731
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:885
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:944
eval at ./boot.jl:373 [inlined]
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:244
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:229
#run_repl#47 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:362
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:349
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#930 at ./client.jl:394
jfptr_YY.930_32578.clone_1 at /Users/drozda/julia-1.7.0/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:757
#invokelatest#2 at ./essentials.jl:716 [inlined]
invokelatest at ./essentials.jl:714 [inlined]
run_main_repl at ./client.jl:379
exec_options at ./client.jl:309
_start at ./client.jl:495
jfptr__start_43127.clone_1 at /Users/drozda/julia-1.7.0/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x400808)
Allocations: 63674768 (Pool: 63651679; Big: 23089); GC: 57
Aborted

@vchuravy
Copy link
Member

Ah okay, that's why I opened JuliaRegistries/General#54523 to avoid that from happening.

Sorry for the mess. I am setting up GPU CI right now to stop this from regressing again. If I had to guess at the version you need is something like GPUCompiler@0.12 and Enzyme_jll@0.0.24

@luciano-drozda
Copy link
Author

@vchuravy I'm really keen on using Enzyme on parts of my differentiable solver, so thanks for the help here.

Unfortunately, the following issue was raised this time :

ERROR: InvalidIRError: compiling kernel df!(Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to guess_activity(T) in Enzyme at /Users/drozda/.julia/packages/Enzyme/afnXq/src/Enzyme.jl:73)
Stacktrace:
 [1] autodiff_deferred
   @ /Users/drozda/.julia/packages/Enzyme/afnXq/src/Enzyme.jl:228
 [2] df!
   @ /Users/drozda/cukernel.jl:19
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(cukernel.df!), Tuple{Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}}}}, args::LLVM.Module)
    @ GPUCompiler /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/validation.jl:111
  [2] macro expansion
    @ /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/driver.jl:319 [inlined]
  [3] macro expansion
    @ /Users/drozda/.julia/packages/TimerOutputs/5tW2E/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/driver.jl:317 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/utils.jl:62
  [6] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA /Users/drozda/.julia/packages/CUDA/9T5Sq/src/compiler/execution.jl:317
  [7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler /Users/drozda/.julia/packages/GPUCompiler/fG3xK/src/cache.jl:89
  [8] cufunction(f::typeof(cukernel.df!), tt::Type{Tuple{Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA /Users/drozda/.julia/packages/CUDA/9T5Sq/src/compiler/execution.jl:288
  [9] cufunction(f::typeof(cukernel.df!), tt::Type{Tuple{Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}, Enzyme.Duplicated{CUDA.CuDeviceVector{Float32, 1}}}})
    @ CUDA /Users/drozda/.julia/packages/CUDA/9T5Sq/src/compiler/execution.jl:282
 [10] macro expansion
    @ /Users/drozda/.julia/packages/CUDA/9T5Sq/src/compiler/execution.jl:102 [inlined]
 [11] f()
    @ cukernel /Users/drozda/cukernel.jl:37
 [12] top-level scope
    @ REPL[2]:1

@vchuravy vchuravy reopened this Feb 12, 2022
@luciano-drozda
Copy link
Author

It happens that Enzyme couldn't guess the activity of the return value in this case.

After adding it to the autodiff_deferred call, it could return gradients, which I checked against a CPU counterpart.

For people who may be facing this issue, I'm replacing

Enzyme.autodiff_deferred(f!, ds, da, db)

by

Enzyme.autodiff_deferred(f!, Const, ds, da, db)

Am using GPUCompiler@0.12.

Here's an implementation (as a module) :

module cukernel

using Test
using Enzyme
using CUDA
if has_cuda()
  @info "CUDA is on"
  CUDA.allowscalar(false)
end

export add

## CPU kernel summing two vectors `a` and `b`
## and storing results in vector `s`
function f_cpu!(s, a, b)
  
  s .= a .+ b
  return nothing

end # f_cpu!

## Wrapper for Enzyme call
## to differentiate `f_cpu!` on the CPU
function df_cpu!(ds, da, db)

  Enzyme.autodiff(f_cpu!, Const, ds, da, db)
  return nothing

end # df!

## CUDA kernel summing two vectors `a` and `b`
## and storing results in vector `s`
function f!(s, a, b)

  i    = threadIdx().x
  s[i] = a[i] + b[i]
  return nothing

end # f!

## Wrapper for Enzyme call
## to differentiate `f!` on the GPU
function df!(ds, da, db)

  Enzyme.autodiff_deferred(f!, Const, ds, da, db)
  return nothing

end # df!

## Perform sum of two vectors
## and compute gradients of the operation
## on the GPU
function add()

  ## Instantiate vectors `a` and `b` to be summed
  ## and vector `s` where result is stored
  nthreads = 4
  a_cpu = rand(nthreads)
  b_cpu = rand(nthreads)
  s_cpu = zero(a_cpu)
  a     = cu(a_cpu)
  b     = cu(b_cpu)
  s     = cu(s_cpu)
  
  ## Call CUDA kernel `f!`
  @cuda threads=nthreads f!(s, a, b)
  @info "vector `a`"
  a  |> display
  @info "vector `b`"
  b  |> display
  "" |> println
  @info "vector `s := a + b`"
  s  |> display

  ## Call `df!` to compute gradients 
  ## on the GPU via Enzyme
  dz_ds_cpu = rand(nthreads) # Some gradient passed to us from other functions
  dz_da_cpu = zero(a_cpu)
  dz_db_cpu = zero(b_cpu)
  dz_ds = cu(dz_ds_cpu)
  dz_da = cu(dz_da_cpu)
  dz_db = cu(dz_db_cpu)
  ds    = Duplicated(s, dz_ds)
  da    = Duplicated(a, dz_da)
  db    = Duplicated(b, dz_db)
  @cuda threads=nthreads df!(ds, da, db)

  ## Check results against CPU
  ds_cpu = Duplicated(s_cpu, dz_ds_cpu)
  da_cpu = Duplicated(a_cpu, dz_da_cpu)
  db_cpu = Duplicated(b_cpu, dz_db_cpu)
  df_cpu!(ds_cpu, da_cpu, db_cpu)

  @test dz_da  cu(dz_da_cpu)
  @test dz_db  cu(dz_db_cpu)

end # add

end # module

I'm closing this issue. Thanks @vchuravy !

@vchuravy vchuravy reopened this Feb 13, 2022
@vchuravy vchuravy changed the title Fail on simple CUDA kernel CUDA support broken since Enzyme v0.8 Feb 13, 2022
@vchuravy vchuravy changed the title CUDA support broken since Enzyme v0.8 CUDA support broken since Enzyme v0.8 and GPUCompiler v13 Feb 13, 2022
@wsmoses
Copy link
Member

wsmoses commented Jun 4, 2022

@vchuravy I presume at this point we can close this?

@vchuravy vchuravy closed this as completed Jun 4, 2022
@luciano-drozda
Copy link
Author

Hi, thanks again for the work on the CUDA support, the above tests ran fine !

There's just a small issue when using Val. Here's a MWE :

using CUDA
using Enzyme
if has_cuda()
  @info "CUDA is on"
  CUDA.allowscalar(false)
end

function kernel!(u, ::Val{n}) where {n}
  
  return nothing

end # kernel!

function dkernel!(du, ::Val{n}) where {n}

  Enzyme.autodiff_deferred(kernel!, Const, du, Val(n))
  return nothing
  
end # dkernel!

function call_dkernel()

  n    = 10
  u    = rand(n) |> cu
  dzdu = rand(n) |> cu
  du   = Duplicated(u, dzdu)
  @cuda threads=4 dkernel!(du, Val(n))
  
end # call_dkernel

call_dkernel()

The output :

[ Info: CUDA is on
ERROR: LoadError: InvalidIRError: compiling kernel #dkernel!(Duplicated{CuDeviceVector{Float32, 1}}, Val{10}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to jl_f_getfield)
Stacktrace:
 [1] getindex
   @ ./tuple.jl:29
 [2] iterate
   @ ./tuple.jl:69
 [3] same_or_one
   @ /scratch/drozda/.julia/packages/Enzyme/7MHm8/src/Enzyme.jl:203
 [4] autodiff_deferred
   @ /scratch/drozda/.julia/packages/Enzyme/7MHm8/src/Enzyme.jl:429
 [5] dkernel!
   @ /scratch/drozda/test.jl:16
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(dkernel!), Tuple{Duplicated{CuDeviceVector{Float32, 1}}, Val{10}}}}, args::LLVM.Module)
    @ GPUCompiler /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/validation.jl:139
  [2] macro expansion
    @ /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/driver.jl:409 [inlined]
  [3] macro expansion
    @ /scratch/drozda/.julia/packages/TimerOutputs/LDL7n/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/driver.jl:407 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:354
  [7] #224
    @ /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(dkernel!), Tuple{Duplicated{CuDeviceVector{Float32, 1}}, Val{10}}}}})
    @ GPUCompiler /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler /scratch/drozda/.julia/packages/GPUCompiler/XyxTy/src/cache.jl:90
 [11] cufunction(f::typeof(dkernel!), tt::Type{Tuple{Duplicated{CuDeviceVector{Float32, 1}}, Val{10}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:299
 [12] cufunction(f::typeof(dkernel!), tt::Type{Tuple{Duplicated{CuDeviceVector{Float32, 1}}, Val{10}}})
    @ CUDA /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:293
 [13] macro expansion
    @ /scratch/drozda/.julia/packages/CUDA/GGwVa/src/compiler/execution.jl:102 [inlined]
 [14] call_dkernel()
    @ Main /scratch/drozda/test.jl:27
 [15] top-level scope
    @ /scratch/drozda/test.jl:31
 [16] include(fname::String)
    @ Base.MainInclude ./client.jl:451
 [17] top-level scope
    @ REPL[2]:1
 [18] top-level scope
    @ /scratch/drozda/.julia/packages/CUDA/GGwVa/src/initialization.jl:52
in expression starting at /scratch/drozda/test.jl:31

@vchuravy
Copy link
Member

vchuravy commented Jun 5, 2022

Can you open a new issue?

@luciano-drozda
Copy link
Author

Done here, thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants