Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel cache broken: functions are compiled every time #336

Closed
anj00 opened this Issue Jan 31, 2019 · 2 comments

Comments

Projects
None yet
2 participants
@anj00
Copy link

anj00 commented Jan 31, 2019

updating from 0.9.1 to 1.0.1 I see dramatic increase in compilation time. 2-50 times longer depending on a kernel

It is a bit contrived example, just to show the problem. compiling a generated function takes much longer now

using CUDAnative
function say(num)
        x = num
        return
end

ex = :(()->begin
     @cuda threads=4 say(42)
end)

f = eval(ex)
println("no   eval")
for i = 1:5
    @time(Base.invokelatest(f))
end
println("with eval")
for i = 1:5
    f = eval(ex)
    @time(Base.invokelatest(f))
end 

above code generates in Julia 1.0.2 (with CUDAnative 0.9.1)

\| \| \|_\| \| \| \| (_\| \|  \|    Version 1.0.2 (2018-11-08)
--
_/   \|\__'_\|_\|_\|\__'_\|  \|  Official https://julialang.org/ release
\|__/                   \|
 
no     eval
0.020439 seconds (33.20 k allocations: 1.565 MiB)
0.000024 seconds (7 allocations: 240 bytes)
0.000012 seconds (7 allocations: 240 bytes)
0.000009 seconds (7 allocations: 240 bytes)
0.000009 seconds (7 allocations: 240 bytes)
with eval
0.015161 seconds (29.06 k allocations: 1.324 MiB)
0.019288 seconds (29.05 k allocations: 1.324 MiB, 20.81% gc time)
0.015181 seconds (29.05 k allocations: 1.324 MiB)
0.015019 seconds (29.06 k allocations: 1.324 MiB)
0.014749 seconds (29.05 k allocations: 1.324 MiB)

yet in julia 1.1.0 with CUDAnative 1.0.1

\| \| \|_\| \| \| \| (_\| \|  \|    Version 1.1.0 (2019-01-21)
--
_/   \|\__'_\|_\|_\|\__'_\|  \|  Official https://julialang.org/ release
\|__/                   \|
 
no     eval
0.041532 seconds (77.66 k allocations: 5.017 MiB)
0.000027 seconds (6 allocations: 208 bytes)
0.000025 seconds (6 allocations: 208 bytes)
0.000014 seconds (6 allocations: 208 bytes)
0.000009 seconds (6 allocations: 208 bytes)
with eval
0.036061 seconds (37.08 k allocations: 3.222 MiB)
0.035825 seconds (37.07 k allocations: 3.221 MiB)
0.043515 seconds (37.07 k allocations: 3.221 MiB, 11.37% gc time)
0.035669 seconds (37.08 k allocations: 3.222 MiB)
0.036885 seconds (37.07 k allocations: 3.221 MiB)

This was a very simple kernel. With more advanced ones I see even 50x difference (using a custom macro to measure time I see jump from 7ms compilation to 400ms)

for reference here is the list of software
Julia 1.1.0

[c52e3926] Atom v0.7.14
  [6e4b80f9] BenchmarkTools v0.4.2
  [c5f51814] CUDAdrv v1.0.1
  [be33ccc6] CUDAnative v1.0.1
  [3a865a2d] CuArrays v0.9.0
  [5789e2e9] FileIO v1.0.5
  [033835bb] JLD2 v0.1.2
  [e5e0dc1b] Juno v0.5.4

and Julia 1.0.2

  [c52e3926] Atom v0.7.10
  [6e4b80f9] BenchmarkTools v0.4.1
  [336ed68f] CSV v0.4.2
  [3895d2a7] CUDAapi v0.5.2
  [c5f51814] CUDAdrv v0.8.6
  [be33ccc6] CUDAnative v0.9.1
  [3a865a2d] CuArrays v0.8.1
  [a93c6f00] DataFrames v0.14.1
  [5789e2e9] FileIO v1.0.2
  [59287772] Formatting v0.3.4
  [cd3eb016] HTTP v0.7.1
  [033835bb] JLD2 v0.1.2
  [e5e0dc1b] Juno v0.5.3
  [1914dd2f] MacroTools v0.4.4
  [47be7bcc] ORCA v0.2.0
  [f0f68f2c] PlotlyJS v0.12.0
  [91a5bcdd] Plots v0.21.0
  [276daf66] SpecialFunctions v0.7.2
  [2913bbd2] StatsBase v0.25.0
@maleadt

This comment has been minimized.

Copy link
Member

maleadt commented Jan 31, 2019

This is a pretty bad evaluation, since @cuda does not compile a kernel; that happens once and puts it in a cache. However, it does show an issue with current CUDAnative, where the cache seems to be broken. So it's recompiling each and every time, which you can verify by specifying the JULIA_DEBUG=CUDAnative environment flag.

@maleadt maleadt added the bug label Jan 31, 2019

@maleadt maleadt self-assigned this Jan 31, 2019

@maleadt maleadt changed the title very slow compilation with CUDAnative v1.0.1 Kernel cache broken: functions are compiled every time Jan 31, 2019

@maleadt

This comment has been minimized.

Copy link
Member

maleadt commented Jan 31, 2019

Fixed 95fbf93

I don't think this actually got triggered with much user code, since it relied in the world age increasing between kernel launches, which only happened here because of your use of eval and invokelatest.

@maleadt maleadt closed this Jan 31, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.