Skip to content

Add graph capture#895

Merged
pxl-th merged 4 commits intomasterfrom
pxl-th/graphs
Mar 23, 2026
Merged

Add graph capture#895
pxl-th merged 4 commits intomasterfrom
pxl-th/graphs

Conversation

@pxl-th
Copy link
Member

@pxl-th pxl-th commented Mar 21, 2026

  • Add graph capture.
  • Update profiling docs to rocprofv3.
using AMDGPU
using GPUArrays

function f(o)
    x = AMDGPU.rand(Float32, size(o))
    y = AMDGPU.rand(Float32, size(o))
    o .+= sin.(x) * cos.(y) .+ 1f0
    return
end

function main()
    cache = GPUArrays.AllocCache()
    z = AMDGPU.zeros(Float32, 256, 256)
    N = 1000

    GPUArrays.@cached cache f(z)

    # Regular launch.
    # t = AMDGPU.@elapsed for i in 1:N
    #     GPUArrays.@cached cache f(z)
    # end

    # Graph launch.
    g = GPUArrays.@cached cache AMDGPU.@captured f(z)
    t = AMDGPU.@elapsed for i in 1:N
        AMDGPU.launch(g)
    end

    AMDGPU.synchronize()
    @show t
    return
end
main()
Regular Captured
6k kernel launches 1k graph launches
~135 ms ~64 ms
Screenshot from 2026-03-21 22-53-13 Screenshot from 2026-03-21 22-53-33

@pxl-th pxl-th marked this pull request as ready for review March 21, 2026 22:36
@luraess
Copy link
Member

luraess commented Mar 21, 2026

Maybe bits of #862 could be relevant or integrated here given there was not much recent activity on that PR ?

@pxl-th
Copy link
Member Author

pxl-th commented Mar 23, 2026

Maybe bits of #862 could be relevant or integrated here given there was not much recent activity on that PR ?

Not sure if we should mention older versions of rocprof, rocprofv3 seems to work fine.
And for roctx we can do it properly in a separate PR

@pxl-th pxl-th merged commit cee5fed into master Mar 23, 2026
1 check passed
@pxl-th pxl-th deleted the pxl-th/graphs branch March 23, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants