Adding GPU-capable Vulkan Dose Actor #1015

BishopWolf · 2026-05-06T13:52:09Z

BishopWolf
May 6, 2026

Adding a new VulkanDoseEngine is more logical inside OpenGate now more than ever. We can add GPU actors progressively.
The main idea is to support vulkan as device agnostic opensource standard. It already has similar performance than industry standards like CUDA or ROCm. But we must avoid complexity, thus Vulkan is the perfect choice as it is already mature enough.

The proposal starts with just the dose actor, something like this in glsl

#version 450

layout(local_size_x = 256) in;

layout(std430, binding = 0) readonly buffer Input {
    vec4 steps[]; // xyz + edep
};

layout(std430, binding = 1) buffer Dose {
    float voxels[];
};

layout(push_constant) uniform Params {
    vec3 origin;
    vec3 voxelSize;
    ivec3 dims;
    uint nSteps;
} params;

uint flatten(ivec3 v, ivec3 d) {
    return uint(v.x + v.y * d.x + v.z * d.x * d.y);
}

void main() {
    uint i = gl_GlobalInvocationID.x;
    if (i >= params.nSteps) return;

    vec3 pos = steps[i].xyz;
    float edep = steps[i].w;

    ivec3 v = ivec3(floor((pos - params.origin) / params.voxelSize));

    // Bounds check
    if (v.x < 0 || v.y < 0 || v.z < 0 ||
        v.x >= params.dims.x ||
        v.y >= params.dims.y ||
        v.z >= params.dims.z) return;

    uint idx = flatten(v, params.dims);

    // Accumulate dose
    atomicAdd(voxels[idx], edep);
}

Then we can create the VulkanDoseEngine and expose to python, where we need to modify the GateDoseActor to score in GPU if the user flags it. In this case we must remove the multithreading because we will score in GPU instead.

We can gain up to 10x speed depending on the card. This is specially useful for voxelized geometries where the gain is even greater.

BishopWolf · 2026-05-06T13:53:41Z

BishopWolf
May 6, 2026
Author

A starting point for VulkanDoseEngine.h would be like this

#pragma once
#include <vulkan/vulkan.h>
#include <vector>
#include <cstdint>

struct StepData {
    float x, y, z, edep;
};

struct DoseParams {
    float origin[3];
    float voxelSize[3];
    int   dims[3];
    uint32_t nSteps;
};

class VulkanDoseEngine {
public:
    VulkanDoseEngine();
    ~VulkanDoseEngine();

    bool initialize(uint32_t maxSteps,
                    int dimX, int dimY, int dimZ);

    void uploadSteps(const std::vector<StepData>& steps);
    void dispatch(uint32_t nSteps);

    // Copy device dose grid to host vector (size = dimX*dimY*dimZ)
    void downloadDose(std::vector<float>& out);

    void cleanup();

private:
    // Vulkan core
    VkInstance instance = VK_NULL_HANDLE;
    VkPhysicalDevice phys = VK_NULL_HANDLE;
    VkDevice device = VK_NULL_HANDLE;
    VkQueue queue = VK_NULL_HANDLE;
    uint32_t queueFamily = 0;

    VkCommandPool cmdPool = VK_NULL_HANDLE;
    VkCommandBuffer cmd = VK_NULL_HANDLE;
    VkFence fence = VK_NULL_HANDLE;

    // Pipeline
    VkDescriptorSetLayout dsetLayout = VK_NULL_HANDLE;
    VkPipelineLayout pipelineLayout = VK_NULL_HANDLE;
    VkPipeline pipeline = VK_NULL_HANDLE;
    VkDescriptorPool dpool = VK_NULL_HANDLE;
    VkDescriptorSet dset = VK_NULL_HANDLE;

    // Buffers
    VkBuffer stepsBuf = VK_NULL_HANDLE;
    VkDeviceMemory stepsMem = VK_NULL_HANDLE;

    VkBuffer doseBuf = VK_NULL_HANDLE;
    VkDeviceMemory doseMem = VK_NULL_HANDLE;

    // Host pointers (host-visible for MVP)
    void* stepsMapped = nullptr;
    void* doseMapped  = nullptr;

    // Sizes
    uint32_t maxSteps_ = 0;
    int dimX_=0, dimY_=0, dimZ_=0;
    size_t stepsBytes_=0, doseBytes_=0;

    // Helpers
    bool createInstance();
    bool pickDevice();
    bool createDevice();
    bool createCommandObjects();
    bool createBuffers();
    bool createPipeline();

    uint32_t findMemoryType(uint32_t typeBits, VkMemoryPropertyFlags props);
    bool createBuffer(VkDeviceSize size, VkBufferUsageFlags usage,
                      VkMemoryPropertyFlags props,
                      VkBuffer& buf, VkDeviceMemory& mem, void** mapped);

    bool loadShaderModule(const char* path, VkShaderModule& out);

    void recordAndSubmit(uint32_t nSteps);
};

0 replies

nkrah · 2026-05-20T16:21:57Z

nkrah
May 20, 2026
Collaborator

Interesting. I am not sure if I understand how this could be used within the tracking/stepping logic of Geant4. I see that uploadSteps() would require a vector of steps, but in SteppingAction (or actually ProcessHits), we only have a single step. We would have to somehow cache steps and then upload them to the GPU. I am not sure, though, if the current step instant survives the step iteration.

Another point is: this would only offload the scoring to the GPU, not the actual tracking by Geant4, so I am not sure if speed-up would really immense. Maybe I am missing a point.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding GPU-capable Vulkan Dose Actor #1015

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Adding GPU-capable Vulkan Dose Actor #1015

Uh oh!

BishopWolf May 6, 2026

Replies: 2 comments

Uh oh!

Uh oh!

BishopWolf May 6, 2026 Author

Uh oh!

nkrah May 20, 2026 Collaborator

BishopWolf
May 6, 2026

BishopWolf
May 6, 2026
Author

nkrah
May 20, 2026
Collaborator