Auto-Vk is a low-level convenience and productivity layer for the best graphics API there is, namely
(...psst: and also the graphics API that most desperately needs such a layer.)Auto-Vk is written in modern C++ (using C++17 and C++20 features) and is built atop the Khronos Group's very own Vulkan-Hpp. It aims to add more clarity, efficiency, and expressiveness to writing Vulkan code, while not abstracting-away any (or let's say: most) of its important concepts.
I.e., the big, important concepts, which make Vulkan as performant as it can be---such as fine-grained synchronization, parallel command recording on the host-side, usage of multiple queues, etc., and as described in our Vulkan Lecture Series---are all still there, but Auto-Vk can help to not spend days implementing them, and to not lose track of the big picture of your source code. Just a few parts are abstracted a bit more heavily by Auto-Vk, in particular: descriptor set allocation and handling, memory allocation, and the handling of certain configuration parameters and flags. Should any of these parts fail suit your needs, you're still free to use raw Vulkan, or even more better: We're always happy to receive pull requests which extend the functionality and versatility of Auto-Vk.
Auto-Vk requires
- A Vulkan 1.3 SDK
- Vulkan-Hpp
- A C++20 compiler
Auto-Vk consists of multiple C++ include files, one mandatory C++ source file, and one optional C++ source file:
- Add
include/
to the include paths so that your compiler can find include files under pathsavk/*
- Add
src/avk.cpp
as a compiled C++ source code file
- On
clang
(at least version <= 12) Auto-Vk does not compile when usinglibstdc++
version 11 or higher, becauseclang
doesn't yet support "Down withtypename
!" (P0634R3), which is used in thelibstdc++
ranges
-header.
This section shows some selected source code snippets which shall showcase the look and feel of Auto-Vk. Some real example applications can be found in the Auto-Vk-Toolkit repository.
The task of creating a graphics pipeline in Vulkan might require several hundreds of lines of code. Here is what it can look like with Auto-Vk:
auto graphicsPipeline = myRoot.create_graphics_pipeline_for(
// Specify which shaders the pipeline consists of:
avk::vertex_shader("shaders/vertex_shader.vert"),
avk::fragment_shader("shaders/fragment_shader.frag"),
// The next 3 lines define the format and location of the vertex shader inputs:
avk::from_buffer_binding(0) -> stream_per_vertex<glm::vec3>() -> to_location(0),
avk::from_buffer_binding(1) -> stream_per_vertex<glm::vec2>() -> to_location(1),
avk::from_buffer_binding(2) -> stream_per_instance<glm::vec3>() -> to_location(5),
// Some further settings:
avk::cfg::front_face::define_front_faces_to_be_counter_clockwise(),
avk::cfg::viewport_depth_scissors_config::from_extent(1920u, 1080u),
// Declare the renderpass' attachments and the actions to take:
avk::attachment::declare(vk::Format::eR8G8B8A8Unorm, avk::on_load::clear, avk::usage::unused >> avk::usage::color(0) >> avk::usage::input(0), avk::on_store::dont_care),
avk::attachment::declare(vk::Format::eD24UnormS8Uint, avk::on_load::clear, avk::usage::depth_stencil >> avk::usage::preserve >> avk::usage::input(1), avk::on_store::dont_care),
avk::attachment::declare(vk::Format::eR8G8B8A8Unorm, avk::on_load::clear, avk::usage::unused >> avk::usage::unused >> avk::usage::color(0), avk::on_store.in_layout(avk::layout::color_attachment_optimal)),
// The following define resources that will be bound to the pipeline:
avk::push_constant_binding_data { avk::shader_type::vertex, 0, sizeof(std::array<float, 16>) },
avk::descriptor_binding(0, 0, myBuffer->as_uniform_buffer()),
avk::descriptor_binding(0, 1, myImageView->as_sampled_image(avk::layout::shader_read_only_optimal))
);
This creates a fully functioning graphics pipeline on the device consisting of two shaders (a vertex shader and a fragment shader), with three vertex/instance input bindings, stating some additional pipeline settings (face orientation, and viewport config), defining the renderpass' attachments, and further resource bindings to the pipeline: push constants of the size 64 byte (16 floats, which is enough space for a 4x4 matrix), and two descriptors, namely a buffer bound as uniform buffer (corresponding to descriptor type vk::DescriptorType::eUniformBuffer
) and an image view as sampled image (corresponding to descriptor type vk::DescriptorType::eSampledImage
) in a given layout.
Vertex input bindings are specified in an expressive way: For example,
avk::from_buffer_binding(2) -> stream_per_instance<glm::vec3>() -> to_location(5)
means that from the buffer that is bound at index 2
when invoking vk::CommandBuffer::bindVertexBuffers
, data is being streamed to vertex shader's layout (location=5)
and the data pointer is advanced per instance.
In a similarly expressive manner, the subpass usages of the renderpass' attachments are specified. In the example above, there are three subpasses. The first color attachment is not used in the first subpass, it is used as color attachment in subpass one (bound to output location 0
), and as input attachment in subpass two (bound to input attachment index 0
).
The depth/stencil attachment is used as depth/stencil attachment in the first subpass, is not used in the second subpass but its contents are preserved, and in the third subpass, is is used as an input attachment (bound to input attachment index 1
).
The second color attachment is only used in subpass three (bound to output location 0
), and its contents are stored in an image layout optimal for color attachments, while the contents of the other two attachments are no longer needed.
Recording batches of work and submitting that to a queue can require a lot of code---especially if additional synchronization to other work shall be established. With Auto-Vk, it can look like follows:
avk::queue& graphicsQueue = myRoot.create_queue(vk::QueueFlagBits::eGraphics, avk::queue_selection_preference::versatile_queue);
...
avk::command_pool& commandPool = myRoot.get_command_pool_for_single_use_command_buffers(graphicsQueue);
avk::command_buffer cmdBfr = commandPool->alloc_command_buffer(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
avk::graphics_pipeline graphicsPipeline = myRoot.create_graphics_pipeline_for(...);
avk::framebuffer framebuffer = myRoot.create_framebuffer(...);
avk::semaphore imageAvailableSemaphore = ...;
mRoot.record({
avk::command::render_pass(graphicsPipeline->renderpass_reference().value(), framebuffer.as_reference(), {
avk::command::bind_pipeline(graphicsPipeline.as_reference()),
avk::command::draw(3u, 1u, 0u, 0u)
})
})
.into_command_buffer(cmdBfr)
.then_submit_to(graphicsQueue)
.waiting_for(imageAvailableSemaphore >> avk::stage::color_attachment_output)
.submit();
Getting handles to a queue (avk::queue&
) and to a command pool (avk::command_pool&
), and creating a command buffer (avk::command_buffer
) from a command pool are one-liners. The creation of a graphics queue has been described above under Creating a graphics pipeline. Framebuffers (avk::framebuffer
) and semaphores (avk::semaphore
) can be efficiently created too.
In the example, let us assume that we have received the avk::semaphore imageAvailableSemaphore
from our swap chain, indicatig when the image is available and can be rendered into. We can express such a dependency for a new batch of work (as recorded via root::record
) simply by appending .waiting_for(imageAvailableSemaphore >> avk::stage::color_attachment_output)
. It says that the batch's color attachment output stage must wait until the given semaphore has been signaled.
The batch that is recorded into avk::command_buffer cmdBfr
(through .into_command_buffer(cmdBfr)
), describes the rendering of a renderpass, and within the renderpass, the graphicsPipeline
is bound, and one draw call (rendering 3u
vertices and 1u
instance) is recorded.
The batch is submitted to a specific queue, configured through .then_submit_to(graphicsQueue)
, and executed at .submit()
-time. If .submit()
is not called explicitly, the returned avk::submission_data
's destructor will ensure to submit it to the queue (yeah, in its destructor ^^).
Also "simple" tasks like recording barriers can take so much code in Vulkan. See how a global memory barrier can be recorded in Auto-Vk:
avk::sync::global_memory_barrier(
avk::stage::transfer >> avk::stage::acceleration_structure_build,
avk::access::transfer_write >> avk::access::shader_read
)
It expresses that transfer stages and transfer writes must have completed and been made available, before an acceleration build structure may proceed, reading the data that has been written by the preceding transfer operations. Such a barrier can be recorded into a batch (e.g.: root::record
). There's an alternative (maybe even more convenient) syntax for recording such a barrier, but it requires using namespace avk;
s.t. the compiler can find the necessary operator+
:
using namespace avk;
sync::global_memory_barrier(
stage::transfer + access::transfer_write >> stage::acceleration_structure_build + access::shader_read
)
Buffer memory barriers and image memory barriers additionally offer the option to conveniently and expressively add queue family ownership transfers; image memory barriers also support the specification of image layout transitions as shown in the following example:
using namespace avk;
queue& transferQueue = myRoot.create_queue(vk::QueueFlagBits::eTransfer, queue_selection_preference::specialized_queue);
queue& graphicsQueue = myRoot.create_queue(vk::QueueFlagBits::eGraphics, queue_selection_preference::specialized_queue);
...
image myImage;
...
avk::sync::image_memory_barrier(myImage.as_reference(), stage::copy + access::transfer_write >> stage::fragment_shader + access::shader_read)
.with_layout_transition(layout::transfer_dst >> layout::shader_read_only_optimal)
.with_queue_family_ownership_transfer(transferQueue.family_index(), graphicsQueue.family_index())
Finally, as a special convenience feature, Auto-Vk is able to automatically determine the stages and access types of barriers for many cases:
using namespace avk;
std::vector<glm::vec3> vertices;
buffer vertexBuffer = myRoot.create_buffer(memory_usage::device, {}, vertex_buffer_meta::create_from_data(vertices));
bottom_level_acceleration_structure blas = ...;
mRoot.record({
myBuffer->fill(myDataPointer, 0),
sync::buffer_memory_barrier(vertexBuffer.as_reference(), stage::auto_stage + access::auto_access >> stage::auto_stage + access::auto_access),
blas->build(...)
})
...
In such a case, a barrier corresponding to stage::copy + access::transfer_write >> stage::acceleration_structure_build + access::shader_read
should be established.
First of all, include all of Auto-Vk:
#include <avk/avk.hpp>
The next essential step is to derive from avk::root
and implement three virtual methods:
virtual vk::PhysicalDevice& physical_device() = 0;
virtual vk::Device& device() = 0;
virtual vk::DispatchLoaderDynamic& dynamic_dispatch() = 0;
The following code snippet shows a minimal example implementation of the avk::root
class.
A fully-fledged and more mature implementation of it with additional functionality can be found in our Auto-Vk-Toolkit framework.
class my_root : avk::root
{
public:
my_root() {
mInstance = vk::createInstance(vk::InstanceCreateInfo{});
mPhysicalDevice = mInstance.enumeratePhysicalDevices().front();
// Prepare two queues
mQueues[0] = avk::queue::prepare(mPhysicalDevice, 0u, 0u, 0.5f);
mQueues[1] = avk::queue::prepare(mPhysicalDevice, 0u, 1u, 0.75f);
auto queueCreateInfos = avk::queue::get_queue_config_for_DeviceCreateInfo(std::begin(mQueues), std::end(mQueues));
// Iterate over all vk::DeviceQueueCreateInfo entries and set the queue priorities pointers properly (just to be safe!)
for (auto i = 0; i < std::get<0>(queueCreateInfos).size(); ++i) {
std::get<0>(queueCreateInfos)[i].setPQueuePriorities(std::get<1>(queueCreateInfos)[i].data());
}
auto deviceCreateInfo = vk::DeviceCreateInfo{}
.setQueueCreateInfoCount(static_cast<uint32_t>(std::get<0>(queueCreateInfos).size()))
.setPQueueCreateInfos(std::get<0>(queueCreateInfos).data());
// Set further configuration parameters for the device
mDevice = mPhysicalDevice.createDevice(deviceCreateInfo);
context().mDynamicDispatch = vk::DispatchLoaderDynamic{ mInstance, vkGetInstanceProcAddr, mDevice };
// Assign handles to the queues to finish their configuration
for (auto& q : mQueues) {
q.assign_handle(mDevice);
}
}
vk::PhysicalDevice& physical_device() override { return mPhysicalDevice; }
vk::Device& device() override { return mDevice; }
vk::DispatchLoaderDynamic& dynamic_dispatch() override { return mDispatchLoaderDynamic; }
private:
vk::Instance mInstance;
vk::PhysicalDevice mPhysicalDevice;
std::array<avk::queue, 2> mQueues;
vk::Device mDevice;
vk::DispatchLoaderDynamic mDispatchLoaderDynamic;
}
auto myRoot = my_root{};
Also refer to include/avk/root_example_implementation.hpp
.
Queues require some special handling because they must be declared prior to creating the logical device. Use avk::queue::prepare
to prepare a queue and use the convenience method avk::queue::get_queue_config_for_DeviceCreateInfo
to generate the required entries for vk::DeviceCreateInfo
.
From this point onwards, the root class (my_root
in the example) serves as the origin for creating all kinds of things.
To create an image with an image view, for example, you could invoke:
auto myImageView = myRoot.create_image_view(
myRoot.create_image(1920, 1080, vk::Format::eR8G8B8A8Unorm, 1, avk::memory_usage::device, avk::image_usage::general_storage_image)
);
The first parameters state the resolution, the format, and the number of layers of the image. The memory_usage
parameter states that this image shall live device memory, and the image_usage
flags state that this image shall be usable as a "general image" and as a "storage image". This will create the the image with usage flags vk::ImageUsageFlagBits::eTransferSrc | vk::ImageUsageFlagBits::eTransferDst | vk::ImageUsageFlagBits::eSampled | vk::ImageUsageFlagBits::eStorage
and the tiling flag vk::ImageTiling::eOptimal
.
To create a buffer that is usable as both, uniform buffer, and vertex buffer, you could invoke:
std::vector<std::array<float, 3>> vertices;
auto myBuffer = myRoot.create_buffer(
avk::memory_usage::host_coherent, {},
avk::vertex_buffer_meta::create_from_data(vertices),
avk::uniform_buffer_meta::create_from_data(vertices)
);
The first parameter states that the buffer shall live in host-coherent memory. The second parameter states no additional usage flags. Parameters three and four state two different usage types of the same buffer, namely as vertex buffer and as uniform buffer which means that the buffer will be created with both, vk::BufferUsageFlagBits::eVertexBuffer
and vk::BufferUsageFlagBits::eUniformBuffer
, flags set. The create_from_data
convenience methods automatically infer size of one element and number of elements from the given std::vector
.
To fill a buffer (and this generally applies to operations after creation), no reference to myRoot
is needed anymore. Some further operations can be invoked on the created instance like follows:
auto actionTypeCommand = myBuffer->fill(vertices, 0);
The first parameter are the data, the second refers to the meta data index to use (in this case to the vertex_buffer_meta
, but actually the data should be the same for all of the meta data entries anyways). This operation returns an instance of type avk::command::action_type_command
. This indicates that there are (or might be) commands which still need to be submitted to a queue and executed there in order to complete the operation.
Auto-Vk offers some preprocessor settings which influence its internal behavior. In order to change these settings, define the respective setting before including avk.hpp
!
AVK_STAGING_BUFFER_MEMORY_USAGE
: If defined before includingavk.hpp
, it can be used to change the memory type that staging buffers are created in.
Expected value: A value of theavk::memory_usage
enum.
Example:#define AVK_STAGING_BUFFER_MEMORY_USAGE avk::memory_usage::host_coherent
(this is also the default value).AVK_STAGING_BUFFER_READBACK_MEMORY_USAGE
: If defined before includingavk.hpp
, it can be used to change the memory type that readback buffers are created in.
Expected value: A value of theavk::memory_usage
enum.
Example:#define AVK_STAGING_BUFFER_READBACK_MEMORY_USAGE avk::memory_usage::host_visible
(default value).AVK_USE_CORE_INSTEAD_OF_SYNCHRONIZATION2
: If defined before includingavk.hpp
, it changes Auto-Vk internally s.t. it uses the core2
-type Vulkan API functions instead of the2KHR
-type functions from theVK_KHR_synchronization2
extension. There's a corresponding function for each one. The Synchronization2 functions have been promoted to core in Vulkan 1.3.
Expected value: None, just defineAVK_USE_CORE_INSTEAD_OF_SYNCHRONIZATION2
or don't.
Example:#define AVK_USE_CORE_INSTEAD_OF_SYNCHRONIZATION2
(Not defined by default. I.e., by default the Synchronization 2 API functions are used.)
When to use: If you're targeting Vulkan 1.3 and above only, define it! If you're targeting also Vulkan API versions smaller than 1.3, then do not define it, but make sure to enableVK_KHR_synchronization2
!DISPATCH_LOADER_CORE_TYPE
: Can be used to define a custom dispatch loader type for the core functions (those that do not require extensions + some of the swapchain functions) passed on to Vulkan-Hpp types and calls.
Expected value: A type compatible with theDispatch
template parameters used in Vulkan-Hpp.
Example:#define DISPATCH_LOADER_CORE_TYPE vk::DispatchLoaderStatic
(default value)DISPATCH_LOADER_EXT_TYPE
: Can be used to define a custom dispatch loader type for the extension functions (all those which are not core or swapchain) passed on to Vulkan-Hpp types and calls.
Expected value: A type compatible with theDispatch
template parameters used in Vulkan-Hpp.
Example:#define DISPATCH_LOADER_EXT_TYPE vk::DispatchLoaderDynamic
(default value)AVK_USE_VMA
: If defined before includingavk.hpp
, it changes the memory allocator used by Auto-Vk internally to the Vulkan Memory Allocator (VMA).
Expected value: None, just defineAVK_USE_VMA
or don't.
Example:#define AVK_USE_VMA
(Not defined by default. I.e., by default VMA is not used internally.)
Further information: See Memory Allocation section below.
Resource management in Auto-Vk is managed in a way which strongly relies on move-only types. That means: Whenever a resource's destructor is being invoked, the resource is destroyed and its memory freed. Furthermore, where resources "live" can be totally controlled by the programmer: be it on the stack or on the heap, enabling both, efficient and versatile usage patterns. These policies require explicit and precise handling of how resources are stored and passed around, especially when passing them as arguments to functions/methods.
Whenever you see resources as parameters passed, that can come in one of three different flavours:
avk::owning_resource<T>
indicates that ownership of the resource needs to be taken. (Example:avk::image
, which is defined as follows:using image = owning_resource<image_t>;
)const T&
indicates that a referenc to the resource shall be passed. (Example:const avk::image_t&
, where anavk::image
can be turned into anconst avk::image&
through its.as_reference()
member (or more precisely avk::owning_resource::as_reference)).avk::resource_argument<T>
indicates that either ownership or a reference can be passed. This means that the called functionality offers to take care of the resource's ownership if the argument is passed asavk::owning_resource<T>
---and if it is passed asconst T&
, ownership is not handled. (Example:avk::resource_argument<avk::image_t>
)
The resource type T
always refers to the "inner resource type" which, for example, means avk::image_t
, when there's also a avk::image
. I.e. don't let yourself be confused by the _t
-version and the non-_t
-version, they actually mean the same thing, only that the non-_t
-versions represent the _t
-versions with an additional lifetime-convenience-wrapper around them which adds some features w.r.t. lifetime-handling of the "inner resource types" (i.e. the _t
-versions). Whenever you call one of the root::create_*
methods, you'll get back a resource with that convenience-wrapper. The wrapper types are defined as avk::owning_resource<T>
, where T
refers to one of the "inner resource types".
The following is a list of Auto-Vk resource types that are implemented and handled as described above:
- Buffer type
avk::buffer_t
and its wrapper typeavk::buffer
- Buffer-View type
avk::buffer_view_t
and its wrapper typeavk::buffer_view
- Image type
avk::image_t
and its wrapper typeavk::image
- Image-View type
avk::image_view_t
and its wrapper typeavk::image_view
- Sampler type
avk::sampler_t
and its wrapper typeavk::sampler
- Image Sampler type
avk::image_sampler_t
and its wrapper typeavk::image_sampler
- Framebuffer type
avk::framebuffer_t
and its wrapper typeavk::framebuffer
- Renderpass type
avk::renderpass_t
and its wrapper typeavk::renderpass
- Graphics Pipeline type
avk::graphics_pipeline_t
and its wrapper typeavk::graphics_pipeline
- Compute Pipeline type
avk::compute_pipeline_t
and its wrapper typeavk::compute_pipeline
- Ray-Tracing Pipeline type
avk::ray_tracing_pipeline_t
and its wrapper typeavk::ray_tracing_pipeline
- Ray-Tracing Bottom Level Acceleration Structure type
avk::bottom_level_acceleration_structure_t
and its wrapper typeavk::bottom_level_acceleration_structure
- Ray-Tracing Top Level Acceleration Structure type
avk::top_level_acceleration_structure_t
and its wrapper typeavk::top_level_acceleration_structure_t
- Command Pool type
avk::command_pool_t
and its wrapper typeavk::command_pool
- Command Buffer type
avk::command_buffer_t
and its wrapper typeavk::command_buffer
- Fence type
avk::fence_t
and its wrapper typeavk::fence
- Semaphore type
avk::semaphore_t
and its wrapper typeavk::semaphore
- Query Pool type
avk::query_pool_t
and its wrapper typeavk::query_pool
By default, avk::owning_resource<T>
are created with so called "shared ownership" enabled, which means that the resources are internally stored in shared_ptr
s and living on the heap. Should you, for efficiency reasons, prefer to not have ths behavior, please consider the second parameter of avk::owning_resource<T>::owning_resource(T&& aResource, bool aCreateWithSharedOwnershipEnabled = true)
(and please note that it lacks proper support by Auto-Vk currently).
By default Auto-Vk uses a very straight-forward, but for most cases probably also suboptimal, way of handling memory allocations: One allocation per resource. This is especially suboptimal if many small resources are used. Implementation-wise, avk::mem_handle
is used in this case.
Auto-Vk, however, allows to easily swap this straight-froward way of memory handling with using the well-established Vulkan Memory Allocator(VMA) library. Only a small config-change is necessary to switch from avk::mem_handle
to avk::vma_handle
, which uses VMA to alloc memory for all resources.
To enable VMA, define AVK_USE_VMA
before including <avk/avk.hpp>
:
#define AVK_USE_VMA
#include <avk/avk.hpp>
If you're using CMake
you can do this by setting the avk_UseVMA
option to ON
.
Note: Your project setup must enable the compiler to include VMA from one of the following include paths: <vma/vk_mem_alloc.h>
or <vk_mem_alloc.h>
. This can be established through:
- Installing VMA through the Vulkan installer or its maintenance tool (e.g.,
maintenancetool.exe
on Windows) by selecting theVulkan Memory Allocator header
. option.
This is all that is required to set-up VMA to handle all internal memory allocations.
Advanced usage:
By defining AVK_USE_VMA
, internally the following definitions are set:
#define AVK_MEM_ALLOCATOR_TYPE VmaAllocator
#define AVK_MEM_IMAGE_HANDLE avk::vma_handle<vk::Image>
#define AVK_MEM_BUFFER_HANDLE avk::vma_handle<vk::Buffer>
By defining them by yourself before including <avk/avk.hpp>
, you can plug in custom memory allocation behavior into Auto-Vk.