Release notes

Release notes are moved to https://github.com/GPUOpen-Drivers/AMDVLK/releases since 2018-11-8

2018-10-26 update

Fix a LLPC issue that loop invariant code motion does not work in certain cases when there are multiple push constant, seeing ~13% performance gain in Serious Sam Fusion 4k - Low setting
Fix llvm.amdgcn.fmed3.f16 doesn't work on gfx8, use min/max to emulate it
Move lowering optimizations to after patch phase
Enable building amdllpc on clang, including MacOS
Support swapchain composite alpha
Refine wayland window system support in PAL
[GpuProfiler] Add support for injecting pipeline and shader hashes into ThreadTraceViewer thread traces
Remove support for view 3d as 2d array
Fix clears of 32-32-32 format images
Fix crash when running Killer Instinct with Steam Proton

2018-10-17 update

Update Vulkan headers to 1.1.86
Enable VK_KHR_shader_atomic_int64 extension
Enable VK_GOOGLE_decorate_string and VK_GOOGLE_hlsl_functionality1 extension
Remove VK_KHX_device_group
Refine wayland window system support in PAL
Add Indirect Function Support to PAL
Improve CPU performance hotspots that a large amount of time was spent in a few CmdUtil functions that RMW command memory
Improve handling of scratch memory in PAL
Add code object API Loader Events chunk to RGP traces, Part 2 of 2
Add the support for rotated copy in graphics scaled copy path
Enable gamma conversion in graphics scaled copy path
Improve GpaSession handling of per-draw granularity performance counters
[dxvk/wine] Fix Final Fantasy XII missing text
Fix peephole crash in the Witcher 3

2018-9-29 update

Enable VK_KHR_driver_properties extension
Implement direct display for console mode
Implement VK_KHR_shader_atomic_int64 extension
Replace buffer.load intrinsic with raw.buffer.load intrinsic in GS off-chip path
Add peephole optimizations for PHI's & vector operations, up to 6% performance improvement
Investigate DATA_AND_OFFSET mode for LOAD_*_REG_INDEX, Part #1
Add code object Database chunk to RGP traces. Part 1 of 2
Add new query to obtain the PCI bus id for a physical device
Add dest color key and src alpha blend support in graphics scaled copy path and enable the path
Support YCbCr plane of UYVY and YUY2 color filling
Add MGPU support to DynamicDescriptorData
Fix several crash/hang/corruption issues for running games on dxvk
Update llvm to 340950

2018-9-12 update

Pipeline/code object metadata refactoring using MsgPack
Implement direct display for console mode
Implement a graphics path for scaled copy in PAL.There are some CTS failures,so force to use computer path for scaled copy first
Move GpuProfiler granularity from GpuProfilerPerfCounterConfig to GpuProfilerConfig as it is applies to traces as well
Fix some CTS failures for Vega12
Remove support for graphics-only command buffers from PA

2018-8-30 update

Update Vulkan headers to version 1.1.82
Enable VK_EXT_conservative_rasterization extension, only "Primitive overestimation" feature is supported
VK_EXT_descriptor_indexing: support non-uniform flag in image and atomic operations
Fix uniform_buffer_dynamic_array_non_uniform_access_* test failures on GFX6

2018-8-24 update

Add Vega12 support
Educate LLPC on choosing better loop trip counts for loop unrolling (Note: The change causes some performance drop in Dawn of War 3, will be fixed in following drop)
PAL fence refactoring (phase 1)
Add PAL support for VK_EXT_conservative_rasterization extension
Fix crash issue when running Doom with wine/dxvk
Fix build failure after make clean

2018-8-17 update

Support non-uniform descriptor index for store operations
Enable non-uniform indexing support for VK_EXT_descriptor_indexing
Add Wave32 support in LLPC subgroup arithmetic code path
Use python to generate subgroup arithmetic Op wrapper code in LLPC
Support re-parse command options in Llpc::Compiler. It only works when all compiler instance are destroyed
Add dump compiler option in LLPC
Add debug support with pipeline binary replacement
Remove format A8B8G8R8_SRGB_PACK32 from support list in LLPC; Disable A2B10G10R10 patch on gfx9
Remove imported Jemalloc and disable jemalloc from CMake
[Raven]-Support UMC block perfcounters
Disable multisampled and depth/stencil PRT features
Fix a potential build dependency issue of strings.cpp
Fix overhead of RGP capture in some applications
Fix VK_AMD_shader_info is broken on top of PAL NULL backend
Fix device groups enumeration for mGPU support
Fix a witcher3 crash issue
Add some shift op tests to shaderdb

2018-8-15 update

Add some workarounds in llpc
Add option to prefix cache and debug file paths
Fix some issues in Util::IsKeyPressed implementation. Now GPU profiling could be triggered by pressing shift-F11
Updates ISettingsLoader interface to remove Device dependency and to use IndirectAllocator
Clean up Linux VA partition initialization
Upgrade gpuopen
Fix performance drop in Serious Sam
Fix some non-uniform descriptor waterfall problems in LLVM
Fix a regression in spvgen that shader log is removed by mistake

2018-8-3 update

Enable VK_KHR_8bit_storage extension
Begin to add dpp (data parallel primitive) support for gfxip8/9
Fix failure to parse some bad SPIR-V
Fix bugs in testShaders.py: compile_name is not set in asyc process; crash when shader number is less than 8
Add option to prefix the ICD's multiple debugging paths
Fix an issue for MGPU that vkGetPhysicalDeviceXlibPresentationSupportKHR returns false when presentation is supported
Refine WaitForCompletion for Wayland support, let wsa to wait event to fulfill doWait requirement
Update implementation for VK_EXT_acquire_xlib_display extension
- Don’t build lease-related functions if the xcb-randr dev package used for driver build doesn’t support lease
- Dri3WindowSystem::AcquireScreenAccess returns error if lease is not supported at runtime
Implement Util::IsKeyPressed
Settings Refactor - convert the legacy settings config files to the new JSON format that is used by the DevDriver settings service
Fast clear eliminate performance optimization

2018-7-26 update

Implement non-uniform descriptor index support
Add an option to do dynamic loop unroll
Fix dxvk: F.E.A.R. 3 black screen in menu
Fix dEQP-VK.spirv_assembly.type.scalar.u16.switch_* failure
Disable memory clause formation if forcing si-scheduler
[PAL]Update timingReport.py script
[LLPC]Move some functions in llpcInternal.h/.cpp to other proper files
[LLPC]Refine register settings
[XGL]Change the type of ConnectorId from int32 to uint32
[XGL]vkUpdateDescriptorSetWithTemplate(): move the loop over the GPUs into the UpdateEntryXXX() functions

2018-7-20 update

Enable general variable pointer support
Enable VK_KHR_create_renderpass2 extension
Enable VK_KHR_get_display_properties2 extension
Refine code and fix issue for PRT support, add a runtime setting (OptEnablePrt) to enable PRT feature
Add support for non-MSAA programmable sample locations
Add support for non-0 defaultIndex for memory object allocated in device group
Use thin tiles for non-standard 3D PRT 64-bit format images
Don't use LayoutShaderFmaskBasedRead for depth/stencil images
Add option to control zero-initialization of IL registers
Dota2: Enable ReZ for several G-Buffer shaders on Ellesmere
Report VK_ERROR_OUT_OF_DEVICE_MEMORY once heap size is exceeded. Currently the feature is hidden behind the MemoryEnableExternalLocalTracking panel setting
Fix build and link error with clang
Fix build error when comparing int32 with uint32
Fix device groups enumeration: use GetMultiGpuCompatibility() to check for mGPU support
Fix memory error handling
Fix soft hang in EQP-VK.wsi.wayland.swapchain.render.basic
Fix GL_AMD_gpu_shader_int16 + GL_AMD_shader_ballot interaction issue: various GLSL functions return invalid output
Fix PushConstants issue in LLPC
[LLPC]Enable new dimension aware image intrinsic by default
[LLPC]Add i32 format integer gather patch to dimension aware intrinsic path
[LLPC]Refine ELF dump related code
[PAL]Correct logic for CreatePlatformKey() argument validation
[PAL]Fix initialization of indirect user data table pointer
[PAL]Fix MGPU support issue
[PAL]Adjust code to fit new Wayland window system interface definition
[PAL]Disable the workaround for delayed reserve of PRT VA Range starting from version 2.27 of amdgpu kernel module or version 4.18 of Linux kernel
[PAL]Change FCE optimization to remove CPU perf hotspot due to memset
[PAL]Remove PAL setting forcedUserDataSpillThreshold
Update LLVM to trunk @336387
[LLVM]Waterfall instrinsics for non-uniform descriptor support
[LLVM]New tbuffer intrinsics with combined format

2018-7-12 update

Enable VK_EXT_direct_mode_display extension
Implement VK_EXT_acquire_xlib_display extension
Enable variable pointer of storage buffer
Update Vukan headers to 1.1.77.0
Fix DXVK flickering garbage issue
Fix some 64-bit case failures in dEQP-VK.spirv_assembly.type.*
Fix clang compile error in image code
Fix some Renderpass attachment_write_mask test failures
Expose the feature of gathering different global perf counter per block instance through gpu profiler
Fix the command buffer IDs in the Gpu Profiler and Cmd Buffer Logger layers.
Fix issue that clSVMAlloc is failing to create allocations greater than 2GB on Vega10
[LLPC] Add integer gather patch for i32 resource format
[LLPC] Extract 6 low bits from 1D offset since we translate 1D texture to 2D on gfx9
[LLPC] Sync LLPC translator source code with upstream
[PAL] Add image usage flag to make 3d arrays work when accessed as 2d
[PAL] Add a counting suffix to the end of every layer's per-platform logging directory to prevent collisions when PAL is recreated in less than a second.
[PAL] Move WAIT_CE_COUNTER To Immediately Before Draw
[PAL] Disable Write Confirm for CPDMA Shader Prefetch
[LLVM] f16 interpolation with rtz mode
[LLVM] Fix assertion error in register allocation

2018-7-2 update

Separate LLPC to https://github.com/GPUOpen-Drivers/llpc
Enable EXT_vertex_attribute_divisor extension
Enable EXT_descriptor_indexing extension (limited to dynamic indexing)
Enable VK_KHR_draw_indirect_count extension
Update Vulkan headers to v1.1.76
Clean up PRT CTS test failures
Release shaderdb tests
Release spvgen

[XGL]

Increase reported mip tail size to match Addrlib alignment requirements (3D PRT)
Zero-initialize data in Semaphore
Refactor the device group resource binding logic
Add device group for semaphore
Idle time in between submits during RGP capture
Remove the copy in GetSparseImageFormatProperties2
Change barrier policy to handle 3 aspects - in case of YUV images
Add FS2 support and some colorspace and transfer function tweaks
Barrier optimization: add per-queue family policies to limit the scope of buffer and image memory barriers to those applicable to the specified queue family.
Remove DescriptorSet::m_pPool sine it is only used in Destroy() and Destroy() is never called
Fix an issue that vkCmdPipelineBarrier calls which only define execution dependencies are ignored
Make sure DescriptorSet::dynamicDescriptorData is 64 bit aligned.
Skip the subpass self-dependencies
Pass fmaskBasedMsaaReadEnabled and robustBufferAccess as template parameters to avoid using space in each descriptor set.
Ordered Approach to App Detection

[PAL]

Fix pEngineInfo->sizeAlignInDwords evaluation
Disable degamma for sRGB source images when executing vkCmdColorSpaceBlitImageAMD
Add Util::Event::Wait and remove the unused Util::WaitForEvents
vk_Interop support: waite on a master fence whose OPAQUE_FD payload has been reset by waiting on a "user" fence succeeds as if it were still signaled
Use virtual page size instead of buffer size to map/unmap External Physical buffer. For External Physical memory, free marker before freeing its surface
Add a script timingReport.py which could be used to analyze GPU profiling result to identify top pipelines
FastClearEliminate optimization for performance
Fix F1 2017 Corruption observed while running benchmark
Add SyncobjFence created with FENCE_CREATE_SIGNALED_BIT
Add a workaround for the issue that amdgpu doesn’t synchronize PTE updates

[LLPC]

Use option -enable-dim-aware-image-intrinsic to control whether or not to use new dimension aware image intrinsic path
Support dump full data in SpecializationInfo when data size isn't aligned dword
Fix below issues
- SV_PrimitiveID gets requested even though never used (due to specialization constant)
- Texture SRD is loaded with two flat_load_dwordx4 which slows down the GPU performance
- Some sampling testing failures
- GCNShader cubeFaceIndexAMD_const testing failure
- ShaderImageLoadStoreLod imageStoreLodAMD_Cube testing failure
Sync LLPC translate component with upstream SPIRV-LLVM

2018-6-8 update

Add Barrier optimization to avoid unnecessary cache flushes/invalidations in case of ownership transfer barriers
[LLPC]Default enable new LLVM dimension aware image intrinsics
[LLPC] Add an option to set loop unroll count
Add MGPU support for VkDeviceGroupBindSparseInfo. Sparse binding with resourceDeviceIndex != memoryDeviceIndex still doesn't work correctly. The root cause wasn't found yet
Add back-end support for sparse texture
Fix an issue for sparse texture support that gather component is not correctly passed to dmask
Fix dEQP-VK.pipeline.push_constant.graphics_pipeline.overlap_4_shaders_vert_tess_frag failure on Vega10
Fix an issue that clSVMAlloc is failing to create allocations greater than 2GB
Update pm4 packet headers
Fix a typo that was causing an explosion of stack space.
Force fMask swizzle mode to be 4kB on GFX9 platforms in order to take advantage of optimized copy path.
Fix Gfx6 failure on VK_KHR_maintenance1_copy_image_2D_array_to_3D_transfer_R32G32B32A32_*
Remove the usage of AMDGPU_CS_MAX_IBS_PER_SUBMIT because it's deprecated in libdrm after version 2.4.92
Remove explict fence reset check in Queue::SubmitInternal
[PAL]Add IndirectAllocator utility class
Print ClientMem pointer in the leaked list, which helps debugging memory leak

2018-6-1 update

[LLPC] Add image operation lz optimization
[LLPC] Add an option to dump llvm module's CFG
Add timestamp hash to VkPipielineCache ID to make it more unique
Add more implementation for sparse texture support
Support new dimension aware image intrinsic: sample group and gather group
Add recommended heap in Pal::DeviceProperties to client for each engine for best performance
Add Util::ArrayLen, a constexpr function to get the length of an array at compile time. This is meant to be used in place of "sizeof(foo) / sizeof(foo[0])"
Remove asserts that fire on images that aren't render targets
WriteEventCmd should translate HwPipePostIndexFetch to WRITE_DATA on ME engine
Add implmentations for ComputeResults for StreamoutStats queries
Update LLVM to trunk @332170
Fix a shader ballot issue in llvm backend which causes Wolfenstein 2 hang on Wine

2018-5-25 update

Add fp16 interpolation intrinsics and register settings for AMD_gpu_shader_half_float
Add extension VK_AMD_gpu_shader_half_float_fetch (not enabled)
[LLPC]Support dual source blend
[LLPC]Enable on-chip GS by default for GFX6-8
Check VkPhysicalDeviceFeatures2 on device create
Barrier optimization: move decision about whether to apply layout transitions for this barrier in case of ownership transfers to the ImageBarrierPolicy class
[LLPC] spir-v reader: fix clang compile error in image code
Remove the support for PRT depth/stencil formats. Single-aspect depth and stencil are still supported
Report per-aspect sparse image format properties for depth/stencil
[LLPC] Fix an issue of MRT color out
Simplify sparse texture bind virtual offset calculation
Remove the implicit null sparse bind on queue 0
Disable loop unroll for game TombRaider to work-around an issue that lighting is incorrect on main menu and in benchmark
[LLPC]Support new dimension aware image instrinsics
- Support general Fmask loading
- Fix SubpassDataArray dimension in GL_EXT_multiview
Fix assert caused by missing image layout in renderpass logger
Add IHashProvider and IHashContext to PAL Util namespace
New a flag sampleLocsAlwaysKnown to enable defer MSAA depth expand optimization for GFX6~9
Null initialize the fmask srd if in CreateFmaskViewSrdsInternal() there is no fmask for the image
Fix the issue that VK_KHR_maintenance1 + sDMA queue: 2D Array image -> 3d image copy ops (and vice versa) does not work
Fix copies of BCn mip-levels where the HW determines the incorrect size of the mip level

2018-5-18 update

Revert api_version in Json file to 1.1.70
Enable extension VK_KHR_display
[LLPC]Add missing int64 function
[LLPC]Support new dimension aware image instrinsics
- Add runtime option to support switching between dimension aware image intrinsics and old image intrinsic
- Add dimension aware version of fetch fmaskvalue,
- Fix fmask loading failure in VulkanCTS dEQP-VK.amd.shader_fragment_mask group
Expose the subgroup arithmetic capabilities
Pipeline stats crash the GPU profiler
Only disable DE workload IB when PAL MCBP is off
Buffer->image copy op truncates output written to the image if a 2D R32G32B32 linear 48x240 image is used
Add new interface CmdDrawOpaque() in PAL
Update LLVM to trunk @329887
Fix shader_ballot writelane incorrect issue

2018-5-9 update

Update Vulkan headers to 1.1.73
Implement VK_EXT_descriptor_indexing (not enabled )
[LLPC]Begin to add support (ImageRead and ImageFetch) for dimension aware image intrinsics which is newly added in LLVM backend and will replace old hardware oriented image intrinsics
[LLPC]Use wqm intrinsic for ds_swizzle derivatives
[LLPC]Update SPIR-V header
Fix bugs in fetch RGB10A2
Barrier optimization: move the responsibility of handling image layouts to the barrier policy classes
Set PARTIAL_VS_WAVE_ON to 1 for off-chip GS to work-around an issue of system hang
Remove support for image atomics from formats that should not support it
Remove the Per-Device ring buffers for CE RAM dumps
Make internal CE RAM dumps cacheline-aligned
Fix GPU scratch memory allocation bug

2018-4-28 update

Enable AMD_shader_ballot and AMD_gpu_shader_half_float extension
Expose the subgroup shuffle capabilities, implement arithmetic 16bit and 64bit operation
Enable app_shader_optimizer in LLPC path
Barrier optimization
Workaroud TombRaider third benchmark hang issue
Fix allocation granularity issue
Add max mask enum for ImageLayoutUsageFlags and CacheCoherencyUsageFlags
Fix issues in FragColorExport::ComputeExportFormat()
Remove 32-bit CTS workaround
Set unboundDescriptorDebugSrdCount PAL setting to 0 to avoid CTS issues with using multiple devices through testing
Fix the issue that driver reports currentExtent of (N, 0) on zero sized width/height surface; According to Vulkan spec 1.1.70.1, currentExtent of a valid window surface(Win32/Xlib/Xcb) must have both * width and height greater than 0, or both of them 0
Fix LLPC assert on image type
Use runtime cache mode to get contextCache and reduce the time of running CTS tests
Command buffer dumping fixes, provide the correct engine ID for SDMA command buffers
Set DropIfSameContext for the CE preamble stream.
Add max mask enum for ImageLayoutUsageFlags and CacheCoherencyUsageFlags
Fix assert and build error for PAL null device
Fix app crash when reading amdPalSettings.cfg
Fix source image descriptors for graphics depth/stencil copies.
Fix dEQP-VK.api.external.semaphore.opaque_fd.import_twice_temporary CTS test hang on Vega
Partially revert earlier change for clean-up of user data table management code. Most of the original change is not reverted, just the portion which moves some common structures from each HWL to the independent layer for universal command buffers. The compute command buffer changes were left as-is.
Moves the DescribeDraw calls after validateDraw in all CmdDraw calls
Implement PAL support needed for KHR_Display extension
Update LLVM to trunck @329887, fixing the MadMax corruption issue introduced in last update

2018-4-20 update

Enable extension VK_AMD_shader_image_load_store_lod
Enable extension VK_AMD_gcn_shader
Implement subgroup arithmetic operations
[LLPC] Add missing pipeline member in pipeline dump
Optimize subgroup function name generating process, generate functions based on the subgroup arithmetic group op
Enable SyncobjFence and choose which fence type to use during runtime
Implement SYNC_FD handle type for External Fence and Semaphore
Remove releasing stack allocator in CmdBuffer::End()
Set UseRingBufferForCeRamDumps default back to true
No need to allocate memory for Sampler descriptors for all Gpus in the device group.
Fix verification error using R32ui image format
Fix pipeline compilation failure when running ManiaPlanet on Wine
Fix and optimize the use of some of the barrier flags which were noted to be handled incorrectly or inconsistently
Expand reporting of CmdBindTargets in the logger
Enable support for IL_OP_LOAD_DWORD_AT_ADDR in ILP
Add logic to memtracker to detect when someone corrupts the allocation list by scribbling into the heap
Remove CE/DE counter syncs from the postamble command streams on gfxip8+
Fix the issue that vkAcquireNextImageKHR returning VK_TIMEOUT w/o waiting the timeout duration
Update LLVM to be based on trunck 328191. The new LLVM code introduced a rendering corruption issue with game MadMax, will be fixed in next update

2018-4-16 update

Reduce unnecessary malloc/free calls
[LLPC] Change the undef value to 0 or 0.0 for those unsupported functions. This is because undef value will block constant folding in LLVM and the nested constant expression after lower will be time-consuming when backend does analysis
[LLPC] Support int64 atomic operations
Tweaks the way tha handles load op clears in renderpasses to fix too many barriers in render pass clear
Add error handling where AddMemReference() is used; Add vk::Memory::CreateGpuMemory() and vk::Memory::CreateGpuPinnedMemory()
Fix assertion when running DOOM 2016 in Wine
Set "vm" flag for all fragment outputs
Add FMASK shadow table support to the Vulkan Driver which changes descriptors are stored in memory. This allows writing the FMASK descriptors in the same corresponding upper 32 bits of the STA descriptors VA address
Fix missing cmd scratch memory heap in gpasession. Prevents a divide by zero exception when initializing driver for RGP traces
Explicitly acquire and release ownership of the queue context in PAL's preamble and postamble command streams
PAL no longer try to chain from the last command buffer to the postamble command streams.*
Fix interfaceLogger access violation. DataAllocNames array does not match CmdAllocType enum.
VK_AMD_gpu_shader_int16 + VK_AMD_shader_trinary_minmax + GFX9: Graphics pipeline fails to create if functionality dependent on the two exts is used
Rewrite VamMgrSingleton to avoid static members
Clean-Up of User Data Table Management Code

2018-4-9 update

Add int16 support to AMD_shader_ballot and AMD_trinary_minmax extension
[LLPC] Enable RetBlock in GS to make sure only one return is used in GS
Refine Pipeline dump
- Simplify pipeline panel options
- Update variable name in llpcAbiMetadata.h to match palPipelineAbi original name
- Remove metadata name in RegNameMap, instead, Util::Abi::PipelineMetadataNameStrings is used
- Fix a bug in PipelineCompiler::ApplyBilConvertOptions, the return value of GetRuntimeSettings must be a reference
AMD_shader_ballot:
- Rename glslSpecialOpEmuF16 to glslSpecialOpEmuD16
- Add stubs of subgroup arithmetic operations for i64 and f16
- use tbuffer_load_d16 to do vertex fetching
Implement a consistent dispatch table mechanism across the driver
- Now we have separate global, per-instance, and per-device dispatch tables
- We can override individual entry points in each dispatch table to enable optimizations based on app profile or any other criteria
- Entry points now can have complex requirement criteria and we now clearly distinguish between instance and device level functions
- SQTT layer handling is still a bit clumsy because it operates more like a device-only layer, but at least it's injection code is less intrusive now
- Also fixed a bunch of unrelated bugs and missing implementation on the way, as the new code revealed those
Update Pipeline Dump service to inherit from IService instead of URIService (which is deprecated and being removed).
Changes ValidateDraw to reserve its own space rather than including it with the rest of the draw related packets. This avoids running out of reserved space in TimeSpy
Move MetroHash and jemalloc to pal/src/util/imported from pal/src/core/imported

2018-4-3 update

Enable below extensions:
- AMD_shader_explicit_vertex_parameter
- AMD_shader_trinary_minmax
- AMD_mixed_attachment_samples
- AMD_shader_fragment_mask
- EXT_queue_family_foreign
Enable AMD_gpu_shader_int16 for gfx9
Enable shaderInt64
Disable extension AMD_gpu_shader_half_float since the interpolation in FS is not implemented.
Add arithmetic operations of AMD_shader_ballot
Implement subgroup arithmetic reduce int ops
Remove KHR suffixes for promoted extensions: replace some of the KHXs with KHRs, the rest should go away whenever device group KHXs are removed
Remove Vulkan 1.0 headers because 1.1's are backward compatible, 1.0 driver functionality can still be built with USE_NEXT_SDK=0
Fix an issue that incorrect buffer causes compute shader loop infinitely
Disable FmaskBasedMsaaRead for Dota2, which can bring ~1% performance gain for Dota2 4K + best-looking on Fiji:
Add FMASK shadow table support to LLVM/LLPC
Fix the issue that Wolfenstein 2 fails to compile compute shader
Fix dEQP-VK.api.image_clearing.core.clear_color_image.3d.* CTS tests failure
Clarifies an existing 3D color target interface requirement and fixes a bug which can cause DCC corruption.
Fix an issue related to fast clear eliminate
Do late expand for HTILE if it used fixfuction resolve
Fix a LLVM issue (zext of f16 to i32) for Dawn of War III corruption on Radeon™ RX Vega
Eliminate stalls between command buffers, Phase #1

2018-3-29 update

Enable Wayland extension
Enable AMD_texture_gather_bias_lod extension
Implementation for below AMD extensions:
- AMD_shader_fragment_mask
- AMD_gcn_shader
- AMD_shader_trinary_minmax
- AMD_shader_explicit_vertex_parameter
- AMD_shader_ballot
Enable subgroupQuadSwapHorizontal, subgroupQuadSwapVertical, subgroupQuadSwapDiagonal, subgroupQuadBroadcast(uint/int, uint)
Fix issues when grouping all identical devices into single device group and enable support to group the devices if they have matching Pal::DeviceProperties::deviceIds, pass CTS device group testing
Hide VK_AMD_negative_viewport_height in Vulkan 1.1: using the extension is no longer legal, because 1.1 core includes VK_KHR_maintenance1
[LLPC] make spir-v bool-in-mem i8 rather than i1
Enable shader prefetcher for Serious Sam Fusion and Dota2, about 2.5% performance gain
Remove redundant divide in BindVertexBuffers() (PAL does the same divide). Remove extra bookeeping needed for the redundant divide
Fix some issues in the RGP command buffer tag based capture code
Move Pipeline & User-Data Binding to Draw-Time, observed some nice gains in several applications, and other apps were neutral in terms of performance loss/gain
Fix an order of initialization issue related to public settings
VK_KHR_image_format_list for swapchains: add the necessary PAL support for deciding image compression policy for presentable images based on a list of possible view formats
Report to clients that GFX OFF may reset the GFX timestamp to 0 after an idle period
Fix some issues in command buffer dumping
Implement COND_EXEC style predication for CP DMA path in CmdCopyMemory on compute command buffers
Change CreateTypedBufferViewSrds() and CreateUntypedBufferViewSrds() to remove the requirement that the range is a multiple of the stride
Make Pal Linux VA manager support multi-device cases
Fix dEQP-VK.api.object_management.multithreaded_per_thread_resources.instance random crash
Fix validation bug with computing PBB bin sizes
Don't allow LayoutCopySrc on images of a format that doesn't support buffers
Add call to DevDriver ShowOverlay() function to determine if the developer driver overlay should be displayed.
Handle unaligned memory to image and image to memory copies on the DMA Queue
Convert some PAL inline utility functions into constexpr functions and fix some const-correctness issues.
Resolve potential HW bug with SDMA copy overlap syncs on GFX9
Temporarily disable the SDMA copy overlap sync feature on GFX9 for a suspected HW ucode bug with SDMA's ability to detect certain hazards which results in race conditions in SDMA stress tests.
Fix bug in PA_SC_MODE_CNTL_1 validation
Improve hotspots related to Color-Target & Depth/Stencil views, some improvements in CPU performance when creating color-target and depth-stencil view objects in PAL
Make GFX9's BuildSetSeqContextRegs() and BuildSetSeqConfigRegs() avoid reading from the command buffer similar to what is done for GFX6. Cleans up big spikes if Vulkan uses write combined command buffers (they were small bumps when using cacheable command buffers)

2018-3-16 update

Add Instance- and Device-specific dispatch tables. Comply with spec requirements
Handle unaligned memory to image and image to memory copies on the DMA Queue
Use included headers to determine apiVersion instead of manual bumps
Complete VK_EXT_sampler_filter_minmax extension, allows more formats and is completely driven by the formats spreadsheet
Enable VK_EXT_shader_subgroup_vote and VK_EXT_shader_subgroup_ballot support
VK_KHR_subgroup support: - Add missing subgroup builtins in compute shader - Move the implementation of gl_SubGroupSize from patch phase to .ll library - Support for the shufflexor, shuffleup, shuffledown function
VK_KHR_multiview support: - LoadOp Clears implementation - Rewrite the function ConfigBuilder::BuildUserDataConfig to support merged shader. - Adjust the position of SGPR to emulate ViewIndex. - Set the user data configuration of ViewId even if the stage is not the last vertex processing stage.
Implement interaction between VK_KHR_multiview and VK_KHR_device_group by adding support for VK_PIPELINE_CREATE_VIEW_INDEX_FROM_DEVICE_INDEX_BIT.
Change implementation of KHR_descriptor_update_template to move work from vkUpdateDescriptorSetWithTemplateKHR to vkCreateDescriptorUpdateTemplateKHR
Batch large numbers of copy/clear/etc. image regions to avoid OOM errors
Rearranged the loop in DescriptorSet::InitImmutableDescriptors() to avoid looking up the the descriptor sizes in the device unless necessary. Cuts time in DescriptorSet::Reassign() in half.
Remove DescriptorSetHeap::m_pHandles. We can compute the handle with a little arithmetic instead of a memory lookup. Cuts the time in AllocDescriptorSets() in half.
[LLPC]Implement sparse texture residency
[LLPC]Fix Crash when parsing Hull Shader
[LLPC]Fix problems with address space mapping
[LLPC]Restored correct addr space for gs-vs ring buffer descriptor load
Fix an assert when running DOOM in Wine
Don't treat MSAA image as pure shader resolve/read src if CB fixed function resolve method is preferred
Implement the changes needed to change the fast clear code from the 3 special values ((0,0,0,1), (1,1,1,1) and (1,1,1,0)) to ClearColorReg when we mix signed and unsigned formats views for a resource
Don't write IA_MULTI_VGT_PARAM and VGT_LS_HS_CONFIG in ValidateDrawTimeHwState
Remove unnecessary calls to SetContextRollDetected() during GFX9 command buffer generation
Remove the software-based dynamic primgroup optimization on GFX9
Fix GpuProfiler ThreadTrace shader hashes. 64-bit to 128-bit
Optimize path with depth clamp disabled. Set DISABLE_VIEWPORT_CLAMP only if depth clamp is disabled in pipeline and depth is exported in fragment shader
Trace SQTT Causes Driver AV if sqtt.gpuMemoryLimit is Too Small

2018-3-7 update

Enable Vulkan 1.1 support
Enable VK_AMD_shader_core_properties extension
Enable VmAlwaysValid feature for kernel 4.16 and above
Force per-sample shading if the shader is using per-sample features
[LLPC] added addr space translation pass
Handle OOM errors during command buffer recording
Fix the problem that driver unbinds vertex buffers when binding a new pipeline
Fix gpuProfiler crash when starting capture from first frame)
[gfx6] Update DB with correct address for PERFCOUNTERx_SELECT1 register, fixing GPU hang on issuing spm traces with more than 2 events for DB
Fix a CmdClearDepthStencil bug and adds validation to avoid 3D depth/stencil images
Expose perSampleShading PS parameter in PipelineInfo

2018-2-27 update

Complete Geometry shader and tessellation support for gfx9
Clear v1.0 CTS failures for Radeon™ RX Vega Series
Generate extension related source files during driver building time
Enable VK_EXT_depth_range_unrestricted extension
Fix vrcompositor startup crash issue
Fix random failure in AMD_buffer_marker tests
Reduce time to clear AllGpuRenderState structure by removing Pal::DynamicGraphicsShaderInfos graphicsShaderInfo and Pal::DynamicComputeShaderInfo computeShaderInfo and making them local variables
[LLPC] use PassManagerBuilder instead of a forked and modified copy of opt
Vulkan queue marker to trigger RGP capture (Frame terminator)
Re-order the PreciseAnisoMode enum for clarity; Change the PreciseAnisoMode value based on the public Radeon Settings Texture filter quality (TFQ) setting
Fix vulkan CTS failures of dEQP-VK.api.external.memory.opaque_fd.dedicated with VM-always-valid enabled.
Fix a multi-thread segfault issue
Fix some Coverity warnings
Improve CPU performance by removing read modify writes in CreateUntypedBufferViewSrds

2018-2-14 update

Enhance GFX9 support
Texture filtering quality changes
Sample mask input to fs shouldn't force per-sample execution
Fix LLVM error when using both OpImageSampleDref* and OpImageSample* on the same image
CPU optimization for Dota2: reduces the time spent in CmdBuffer::RPSyncPoint() and its callees from 3.1% to 0.4%.
[LLPC] Enable fastMathMode for floating point
[LLPC] Enable NoSignedZero for FP math to activate omod modifiers
Program CHKSUM register with the value obtained from the pipeline binary for SPP.
Fix implicit prim shader controls.
Fix "all" null device creation to skip undefined devices
Add "virtual" to some destructors in PAL
Add new field in struct DynamicComputeShaderInfo to support LDS size update during binding compute pipeline.

2018-2-9 update

Implement VK_EXT_external_memory_host extension, enable the extension by default
Enhance on-chip GS support
Avoiding redundant lookups of Pal::ICmdBuffer* in GraphicsPipeline::BindToCmdBuffer()
Save PAL pipeline hash when pipeline is created
Device group: trim the number of structs returned from EnumeratePhysicalDeviceGroups
Cleanup and remove redundant code for ImageToBuffer and ImageToImage copies as the two step copies are detected and handled in PAL for SDMA queues
Remove deprecated metadata PsRunsAtSampleRate
Fix the crash caused by Mismatch of OpName on entry point and OpEntryPoint name
LLPC: Remove print module in each pass, move verify module to debug build
LLPC: Stop AShr and LShr from having exact flag set in SPIRV translation
Add streaming Perf counter support in PAL for gfxip-9
Implement optimal sharing
Expose texture 3D PRTs for queryable tile shapes for GFX 7/8
Update GetMaxGpuMemoryAlignment to account for metadata alignment
Reduce the number of small surfaces that need CMASK or DCC
Improve efficiency of MsaaState::SetCentroidPriorities()
Unbinds vertex buffers when binding a new pipeline
Fix issue when running with mode setting driver
Fix issue when running on XWayland
Upgrade LLVM to new code base

2018-1-22 update

Implement VK_AMD_buffer_marker extension
Implement VK_EXT_debug_report extension
Pass layout to InitImmutableDescriptors(), removes 80% of the time in DescriptorSet::Reassign()
Calculate location of bindings for descriptor set layout to avoid a memory lookup
Disable depth clamping when enableDepthClamp is set to false
Fix CTS dEQP-VK.tessellation.shader_input_output.barrier failure, simplify the TessFactorToBuffer offset calculation
Fix CTS dEQP-VK.glsl.440.linkage.varying.component group testing failure

2018-1-16 update

Fix dEQP-VK.api.external.semaphore.opaque_fd.signal_wait_import_permanent
Fix dEQP-VK.spirv_assembly.instruction.compute.image_sampler.imagefetch.* hang issue.
Pass image format list to PAL to allow enabling DCC for certain cases.
llpc: Use llvm build's llvm-as and llvm-link to save driver build time
llpc: update merged shader implementation
Get rid of unnecessary synchronization for present jobs.
Fix a couple of asserts when loading llvm-generated ELF
[PAL Util] Enhancements for containers.
Fix PalAlert when create null ps

2018-1-8 update

Enable VK_EXT_global_priority extension when libdrm version >= 3.22
Enable VK_KHR_MAINTENANCE2 extension
Madmax performance tuning, ~3% gain
LLPC refine

2017-12-28 update

Fix a bug when multiple devices access the same shader cache disk file, create internal shader cache instance per gfxip
Fix compile error with GCC7.2
Fix Dynamic WaveLimits for Graphics Pipelines
Cache robustBufferAccess in the descriptor set
Hook up present timing events in RGP queue traces
Add pass llpcSpirvLowerZero to optimize float zero operations
Add implementation of AMD_gpu_shader_half_float and AMD_gpu_shader_int16 (not completed)
Add implementation of VK_EXT_global_priority (not completed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release notes

2018-10-26 update

2018-10-17 update

2018-9-29 update

2018-9-12 update

2018-8-30 update

2018-8-24 update

2018-8-17 update

2018-8-15 update

2018-8-3 update

2018-7-26 update

2018-7-20 update

2018-7-12 update

2018-7-2 update

2018-6-8 update

2018-6-1 update

2018-5-25 update

2018-5-18 update

2018-5-9 update

2018-4-28 update

2018-4-20 update

2018-4-16 update

2018-4-9 update

2018-4-3 update

2018-3-29 update

2018-3-16 update

2018-3-7 update

2018-2-27 update

2018-2-14 update

2018-2-9 update

2018-1-22 update

2018-1-16 update

2018-1-8 update

2017-12-28 update

Clone this wiki locally