CUDA Denoiser For CUDA Path Tracer

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4

Xuntong Liang
- LinkedIn, GitHub, twitter.
Tested on: Windows 10, i7-10750H @ 2.60GHz 16GB, RTX 2070 Super with Max-Q 8192MB

This project is an extension of my CUDA Path Tracer.

Features

Overall

Implemented the A-trous wavelet filter.
Implemented the edge avoiding A-trous wavelet filter.
Implemented temporal sampling.
Implemented shared memory version.

A-Trous Wavelet Filter

A-trous wavelet filter is an approximation of gaussian filter. It provides filtered images by repeated convolution with different stride of generating kernels. This process only considers the final color so it may eliminate many high frequency features of an image.

Edge Avoiding A-Trous Wavelet Filter

Edge avoiding A-trous wavelet filter takes the advantage of bilateral gaussian filter. The weight of kernels could be computed with several edge-stopping function, which takes considerations of more features such as the surface normal, the position, instead of the final color only. In this case, we should generate a G-buffer in each frame.

Here are some results that ran in two scenes, "Ceiling Light" and "Micro Facet".

Ceiling Light 20 Iterations	Ceiling Light 20 Iterations with Denoising	Ceiling Light 200 Iterations

Micro Facet 50 Iterations	Micro Facet 50 Iterations with Denoising	Micro Facet 1500 Iterations

These images show the effect of the filter. As a result of using the filter, we can get a less noisy image with clear edges within a few iterations, which is much faster than waiting for a lot of iterations without denoising to get a less noisy image.

Temporal Filter

If the camera, the lights or the objects moves, we can also take advantage of the spatial or temporal continuity of a sequence of images, which means that we can use historical data for denoising (temporal accumulation) as long as we can find the corresponding filtered pixel of the target noisy pixel in history (reprojection). The temporal filter process can be divided into two parts: reprojection and temporal accumulation.

Ceiling Light without Temporal	Ceiling Light with Temporal

Micro Facet without Temporal	Micro Facet with Temporal

I have not implemented the complete version of SVGF and do not separate direct illumination and indirect illumination, so it cannot reach the performance that SVGF does. But the figures above show that with temporal filter, we can keep much filtered data while moving the camera as much as possible.

It also shows that one of the disadvantage of temporal filter is the lagging of the reflected pixels. In scene "Micro Facet", even though the changing of the image is smoother with temporal filter, the reflected pixels appears lagging.

Shared Memory Optimization

For each pixel, the filter process reads several neighboring pixels to compute a final value, so there are many pixels read for several times in each blocks. This process is likely to benefit from shared memory.

Performance Analysis

Filter Size and Resolution

I did the performance analysis with the two scenes and different filter size.

Notice that the actual filter size = 2 * the parameter of filter size + 1.

With the filter size increased, the duration of each frame becomes longer. It is obvious because it needs more wavelet filters if the filter size is larger. It is also obvious that it spends more time for larger resolutions.

So how do the filter size influence the image? In a diffuse scene, increasing filter size may have little effect ("Ceiling Light"), while in a scene with more specular objects, or with smaller lights, it still affect much ("Micro Facet"). Here are some results with filter size greater than or equal to 4.

Ceiling Light Filter Size 4	Ceiling Light Filter Size 8	Ceiling Light Filter Size 16

Micro Facet Filter Size 4	Micro Facet Filter Size 8	Micro Facet Filter Size 16

We can see that with the filter size larger, the light is more blurry in the scene "Micro Facet".

Material Type

We can also infer that, the filter makes the glossy material more diffuse, or overblurred, as is also shown in the overall image. Also, I believe that the denoising effect on diffuse objects is better, so I think that we can also add considerations about some properties of materials such as roughness, specular, in the filter.

Shared Memory

I find my shared memory optimization only gets faster with the first two sweep of the wavelet filter, but gets slower with the third sweep. This disadvantage may comes from the large size of the G-buffer pixel. It even causes that I cannot make the block size and the filter size larger, otherwise there will be invalid arguments of calling the denoising kernels. According to my test result, the maximum filter size is 4 with the default 8x8 block size. However, this block size is the minimum size so the filter size cannot be larger unless I should optimize the G-buffer.

Temporal Filter

The aim of the temporal filter is to keep the historical data for denoising as much as possible, for interactive rendering applications, which means that we should get an acceptable render result in real-time. If it cost too much, it still be problem for real-time applications.

In previous chapter, we can see that when the camera moves around, the duration of rendering is still acceptable. Next I provide analysis on temporal filter, compared with no denoising, and applying spatial filter only.

This figure shows that the temporal filter is much faster than spatial filter, so it hardly makes the denoising much slower.

Changes

CMakeList Changes

I add a new project log_profile for logging the profile (like CUDA Path Tracer).

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
cmake		cmake
external		external
img		img
imgui		imgui
profile_log		profile_log
profiles		profiles
scenes		scenes
src		src
.cproject		.cproject
.gitignore		.gitignore
.project		.project
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
Project4-CUDA-Denoiser.launch		Project4-CUDA-Denoiser.launch
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Denoiser For CUDA Path Tracer

Features

Overall

A-Trous Wavelet Filter

Edge Avoiding A-Trous Wavelet Filter

Temporal Filter

Shared Memory Optimization

Performance Analysis

Filter Size and Resolution

Material Type

Shared Memory

Temporal Filter

Changes

CMakeList Changes

Reference

About

Releases

Packages

Languages

PacosLelouch/Project4-CUDA-Denoiser

Folders and files

Latest commit

History

Repository files navigation

CUDA Denoiser For CUDA Path Tracer

Features

Overall

A-Trous Wavelet Filter

Edge Avoiding A-Trous Wavelet Filter

Temporal Filter

Shared Memory Optimization

Performance Analysis

Filter Size and Resolution

Material Type

Shared Memory

Temporal Filter

Changes

CMakeList Changes

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages