Skip to content

IwakuraRein/Nagi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nagi

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3

Nagi is a simple path tracer built in CUDA. As shown in the picture, it's capable of rendering photorealistic images of diffuse, mirror, metal, and glass materials with their own textures.

Features

nagi2.mp4

Finished

  • Mesh Loading
  • Texture Mapping
  • Denoiser
  • Bounding Volume Hierarchy
  • Refrection
  • Skybox
  • Preview Window

Working On

  • Direct Light Sampling
  • Importance Sampling On Skybox
  • Multiple Importance Sampling
  • Multithread GUI

Usage

Nagi.exe <scene file> [<output path>]

For example:

Nagi.exe ./res/cornell_box/cornell_box.json ./results

The scene is defined by a JSON file. It is very self-explanatory and contains rendering configurations. It currently supports 5 material types: Lambert, Specular, Glass, Microfacet, and Light Source. It's also possible to add depth of field by setting the camera's f-number and focus distance (or a look-at point).

Gallery

Artisit: NewSee2l035

Artisit: James Ray Cock, Greg Zaal, Dimitrios Savva

Artisit: JpArtSky

Acceleration Structure

To accelerate the intersection test, at first, I divide each model's triangles and store them in oct-tree structures. Then all objects will be passed into the path tracer in an array sorted by their volume. The path tracer will perform a depth-first search in the intersection test.

Then I realized using a bottom-to-top bounding volume hiearchy is more optimal. And the stack for DFS consume too much memories so I adopted the BVH described in PBRT and the stack-free traversal.

The triangle intersected with the shortest distance will be recorded. If the distance to an object is larger than the last recorded distance to the triangle, all its triangles will be surpassed.

Performance Analysis

After dividing the scene into an oct-tree structure, a huge improvement of speed can be seen:

It is interesting that sorting the rays according to their material slightly lower the performance. The overhead of sorting might counteract the improvement brought by memory coalescing.

However, when a mesh is rectangular and contain large triangles, like this mesh from the Modern Hall scene, the oct-tree fails to divide it effectively.

The time cost for the Staircase scene increases to 8 seconds per spp. Therefore, I replaced the oct-tree with BVH and adopted the stack-free traversal. This significantly improved the performance. The timecost for the Modern Hall scene droped to 10ms!

original real-time denoiser Open Image Denoise

For the real-time preview, I implemented SVGF as the denoiser. For the final result, I implemented Intel's Open Image Denoise.