The appetite for higher and higher 3D graphics quality continues to drive GPU computing requirements. To satisfy these demands, GPU vendors are moving towards new architectures, such as MCM-GPU and multi-GPUs, that connect multiple chip modules or GPUs with high-speed links (e.g., NVLink and XGMI) to provide higher computing capability.
Unfortunately, it is not clear how to adequately parallelize the rendering pipeline to take advantage of these resources while maintaining low rendering latencies. Current implementations of Split Frame Rendering (SFR) are bottlenecked by redundant computations and sequential inter-GPU synchronization, and fail to scale as the GPU count increases.
In this paper, we propose CHOPIN, a novel SFR scheme for multi-GPU systems that exploits the parallelism available in image composition to eliminate the bottlenecks inherent to existing solutions. CHOPIN composes opaque sub-images out-of-order, and leverages the associativity of image composition to compose adjacent sub-images of transparent objects asynchronously. To mitigate load imbalance across GPUs and avoid inter-GPU network congestion, CHOPIN includes two new scheduling mechanisms: a draw-command scheduler and an image composition scheduler. Detailed cycle-level simulations on eight real-world game traces show that, in an 8-GPU system, CHOPIN offers speedups of up to 1.56× (1.25× gmean) compared to the best prior SFR implementation.
CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition, in HPCA-2021 (to appear).
For multi-GPU graphics rendering, our baseline implementation is to duplicate all primitives in every GPU (i.e., similar to NVIDIA's NVLink and AMD's CrossFire). We also implemented both GPUpd and CHOPIN.
Before running ParKD, you should run ATTILA first with single-GPU configuration and profileForSortLast being enabled. The generated profiling results (inputVertexes.obj, blendInfo.obj, zTestInfo.obj, and zFightingInfo.obj) are inputs of ParKD. The binary output file of ParKD will then be used as an input of CHOPIN simulation in ATTILA.