Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

The appetite for higher and higher 3D graphics quality continues to drive GPU computing requirements. To satisfy these demands, GPU vendors are moving towards new architectures, such as MCM-GPU and multi-GPUs, that connect multiple chip modules or GPUs with high-speed links (e.g., NVLink and XGMI) to provide higher computing capability.

Unfortunately, it is not clear how to adequately parallelize the rendering pipeline to take advantage of these resources while maintaining low rendering latencies. Current implementations of Split Frame Rendering (SFR) are bottlenecked by redundant computations and sequential inter-GPU synchronization, and fail to scale as the GPU count increases.

In this paper, we propose CHOPIN, a novel SFR scheme for multi-GPU systems that exploits the parallelism available in image composition to eliminate the bottlenecks inherent to existing solutions. CHOPIN composes opaque sub-images out-of-order, and leverages the associativity of image composition to compose adjacent sub-images of transparent objects asynchronously. To mitigate load imbalance across GPUs and avoid inter-GPU network congestion, CHOPIN includes two new scheduling mechanisms: a draw-command scheduler and an image composition scheduler. Detailed cycle-level simulations on eight real-world game traces show that, in an 8-GPU system, CHOPIN offers speedups of up to 1.56× (1.25× gmean) compared to the best prior SFR implementation.

Paper Link

CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition, in HPCA-2021 (to appear).


This work was implemented upon ATTILA. The original README of the ATTILA that we used can be found at here.

For multi-GPU graphics rendering, our baseline implementation is to duplicate all primitives in every GPU (i.e., similar to NVIDIA's NVLink and AMD's CrossFire). We also implemented both GPUpd and CHOPIN.

A template config for multi-GPU simulation can be found at here. All benchmarks that we used can be downloaded from here.


In CHOPIN, we use ParKD to divide draw calls into multiple groups, and partition transparent draw calls among GPUs. The original README of the ParKD that we used can be found at here.

Before running ParKD, you should run ATTILA first with single-GPU configuration and profileForSortLast being enabled. The generated profiling results (inputVertexes.obj, blendInfo.obj, zTestInfo.obj, and zFightingInfo.obj) are inputs of ParKD. The binary output file of ParKD will then be used as an input of CHOPIN simulation in ATTILA.


No description, website, or topics provided.






No releases published


No packages published