Skip to content

AsteriskChan/Project1-CUDA-Flocking

 
 

Repository files navigation

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

  • Szeyu Chan
  • Tested on: Windows 10, i7-10510U @ 1.80GHz 16GB, MX250 2048MB (Personal Laptop)

Results

Analysis

Number of Boids


It is apparent that the number boids will affect performance. For the naïve method, we need to iterate all the boids, whose complexity is O(n^2), so the framerate decreases significantly with the increase of the number of boids. For the scattered grid method and the coherent grid method, because of the O(1) neighbor search, the performance is affected slightly.

Block Size


When increasing the block size from 8 to 64, the performance is improved, while from 64 to 256, the framerate is stable. One possible reason is that when the size is less than 32, there are some idle threads in a warp, so the processors are not fully utilized.

Scattered Grid VS Coherent Grid

As shown in the above two figures, the coherent grid method has a better performance than the scattered grid method. The overhead of reshuffling data is far more less than the overhead of random memory access. So it is important to keep the locality.

Cell Width

  • Number of Boids = 20000, Block Size = 128
  • 8 neighbors: Cell Width = 2 * Max Distance, FPS = 182
  • 27 neighbors: Cell Width = 0.5 * Max Distance, FPS = 176

It seems the performance is not affected. One possible reason is that although the number of neighbor cells increases, the number of boids in each cell decreases, so the total number of boids that will be checked is similar.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 40.8%
  • C++ 39.4%
  • CMake 17.8%
  • Other 2.0%