Skip to content

LichengCAO/Project1-CUDA-Flocking

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

  • Licheng CAO
  • Tested on: Windows 10, i7-10870H @ 2.20GHz 32GB, GTX 3060 6009MB

Result (65536 Boids)

boid

Analysis

  • the number of boids (with/without visualization)

    • Figures 1 and 2 depict a clear trend: as the number of boids increases, the frames per second (FPS) decreases. This decline is primarily attributed to the growing number of boids that each thread must process. Among the various step methods, the naive method is the most affected by the number of boids, as it necessitates considering all boids within each thread. On the other hand, the scattered method significantly enhances performance by limiting the scope to boids in nearby grids, thereby reducing the number of boids each thread must handle. Additionally, in this method, I've implemented a specific order for searching the nearby grids to ensure contiguous grid access.
    • The coherent method provides a slight performance boost over the scattered method. It accomplishes this by rearranging the position and velocity arrays of boids in such a way that information for boids within the same grid is stored contiguously in memory. This optimization allows for faster retrieval of information when iterating over all boids within a single grid.
    • Figure1 average FPS/number of boids with visualization
    • avgFPS_numboidsV
    • Figure2 average FPS/number of boids without visualization
    • avgFPS_numboids
    • We observe that the average FPS is higher when visualization is disabled. As the number of boids increases, the disparity between FPS with and without visualization diminishes. This phenomenon occurs because the primary bottleneck affecting FPS is the GPU calculations when dealing with a large number of boids, whereas it becomes the drawing speed when dealing with a smaller number of boids.
    • Figure3 average FPS with/without visualization
    • FPS_V_NV
  • the blocksize and cell width

    • Table 1 and Figure 4 provide insights into the behavior of the average FPS. In the case of the naive method, we observe that the average FPS remains relatively consistent. However, for both the scattered and coherent methods, altering the block size from 32 to 64 leads to an increase in the average FPS, while changing it from 224 to 512 results in a decrease.
    • The increase in average FPS with the larger block size can be attributed to the block having more warps, allowing for smoother switching to hide the delay associated with accessing global data. Conversely, the decrease in average FPS with the smaller block size may stem from each Streaming Multiprocessor (SM) lacking sufficient Streaming Processors (SPs) to efficiently process threads in a single loop.
    • Modifying the cell width and increasing the number of blocks to check from 8 to 27 can have a notable impact on the performance of my implementation. In my approach, I introduce a jitter vector (vec3(-0.5)) to ensure that each boid within a grid only needs to search within nearby 8 blocks to locate all neighboring boids efficiently. Consequently, examining 27 blocks would introduce unnecessary computational overhead.
    • However, when we alter the cell width and block size, the outcomes may differ. A smaller cell width could result in a reduced bounding box for each boid, potentially decreasing the number of boids that each thread needs to inspect. This reduction in workload could, in turn, lead to improved performance.
    • Table1 average FPS/blocksize without visualization (number of boids: 524,288)
    • blocksize 32 64 96 128 160 192 224 512
      naiveFPS 0.323 0.454 0.452 0.458 0.449 0.455 0.457 0.1326
      ScatteredFPS 129.62 174.728 181.97 181.566 179.985 181.666 177.534 166.941
      CoherentFPS 231.937 270.333 273.499 270.334 269.264 269.608 267.44 251.679
    • Figure4 average FPS/blocksize
    • avgFPS_blocksize

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 43.7%
  • C++ 37.5%
  • CMake 16.9%
  • Other 1.9%