Skip to content
brute force examines all n choose 3 triangles
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
EXP3
.gitattributes
.gitignore
README.md

README.md

CUDA_brute_triangle

brute force examines all possible n choose 3 triangles only once

UPDATE: Optimized both implementations, so new times posted.Run using the --use_fast_math flag, and must have a GPU with compute capability >=3.5. Best performance is max_register set to 32.

This code goes through every possible 3 point combination of a set of points, makes a triangle of those three points, then evalutes how many other points are within that triangle.

In this simple example case, the objective is to find which triangle of the possible set contains within its borders the greatest number of other points(exclusive). The two different CPU and GPU functions return that max number of points, and the INDEXES of the three points which created that optimal triangle.

NOTE: Since the GPU version does not proceed in a serial fashion, if there is more than one combination associated with the optimal value it will return a valid combination, BUT not necessarly the same(first seen) combination the CPU version returned. The overall answer will be correct, but a hueristic needs to be added for the CUDA version to return a specific arrangement if more than one does indeed exist.

While many CUDA GPU implementations of algorithms many only be 10-100 times faster than a single core CPU implementation, this problem has a much greater difference in performance.

The larger the data set of points, the greater the outperformance of the CUDA GPU implementation. For data sets of points >=400 the GPU CUDA implementation was at least 400x times faster than a 3.9 Ghz CPU implementation(including all host-device, device-device and device-host memory copies).

No overlocking of GPU, is running at stock 706 Mhz for older Kepler Tesla K20c

Optimal Triangle Running Time comparison:

Number of pointsIntel I-3770K 3.9 Ghz CPU time Tesla K20c GPU time CUDA Speedup
300 13,198 ms 32 ms 412.438x
400 42,710 ms 99 ms 431.32x
500 103,731 ms 240 ms 432.2x
700 411,401 ms 912 ms 451.01x
1300 5,075,524 ms 10,788 ms 470.47x

New Times for GTX 980 and GTX 780ti

Number of pointsIntel I-3770K 3.9 Ghz CPU time GTX 980 time GTX 780ti time GTX Titan X time
700 411,401 ms 716 ms 655 ms 573 ms
1300 5,075,524 ms 8,422 ms 7,720 ms 6,735 ms

The running time is apx (N choose 3)*N, where N is the number of 2-D points in the array.

What makes this even more impressive is that the GPU actually has to do over 10x more work for the same answer (this is apparent in code, compare CPU version to GPU).

AMD users, post your times and code. I would like to compare results. Same goes to the multi-core CPU crowd who thinks a Xeon can beat a GPU at this type of problem. Prove it ..

<script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-60172288-1', 'auto'); ga('send', 'pageview'); </script>
You can’t perform that action at this time.