Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TeamVectorRange for filling tiles in BruteForce implementation #616

Merged
merged 1 commit into from
Jan 21, 2022

Conversation

masterleinad
Copy link
Collaborator

This fixes current problems in the Kokkos SYCL backend (see #614) but the restriction to do everything on the first team member always felt weird to me.
Alternatively, we can special case for SYCL of course.

@aprokop
Copy link
Contributor

aprokop commented Jan 12, 2022

I'd like to see some performance data for this patch.

@aprokop
Copy link
Contributor

aprokop commented Jan 21, 2022

Some data from my workstation (GTX1070):

$ for i in master_6794807d branch_45b6801c; do ./ArborX_BruteForce_$i --predicates 100 --primitives 100| grep 'Time BF'; done
Time BF: 0.000546
Time BF: 0.000478
$ for i in master_6794807d branch_45b6801c; do ./ArborX_BruteForce_$i --predicates 500 --primitives 500| grep 'Time BF'; done
Time BF: 0.001253
Time BF: 0.000986
$ for i in master_6794807d branch_45b6801c; do ./ArborX_BruteForce_$i --predicates 10000 --primitives 10000| grep 'Time BF'; done
Time BF: 0.030061
Time BF: 0.024904
$ for i in master_6794807d branch_45b6801c; do ./ArborX_BruteForce_$i --predicates 50000 --primitives 50000| grep 'Time BF'; done
Time BF: 0.598086
Time BF: 0.514194

@aprokop aprokop added the performance Something is slower than it should be label Jan 21, 2022
@masterleinad
Copy link
Collaborator Author

I similarly see

$ for i in master branch; do jsrun -n 1 -a 1 -c 42 -g 1 -r 1 -l CPU-CPU -d packed -b packed:42 ./ArborX_BruteForce_$i.exe --predicates 100 --primitives 100| grep 'Time BF'; done
Time BF: 0.000863
Time BF: 0.000815
$ for i in master branch; do jsrun -n 1 -a 1 -c 42 -g 1 -r 1 -l CPU-CPU -d packed -b packed:42 ./ArborX_BruteForce_$i.exe --predicates 500 --primitives 500| grep 'Time BF'; done
Time BF: 0.002101
Time BF: 0.001862
$ for i in master branch; do jsrun -n 1 -a 1 -c 42 -g 1 -r 1 -l CPU-CPU -d packed -b packed:42 ./ArborX_BruteForce_$i.exe --predicates 10000 --primitives 10000| grep 'Time BF'; done
Time BF: 0.013687
Time BF: 0.013295
$ for i in master branch; do jsrun -n 1 -a 1 -c 42 -g 1 -r 1 -l CPU-CPU -d packed -b packed:42 ./ArborX_BruteForce_$i.exe --predicates 50000 --primitives 50000| grep 'Time BF'; done
Time BF: 0.335548
Time BF: 0.319999

on Ascent.

@dalg24
Copy link
Contributor

dalg24 commented Jan 21, 2022

Are these GPU-only results?

@masterleinad
Copy link
Collaborator Author

Yes, it's the default execution space, so CUDA.

@aprokop
Copy link
Contributor

aprokop commented Jan 21, 2022

Are these GPU-only results?

My workstation (Intel E5-2620):

$ for k in 100 500 1000 5000 10000; do \
     for i in master_host_6794807d branch_host_45b6801c; do \
        ./ArborX_BruteForce_$i --predicates $k --primitives $k | grep 'Time BF'; \
     done; \
  done
Time BF: 0.000262
Time BF: 0.000251
Time BF: 0.005522
Time BF: 0.005494
Time BF: 0.022902
Time BF: 0.021482
Time BF: 0.603481
Time BF: 0.571992
Time BF: 2.438251
Time BF: 2.291969

So it is at least as fast on host.

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering reports that it is not slower both on the CPU and on the GPU

@aprokop aprokop merged commit 6aca3b8 into arborx:master Jan 21, 2022
@aprokop aprokop deleted the use_team_vectorrange_bruteforce branch January 21, 2022 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Something is slower than it should be
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants