Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve benchmark for distributed tree #78

Merged

Conversation

masterleinad
Copy link
Collaborator

@masterleinad masterleinad commented Jun 11, 2019

Changes:

  • Properly align output,
  • Only print information on MPI rank 0,
  • Print relevant run parameters.
  • Replace overlap by shift such that values different from one and zero have a more intuitive meaning.

CMakeLists.txt Outdated
@@ -13,7 +13,7 @@ if(ArborX_ENABLE_MPI)
find_package(MPI REQUIRED)
target_link_libraries(ArborX INTERFACE MPI::MPI_CXX)
endif()
#target_compile_features(ArborX INTERFACE cxx_std_14)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we want here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we are running into our old friend

CMake Error in test/CMakeLists.txt:
  No known features for CXX compiler

  "GNU"

  version 7.4.0.

@aprokop
Copy link
Contributor

aprokop commented Jun 11, 2019

@masterleinad Can you please post the output before and after?

@masterleinad
Copy link
Collaborator Author

before:

$ mpiexec -np 2 ./ArborX_DistributedTree.exe
ArborX version: 0.9 (dev)
ArborX hash   : 0b73981
ArborX version: 0.9 (dev)
ArborX hash   : 0b73981
contruction done
knn done
radius done
========================================

TimeMonitor results over 2 processors
Timer Name	MinOverProcs	MeanOverProcs	MaxOverProcs
----------------------------------------
construction	0.0406743	0.0406757	0.0406771
knn	1.03524	1.03534	1.03543
radius	0.57317	0.573342	0.573514
========================================

after:

$ mpiexec -np 2 ./ArborX_DistributedTree.exe
ArborX version: 0.9 (dev)
ArborX hash   : 0b73981

Running with arguments:
perform knn search      : true
perform radius search   : true
#points/MPI process     : 50000
#queries/MPI process    : 20000
size of shift           : 1
dimension               : 3

contruction done
knn done
radius done
==========================================================

TimeMonitor results over 2 processors
Timer Name   | MinOverProcs | MeanOverProcs | MaxOverProcs
----------------------------------------------------------
construction | 4.230867e-02 |  4.230957e-02 | 4.231046e-02
knn          | 1.042842e+00 |  1.043093e+00 | 1.043345e+00
radius       | 5.693861e-01 |  5.695536e-01 | 5.697212e-01
==========================================================

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine to use C++14. You just cannot use the target_compile_feature() for now because of nvcc_wrapper. The build arguments passed to the Kokkos generate_makefile.bash script will need to be adjusted in the .jenkins file.

// Initialize with length of "Timer Name"
std::size_t max_section_length = 10;

for (unsigned int i = 0; i < n_timers; ++i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning comparison between signed and unsigned

{
std::rotate(kokkos_argv, help_it + 1, kokkos_argv + kokkos_argc);
--kokkos_argc;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decide to rotate rather than swapping with the last argument?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapping now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wasted a bunch of time checking that swap(x, x) was safe (will happen if you only have the help flag passed has argument) :/

Anyway that's fine

@@ -110,10 +122,11 @@ class TimeMonitor
n_timers, MPI_DOUBLE, comm);
if (comm_rank == 0)
{
os << "========================================\n\n";
os << std::string(max_section_length + 46, '=') << "\n\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the 46 come from?

if (comm_size == 1)
{
os << "========================================\n\n";
os << std::string(max_section_length + 15, '=') << "\n\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the 15 come from?

@dalg24
Copy link
Contributor

dalg24 commented Jun 11, 2019

Also the name of the PR is misleading. Please, at the very least, edit the description and make a list or an enumeration to emphasize the changes for controlling the overlap between regions.

@dalg24
Copy link
Contributor

dalg24 commented Jun 11, 2019

BTW #78 (comment) demonstrates something interesting I had already ran into. The hash returned by the runtime is garbage if you do not reconfigure.

@masterleinad masterleinad changed the title Beautify ArborX_DistributedTree output Improve ArborX_DistributedTree Jun 11, 2019

// Initialize with length of "Timer Name"
std::size_t const max_section_length = std::accumulate(
_data.begin(), _data.end(), std::size_t(10),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's 10?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of "Timer name".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment to avoid question. Say "The initial value of the sum (3rd argument) correspond to the length of ..."

Or define a variable that holds the label and get its size.

std::string const header_without_timer_name = " | GlobalTime";
int const header_width =
max_section_length + std::max<int>(header_without_timer_name.size(),
std::cout.precision() + 9);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's 9?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The width we need to print a floating point number in scientific format apart from the precision assuming the exponent needs two characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment or update code to make it obvious

@dalg24 dalg24 changed the title Improve ArborX_DistributedTree Improve benchmark for distributed tree Jun 12, 2019
@@ -30,7 +31,8 @@ struct HelpPrinted
{
};

// Poor man's replacement for Teuchos::TimeMonitor
// The TimeMonitor class can be used to measure for a series of events, i.e. it
// represents a set of timers of type Timer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think the original comment was better because it gives context on the design of TimeMonitor


// Initialize with length of "Timer Name"
std::size_t const max_section_length = std::accumulate(
_data.begin(), _data.end(), std::size_t(10),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment to avoid question. Say "The initial value of the sum (3rd argument) correspond to the length of ..."

Or define a variable that holds the label and get its size.

std::string const header_without_timer_name = " | GlobalTime";
int const header_width =
max_section_length + std::max<int>(header_without_timer_name.size(),
std::cout.precision() + 9);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment or update code to make it obvious

@masterleinad masterleinad force-pushed the beautify_arborx_distributedtree branch from c083ae2 to f478e4b Compare June 12, 2019 18:42

// Initialize with length of "Timer Name"
std::size_t const max_section_length = std::accumulate(
_data.begin(), _data.end(), std::string("Timer Name").size(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor comment) I would have defined a variable for the label and reuse below.

@dalg24
Copy link
Contributor

dalg24 commented Jun 18, 2019

Merge when you are ready. You may chose to ignore my last comment. I would be fine with you adding the Arguments class you introduced in #83 before we merge.

@masterleinad
Copy link
Collaborator Author

Since you are requesting changes to the Arguments struct already, let us deal with it in the other pull request.

@masterleinad masterleinad merged commit 51384df into arborx:master Jun 19, 2019
@masterleinad masterleinad deleted the beautify_arborx_distributedtree branch June 19, 2019 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants