Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor hardware detection to sup #654

Merged
merged 9 commits into from
Dec 5, 2018
Merged

Conversation

bcumming
Copy link
Member

Refactoring that moves logic for determing available concurrency and finding GPUs from the core Arbor library to the sup library.

Before this patch, Arbor's would attempt to use the first available GPU by default.
This caused problems on systems with multiple GPUs and multiple ranks on the same node, where by default all ranks would attempt to use the same GPU.

Now thread pools and execution contexts are always default initialized to use one thread and no GPU, making no effort to determine if more resources are available.

The sup library now provides an interface for

  • querying a best guess at the number of available threads.
  • finding a GPU, with and without MPI. Currently implements the old behavior of taking the first available GPU. The next PR will add logic for assigning GPUs in MPI jobs.

Fixes #647

@bcumming bcumming changed the title Mpi gpus refactor hardware detection to sup Nov 28, 2018
@bcumming bcumming changed the title refactor hardware detection to sup Refactor hardware detection to sup Nov 28, 2018
Copy link
Contributor

@halfflat halfflat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issues are:

  1. The naming/interface of find_gpu(...).
  2. The conflation of the status of the Arbor feature test macros with the configuration-time state of the sup library; these things are the same only contingently, and will make for an unnecessarily indirect configuration dependency chain when we split sup up into an installable library component.

else {
resources.num_threads = sup::thread_concurrency();
}

#ifdef ARB_MPI_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread count code is repeated in each example, and will likely appear in pretty much every C++ code that uses the library. It's boilerplate code that could be wrapped up with e.g. sup::get_default_threads() or similar; people can still use the more fundamental functions if they wish.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is repeated in each example, but I don't have a problem with this because:

  • I think this makes it much clearer what is going on: test the environment variable, then attempt to inspect the CPU set if the environment variable isn't set.
  • the examples are meant to be stand alone, so a bit of repeated code isn't a crime.

I really feel that combining this logic into a single function obscures what is going on too much.
This level of granularity feels "about right" for what the sup library is trying to achieve: small well-defined tools/types/functions that are useful for all examples, tests, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think though that almost every such example will have similar code, and a bunch of user code will too. By providing an easy-to-use default function, we make it easy for library users also to do the reasonable thing by default.

sup contains mainly things that we want to use in common across the examples and tests, which should not form part of the core interface. Wrapping boilerplate code is part of its very purpose — I really don't see this as a problem at all.

This thread counting code and the GPU negotiation code will be in whatever we will call our installed-with-Arbor helper library; any obscurity in its functionality can be well addressed simply with documentation, IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I am a bit uneasy suggesting that ARB_NUM_THREADS is a reasonable environment variable for users ;)
More seriously, user code might be similar, but would in all likelihood use different environment variables.
The examples might share code via sup, but at the same time they should be as simple to understand stand-alone. Because nobody reads the docs (they just create a ticket).

This feels a bit like 'half of one, six of the other'. I can change it, but I think that both of our positions are fairly reasonable.

{
int rank = 0;
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a minor nit, but if MPI_Comm_rank were to fail, it wouldn't write to rank; having a default initialized value seems like good practice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a minor nit at all: I should be testing the return value from MPI_Comm_rank... but wait a moment, I think I see a better solution.

Copy link
Member Author

@bcumming bcumming Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with

root = arb::rank(context) == 0;

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
is_root = rank==0;
}
#else
auto context = arb::make_context();
resources.gpu_id = sup::find_gpu();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think find_gpu() is not a good name; more on this below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree with the proposed changes below.

@@ -25,7 +30,7 @@ namespace threading {
//
// Throws std::runtime_error:
// ARB_NUM_THREADS or OMP_NUM_THREADS is set with invalid value.
util::optional<size_t> get_env_num_threads() {
arb::util::optional<size_t> get_env_num_threads() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here, this would be a good opportunity to reduce the use of arb::util::optional in public interfaces: this is going to be of a concern when we're in a mixed C++17 environment and we move to std::optional. (We're currently only using optional in one other public method, in arb::mc_segment).

Copy link
Member Author

@bcumming bcumming Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change it to return 0 or -1 on failure to read an environment variable. Is that what you would suggest?

I have no problem with making the fix, but I don't see what the problem is with keeping it.
When we move to C++17, wouldn't we just change it to std::optional?
If a user is currently using C++17, then how does our current set up cause a problem (besides potential confusion about the difference between std::optional and util::optional)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to lift a name into a new namespace? Have an arb::future, where eventuall arb::future::optional = std::optional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chief problem is one of interface: it's quite possible that we build Arbor with a C++17-capable compiler, but that users building against an installed version of it might want to use a different compiler with less robust C++ support (assuming all the ABIs are the same of course).

We should definitely move to std::optional when we can rely upon a C++17 build environment, but we shouldn't force a C++17 dependency through the interface, at least until compiler support is near universal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is probably worth removing the optional from this interface, because it wouldn't be a massive loss (the error state is easy enough to encode in the return value with 0 or -1).
We might have a look at removing the optional on mc_segment so that this becomes a discussion we don't have to have.

If we were to keep optional in our interface (and similar types like any), I think Sam's idea of just using our util::optional until we move everything to C++17 is the way forward.
I want to avoid having multiple implementations of things in the library for different C++ standards: I would prefer to just implement everything in the minimum standard that we have selected for the library (currently C++14).

// If still zero, use one thread.
n = std::max(n, 1u);

return n;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the comment, but using max here just adds code to no effect: there's nothing wrong with n? n: 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer using std::max for this sort of logic. It is arguable whether a ternary or max is clearer... probably a matter of what you are used to.
The suggested ternary n? n: 1 is short hand for n>0? n: 1, so while it is less "code", it isn't less logic for the reader to mentally parse.

That is my five cents, though I have no problem making the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, both have equal cognitive load, but I'm less tempted to think about possible c++ weirdness with n ? n : 1 than max. (And their not precisely equivalent but I have no idea what the compiler could do to either one).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler should generate equally efficient code, it is just a matter of semantics I think.

https://godbolt.org/z/0ITuiT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, I can never convince gcc to produce a conditional move when I want it to, damn it.

sup/gpu.cpp Outdated
#ifdef ARB_MPI_ENABLED
#include <mpi.h>
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code doesn't depend upon Arbor having been built with MPI or CUDA or not, but rather on the current configuration environment.

When we split sup up into a local-for-us library and an installed-for-general-consumption component, the latter will have to have similar CMake guff that the arbor library itself has to make this all hang together; in the meantime, we can hand-write the CMake stuff for sup that handles the ARB_WITH_MPI and ARB_WITH_CUDA stuff as required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you are saying.

You are suggesting another set of #defines, e.g. SUP_WITH_MPI, SUP_WITH_CUDA from CMake?
We can't use the ARB_WITH* defines, because they shouldn't go past the public/private split.

We always want consistency between sup and arbor, i.e. if Arbor was compiled with MPI support, then sup must have matching MPI support.
The libsup installed with libarbor must be consistent, and the libsup and libarbor "used locally" for tests and the like must also always be consistent.

So ARB_MPI_ENABLED and SUP_WITH_MPI must always agree. Which makes me wonder if it isn't just simpler to use the one, even if it isn't a perfect split in theory.

To play the devil's advocate: two arguments for keeping it this way

  • it is actually less confusing for users.
  • one single variable, ARB_XYZ_ENABLED, has to be configured in CMake for every feature, instead of two (and the less CMake wrangling, the better).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ARB_WITH_foo defines are provided by the CMake infrastructure, and reflect the configuration environment. They are never exposed to user code, but there is no reason why they cannot also be used in the sup sources (obviously not the headers).

I can't see any need for the sup code to provide arbor-like feature macros.

The CMake guff will come from splitting MPI and CUDA dependencies out from arbor-private-deps into an internal dependency target that will be a requirement for both arbor and sup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addendum: the only reason why I've been advocating not using ARB_WITH_foo in our test sources is that it allows us to catch interface issues — it keeps us more honest when it come to corralling our configuration state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point about minimising ARB_WITH_foo in the tests is a good one: we want to test the public interface whenever possible.

I am just trying to think about how to keep such #ifdef MPI out of the sup headers. When we have MPI support, there will be a header that exposes find_private_gpu(MPI_Comm), which will obviously need to include mpi.h, like sup/include/sup/with_mpi.hpp currently does.

I would have to put the find_private_gpu(MPI_Comm) in a separate file to the find_gpu() (or whatever we call it, you get what I am saying).

Is this correct reasoning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can put them in separate files, and only install the MPI one if MPI is configured, making a mismatch a compile error.

Or we can use a template in the header with MPI_Comm the type parameter, making a mismatch a link error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Currently we're not installing sup at all, so to make things work with our current code and minimize CMake nonsense, I'd go with the template option.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template option is a bit easier to implement, and is a pattern that we use elsewhere.
But compile errors are always preferable to linker errors.
I will think about it some more.

#endif

} // namespace sup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, find_gpu(MPI_Comm) does not require that arbor was built with MPI, but that MPI support is available at configure time. The MPI code can live in a different header+source that just includes mpi.h, but which is only conditionally included in the sup library based on the configure-time ARB_WITH_MPI.

The find_gpu names for these functions are too generic and don't really reflect what they do, which itself is very different in find_gpu() and find_gpu(MPI_Comm).

The non-distributed version just checks to see if any GPU is visible, and so should have a corresponding name and return true or false. Alternately, if the interface doesn't change, it should have a name like sup::default_gpu() or similar.

The distributed one does this negotiation between ranks, and perhaps should be called something like sup::find_private_gpu_id(MPI_Comm) or something else along these lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be for keeping the interfaces, and renaming them. sup::default_gpu and sup::find_private_gpu are quite good names for that purpose!

@bcumming
Copy link
Member Author

bcumming commented Dec 5, 2018

I have acted on all the suggested changes, except the logic that selects the number of threads to use in the examples:

if (auto nt = sup::get_env_num_threads()) {
    resources.num_threads = nt; 
}   
else {
    resources.num_threads = sup::thread_concurrency();
}   

Which I kept, because that is six of one, half a dozen of the other.

Copy link
Contributor

@halfflat halfflat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good!
I'm not really happy about the 17 or so lines of common context-making code in each example — this really feels like copy-and-paste code that should be wrapped — but we can debate the colour of that particular bike shed later!

@halfflat
Copy link
Contributor

halfflat commented Dec 5, 2018

I'm going to keep #647 open until we have the separation of sup into the example/test-only code and an installable adjunct utility library.

@halfflat halfflat merged commit 712070f into arbor-sim:master Dec 5, 2018
@bcumming bcumming deleted the mpi-gpus branch April 5, 2019 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants