-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor hardware detection to sup #654
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issues are:
- The naming/interface of
find_gpu(...)
. - The conflation of the status of the Arbor feature test macros with the configuration-time state of the
sup
library; these things are the same only contingently, and will make for an unnecessarily indirect configuration dependency chain when we splitsup
up into an installable library component.
else { | ||
resources.num_threads = sup::thread_concurrency(); | ||
} | ||
|
||
#ifdef ARB_MPI_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thread count code is repeated in each example, and will likely appear in pretty much every C++ code that uses the library. It's boilerplate code that could be wrapped up with e.g. sup::get_default_threads()
or similar; people can still use the more fundamental functions if they wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is repeated in each example, but I don't have a problem with this because:
- I think this makes it much clearer what is going on: test the environment variable, then attempt to inspect the CPU set if the environment variable isn't set.
- the examples are meant to be stand alone, so a bit of repeated code isn't a crime.
I really feel that combining this logic into a single function obscures what is going on too much.
This level of granularity feels "about right" for what the sup
library is trying to achieve: small well-defined tools/types/functions that are useful for all examples, tests, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think though that almost every such example will have similar code, and a bunch of user code will too. By providing an easy-to-use default function, we make it easy for library users also to do the reasonable thing by default.
sup
contains mainly things that we want to use in common across the examples and tests, which should not form part of the core interface. Wrapping boilerplate code is part of its very purpose — I really don't see this as a problem at all.
This thread counting code and the GPU negotiation code will be in whatever we will call our installed-with-Arbor helper library; any obscurity in its functionality can be well addressed simply with documentation, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit I am a bit uneasy suggesting that ARB_NUM_THREADS
is a reasonable environment variable for users ;)
More seriously, user code might be similar, but would in all likelihood use different environment variables.
The examples might share code via sup
, but at the same time they should be as simple to understand stand-alone. Because nobody reads the docs (they just create a ticket).
This feels a bit like 'half of one, six of the other'. I can change it, but I think that both of our positions are fairly reasonable.
example/bench/bench.cpp
Outdated
{ | ||
int rank = 0; | ||
int rank; | ||
MPI_Comm_rank(MPI_COMM_WORLD, &rank); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a minor nit, but if MPI_Comm_rank
were to fail, it wouldn't write to rank
; having a default initialized value seems like good practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a minor nit at all: I should be testing the return value from MPI_Comm_rank... but wait a moment, I think I see a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with
root = arb::rank(context) == 0;
example/bench/bench.cpp
Outdated
MPI_Comm_rank(MPI_COMM_WORLD, &rank); | ||
is_root = rank==0; | ||
} | ||
#else | ||
auto context = arb::make_context(); | ||
resources.gpu_id = sup::find_gpu(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think find_gpu()
is not a good name; more on this below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree with the proposed changes below.
sup/concurrency.cpp
Outdated
@@ -25,7 +30,7 @@ namespace threading { | |||
// | |||
// Throws std::runtime_error: | |||
// ARB_NUM_THREADS or OMP_NUM_THREADS is set with invalid value. | |||
util::optional<size_t> get_env_num_threads() { | |||
arb::util::optional<size_t> get_env_num_threads() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here, this would be a good opportunity to reduce the use of arb::util::optional
in public interfaces: this is going to be of a concern when we're in a mixed C++17 environment and we move to std::optional
. (We're currently only using optional
in one other public method, in arb::mc_segment
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change it to return 0 or -1 on failure to read an environment variable. Is that what you would suggest?
I have no problem with making the fix, but I don't see what the problem is with keeping it.
When we move to C++17, wouldn't we just change it to std::optional
?
If a user is currently using C++17, then how does our current set up cause a problem (besides potential confusion about the difference between std::optional
and util::optional
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible to lift a name into a new namespace? Have an arb::future
, where eventuall arb::future::optional = std::optional
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chief problem is one of interface: it's quite possible that we build Arbor with a C++17-capable compiler, but that users building against an installed version of it might want to use a different compiler with less robust C++ support (assuming all the ABIs are the same of course).
We should definitely move to std::optional
when we can rely upon a C++17 build environment, but we shouldn't force a C++17 dependency through the interface, at least until compiler support is near universal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is probably worth removing the optional
from this interface, because it wouldn't be a massive loss (the error state is easy enough to encode in the return value with 0 or -1).
We might have a look at removing the optional
on mc_segment
so that this becomes a discussion we don't have to have.
If we were to keep optional
in our interface (and similar types like any
), I think Sam's idea of just using our util::optional
until we move everything to C++17 is the way forward.
I want to avoid having multiple implementations of things in the library for different C++ standards: I would prefer to just implement everything in the minimum standard that we have selected for the library (currently C++14).
// If still zero, use one thread. | ||
n = std::max(n, 1u); | ||
|
||
return n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep the comment, but using max
here just adds code to no effect: there's nothing wrong with n? n: 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer using std::max
for this sort of logic. It is arguable whether a ternary or max
is clearer... probably a matter of what you are used to.
The suggested ternary n? n: 1
is short hand for n>0? n: 1
, so while it is less "code", it isn't less logic for the reader to mentally parse.
That is my five cents, though I have no problem making the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, both have equal cognitive load, but I'm less tempted to think about possible c++ weirdness with n ? n : 1
than max
. (And their not precisely equivalent but I have no idea what the compiler could do to either one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler should generate equally efficient code, it is just a matter of semantics I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, I can never convince gcc to produce a conditional move when I want it to, damn it.
sup/gpu.cpp
Outdated
#ifdef ARB_MPI_ENABLED | ||
#include <mpi.h> | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code doesn't depend upon Arbor having been built with MPI or CUDA or not, but rather on the current configuration environment.
When we split sup
up into a local-for-us library and an installed-for-general-consumption component, the latter will have to have similar CMake guff that the arbor library itself has to make this all hang together; in the meantime, we can hand-write the CMake stuff for sup
that handles the ARB_WITH_MPI
and ARB_WITH_CUDA
stuff as required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you are saying.
You are suggesting another set of #defines
, e.g. SUP_WITH_MPI
, SUP_WITH_CUDA
from CMake?
We can't use the ARB_WITH*
defines, because they shouldn't go past the public/private split.
We always want consistency between sup and arbor, i.e. if Arbor was compiled with MPI support, then sup must have matching MPI support.
The libsup
installed with libarbor
must be consistent, and the libsup
and libarbor
"used locally" for tests and the like must also always be consistent.
So ARB_MPI_ENABLED
and SUP_WITH_MPI
must always agree. Which makes me wonder if it isn't just simpler to use the one, even if it isn't a perfect split in theory.
To play the devil's advocate: two arguments for keeping it this way
- it is actually less confusing for users.
- one single variable,
ARB_XYZ_ENABLED
, has to be configured in CMake for every feature, instead of two (and the less CMake wrangling, the better).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ARB_WITH_foo
defines are provided by the CMake infrastructure, and reflect the configuration environment. They are never exposed to user code, but there is no reason why they cannot also be used in the sup
sources (obviously not the headers).
I can't see any need for the sup
code to provide arbor
-like feature macros.
The CMake guff will come from splitting MPI and CUDA dependencies out from arbor-private-deps
into an internal dependency target that will be a requirement for both arbor
and sup
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addendum: the only reason why I've been advocating not using ARB_WITH_foo
in our test sources is that it allows us to catch interface issues — it keeps us more honest when it come to corralling our configuration state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point about minimising ARB_WITH_foo
in the tests is a good one: we want to test the public interface whenever possible.
I am just trying to think about how to keep such #ifdef MPI
out of the sup headers. When we have MPI support, there will be a header that exposes find_private_gpu(MPI_Comm)
, which will obviously need to include mpi.h
, like sup/include/sup/with_mpi.hpp
currently does.
I would have to put the find_private_gpu(MPI_Comm)
in a separate file to the find_gpu()
(or whatever we call it, you get what I am saying).
Is this correct reasoning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can put them in separate files, and only install the MPI one if MPI is configured, making a mismatch a compile error.
Or we can use a template in the header with MPI_Comm
the type parameter, making a mismatch a link error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Currently we're not installing sup at all, so to make things work with our current code and minimize CMake nonsense, I'd go with the template option.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The template option is a bit easier to implement, and is a pattern that we use elsewhere.
But compile errors are always preferable to linker errors.
I will think about it some more.
#endif | ||
|
||
} // namespace sup | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, find_gpu(MPI_Comm)
does not require that arbor
was built with MPI, but that MPI support is available at configure time. The MPI code can live in a different header+source that just includes mpi.h
, but which is only conditionally included in the sup
library based on the configure-time ARB_WITH_MPI
.
The find_gpu
names for these functions are too generic and don't really reflect what they do, which itself is very different in find_gpu()
and find_gpu(MPI_Comm)
.
The non-distributed version just checks to see if any GPU is visible, and so should have a corresponding name and return true or false. Alternately, if the interface doesn't change, it should have a name like sup::default_gpu()
or similar.
The distributed one does this negotiation between ranks, and perhaps should be called something like sup::find_private_gpu_id(MPI_Comm)
or something else along these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be for keeping the interfaces, and renaming them. sup::default_gpu
and sup::find_private_gpu
are quite good names for that purpose!
I have acted on all the suggested changes, except the logic that selects the number of threads to use in the examples: if (auto nt = sup::get_env_num_threads()) {
resources.num_threads = nt;
}
else {
resources.num_threads = sup::thread_concurrency();
} Which I kept, because that is six of one, half a dozen of the other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good!
I'm not really happy about the 17 or so lines of common context-making code in each example — this really feels like copy-and-paste code that should be wrapped — but we can debate the colour of that particular bike shed later!
I'm going to keep #647 open until we have the separation of |
Refactoring that moves logic for determing available concurrency and finding GPUs from the core Arbor library to the sup library.
Before this patch, Arbor's would attempt to use the first available GPU by default.
This caused problems on systems with multiple GPUs and multiple ranks on the same node, where by default all ranks would attempt to use the same GPU.
Now thread pools and execution contexts are always default initialized to use one thread and no GPU, making no effort to determine if more resources are available.
The sup library now provides an interface for
Fixes #647