Refactor hardware detection to sup #654

bcumming · 2018-11-28T08:29:07Z

Refactoring that moves logic for determing available concurrency and finding GPUs from the core Arbor library to the sup library.

Before this patch, Arbor's would attempt to use the first available GPU by default.
This caused problems on systems with multiple GPUs and multiple ranks on the same node, where by default all ranks would attempt to use the same GPU.

Now thread pools and execution contexts are always default initialized to use one thread and no GPU, making no effort to determine if more resources are available.

The sup library now provides an interface for

querying a best guess at the number of available threads.
finding a GPU, with and without MPI. Currently implements the old behavior of taking the first available GPU. The next PR will add logic for assigning GPUs in MPI jobs.

Fixes #647

halfflat

The main issues are:

The naming/interface of find_gpu(...).
The conflation of the status of the Arbor feature test macros with the configuration-time state of the sup library; these things are the same only contingently, and will make for an unnecessarily indirect configuration dependency chain when we split sup up into an installable library component.

halfflat · 2018-11-30T10:27:10Z

example/bench/bench.cpp

+        else {
+            resources.num_threads = sup::thread_concurrency();
+        }
+
 #ifdef ARB_MPI_ENABLED


This thread count code is repeated in each example, and will likely appear in pretty much every C++ code that uses the library. It's boilerplate code that could be wrapped up with e.g. sup::get_default_threads() or similar; people can still use the more fundamental functions if they wish.

It is repeated in each example, but I don't have a problem with this because:

I think this makes it much clearer what is going on: test the environment variable, then attempt to inspect the CPU set if the environment variable isn't set.

the examples are meant to be stand alone, so a bit of repeated code isn't a crime.

I really feel that combining this logic into a single function obscures what is going on too much.
This level of granularity feels "about right" for what the sup library is trying to achieve: small well-defined tools/types/functions that are useful for all examples, tests, etc.

I think though that almost every such example will have similar code, and a bunch of user code will too. By providing an easy-to-use default function, we make it easy for library users also to do the reasonable thing by default.

sup contains mainly things that we want to use in common across the examples and tests, which should not form part of the core interface. Wrapping boilerplate code is part of its very purpose — I really don't see this as a problem at all.

This thread counting code and the GPU negotiation code will be in whatever we will call our installed-with-Arbor helper library; any obscurity in its functionality can be well addressed simply with documentation, IMO.

I must admit I am a bit uneasy suggesting that ARB_NUM_THREADS is a reasonable environment variable for users ;)
More seriously, user code might be similar, but would in all likelihood use different environment variables.
The examples might share code via sup, but at the same time they should be as simple to understand stand-alone. Because nobody reads the docs (they just create a ticket).

This feels a bit like 'half of one, six of the other'. I can change it, but I think that both of our positions are fairly reasonable.

halfflat · 2018-11-30T10:27:51Z

example/bench/bench.cpp

        {
-            int rank = 0;
+            int rank;
            MPI_Comm_rank(MPI_COMM_WORLD, &rank);


Only a minor nit, but if MPI_Comm_rank were to fail, it wouldn't write to rank; having a default initialized value seems like good practice.

Not a minor nit at all: I should be testing the return value from MPI_Comm_rank... but wait a moment, I think I see a better solution.

replaced with

root = arb::rank(context) == 0;

halfflat · 2018-11-30T10:28:23Z

example/bench/bench.cpp

            MPI_Comm_rank(MPI_COMM_WORLD, &rank);
            is_root = rank==0;
        }
 #else
-        auto context = arb::make_context();
+        resources.gpu_id = sup::find_gpu();


I think find_gpu() is not a good name; more on this below.

Yep, I agree with the proposed changes below.

halfflat · 2018-11-30T10:31:08Z

sup/concurrency.cpp

@@ -25,7 +30,7 @@ namespace threading {
 //
 // Throws std::runtime_error:
 //  ARB_NUM_THREADS or OMP_NUM_THREADS is set with invalid value.
-util::optional<size_t> get_env_num_threads() {
+arb::util::optional<size_t> get_env_num_threads() {


While we're here, this would be a good opportunity to reduce the use of arb::util::optional in public interfaces: this is going to be of a concern when we're in a mixed C++17 environment and we move to std::optional. (We're currently only using optional in one other public method, in arb::mc_segment).

I could change it to return 0 or -1 on failure to read an environment variable. Is that what you would suggest?

I have no problem with making the fix, but I don't see what the problem is with keeping it.
When we move to C++17, wouldn't we just change it to std::optional?
If a user is currently using C++17, then how does our current set up cause a problem (besides potential confusion about the difference between std::optional and util::optional)?

Isn't it possible to lift a name into a new namespace? Have an arb::future, where eventuall arb::future::optional = std::optional?

The chief problem is one of interface: it's quite possible that we build Arbor with a C++17-capable compiler, but that users building against an installed version of it might want to use a different compiler with less robust C++ support (assuming all the ABIs are the same of course).

We should definitely move to std::optional when we can rely upon a C++17 build environment, but we shouldn't force a C++17 dependency through the interface, at least until compiler support is near universal.

I think it is probably worth removing the optional from this interface, because it wouldn't be a massive loss (the error state is easy enough to encode in the return value with 0 or -1).
We might have a look at removing the optional on mc_segment so that this becomes a discussion we don't have to have.

If we were to keep optional in our interface (and similar types like any), I think Sam's idea of just using our util::optional until we move everything to C++17 is the way forward.
I want to avoid having multiple implementations of things in the library for different C++ standards: I would prefer to just implement everything in the minimum standard that we have selected for the library (currently C++14).

halfflat · 2018-11-30T10:32:16Z

sup/concurrency.cpp

+    // If still zero, use one thread.
+    n = std::max(n, 1u);
+
+    return n;


We can keep the comment, but using max here just adds code to no effect: there's nothing wrong with n? n: 1.

I personally prefer using std::max for this sort of logic. It is arguable whether a ternary or max is clearer... probably a matter of what you are used to.
The suggested ternary n? n: 1 is short hand for n>0? n: 1, so while it is less "code", it isn't less logic for the reader to mentally parse.

That is my five cents, though I have no problem making the change.

For me, both have equal cognitive load, but I'm less tempted to think about possible c++ weirdness with n ? n : 1 than max. (And their not precisely equivalent but I have no idea what the compiler could do to either one).

The compiler should generate equally efficient code, it is just a matter of semantics I think.

https://godbolt.org/z/0ITuiT

Man, I can never convince gcc to produce a conditional move when I want it to, damn it.

halfflat · 2018-11-30T10:40:15Z

sup/gpu.cpp

+#ifdef ARB_MPI_ENABLED
+#include <mpi.h>
+#endif
+


This code doesn't depend upon Arbor having been built with MPI or CUDA or not, but rather on the current configuration environment.

When we split sup up into a local-for-us library and an installed-for-general-consumption component, the latter will have to have similar CMake guff that the arbor library itself has to make this all hang together; in the meantime, we can hand-write the CMake stuff for sup that handles the ARB_WITH_MPI and ARB_WITH_CUDA stuff as required.

I think I see what you are saying.

You are suggesting another set of #defines, e.g. SUP_WITH_MPI, SUP_WITH_CUDA from CMake?
We can't use the ARB_WITH* defines, because they shouldn't go past the public/private split.

We always want consistency between sup and arbor, i.e. if Arbor was compiled with MPI support, then sup must have matching MPI support.
The libsup installed with libarbor must be consistent, and the libsup and libarbor "used locally" for tests and the like must also always be consistent.

So ARB_MPI_ENABLED and SUP_WITH_MPI must always agree. Which makes me wonder if it isn't just simpler to use the one, even if it isn't a perfect split in theory.

To play the devil's advocate: two arguments for keeping it this way

it is actually less confusing for users.

one single variable, ARB_XYZ_ENABLED, has to be configured in CMake for every feature, instead of two (and the less CMake wrangling, the better).

The ARB_WITH_foo defines are provided by the CMake infrastructure, and reflect the configuration environment. They are never exposed to user code, but there is no reason why they cannot also be used in the sup sources (obviously not the headers).

I can't see any need for the sup code to provide arbor-like feature macros.

The CMake guff will come from splitting MPI and CUDA dependencies out from arbor-private-deps into an internal dependency target that will be a requirement for both arbor and sup.

Addendum: the only reason why I've been advocating not using ARB_WITH_foo in our test sources is that it allows us to catch interface issues — it keeps us more honest when it come to corralling our configuration state.

The point about minimising ARB_WITH_foo in the tests is a good one: we want to test the public interface whenever possible.

I am just trying to think about how to keep such #ifdef MPI out of the sup headers. When we have MPI support, there will be a header that exposes find_private_gpu(MPI_Comm), which will obviously need to include mpi.h, like sup/include/sup/with_mpi.hpp currently does.

I would have to put the find_private_gpu(MPI_Comm) in a separate file to the find_gpu() (or whatever we call it, you get what I am saying).

Is this correct reasoning?

We can put them in separate files, and only install the MPI one if MPI is configured, making a mismatch a compile error.

Or we can use a template in the header with MPI_Comm the type parameter, making a mismatch a link error.

(Currently we're not installing sup at all, so to make things work with our current code and minimize CMake nonsense, I'd go with the template option.)

The template option is a bit easier to implement, and is a pattern that we use elsewhere.
But compile errors are always preferable to linker errors.
I will think about it some more.

halfflat · 2018-11-30T10:49:04Z

sup/include/sup/gpu.hpp

+#endif
+
+} // namespace sup
+


As above, find_gpu(MPI_Comm) does not require that arbor was built with MPI, but that MPI support is available at configure time. The MPI code can live in a different header+source that just includes mpi.h, but which is only conditionally included in the sup library based on the configure-time ARB_WITH_MPI.

The find_gpu names for these functions are too generic and don't really reflect what they do, which itself is very different in find_gpu() and find_gpu(MPI_Comm).

The non-distributed version just checks to see if any GPU is visible, and so should have a corresponding name and return true or false. Alternately, if the interface doesn't change, it should have a name like sup::default_gpu() or similar.

The distributed one does this negotiation between ranks, and perhaps should be called something like sup::find_private_gpu_id(MPI_Comm) or something else along these lines.

I would be for keeping the interfaces, and renaming them. sup::default_gpu and sup::find_private_gpu are quite good names for that purpose!

bcumming · 2018-12-05T08:58:31Z

I have acted on all the suggested changes, except the logic that selects the number of threads to use in the examples:

if (auto nt = sup::get_env_num_threads()) {
    resources.num_threads = nt; 
}   
else {
    resources.num_threads = sup::thread_concurrency();
}

Which I kept, because that is six of one, half a dozen of the other.

halfflat

Changes look good!
I'm not really happy about the 17 or so lines of common context-making code in each example — this really feels like copy-and-paste code that should be wrapped — but we can debate the colour of that particular bike shed later!

halfflat · 2018-12-05T14:25:42Z

I'm going to keep #647 open until we have the separation of sup into the example/test-only code and an installable adjunct utility library.

bcumming added 6 commits November 27, 2018 11:33

remove auto-detection of cores/gpus from core library

2595d07

add find_gpu to sup: no fancy logic yest

f3243be

small fix

42d2378

Merge branch 'master' into mpi-gpus

d76b750

add CUDA headers to sup library

713520e

update examples to use new sup interface

2ac926e

bcumming changed the title ~~Mpi gpus~~ refactor hardware detection to sup Nov 28, 2018

bcumming changed the title ~~refactor hardware detection to sup~~ Refactor hardware detection to sup Nov 28, 2018

halfflat suggested changes Nov 30, 2018

View reviewed changes

bcumming added 3 commits December 3, 2018 15:50

refactoring for PR review

482d1cb

final clean up

5d7bec8

cleanup

7d3e2b7

halfflat approved these changes Dec 5, 2018

View reviewed changes

halfflat merged commit 712070f into arbor-sim:master Dec 5, 2018

bcumming deleted the mpi-gpus branch April 5, 2019 07:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor hardware detection to sup #654

Refactor hardware detection to sup #654

bcumming commented Nov 28, 2018

halfflat left a comment

halfflat Nov 30, 2018

bcumming Nov 30, 2018

halfflat Nov 30, 2018

bcumming Nov 30, 2018

halfflat Nov 30, 2018

bcumming Nov 30, 2018

bcumming Nov 30, 2018 •

edited

Loading

halfflat Nov 30, 2018

bcumming Nov 30, 2018

halfflat Nov 30, 2018

bcumming Nov 30, 2018 •

edited

Loading

apeyser Nov 30, 2018

halfflat Nov 30, 2018

bcumming Nov 30, 2018

halfflat Nov 30, 2018

bcumming Nov 30, 2018

apeyser Nov 30, 2018

bcumming Nov 30, 2018

halfflat Nov 30, 2018

halfflat Nov 30, 2018

bcumming Nov 30, 2018

halfflat Nov 30, 2018

halfflat Nov 30, 2018

bcumming Nov 30, 2018

halfflat Nov 30, 2018

halfflat Nov 30, 2018

bcumming Nov 30, 2018

halfflat Nov 30, 2018

bcumming Nov 30, 2018

bcumming commented Dec 5, 2018 •

edited

Loading

halfflat left a comment

halfflat commented Dec 5, 2018

Refactor hardware detection to sup #654

Refactor hardware detection to sup #654

Conversation

bcumming commented Nov 28, 2018

halfflat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcumming Nov 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcumming Nov 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcumming commented Dec 5, 2018 • edited Loading

halfflat left a comment

Choose a reason for hiding this comment

halfflat commented Dec 5, 2018

bcumming Nov 30, 2018 •

edited

Loading

bcumming Nov 30, 2018 •

edited

Loading

bcumming commented Dec 5, 2018 •

edited

Loading