Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix apex annotation for async dispatch #4257

Merged
merged 8 commits into from Dec 6, 2019
Merged

Fix apex annotation for async dispatch #4257

merged 8 commits into from Dec 6, 2019

Conversation

@biddisco
Copy link
Contributor

biddisco commented Dec 3, 2019

When an annotated task is passed though the asyn dispatch, we can strip
off the task 'name' or annotation and pass it through to the thread
creation so that it is used when the task is initialized and apex
get the right name. We make use of
traits::get_function_annotation<typename hpx::util::decay::type>::call(f)
to achieve this.

When an annotated task is passed though the async dispatch, we can strip
off the task 'name' or annotation and pass it through to the thread
creation so that it is used when the task is initialized and apex
get the right name. We make use of
traits::get_function_annotation<typename hpx::util::decay<F>::type>::call(f)
to achieve this.
@biddisco

This comment has been minimized.

Copy link
Contributor Author

biddisco commented Dec 3, 2019

This is an experimental patch that @khuck should try. It intercepts the annotated task function name and passes it through the async dispatch so that when the task is created, the task annotation is passed in. This solves the primary problem of tasks being renamed after the OTF2 trace info is created.

There may be other places where special handling is needed.

The code that handles stacksize, priority, policy, annotation &etc should be consolidated into a task_description type object. @hkaiser I welcome your feedback on that point. We could simplify a lot of the internal API by packing those items into a struct earlier on. Executors tat forward information through to other executors all need special treatment currently to handle these parameters. simplification would be a big help.

@hkaiser hkaiser added this to the 1.5.0 milestone Dec 3, 2019
@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 3, 2019

@biddisco @hkaiser I am testing it now...

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 3, 2019

@biddisco So very close! The only thing that didn't work is the "then" clause, for example:

auto f1 = hpx::async(HP_executor,
    hpx::util::annotated_function(&dummy_task, "HP task"), "HP task", hp_m).
    then(SYNCHRONIZATION, hpx::util::annotated_function([&](auto &&){
        counter--;
    }, "then after HP"));

My example is here: https://github.com/khuck/jb_error
The "correct" task graph is here: https://github.com/khuck/jb_error/blob/master/taskgraph_pre_sync.png
All trace events are correct except the "then" clauses.
Here's the vampir figure: https://github.com/khuck/jb_error/blob/master/trace.png
1000 of the "...post" tasks should have been "launch inner", and 500 of them should have been "then after HP". They are both launched with "then" clauses.

@msimberg

This comment has been minimized.

Copy link
Contributor

msimberg commented Dec 3, 2019

I think the .then tasks having the wrong annotations is related to this hiding the annotation in f_ (the deferred_call with a member function doesn't pick up the annotation from this->f_).

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 3, 2019

@msimberg @biddisco @hkaiser I fixed the "then" clause (in hpx/lcos/local/packaged_continuation.hpp) and now it's annotated correctly. I'll push that change in a few...

before creating the thread_description.
@msimberg

This comment has been minimized.

Copy link
Contributor

msimberg commented on hpx/lcos/local/packaged_continuation.hpp in 37de97f Dec 3, 2019

I think

hpx::util::thread_description desc(f_, "hpx::parallel::execution::parallel_executor::post");

will do the exact same thing.

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 3, 2019

@msimberg I just found the same thing. I'm making that change (it's cleaner) and testing it now...

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 3, 2019

@msimberg Interesting... it did NOT do the same thing. It grabbed the annotation from the hpx::async() function, not the "then" function. I'm going to leave my change as it is, and not revert it.

@msimberg

This comment has been minimized.

Copy link
Contributor

msimberg commented Dec 3, 2019

@msimberg Interesting... it did NOT do the same thing. It grabbed the annotation from the hpx::async() function, not the "then" function. I'm going to leave my change as it is, and not revert it.

@khuck note that

hpx::threads::thread_id_type id = hpx::threads::get_self_id();
if (id)
{
// get the current task description
thread_description desc = hpx::threads::get_thread_description(id);
type_ = desc.kind();
// if the current task has a description, use it.
if (type_ == data_type_description)
{
data_.desc_ = desc.get_description();
}
else
{
// if there is an alternate name passed in, use it.
if (altname != nullptr) {
type_ = data_type_description;
data_.desc_ = altname;
} else {
// otherwise, use the address of the task.
HPX_ASSERT(type_ == data_type_address);
data_.addr_ = desc.get_address();
}
}
} else {
type_ = data_type_description;
data_.desc_ = altname;
}

has to be changed to give priority to altname. With that change I'm 99% sure

hpx::util::thread_description desc(f_, "hpx::parallel::execution::parallel_executor::post");

will do the same thing.

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 3, 2019

@msimberg I just confirmed that in the debugger. I'll try that fix, too...

@biddisco

This comment has been minimized.

Copy link
Contributor Author

biddisco commented Dec 3, 2019

Apologies. I intentionally didn't fix the .then() case. The async case is used by most of the code paths and one would need to make another change around the parallel::post area to fix .then() - I was really just asking if this was the right approach from @hkaiser 's point of view. Since I've modified the async dispatch signatures in a few places I thought it would be rejected anyway.

and fixing the packaged_continuation to use the correct thread_description
constructor.
@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 3, 2019

@msimberg @biddisco OK, I pushed my changes... looks like annotations are correct now.

@biddisco

This comment has been minimized.

Copy link
Contributor Author

biddisco commented Dec 4, 2019

Nice teamwork everyone. Question. Since we have now fixed the 'real' problem of tasks being created with one name and then renamed later, does this mean that some of the 'fixes' and checks that have been added to apex can now be removed or simplified?

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 4, 2019

Nice teamwork everyone. Question. Since we have now fixed the 'real' problem of tasks being created with one name and then renamed later, does this mean that some of the 'fixes' and checks that have been added to apex can now be removed or simplified?

I was thinking the same thing. If the functions are annotated before starting, we don't need the update_task() calls, other than when annotate_function() is used (and should be refactored to be called "rename_function" to make it less ambiguous). Also, the annotated_function() shouldn't call annotate_function() because it isn't necessary now.

@msimberg

This comment has been minimized.

Copy link
Contributor

msimberg commented Dec 4, 2019

Just to put the idea out there: John was at one point considering adding overloads to async/apply that would take a function annotation as the first argument to make annotating functions a bit less verbose. In other words something like this:

hpx::async("my important task", f, arg1, arg2);

It would purely be a simple wrapper making f an annotated_function but it's quite a nice shorthand. It's obviously not standard, but I think it could be a worthy addition. I'll open a separate issue if someone doesn't object badly.

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 4, 2019

Just to put the idea out there: John was at one point considering adding overloads to async/apply that would take a function annotation as the first argument to make annotating functions a bit less verbose. In other words something like this:

hpx::async("my important task", f, arg1, arg2);

It would purely be a simple wrapper making f an annotated_function but it's quite a nice shorthand. It's obviously not standard, but I think it could be a worthy addition. I'll open a separate issue if someone doesn't object badly.

No objections!

Also, John and I talked about swapping the order of the arguments to annotated_function() to be name, function rather than function, name. Thoughts? It would make it easier to annotate lambdas, for example.

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 4, 2019

@biddisco The pycicle failures are due to some changes in HPX that exposed the fact that both APEX and HPX use concurrentqueue from https://github.com/cameron314/concurrentqueue, but unfortunately different versions. I patched APEX to use the concurrentqueue.hpp that is part of HPX instead of downloading the other one from GitHub. The latest APEX develop branch has this fix merged.

@biddisco

This comment has been minimized.

Copy link
Contributor Author

biddisco commented Dec 5, 2019

just for clarification, I'm not in favour of
hpx::async("function name", &task_rddr, args...)
but I am in favour of
hpx::async(task_descriptor("function name", ...), &task_rddr, args...)
where a task_descriptor could be some new type that is analogous to annotated_function

edit: NB. task_descriptor in this sense should be somehow related to execution policies and executor parameters

@msimberg msimberg modified the milestones: 1.5.0, 1.4.0 Dec 5, 2019
@hkaiser

This comment has been minimized.

Copy link
Member

hkaiser commented Dec 5, 2019

just for clarification, I'm not in favour of
hpx::async("function name", &task_rddr, args...)
but I am in favour of
hpx::async(task_descriptor("function name", ...), &task_rddr, args...)
where a task_descriptor could be some new type that is analogous to annotated_function

What would be the difference between task_descriptor and the existing annotated_function (besides the name)?

Copy link
Member

hkaiser left a comment

LGTM, except for the access-after-move problems.

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 5, 2019

@biddisco @msimberg @hkaiser I would like to push some more updates to this branch, please let me know if that's OK:

  1. renamed "annotate_function" to "rename_function"
  2. changed order of annotated_function(f, name) to annotated_function(name, f)
  3. added a directory of unit tests in tests/unit/apex with 8 new tests
  4. other minor fixes
    I'll push these when you give me the OK.
@hkaiser

This comment has been minimized.

Copy link
Member

hkaiser commented Dec 5, 2019

@khuck the changes sound sensible to me. I'd suggest to keep those on a separate PR, however.

khuck added a commit that referenced this pull request Dec 5, 2019
Based of the PR #4257, this branch contains more changes to refactor
the annotated_function() support in HPX.  In particular:
1) renamed "annotate_function" to "rename_function"
2) changed order of annotated_function(f, name) to annotated_function(name, f)
3) added a directory of unit tests in tests/unit/apex with 8 new tests
4) other minor fixes
@msimberg

This comment has been minimized.

Copy link
Contributor

msimberg commented Dec 6, 2019

@khuck also in favour of those changes (including them being on a separate branch). My only wish is that you keep the old overloads around and mark them deprecated with HPX_DEPRECATED(HPX_DEPRECATED_MSG).

@teonnik

This comment has been minimized.

Copy link

teonnik commented Dec 6, 2019

From a Slack conversation with John:

I am running a HPX + APEX (master with the recent #4261, networking is off) / MPI application on Piz Daint. When running on multiple processes, I only get a trace for a single process. Even with a clean directory, I get:

[OTF2] src/otf2_archive_int.c:3945: error: File does already exist: Could not create archive trace directory!
[OTF2] src/otf2_archive_int.c:1108: error: File does already exist: Couldn't create directories on root.
Closing OTF2 event files...
Writing OTF2 definition files...
Writing OTF2 Global definition file...
Writing OTF2 Node information...
Writing OTF2 Communicators...
Closing the archive...
done.
OTF2 Error: 17, File does already exist
....

from most processes. Only 1 process is able to write the OTF2 file.

@biddisco

This comment has been minimized.

Copy link
Contributor Author

biddisco commented Dec 6, 2019

@khuck So if we use N MPI ranks, but HPX_WITH_NETWORKING=OFF then I guess apex doesn't see each of the ranks as part of the same job, and it overwrites the trace files (or something similar). Is there an easy way to tell apex that it should create one trace file per rank/process and not clobber the other ones? (also the time sync for each rank might need some adjustment)

@msimberg

This comment has been minimized.

Copy link
Contributor

msimberg commented Dec 6, 2019

I pushed a few commits to take care of the remaining build failures and the potential use after move. In a small test the task names seem to still work as before. I also snuck in a small commit fixing the use of a deprecated affinity header. I can remove this if you prefer.

Copy link
Member

hkaiser left a comment

Thanks!

@msimberg

This comment has been minimized.

Copy link
Contributor

msimberg commented Dec 6, 2019

I had to make one final formatting commit. Besides the formatting the pycicle apex build and CircleCI were happy. Merging.

@msimberg msimberg merged commit 8d727c6 into master Dec 6, 2019
0 of 4 checks passed
0 of 4 checks passed
build-and-test Workflow: build-and-test
Details
Codacy/PR Quality Review Hang in there, Codacy is reviewing your Pull request.
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details
@msimberg msimberg deleted the async_annotation branch Dec 6, 2019
@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 7, 2019

@khuck So if we use N MPI ranks, but HPX_WITH_NETWORKING=OFF then I guess apex doesn't see each of the ranks as part of the same job, and it overwrites the trace files (or something similar). Is there an easy way to tell apex that it should create one trace file per rank/process and not clobber the other ones? (also the time sync for each rank might need some adjustment)

Then HPX is lying to APEX about how many ranks there are, and who is rank zero. see:

hpx/src/runtime_impl.cpp

Lines 408 to 411 in d001111

#ifdef HPX_HAVE_APEX
util::external_timer::init(nullptr, hpx::get_locality_id(),
hpx::get_initial_num_localities());
#endif

@biddisco

This comment has been minimized.

Copy link
Contributor Author

biddisco commented Dec 9, 2019

@khuck if I were to add command line support to tell apex that we had N localities and replace the hpx::get_initial_num_localities() and hpx::get_locality_id() with something else - would the rest of apex function as expected, or are there other places where apex would assume that communication with other localities was required.
In other words - I can spoof a value that tells apex we have 2,4,8 etc ranks and so it will initialize itself correctly and enable writing out of files per rank - but does apex use hpx actions or other types that involve communication anywhere at all in the code?

@khuck

This comment has been minimized.

Copy link
Contributor

khuck commented Dec 9, 2019

@khuck if I were to add command line support to tell apex that we had N localities and replace the hpx::get_initial_num_localities() and hpx::get_locality_id() with something else - would the rest of apex function as expected, or are there other places where apex would assume that communication with other localities was required.
In other words - I can spoof a value that tells apex we have 2,4,8 etc ranks and so it will initialize itself correctly and enable writing out of files per rank - but does apex use hpx actions or other types that involve communication anywhere at all in the code?

That should work. As of right now, APEX doesn't use any distributed features of HPX. When the OTF library is initialized each process has to know its "rank", otherwise they all assume they are rank 0.

@biddisco

This comment has been minimized.

Copy link
Contributor Author

biddisco commented Jan 16, 2020

@khuck

That should work. As of right now, APEX doesn't use any distributed features of HPX. When the OTF library is initialized each process has to know its "rank", otherwise they all assume they are rank 0.

I have implemented the patch to

hpx/src/runtime_impl.cpp

Lines 408 to 411 in d001111

#ifdef HPX_HAVE_APEX
util::external_timer::init(nullptr, hpx::get_locality_id(),
hpx::get_initial_num_localities());
#endif
so that we pass mpi rank and mpi size in instead of hpx localities etc when NETWORKING is OFF. Superficially this seems to be working and I am indeed getting trace files generated for N ranks and they can be loaded in vampir - however, they are incomplete/corrupted.

Initializing instrumentation with rank,size 0 2
Initializing instrumentation with rank,size 1 2
Rank 0 of 2.
Rank 1 of 2.
[OTF2] src/otf2_archive_int.c:1108: error: Unknown error code: Couldn't create directories on root.
OTF2 Error: INVALID, Unknown error code

I get messages of this kind when I do some debugging (I added the cout of rank/size) and the message appears to be emitted from https://github.com/khuck/xpress-apex/blob/d89100ba49511a16ab8c64f760f7d1bd2548e77b/src/apex/otf2_listener.cpp#L402-L405
the call to OTF2_Archive_SetCollectiveCallbacks

any idea what might be wrong?

@biddisco

This comment has been minimized.

Copy link
Contributor Author

biddisco commented Jan 17, 2020

I recompiled the OTF2 library in debug mode and traced the problem. The call to otf2 broadcast error was returning an unset pointer which said in effect "error" as rank 0 wrote the file/dir archive successfully. I've squashed that and now everything appears to proceed normally, but the trace files are not complete.
I do not know where to look next. Hints welcome please. since this issue is closed. I will copy these last two comments to a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.