Expand support for completing non-blocking requests #989

gunney1 · 2022-07-28T21:41:56Z

I'd like to have more flexible ways to complete conduit::relay::mpi::Requests from non-blocking communications. Specifically,

support the way applications store Requests and
provide a choice to use MPI_Waitsome and MPI_Testsome to complete the MPI side of the requests.

Conduit "wait" functions requires a C array of Requests. Building such an array requires reallocating memory when we run out of space. However, reallocating memory would invalidate the pointers given to MPI. We use a list<Request> in Axom so we don't invalidate any pointers to add more Requests. If the Conduit "wait" functions allowed Request *requests[] instead of just Request requests[] we could use them. We're currently writing our own "wait" code, but since we're using conduit::relay::mpi::Request, it would make sense to let Conduit complete the request.

The second part is for conduit to support waitsome and testsome to check requests. The first always completes at least 1 outstanding request (if there are any) and the second is non-blocking. We currently do it this way:

void check_send_requests(std::list<conduit::relay::mpi::Request>& isendRequests,
                         bool atLeastOne) const
{
  std::vector<MPI_Request> reqs; // Intermediate request container for MPI.
  for(auto& isr : isendRequests) reqs.push_back(isr.m_request);

  int inCount = static_cast<int>(reqs.size());
  int outCount = 0;
  std::vector<int> indices(reqs.size(), -1);
  if(atLeastOne)
    MPI_Waitsome(inCount,
                 reqs.data(),
                 &outCount,
                 indices.data(),
                 MPI_STATUSES_IGNORE);
  else
    MPI_Testsome(inCount,
                 reqs.data(),
                 &outCount,
                 indices.data(),
                 MPI_STATUSES_IGNORE);
  indices.resize(outCount);
  // Remove the completed requests.  (Since they are sends, nothing else is requires.)
  auto reqIter = isendRequests.begin();
  int prevIdx = 0;
  for(const int idx : indices)
  {
    for(; prevIdx < idx; ++prevIdx) ++reqIter;
    reqIter = isendRequests.erase(reqIter);
    ++prevIdx;
  }
}

Obviously, if conduit support lists of Requests, we wouldn't need to generate the intermediate container to pass to the wait functions. But there's probably no way to do this without templates.

The text was updated successfully, but these errors were encountered:

cyrush · 2022-07-30T02:10:37Z

Instead of an all in one method, how about adding wait_some and test_some methods.

Would it be possible to also support this for recv, if we use std::list as the primary interface for passing the data?

gunney1 · 2022-08-15T23:07:32Z

Sorry for the late response.

Yes, I think if you support wait_some and test_some AND std::list as the primary interface for passing data, it would be what we need.

cyrush · 2022-08-16T17:49:50Z

@gunney1

Great, I started working on this, a question:

Your method checks and when the requests are fulfilled, and removes fulfilled ones those from the list.
(We can detect the send vs recv case and do the proper node cleanup)

For the general case - do we want to know which ones completed?

Do we want:

int wait_some(std::list<Request> &requests)

For rounds of communication.

or something like:

int wait_some(std::list<Request> &requests,
                       std::vector<int> &out_indices)

If we want to take some action on the Nodes between wait_some calls. It seems like this would be necessary. But if we shrink the list, the out_indices won't correspond to the adjusted list.

gunney1 · 2022-08-18T22:14:29Z

It may be possible that I want to have the completed requests around for some reason, although I probably don't care about the out_indices. What do you think of moving the completed requests out of the input list and placed into another list to return? Something like

list<request> check_recv_requests(list<request>& inRequests)
{
  list<request> completed;
  list<request>::iterator aCompletedRequest = ...;
  completed.splice(completed.end(), inRequests, aCompletedRequest);
  ...
  return completed;
}

Would that work? Of course, you can also have both lists in the argument and return void.

cyrush · 2022-08-18T22:24:45Z

Yes, that's a great idea! Style wise I would like to pass a ref to the list for the output as well.

cyrush · 2023-02-17T20:59:14Z

@gunney1 Q: Did you look at

conduit/src/libs/relay/conduit_relay_mpi.hpp

Line 281 in 1255e71

class CONDUIT_RELAY_API communicate_using_schema

I think it's different comm pattern than you are using -- but it provides a very nice general interface for isends and irecvs.

gunney1 · 2023-03-02T23:06:27Z

@cyrush Very sorry for taking so long. After calling execute, how would I determine which non-blocking operation completed?

gunney1 · 2024-04-26T15:30:21Z

@cyrush Just getting back to this. I'm still interested in this feature. I'd like to be able to specify whether I'm willing to wait for at least one to finish or if I'm just checking whether any has actually finished. Basically, whether to use MPI_Waitsome or MPI_Testsome. Testsome is used when I'm not willing to wait, but I will take whatever has completed. None if that's the case. Waitsome is used when I'm willing to wait the minimum time to get at least one completion. I don't necessarily want to wait for all to finish, because I want to interleave work and wait. In my implementation, I have parameter bool atLeastOne, that says whether I'm willing to wait for at least one to complete. The options would allow me to maximize the amount of local computation I can do while waiting.

cyrush added the feature label Aug 11, 2022

cyrush added this to the 0.9.0 milestone Aug 11, 2022

cyrush modified the milestones: 0.9.0, 0.9.1 Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand support for completing non-blocking requests #989

Expand support for completing non-blocking requests #989

gunney1 commented Jul 28, 2022

cyrush commented Jul 30, 2022

gunney1 commented Aug 15, 2022

cyrush commented Aug 16, 2022

gunney1 commented Aug 18, 2022

cyrush commented Aug 18, 2022

cyrush commented Feb 17, 2023

gunney1 commented Mar 2, 2023

gunney1 commented Apr 26, 2024 •

edited

Expand support for completing non-blocking requests #989

Expand support for completing non-blocking requests #989

Comments

gunney1 commented Jul 28, 2022

cyrush commented Jul 30, 2022

gunney1 commented Aug 15, 2022

cyrush commented Aug 16, 2022

gunney1 commented Aug 18, 2022

cyrush commented Aug 18, 2022

cyrush commented Feb 17, 2023

gunney1 commented Mar 2, 2023

gunney1 commented Apr 26, 2024 • edited

gunney1 commented Apr 26, 2024 •

edited