Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand support for completing non-blocking requests #989

Open
gunney1 opened this issue Jul 28, 2022 · 8 comments
Open

Expand support for completing non-blocking requests #989

gunney1 opened this issue Jul 28, 2022 · 8 comments
Labels
Milestone

Comments

@gunney1
Copy link

gunney1 commented Jul 28, 2022

I'd like to have more flexible ways to complete conduit::relay::mpi::Requests from non-blocking communications. Specifically,

  1. support the way applications store Requests and
  2. provide a choice to use MPI_Waitsome and MPI_Testsome to complete the MPI side of the requests.

Conduit "wait" functions requires a C array of Requests. Building such an array requires reallocating memory when we run out of space. However, reallocating memory would invalidate the pointers given to MPI. We use a list<Request> in Axom so we don't invalidate any pointers to add more Requests. If the Conduit "wait" functions allowed Request *requests[] instead of just Request requests[] we could use them. We're currently writing our own "wait" code, but since we're using conduit::relay::mpi::Request, it would make sense to let Conduit complete the request.

The second part is for conduit to support waitsome and testsome to check requests. The first always completes at least 1 outstanding request (if there are any) and the second is non-blocking. We currently do it this way:

void check_send_requests(std::list<conduit::relay::mpi::Request>& isendRequests,
                         bool atLeastOne) const
{
  std::vector<MPI_Request> reqs; // Intermediate request container for MPI.
  for(auto& isr : isendRequests) reqs.push_back(isr.m_request);

  int inCount = static_cast<int>(reqs.size());
  int outCount = 0;
  std::vector<int> indices(reqs.size(), -1);
  if(atLeastOne)
    MPI_Waitsome(inCount,
                 reqs.data(),
                 &outCount,
                 indices.data(),
                 MPI_STATUSES_IGNORE);
  else
    MPI_Testsome(inCount,
                 reqs.data(),
                 &outCount,
                 indices.data(),
                 MPI_STATUSES_IGNORE);
  indices.resize(outCount);
  // Remove the completed requests.  (Since they are sends, nothing else is requires.)
  auto reqIter = isendRequests.begin();
  int prevIdx = 0;
  for(const int idx : indices)
  {
    for(; prevIdx < idx; ++prevIdx) ++reqIter;
    reqIter = isendRequests.erase(reqIter);
    ++prevIdx;
  }
}

Obviously, if conduit support lists of Requests, we wouldn't need to generate the intermediate container to pass to the wait functions. But there's probably no way to do this without templates.

@cyrush
Copy link
Member

cyrush commented Jul 30, 2022

Instead of an all in one method, how about adding wait_some and test_some methods.

Would it be possible to also support this for recv, if we use std::list as the primary interface for passing the data?

@cyrush cyrush added the feature label Aug 11, 2022
@cyrush cyrush added this to the 0.9.0 milestone Aug 11, 2022
@gunney1
Copy link
Author

gunney1 commented Aug 15, 2022

Sorry for the late response.

Yes, I think if you support wait_some and test_some AND std::list as the primary interface for passing data, it would be what we need.

@cyrush
Copy link
Member

cyrush commented Aug 16, 2022

@gunney1

Great, I started working on this, a question:

Your method checks and when the requests are fulfilled, and removes fulfilled ones those from the list.
(We can detect the send vs recv case and do the proper node cleanup)

For the general case - do we want to know which ones completed?

Do we want:

int wait_some(std::list<Request> &requests)

For rounds of communication.

or something like:

int wait_some(std::list<Request> &requests,
                       std::vector<int> &out_indices)

If we want to take some action on the Nodes between wait_some calls. It seems like this would be necessary. But if we shrink the list, the out_indices won't correspond to the adjusted list.

@gunney1
Copy link
Author

gunney1 commented Aug 18, 2022

It may be possible that I want to have the completed requests around for some reason, although I probably don't care about the out_indices. What do you think of moving the completed requests out of the input list and placed into another list to return? Something like

list<request> check_recv_requests(list<request>& inRequests)
{
  list<request> completed;
  list<request>::iterator aCompletedRequest = ...;
  completed.splice(completed.end(), inRequests, aCompletedRequest);
  ...
  return completed;
}

Would that work? Of course, you can also have both lists in the argument and return void.

@cyrush
Copy link
Member

cyrush commented Aug 18, 2022

Yes, that's a great idea! Style wise I would like to pass a ref to the list for the output as well.

@cyrush
Copy link
Member

cyrush commented Feb 17, 2023

@gunney1 Q: Did you look at

class CONDUIT_RELAY_API communicate_using_schema

I think it's different comm pattern than you are using -- but it provides a very nice general interface for isends and irecvs.

@gunney1
Copy link
Author

gunney1 commented Mar 2, 2023

@cyrush Very sorry for taking so long. After calling execute, how would I determine which non-blocking operation completed?

@cyrush cyrush modified the milestones: 0.9.0, 0.9.1 Jan 30, 2024
@gunney1
Copy link
Author

gunney1 commented Apr 26, 2024

@cyrush Just getting back to this. I'm still interested in this feature. I'd like to be able to specify whether I'm willing to wait for at least one to finish or if I'm just checking whether any has actually finished. Basically, whether to use MPI_Waitsome or MPI_Testsome. Testsome is used when I'm not willing to wait, but I will take whatever has completed. None if that's the case. Waitsome is used when I'm willing to wait the minimum time to get at least one completion. I don't necessarily want to wait for all to finish, because I want to interleave work and wait. In my implementation, I have parameter bool atLeastOne, that says whether I'm willing to wait for at least one to complete. The options would allow me to maximize the amount of local computation I can do while waiting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants