Skip to content

Conversation

@devreal
Copy link
Member

@devreal devreal commented Apr 28, 2017

I tried to run our test suite on multiple nodes on our cluster and came across some bugs that don't seem to manifest as errors if running on one node only. In particular, this PR contains fixes to the handling of MPI_Request objects and an issue in our shared memory optimization.

Additionally, this PR adds dash::generate_with_index to enable parallel generation of container values in cases where the generated value depends on the index of the element. This function is applied to two tests in which the serial generation was really slow on 8 nodes.

Copy link
Member

@fmoessbauer fmoessbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. See my comments inline.

DART_LOG_TRACE("dart_waitall_local: cannot free request %zu "
"mpi_sta[%zu] = %d (%s)",
r_n,
if (handle[i]->request) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that this even worked in the past...

// re-use previously allocated memory
if (segment->baseptr == NULL) {
segment->baseptr = malloc(sizeof(char *) * team_data->sharedmem_nodesize);
segment->baseptr = calloc(team_data->sharedmem_nodesize, sizeof(char *));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calloc should not be necessary here, but who cares ;)

typename ElementType,
class PatternType,
class UnaryFunction >
void generate_with_index(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I missed this function quite often.

/// Generator function
UnaryFunction gen) {
/// Global iterators to local index range:
auto index_range = dash::local_index_range(first, last);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use the new range based views, as this does not work with multiple local ranges AFAIK. Maybe @fuchsto can assist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any pointer on how to use them? I'm not sure range based views and I have crossed paths so far...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I even introduced you personally last time in Munich!
Anyways: yes, I will add documentation.
For usage you can refer to quite an abundance of tests in dash/test/view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't even know where to start looking. I guess I could crawl through a host of undocumented code, still not knowing where to start or how to put the pieces together...

auto & pattern = first.pattern();
auto first_offset = first.pos();
// Iterate local index range:
for (auto lindex = lbegin_index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem here.

@fuchsto fuchsto merged commit dd2f747 into development Apr 30, 2017
@devreal devreal deleted the bug-dart-multinode branch May 11, 2017 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants