allreduce_robust.cc: Allow num_global_replica to be 0 #38

Merged
merged 1 commit into from Nov 24, 2016

Projects

None yet

2 participants

@AbdealiJK
Contributor

In some cases, users may not want to have any global replica of
the data being broadcasted/all-reduced. In such cases, set the
result_buffer_round to -1 as a flag that this is not necessary
and check for it.

@tqchen
Member
tqchen commented Nov 23, 2016

please do a rebase so that travis can restarts checks, there a few lint errors that you might want to fix

@AbdealiJK
Contributor

@tqchen Do you mean the guard for the header files ?
I can fix them if you like (I thought you might have prefered it in the way it is there right now. Hence left it as is)

src/allreduce_robust.cc
@@ -33,7 +33,11 @@ AllreduceRobust::AllreduceRobust(void) {
}
void AllreduceRobust::Init(int argc, char* argv[]) {
AllreduceBase::Init(argc, argv);
- result_buffer_round = std::max(world_size / num_global_replica, 1);
+ if ( num_global_replica == 0 ) {
@tqchen
tqchen Nov 23, 2016 Member

extra space in if

@AbdealiJK
AbdealiJK Nov 23, 2016 Contributor

Ah sorry about that. Seems like cpplint doesn't catch it so I didn't realize.

@AbdealiJK AbdealiJK allreduce_robust.cc: Allow num_global_replica to be 0
In some cases, users may not want to have any global replica of
the data being broadcasted/all-reduced. In such cases, set the
result_buffer_round to -1 as a flag that this is not necessary
and check for it.
8c99280
@AbdealiJK
Contributor

@tqchen Wanted to confirm here - setting the num_global_replica to 0 shouldn't break anything else in the rest of the library right ? i.e. That is a valid config if I want no replicas of my data to be stored ?

@tqchen
Member
tqchen commented Nov 23, 2016

I think it is valid, except that there is no fault tolerance now

@AbdealiJK
Contributor

But there is still fault tolerance inside a single broadcast call and a single all reduce call. (As there is a loop inside Which checks and reinitiates the broadcast/all reduce) Which is what I want.

@tqchen
Member
tqchen commented Nov 24, 2016

I see, that is correct

@tqchen tqchen merged commit 21b5e12 into dmlc:master Nov 24, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment