Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent new host overloading #1822

Merged
merged 29 commits into from Aug 16, 2018
Merged

Prevent new host overloading #1822

merged 29 commits into from Aug 16, 2018

Conversation

@pschoenfelder
Copy link
Contributor

@pschoenfelder pschoenfelder commented Jul 17, 2018

No description provided.

@pschoenfelder pschoenfelder requested a review from ssalinas Jul 17, 2018
Copy link
Member

@ssalinas ssalinas left a comment

I think the overall strategy of refreshing metrics in the offer scoring for things is good. Definitely want to make sure we only do it when metrics are too old (or maybe also after some number of additional tasks/percent of resources allocated since last metric?) and that we determine what kind of impact this has on scheduling speed

utilizationPerRequestId.values().forEach(usageManager::saveRequestUtilization);

if (configuration.isShuffleTasksForOverloadedSlaves()) {
shuffleTasksOnOverloadedHosts(overLoadedHosts);
}
}

public CompletableFuture<Void> getSlaveUsage(SingularitySlave slave) {
return usageCollectionSemaphore.call(() ->

This comment has been minimized.

@ssalinas

ssalinas Jul 17, 2018
Member

For the individual method, not sure we want to make it completely async like the larger ones. This method will likely not be called from the same context as the poller itself, so it should probably fall under a different semaphore (e.g. the offer scoring one) if we want it to be async

&& t.getMesosTask().getSlaveId().getValue().equals(offerHolder.getSlaveId()))) {
Optional<SingularitySlave> maybeSlave = slaveManager.getSlave(offerHolder.getSlaveId());
if (maybeSlave.isPresent()) {
usagePoller.getSlaveUsage(maybeSlave.get());

This comment has been minimized.

@ssalinas

ssalinas Jul 17, 2018
Member

will probably want to put this in currentSlaveUsagesBySlaveId after it's calculated. Calling this alone won't update the underlying values we pass to the scoring functions

This comment has been minimized.

@ssalinas

ssalinas Jul 17, 2018
Member

Another note, this would be a good thing to stick inside the completable future below. It's a good candidate to make async since we are now io bound on an api call and cpu bound on the calculations

Copy link
Member

@ssalinas ssalinas left a comment

Added a few comments on the async bits. The main point I'm not sure on is whether or not we are kicking off metric collect for use in a future offer scoring run, or gathering metrics synchronously for use in the current offer scoring run. I could see the first being faster for scoring, but we need to be careful that we aren't kicking off a bunch in a row for the same slave (i.e. because the first call hasn't finished yet). The second is more reliable in terms of making sure we have the most up to date metrics, but is slower since we have to wait for them

@@ -240,6 +238,59 @@ public SingularityMesosOfferScheduler(MesosConfiguration mesosConfiguration,
return offerHolders.values();
}

private Void buildScoringFuture(

This comment has been minimized.

@ssalinas

ssalinas Jul 20, 2018
Member

nit on naming, this method isn't actually building the future, it's a synchronous method for doing the actual scoring. I'd either rename this to calculateScore (or something like that), or move the supplyAsync inside this method and have it return the actual CompletableFuture<Void>

}
});
}
return null;

This comment has been minimized.

@ssalinas

ssalinas Jul 20, 2018
Member

So, I want to mention the choice here, could go either way on this. Currently this looks like the newly updated slave metrics will not be taken into account because we are returning here. Wouldn't we want to continue on to the scoring since we've gathered new metrics and put them in the map that is fed to calculateScore?

This comment has been minimized.

@pschoenfelder

pschoenfelder Jul 24, 2018
Author Contributor

Yes, I'll go with the latter option. For some reason I was thinking only the slave usage is necessary but probably safer to update the score too.

@@ -121,6 +129,10 @@ SingularitySlaveUsage getSlaveUsage() {
return diskInUseScore;
}

long getTimestamp() {

This comment has been minimized.

@ssalinas

ssalinas Jul 20, 2018
Member

you should be able to just do getSlaveUsage().getTimestamp() instead of having to store it in two places

utilizationPerRequestId.values().forEach(usageManager::saveRequestUtilization);

if (configuration.isShuffleTasksForOverloadedSlaves()) {
shuffleTasksOnOverloadedHosts(overLoadedHosts);
}
}

public CompletableFuture<SingularitySlaveUsage> getSlaveUsage(SingularitySlave slave) {

This comment has been minimized.

@ssalinas

ssalinas Jul 20, 2018
Member

The only thing I find as a code smell here is that now the offer scoring flow will rely on the usage semaphore and executor having enough permits/threads. Since within the offer scoring we are already in a block that is executed async, it may be worth calling collectSlaveUsage directly when used from the offer context, to avoid an additional layer of async work

} catch (Throwable t) {
String message = String.format("Could not get slave usage for host %s", slave.getHost());
LOG.error(message, t);
exceptionNotifier.notify(message, t);
}
return null; // TODO: is this really okay?

This comment has been minimized.

@ssalinas

ssalinas Jul 20, 2018
Member

I don't think this method is called anywhere else that expects a return value. Could always wrap in an optional to make it more explicit that the result might not be there.

usage.get().getTimestamp()
));
} else {
throw new RuntimeException(throwable);

This comment has been minimized.

@ssalinas

ssalinas Jul 30, 2018
Member

Where is the handling for this runtime exception? We currently aren't calling a get or join for the future created here which causes two issues for us:

  • calculateScore below may end up being called before the metrics recollection has a chance to run
  • If a RuntimeException is thrown here it is lost to us since there it will not propagate out of the future and no catch block currently logs it
new ConcurrentHashMap<>(),
usageManager.getRequestUtilizations(),
new ConcurrentHashMap<>(),
new AtomicLong(),

This comment has been minimized.

@ssalinas

ssalinas Jul 30, 2018
Member

given these arguments will be the same each time, does it make sense to create an overloaded method to handle those bits in UsagePoller instead?

@@ -180,7 +191,8 @@ public SingularityMesosOfferScheduler(MesosConfiguration mesosConfiguration,
mesosConfiguration.getScoreUsingSystemLoad(),
getMaxProbableUsageForSlave(activeTaskIds, requestUtilizations, offerHolders.get(usageWithId.getSlaveId()).getSanitizedHost()),
mesosConfiguration.getLoad5OverloadedThreshold(),
mesosConfiguration.getLoad1OverloadedThreshold()
mesosConfiguration.getLoad1OverloadedThreshold(),
usageWithId.getTimestamp()

This comment has been minimized.

@ssalinas

ssalinas Jul 30, 2018
Member

Second thoughts about the setup. Would it make sense to instead collect additional usages in this block instead? I'm realizing that the loop below would be called for each pending task. If we hit a case where collecting a particular slave usage is throwing exceptions or timing out, we will continue to recheck it for each pending task. Whereas, if we check in this block instead, we can just omit that up front and leave the block below as it was previously.

If we move the usage collection here, we'll likely want to convert this from parallelStream to a list of CompletableFutures like below to have better control over the concurrency

pschoenfelder and others added 5 commits Aug 1, 2018
@ssalinas ssalinas changed the title WIP: Prevent new host overloading Prevent new host overloading Aug 6, 2018
@ssalinas
Copy link
Member

@ssalinas ssalinas commented Aug 6, 2018

🚢

1 similar comment
@pschoenfelder
Copy link
Contributor Author

@pschoenfelder pschoenfelder commented Aug 6, 2018

🚢

@ssalinas ssalinas added the hs_stable label Aug 6, 2018
if (slaveMetricsSnapshot != null) {
memoryMbReservedOnSlave = (long) slaveMetricsSnapshot.getSlaveMemUsed();
cpuReservedOnSlave = slaveMetricsSnapshot.getSlaveCpusUsed();
diskMbReservedOnSlave = (long) slaveMetricsSnapshot.getSlaveDiskUsed();

This comment has been minimized.

@pschoenfelder

pschoenfelder Aug 6, 2018
Author Contributor

Are these actually mapped correctly? I know we said /slave/* maps to "reserved", but it also says "used"

if (slaveMetricsSnapshot != null) {
memoryMbReservedOnSlave = (long) slaveMetricsSnapshot.getSlaveMemUsed();
cpuReservedOnSlave = slaveMetricsSnapshot.getSlaveCpusUsed();
diskMbReservedOnSlave = (long) slaveMetricsSnapshot.getSlaveDiskUsed();

This comment has been minimized.

@pschoenfelder

pschoenfelder Aug 6, 2018
Author Contributor

Also, casting doubles to long so we don't have to change the usage pojos? Seems smelly to me

This comment has been minimized.

@ssalinas

ssalinas Aug 6, 2018
Member

Open to using doubles everywhere. Can't remember which level it was where the longs were required

}

SingularityTaskUsage latestUsage = getUsage(taskUsage);
memoryBytesUsedOnSlave += latestUsage.getMemoryTotalBytes();

This comment has been minimized.

@ssalinas

ssalinas Aug 7, 2018
Member

Will these end up being any different than systemMemTotalBytes - systemMemFreeBytes ? I'm wondering if there isn't actually a need to total everything up for these cases. getUsage is still a zk call and it'd be nice to eliminate if we can. The totaled up by task values and the slave-reported totals seem fairly similar. I'm not certain why we have both in the POJO TBH, would have to look through commit history. (Or maybe @darcatron knows since he wrote the original versions of usage collection?)

SingularitySlaveUsage slaveUsage = new SingularitySlaveUsage(
cpuReservedOnSlave, cpuReservedOnSlave, cpusTotal,
memoryMbReservedOnSlave, memoryMbReservedOnSlave, memoryMbTotal,
diskMbReservedOnSlave, diskMbReservedOnSlave, diskMbTotal,

This comment has been minimized.

@pschoenfelder

pschoenfelder Aug 7, 2018
Author Contributor

Seems pretty funky to send in these same variables, but that's what reducing everything to use the metric snapshot resulted in (see the plain collectSlaveUsage below to compare).

This comment has been minimized.

@ssalinas

ssalinas Aug 7, 2018
Member

I guess the 'clean' way to do it could be to have a separate POJO with a more minimal set of fields, which gets used in the scheduling. The fuller class, which would extend that, has all of the fields is used in the poller + api

e.g. SingularitySlaveUsage extends SingularitySimpleSlaveUsage {}

ssalinas and others added 7 commits Aug 8, 2018
 Skip hosts which do not have valid metrics during offer processing
@ssalinas ssalinas added this to the 0.21.0 milestone Aug 9, 2018
@baconmania
Copy link
Contributor

@baconmania baconmania commented Aug 9, 2018

🚢

@ssalinas ssalinas merged commit 29c7199 into master Aug 16, 2018
0 of 2 checks passed
0 of 2 checks passed
continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/travis-ci/push The Travis CI build could not complete due to an error
Details
@ssalinas ssalinas deleted the new-host-overloading branch Aug 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.