-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better task balancing #1482
Better task balancing #1482
Conversation
double score = score(offerHolder, stateCache, tasksPerOfferPerRequest, taskRequestHolder, getUsagesPerRequestTypePerSlave()); | ||
if (score > 0) { | ||
// todo: can short circuit here if score is high enough | ||
scorePerOffer.put(offerHolder, score); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought we might want to have a value that's definitely good enough to just accept instead of continue evaluating
I like the idea of a scoring system overall. Some comments on specific logic things I'll make later since this is still WIP. Overall comments though:
|
|
||
taskManager.createTaskAndDeletePendingTask(zkTask); | ||
private double minScore(SingularityTaskRequest taskRequest) { | ||
double minScore = 0.80; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be adjusted as necessary. I thought an 80% match might be a good starting point, but we could def reduce it
This has been updated as follows: BeforeAll offers would be scored and of all scores above AfterAll offers are still scored, but the minimum score acceptable depends on the task's overdue milliseconds and number of offers a task has not accepted. Currently, the overdue time and offer count have a max of The offer attempts count is any offer that was considered, not just offers that scored too low. So an offer that didn't have enough resources to satisfy the task will still be counted. I picked these numbers based on generalizations. We will likely have to tune these |
return SlaveMatchState.SLAVE_ATTRIBUTES_DO_NOT_MATCH; | ||
} | ||
|
||
final SlavePlacement slavePlacement = taskRequest.getRequest().getSlavePlacement().or(configuration.getDefaultSlavePlacement()); | ||
|
||
if (!taskRequest.getRequest().isRackSensitive() && slavePlacement == SlavePlacement.GREEDY) { | ||
// todo: account for this or let this behavior continue? | ||
return SlaveMatchState.NOT_RACK_OR_SLAVE_PARTICULAR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know if we would need to account for any rack sensitivity outside of the existing checks done in the scheduler
@ssalinas Got some specific logic tests in. Let me know if there's a piece I missed that should be added. I'm going to continue to try to get some full logic tests in as well |
import com.fasterxml.jackson.annotation.JsonCreator; | ||
import com.fasterxml.jackson.annotation.JsonProperty; | ||
|
||
public class SingularitySlaveUsage { | ||
|
||
public static final String CPU_USED = "cpusUsed"; | ||
public static final String MEMORY_BYTES_USED = "memoryRssBytes"; | ||
public static final long BYTES_PER_MEGABYTE = 1024L * 1024L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was about to comment that there must be some type of easy class/enum for this like there is with TimeUnit
, but apparently there isn't... weird...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was sad to see there wasn't a lib method for this too 😢
} | ||
|
||
public double getCpusUsedForRequestType(RequestType type) { | ||
return usagePerRequestType.get(type).get(CPU_USED).doubleValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe another enum is more appropriate for CPU_USED/MEMORY_BYTES_USED ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, mapping would be clearer then too 👍
return usagePerRequestType; | ||
} | ||
|
||
public double getCpusUsedForRequestType(RequestType type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this and getMemBytesUsedForRequestType are unused methods
} | ||
|
||
@Override | ||
public void runActionOnPoll() { | ||
final long now = System.currentTimeMillis(); | ||
Map<RequestType, Map<String, Number>> usagesPerRequestType = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we want this to be per-slave, not overall?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be per slave. This poller loops through each slave and creates a new SingularitySlaveUsage
with the stats for that slave
Map<SingularityOfferHolder, Double> scorePerOffer = new HashMap<>(); | ||
double minScore = minScore(taskRequestHolder.getTaskRequest(), offerMatchAttemptsPerTask, System.currentTimeMillis()); | ||
|
||
LOG.info("Minimum score for task {} is {}", taskRequestHolder.getTaskRequest().getPendingTask().getPendingTaskId().getId(), minScore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably can be lower than info level here
continue; | ||
} | ||
|
||
double score = score(offerHolder, stateCache, tasksPerOfferPerRequest, taskRequestHolder, getSlaveUsage(currentSlaveUsages, offerHolder.getOffer().getSlaveId().getValue())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for clarity, maybe something like 'hostScore' here? The score is for the particular slave, not necessarily about the offer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the naming here. We do look at the slave's utilization to score the offer, but we are still scoring the offer itself since offers aren't uniquely 1:1 for a slave (e.g. 2 offers for the same slave).
The slave utilization weight will be the same for all offers on the same slave, but the offer resources will be different per offer. So, it seems to me that we're scoring the offer in this class rather than the slave itself
@VisibleForTesting | ||
double minScore(SingularityTaskRequest taskRequest, Map<String, Integer> offerMatchAttemptsPerTask, long now) { | ||
double minScore = 0.80; | ||
int maxOfferAttempts = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another that would be nice to have ocnfigurable
final SingularityTask task = mesosTaskBuilder.buildTask(offerHolder.getOffer(), offerHolder.getCurrentResources(), taskRequest, taskRequestHolder.getTaskResources(), taskRequestHolder.getExecutorResources()); | ||
@VisibleForTesting | ||
double score(Offer offer, SingularityTaskRequest taskRequest, Optional<SingularitySlaveUsageWithId> maybeSlaveUsage) { | ||
double requestTypeCpuWeight = 0.20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make these configurable, maybe another object in the configuration yaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I was in progress on this (now committed), but I kept the fields under SingularityConfiguration since I saw a lot of other stuff in there as well (e.g. caching). We could pull it into an OfferConfiguration file if you think that'd be better for organization
if (matchesResources && slaveMatchState.isMatchAllowed()) { | ||
final SingularityTask task = mesosTaskBuilder.buildTask(offerHolder.getOffer(), offerHolder.getCurrentResources(), taskRequest, taskRequestHolder.getTaskResources(), taskRequestHolder.getExecutorResources()); | ||
@VisibleForTesting | ||
double score(Offer offer, SingularityTaskRequest taskRequest, Optional<SingularitySlaveUsageWithId> maybeSlaveUsage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go over this one in-person, think we are getting close, just easier to chat than typing a novel in github ;)
@@ -31,22 +32,30 @@ | |||
|
|||
private static final String SLAVE_PATH = ROOT_PATH + "/slaves"; | |||
private static final String TASK_PATH = ROOT_PATH + "/tasks"; | |||
private static final String USAGE_SUMMARY_PATH = ROOT_PATH + "/summary"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssalinas can you take a look at this piece? Just wanna make sure I got this set up right
@@ -123,6 +140,10 @@ public SingularityCreateResult saveSpecificTaskUsage(String taskId, SingularityT | |||
return save(getSpecificTaskUsagePath(taskId, usage.getTimestamp()), usage, taskUsageTranscoder); | |||
} | |||
|
|||
public SingularityCreateResult saveSpecificClusterUtilization(SingularityClusterUtilization utilization) { | |||
return save(getSpecificClusterUtilizationPath(utilization.getTimestamp()) , utilization, clusterUtilizationTranscoder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any point at which we would want a history of these? If we don't need a history of summaries, we might as well save
the data to the summary path (without timestamp) and just overwrite when there is new data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a reason to at this point. Saw the others were saving them so it seemed okay since we only save up to 5 points. I think it's safe to just have the one point. Keeps it simpler
@@ -95,6 +104,10 @@ private String getCurrentSlaveUsagePath(String slaveId) { | |||
return ZKPaths.makePath(getSlaveUsagePath(slaveId), CURRENT_USAGE_NODE_KEY); | |||
} | |||
|
|||
private String getSpecificClusterUtilizationPath(long timestamp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why SpecificClusterUtilization
instead of just getUsageSummaryPath
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "specific" keyword was the pattern the other items were using so I kept that since the timestamp specified which one to grab. I can drop the historical data and then rename it
} catch (InvalidSingularityTaskIdException e) { | ||
LOG.error("Couldn't get SingularityTaskId for {}", taskUsage); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this try catch for the potentially incorrect taskId types
BeforeMin score was configurable by the user and would be reduced as tasks were delayed and offer match attempts were rejected. AfterMin score is calculated based on the overall utilization of the cluster. |
@@ -130,12 +131,13 @@ public SingularityMesosOfferScheduler(MesosConfiguration mesosConfiguration, | |||
|
|||
while (!pendingTaskIdToTaskRequest.isEmpty() && addedTaskInLastLoop && canScheduleAdditionalTasks(taskCredits)) { | |||
addedTaskInLastLoop = false; | |||
double maxTaskMillisPastDue = maxTaskMillisPastDue(SingularityScheduledTasksInfo.getInfo(taskManager.getPendingTasks(), configuration.getDeltaAfterWhichTasksAreLateMillis()).getMaxTaskLagMillis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need a better name for this variable. It's the max possible task lag before we decide we're going to take any offer we can get. I feel maxTaskMillisPastDue
has some confusing overlap with maxTaskLag
; which is the current highest lag time for pending tasks
BeforeWe had a configured value for the max possible task lag before we accepted any offer that matched a task. The default was 5 minutes. AfterThe max possible task lag ( A task with no lag will result in a 3 min Since we reduce the It's also important to note that we will actually start accepting any matching offer (minScore == 0) before the |
@darcatron I think this one is good to merge. Going to get this in master so we can continue to build off of the resource usage updates without endless merge conflicts 👍 . Thanks for all the work on this one |
🚧 This is a WIP for task balancing.
The general idea is that a pendingTask will calculate the best offer by scoring each offer and choosing the best score. Right now, the top score is 1.00 assuming the slave has nothing on it.
Score is weighted based on 2 real criteria: current resource usage for the same request type and current resource availability. I choose weights based on what I thought might be important, but it can be changed. I thought mem would be the most important so I weighted it a bit higher:
This only scores based on what's running on the slave. It does not look at the acceptedPendingTasks for an offer.
@ssalinas