New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better task balancing #1482

Merged
merged 73 commits into from Jun 8, 2017
Commits
Jump to file or symbol
Failed to load files and symbols.
+884 鈭141
Diff settings

Always

Just for now

@@ -171,13 +171,13 @@ These settings should live under the "mesos" field inside the root configuration
#### Resource Limits ####
| Parameter | Default | Description | Type |
|-----------|---------|-------------|------|
| defaultCpus | 1 | Number of CPUs to request for a task if none are specified | int |
| defaultCpus | 1 | Number of CPUs to request for a task if none are specified | int |
| defaultMemory | 64 | MB of memory to request for a task if none is specified | int |
| maxNumInstancesPerRequest | 25 | Max instances (tasks) to allow for a request (requests using over this will return a 400) | int |
| maxNumCpusPerInstance | 50 | Max number of CPUs allowed on a given task | int |
| maxNumCpusPerRequest | 900 | Max number of CPUs allowed for a given request (cpus per task * task instance) | int |
| maxMemoryMbPerInstance | 24000 | Max MB of memory allowed on a given task | int |
| maxMemoryMbPerRequest | 450000 | Max MB of memory allowed for a given request (memoryMb per task * task instances) | int |
| maxNumCpusPerInstance | 50 | Max number of CPUs allowed on a given task | int |
| maxNumCpusPerRequest | 900 | Max number of CPUs allowed for a given request (cpus per task * task instance) | int |
| maxMemoryMbPerInstance | 24000 | Max MB of memory allowed on a given task | int |
| maxMemoryMbPerRequest | 450000 | Max MB of memory allowed for a given request (memoryMb per task * task instances) | int |
#### Racks ####
| Parameter | Default | Description | Type |
@@ -189,7 +189,21 @@ These settings should live under the "mesos" field inside the root configuration
| Parameter | Default | Description | Type |
|-----------|---------|-------------|------|
| slaveHttpPort | 5051 | The port to talk to slaves on | int |
| slaveHttpsPort | absent | The HTTPS port to talk to slaves on | Integer (Optional) |
| slaveHttpsPort | absent | The HTTPS port to talk to slaves on | Integer (Optional) |
#### Offers ####
| Parameter | Default | Description | Type |
|-----------|---------|-------------|------|
| minOfferScore | 0.80 | The starting minimum score a task will accept for a mesos offer. The best possible offer score is 1.00 | double |
| maxOfferAttemptsPerTask | 20 | The max number of matching attempts a task will take without accepting a possible offer | int |
| maxMillisPastDuePerTask | 600000 (10 min) | The max milliseconds a task can be past due when scoring an offer | long |
| longRunningUsedCpuWeightForOffer | 0.40 | The weight long running tasks' cpu utilization carries when scoring an offer (must add up to 1 with longRunningUsedMemWeightForOffer) | double |
| longRunningUsedMemWeightForOffer | 0.60 | The weight long running tasks' memory utilization carries when scoring an offer (must add up to 1 with longRunningUsedCpuWeightForOffer) | double |
| freeCpuWeightForOffer | 0.40 | The weight the slave's free cpu carries when scoring an offer (must add up to 1 with freeMemWeightForOffer) | double |
| freeMemWeightForOffer | 0.60 | The weight the slave's free memory carries when scoring an offer (must add up to 1 with freeCpuWeightForOffer) | double |
| defaultOfferScoreForMissingUsage | 0.10 | The default offer score used for offers without utilization metrics | double |
| considerNonLongRunningTaskLongRunningAfterRunningForSeconds | 21600 (6 hours) | If a non long running task runs, on average, this long or more, it's considered a long running task | long |
| maxNonLongRunningUsedResourceWeight | 0.50 | The max weight long running tasks' utilization can carry when scoring a non long running task for an offer | double
## Database ##
@@ -1,22 +1,42 @@
package com.hubspot.singularity;
import java.util.Map;
import com.google.common.base.Optional;
import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonProperty;
public class SingularitySlaveUsage {
public enum ResourceUsageType {
CPU_USED, MEMORY_BYTES_USED
}
public static final long BYTES_PER_MEGABYTE = 1024L * 1024L;

This comment has been minimized.

@ssalinas

ssalinas Apr 20, 2017

Member

was about to comment that there must be some type of easy class/enum for this like there is with TimeUnit, but apparently there isn't... weird...

@ssalinas

ssalinas Apr 20, 2017

Member

was about to comment that there must be some type of easy class/enum for this like there is with TimeUnit, but apparently there isn't... weird...

This comment has been minimized.

@darcatron

darcatron Apr 20, 2017

Contributor

Yeah, I was sad to see there wasn't a lib method for this too 馃槩

@darcatron

darcatron Apr 20, 2017

Contributor

Yeah, I was sad to see there wasn't a lib method for this too 馃槩

private final long memoryBytesUsed;
private final int numTasks;
private final long timestamp;
private final double cpusUsed;
private final Optional<Long> memoryMbTotal;
private final Optional<Double> cpuTotal;
private final Map<ResourceUsageType, Number> longRunningTasksUsage;
@JsonCreator
public SingularitySlaveUsage(@JsonProperty("memoryBytesUsed") long memoryBytesUsed, @JsonProperty("timestamp") long timestamp, @JsonProperty("cpusUsed") double cpusUsed,
@JsonProperty("numTasks") int numTasks) {
public SingularitySlaveUsage(@JsonProperty("memoryBytesUsed") long memoryBytesUsed,
@JsonProperty("timestamp") long timestamp,
@JsonProperty("cpusUsed") double cpusUsed,
@JsonProperty("numTasks") int numTasks,
@JsonProperty("memoryMbTotal") Optional<Long> memoryMbTotal,
@JsonProperty("cpuTotal") Optional<Double> cpuTotal,
@JsonProperty("longRunningTasksUsage") Map<ResourceUsageType, Number> longRunningTasksUsage) {
this.memoryBytesUsed = memoryBytesUsed;
this.timestamp = timestamp;
this.cpusUsed = cpusUsed;
this.numTasks = numTasks;
this.memoryMbTotal = memoryMbTotal;
this.cpuTotal = cpuTotal;
this.longRunningTasksUsage = longRunningTasksUsage;
}
public long getMemoryBytesUsed() {
@@ -35,6 +55,22 @@ public int getNumTasks() {
return numTasks;
}
public Optional<Long> getMemoryBytesTotal() {
return memoryMbTotal.isPresent() ? Optional.of(memoryMbTotal.get() * BYTES_PER_MEGABYTE) : Optional.absent();
}
public Optional<Long> getMemoryMbTotal() {
return memoryMbTotal.isPresent() ? Optional.of(memoryMbTotal.get()) : Optional.absent();
}
public Optional<Double> getCpuTotal() {
return cpuTotal;
}
public Map<ResourceUsageType, Number> getLongRunningTasksUsage() {
return longRunningTasksUsage;
}
@Override
public String toString() {
return "SingularitySlaveUsage [memoryBytesUsed=" + memoryBytesUsed + ", numTasks=" + numTasks + ", timestamp=" + timestamp + ", cpusUsed=" + cpusUsed + "]";
@@ -8,7 +8,7 @@
private final String slaveId;
public SingularitySlaveUsageWithId(SingularitySlaveUsage usage, String slaveId) {
super(usage.getMemoryBytesUsed(), usage.getTimestamp(), usage.getCpusUsed(), usage.getNumTasks());
super(usage.getMemoryBytesUsed(), usage.getTimestamp(), usage.getCpusUsed(), usage.getNumTasks(), usage.getMemoryMbTotal(), usage.getCpuTotal(), usage.getLongRunningTasksUsage());
this.slaveId = slaveId;
}
@@ -188,6 +188,26 @@
private int maxTasksPerOfferPerRequest = 0;
private double minOfferScore = 0.80;
private int maxOfferAttemptsPerTask = 20;
private long maxMillisPastDuePerTask = TimeUnit.MINUTES.toMillis(10);

This comment has been minimized.

@ssalinas

ssalinas Apr 27, 2017

Member

thinking something shorter would be a better default. We can check in testing, but I think after even 5 minutes we shouldn't be worrying about score and just scheduling asap

@ssalinas

ssalinas Apr 27, 2017

Member

thinking something shorter would be a better default. We can check in testing, but I think after even 5 minutes we shouldn't be worrying about score and just scheduling asap

private double longRunningUsedCpuWeightForOffer = 0.40;
private double longRunningUsedMemWeightForOffer = 0.60;
private double freeCpuWeightForOffer = 0.40;
private double freeMemWeightForOffer = 0.60;
private double defaultOfferScoreForMissingUsage = 0.10;
private long considerNonLongRunningTaskLongRunningAfterRunningForSeconds = TimeUnit.HOURS.toSeconds(6);
private double maxNonLongRunningUsedResourceWeight = 0.50;
private int maxRequestIdSize = 100;
private int maxUserIdSize = 100;
@@ -618,6 +638,45 @@ public int getMaxTasksPerOfferPerRequest() {
return maxTasksPerOfferPerRequest;
}
public double getMinOfferScore() {
return minOfferScore;
}
public int getMaxOfferAttemptsPerTask() {
return maxOfferAttemptsPerTask;
}
public long getMaxMillisPastDuePerTask() {
return maxMillisPastDuePerTask;
}
public double getLongRunningUsedCpuWeightForOffer() {
return longRunningUsedCpuWeightForOffer;
}
public double getLongRunningUsedMemWeightForOffer() {
return longRunningUsedMemWeightForOffer;
}
public double getFreeCpuWeightForOffer() {
return freeCpuWeightForOffer;
}
public double getFreeMemWeightForOffer() {
return freeMemWeightForOffer;
}
public double getDefaultOfferScoreForMissingUsage() {
return defaultOfferScoreForMissingUsage;
}
public long getConsiderNonLongRunningTaskLongRunningAfterRunningForSeconds() {
return considerNonLongRunningTaskLongRunningAfterRunningForSeconds;
}
public double getMaxNonLongRunningUsedResourceWeight() {
return maxNonLongRunningUsedResourceWeight;
}
public MesosConfiguration getMesosConfiguration() {
return mesosConfiguration;
}
@@ -978,6 +1037,55 @@ public void setMaxTasksPerOfferPerRequest(int maxTasksPerOfferPerRequest) {
this.maxTasksPerOfferPerRequest = maxTasksPerOfferPerRequest;
}
public SingularityConfiguration setMinOfferScore(double minOfferScore) {
this.minOfferScore = minOfferScore;
return this;
}
public SingularityConfiguration setMaxOfferAttemptsPerTask(int maxOfferAttemptsPerTask) {
this.maxOfferAttemptsPerTask = maxOfferAttemptsPerTask;
return this;
}
public SingularityConfiguration setMaxMillisPastDuePerTask(long maxMillisPastDuePerTask) {
this.maxMillisPastDuePerTask = maxMillisPastDuePerTask;
return this;
}
public SingularityConfiguration setLongRunningUsedCpuWeightForOffer(double longRunningUsedCpuWeightForOffer) {
this.longRunningUsedCpuWeightForOffer = longRunningUsedCpuWeightForOffer;
return this;
}
public SingularityConfiguration setLongRunningUsedMemWeightForOffer(double longRunningUsedMemWeightForOffer) {
this.longRunningUsedMemWeightForOffer = longRunningUsedMemWeightForOffer;
return this;
}
public SingularityConfiguration setFreeCpuWeightForOffer(double freeCpuWeightForOffer) {
this.freeCpuWeightForOffer = freeCpuWeightForOffer;
return this;
}
public SingularityConfiguration setFreeMemWeightForOffer(double freeMemWeightForOffer) {
this.freeMemWeightForOffer = freeMemWeightForOffer;
return this;
}
public SingularityConfiguration setDefaultOfferScoreForMissingUsage(double defaultOfferScoreForMissingUsage) {
this.defaultOfferScoreForMissingUsage = defaultOfferScoreForMissingUsage;
return this;
}
public SingularityConfiguration setConsiderNonLongRunningTaskLongRunningAfterRunningForSeconds(long considerNonLongRunningTaskLongRunningAfterRunningForSeconds) {
this.considerNonLongRunningTaskLongRunningAfterRunningForSeconds = considerNonLongRunningTaskLongRunningAfterRunningForSeconds;
return this;
}
public SingularityConfiguration setMaxNonLongRunningUsedResourceWeight(double maxNonLongRunningUsedResourceWeight) {
this.maxNonLongRunningUsedResourceWeight = maxNonLongRunningUsedResourceWeight;
return this;
}
public void setMesosConfiguration(MesosConfiguration mesosConfiguration) {
this.mesosConfiguration = mesosConfiguration;
}
@@ -158,10 +158,9 @@ public int compare(SingularityTaskUsage o1, SingularityTaskUsage o2) {
return children;
}
public List<SingularitySlaveUsageWithId> getAllCurrentSlaveUsage() {
List<String> slaves = getSlavesWithUsage();
List<String> paths = new ArrayList<>(slaves.size());
for (String slaveId : slaves) {
public List<SingularitySlaveUsageWithId> getCurrentSlaveUsages(List<String> slaveIds) {
List<String> paths = new ArrayList<>(slaveIds.size());
for (String slaveId : slaveIds) {
paths.add(getCurrentSlaveUsagePath(slaveId));
}
@@ -174,6 +173,10 @@ public int compare(SingularityTaskUsage o1, SingularityTaskUsage o2) {
return slaveUsageWithIds;
}
public List<SingularitySlaveUsageWithId> getAllCurrentSlaveUsage() {
return getCurrentSlaveUsages(getSlavesWithUsage());
}
public List<Long> getSlaveUsageTimestamps(String slaveId) {
List<String> timestampStrings = getChildren(getSlaveUsageHistoryPath(slaveId));
List<Long> timestamps = new ArrayList<>(timestampStrings.size());
Oops, something went wrong.
ProTip! Use n and p to navigate between commits in a pull request.