-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust prewarm container dynamically #4871
Adjust prewarm container dynamically #4871
Conversation
ansible/files/runtimes.json
Outdated
@@ -58,7 +58,8 @@ | |||
"stemCells": [ | |||
{ | |||
"count": 2, | |||
"memory": "256 MB" | |||
"memory": "256 MB", | |||
"ttlMinutes": 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding more fine granularity by changing the unit to second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unit of second is very small, and the unit of hour is too big.
so i chosed the unit of minute (second < minute < hour).
Does it make sense if we use unit of second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not FiniteDuration? So that operators can make the value whatever is appropriate for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tysonnorris ,got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already changed to Duration
} | ||
} | ||
for ((container, data) <- containers) { | ||
if (Duration.between(data.lastUsed, Instant.now).compareTo(Duration.ofMinutes(ttlMinutes)) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no activation for prewarm kind for the given TTL time, it will always cause a cold start next time.
Also, even if we can add more prewarm containers when cold starts happen I think it may not fit well for some workloads. If loads spike for some runtime kind, it will cause a lot of cold starts. Also, it will try to create more prewarm containers at the same time. As we know, docker operation is quite expensive and it will end up with a huge number of container creation along with long wait-time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree with you.
I introduced a new mechanism to avoid the cold start
added threshold
, increment
in runtime.json
after prewarmed containers are deleted due to unused.
if action for that kind/memory's activation number >= threshold
in previous one minute, create increment
prewarmed containers.
refer to the new added commit: 8c36f33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@style95 already fixed
2fd4dd1
to
8c36f33
Compare
.foreach { _ => | ||
val time = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(Calendar.getInstance().getTime) | ||
val key = s"${kind},${memoryLimit.toMB},${time}" | ||
coldStartCount.getOrElseUpdate(key, new AtomicInteger(0)).getAndIncrement() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need AtomicInteger? Container allocation should happen in actor message processing, which is single threaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed
@@ -80,8 +85,49 @@ class ContainerPool(childFactory: ActorRefFactory => ActorRef, | |||
//periodically emit metrics (don't need to do this for each message!) | |||
context.system.scheduler.schedule(30.seconds, 10.seconds, self, EmitMetrics) | |||
|
|||
// Key is `kind,memory,time`, value is the number of cold Start in minute | |||
val coldStartCount = TrieMap[String, AtomicInteger]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a var immutable.Map
instead, i think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I would probably not create a separate map here, but rather add expiration:Option[Deadline]
to PreWarmedData
class. Then you can track expiration AND more easily filter to find expired prewarms.
EDIT: sorry, to be clear: the expiration is required to remove unused ones, but there is a separate issue that needs to be tracked which is prewam "miss", where the config matches prewarm config, but no prewarm was available. For the miss, you can use an immutable.Map that is reset every minute (or configurable time period), without tracking timestamps or expirations there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed
Please also update the |
ansible/files/runtimes.json
Outdated
"memory": "256 MB", | ||
"ttlMinutes": 10, | ||
"threshold": 5, | ||
"increment": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this scenario, I think "count" behaves like "initialCount" -right? so now I think we should also have a "maxCount" so that additional prewarms cannot be created past the maxCount limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider renaming "count" to "initialCount", but I'm not sure this is an issue for backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed
98e06dd
to
cf30f7c
Compare
@style95 @tysonnorris |
backfillPrewarms(true) | ||
|
||
// check periodically every 1 minute, delete prewarmed container if unused for sometime | ||
context.system.scheduler.schedule(10.seconds, 1.minute, self, DeleteUnusedPrewarmedContainer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the main idea is removing unused prewarm containers every 1 minute and create more prewarm containers based on the usage during the last 1 minute.
I think it is a good starting point.
This is a kind of heuristic approach expecting the future based on history
One thing to consider is, we can't exactly expect the required prewarm types.
Any kind of activation can come at any time.
If an action is sparsely invoked or invoked with a bigger interval than prewarm TTL, it will always trigger a cold-start.
So how about introducing a minimum limit as well?
We can configure the minimum number of prewarm containers while controlling the ratio of them based on the usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding So how about introducing a minimum limit as well?
You mean add minCount
in runtime.json
Regarding We can configure the minimum number of prewarm containers while controlling the ratio of them based on the usage.
What's mean? the minimum numbe can be changed on some cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that without a minimum, the prewarms may all go to zero, which is what you need if you have blackbox dedicated invokers if you don't want any prewarms running there.
So the stemcell config could have:
initialCount : this should probably be whatever the value is used today for number of prewarms of that config.
minCount : 0 is required, if you want to prevent case where unused prewarms are wasting resources, but it implies that you will get a cold start for that config on that invoker.
maxCount : this is an upper limit on max number of prewarms for that config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future work, I would also consider:
- Does this invoker support only blackbox? If so, it can safely scale down to 0 prewarms, since blackbox are never prewarmed.
- Can new prewarm configs be completely inferred? So that if lots of actions of a particular config arrive, they eventually get benefits of having prewarms.
- Can blackbox then be allowed to have prewarms, if they receive enough traffic at this invoker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already added minCount
in runtimes.json
calendar.add(Calendar.MINUTE, -1) | ||
val lastDate = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(calendar.getTime) | ||
val key = s"${kind},${memory.toMB},${lastDate}" | ||
coldStartCount.get(key) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use coldStartCount.remove(key)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we changed from immutable.Map
to TrieMap
, we can use coldStartCount.remove
But here, coldStartCount can only be accessed via single thread due to actor, i think use immutable.Map
is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for immutable.Map
val calendar = Calendar.getInstance() | ||
calendar.add(Calendar.MINUTE, -1) | ||
val lastDate = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(calendar.getTime) | ||
val key = s"${kind},${memory.toMB},${lastDate}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to use a typed key such as a case class rather than a String key with multiple fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -80,8 +86,59 @@ class ContainerPool(childFactory: ActorRefFactory => ActorRef, | |||
//periodically emit metrics (don't need to do this for each message!) | |||
context.system.scheduler.schedule(30.seconds, 10.seconds, self, EmitMetrics) | |||
|
|||
// Key is `kind,memory,time`, value is the number of cold Start in minute | |||
var coldStartCount = immutable.Map.empty[String, Int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this map can be used in multiple threads, should we use concurrent.map
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, really, i think coldStartCount
can only be accessed in single thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immutable.Map should be fine if it is only modified during actor message processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok I thought it was being accessed by a scheduler and actor thread at the same time.
But it was not.
// prewarmed container got removed | ||
case PreWarmedContainerRemoved => | ||
prewarmedPool.get(sender()).foreach { _ => | ||
logging.info(this, "prewarmed container is deleted due to unused for long time") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to include the type of prewarm container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already added
@@ -313,12 +391,12 @@ class ContainerPool(childFactory: ActorRefFactory => ActorRef, | |||
} | |||
val startingCount = prewarmStartingPool.count(p => p._2._1 == kind && p._2._2 == memory) | |||
val containerCount = currentCount + startingCount | |||
if (containerCount < config.count) { | |||
if (containerCount < config.initialCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.initialCount
is being used multiple times.
How about introducing a local variable for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already added
case None => coldStartCount = coldStartCount + (key -> 1) | ||
} | ||
for ((k, v) <- coldStartCount) { | ||
logging.info(this, s"===statistics the cold start, k: ${k}, v: ${v}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing ===
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already removed
} | ||
|
||
/** statistics the cold start */ | ||
def countColdStart(kind: String, memoryLimit: ByteSize): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is to increase a count for the number of cold-starts.
How about renaming this to something more intuitive such as incrementColdStartCount
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already changed to incrementColdStartCount
} | ||
.foreach { _ => | ||
val time = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(Calendar.getInstance().getTime) | ||
val key = s"${kind},${memoryLimit.toMB},${time}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with the above.
A typed key is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
context.system.scheduler.schedule(10.seconds, 1.minute, self, DeleteUnusedPrewarmedContainer) | ||
|
||
// check periodically for the cold start and create some increment containers automatically if activation >= threshold | ||
context.system.scheduler.schedule(1.minute, 1.minute, self, SupplementPrewarmedContainer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use same scheduled message to do both delete + supplement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already used same message.
@@ -146,7 +146,7 @@ case class PreWarmedData(override val container: Container, | |||
kind: String, | |||
override val memoryLimit: ByteSize, | |||
override val activeActivationCount: Int = 0) | |||
extends ContainerStarted(container, Instant.EPOCH, memoryLimit, activeActivationCount) | |||
extends ContainerStarted(container, Instant.now(), memoryLimit, activeActivationCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overloading the lastUse property is confusing - I would move that property to ContainerInUse trait, and create a dedicated property in PrewarmedData for tracking expiration Instant or Deadline, so that every scheduled period can easily check (without calculation of expiration) whether the expiration is in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe your method(I would move that property to ContainerInUse trait, and create a dedicated property in PrewarmedData for tracking expiration Instant or Deadline) is good,
But when i wrote using your method, it seems i cannot wrote it well, can you check it in your local?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it takes refactoring some or most of the uses of "ContainerData" generic type. For now you can just:
- change prewarmedPool to type
var prewarmedPool = immutable.Map.empty[ActorRef, PreWarmedData]
- add a
val created = Instant.now()
toPrewarmedData
type
Then least theContainerData.lastUsed
property for use when container is actually allocated to an action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly
for ((container, data) <- containers) { | ||
if (JDuration.between(data.lastUsed, Instant.now).compareTo(JDuration.ofSeconds(ttlSeconds)) > 0) { | ||
// Don't recover a new one under this situation | ||
container ! RemovePreWarmedContainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just use Remove message here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If use Remove
message to remove unused prewarmed container, this lead to another new prewarmed created after delete the unused prewarmed container
Logic as below
firstly, codes will be executed here: https://github.com/apache/openwhisk/blob/master/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerProxy.scala#L376, then after executed context.parent ! ContainerRemoved
in def destroyContainer
, codes will be executed here: https://github.com/apache/openwhisk/blob/master/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala#L272
So i used another message RemovePreWarmedContainer
to distinguish them, and has another benefit, after deleted the prewarmed container, can print the prewarmed container's detail info via PreWarmedContainerRemoved(prewarmedData: PreWarmedData)
's prewarmedData
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In that case, I suggest changing case object ContainerRemoved
to case class ContainerRemoved(replacePrewarm:Booleam)
so that we can limit the duplicated logic in ContainerProxy
- would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, can work, updated accordingly
@@ -375,6 +377,8 @@ class ContainerProxy(factory: (TransactionId, | |||
|
|||
case Event(Remove, data: PreWarmedData) => destroyContainer(data) | |||
|
|||
case Event(RemovePreWarmedContainer, data: PreWarmedData) => destoryPreWarmedContainer(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using Remove message and existing destroyContainer() function is simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly
cf30f7c
to
d6a6181
Compare
val currentCount = prewarmedPool.count { | ||
case (_, PreWarmedData(_, `kind`, `memory`, _)) => true //done starting | ||
case _ => false //started but not finished starting | ||
} | ||
val startingCount = prewarmStartingPool.count(p => p._2._1 == kind && p._2._2 == memory) | ||
val containerCount = currentCount + startingCount | ||
if (containerCount < config.count) { | ||
if (containerCount < initialCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be something like:
val requiredCount = if (init) { initialCount } else {minCount}
if (containerCount < requiredCount) {
Otherwise if min is 0 and initial is 2, we can never scale down to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly
// check periodically every 1 minute, adjust prewarmed container(delete if unused for some time and create some increment containers) | ||
context.system.scheduler.schedule(1.minute, 1.minute, self, AdjustPrewarmedContainer) | ||
|
||
def adjustPrewarmedContainer(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you DRY up the code in adjustPrewarmedContainer
and backfillPrewarms
, possibly using the same function for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm.. it seems adjustPrewarmedContainer
's logic is not same as backfillPrewarms
, and in its inner, it seems can't extract the common code as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of duplicated delicate logic - these need to be consolidated, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a proposed version that:
- consolidates
backfillPrewarms()
andadjustPrewarmedContainer()
- removes
prewarmContainerIfpossible()
def adjustPrewarmedContainer(init: Boolean, scheduled: Boolean): Unit = {
prewarmConfig.foreach { config =>
// Delete unused prewarmed container until minCount is reached
val kind = config.exec.kind
val memory = config.memoryLimit
val runningCount = prewarmedPool.count {
//done starting, and not expired
case (_, p @ PreWarmedData(_, `kind`, `memory`, _, _)) if !p.isExpired() => true
//started but not finished starting (or expired)
case _ => false
}
val startingCount = prewarmStartingPool.count(p => p._2._1 == kind && p._2._2 == memory)
val currentCount = runningCount + startingCount
//determine how many are needed
val desiredCount: Int =
if (init) config.initialCount
else {
if (scheduled) {
//scheduled/reactive config backfill
config.reactive
.map(c => getReactiveCold(c, kind, memory).getOrElse(c.minCount)) //reactive -> desired is either cold start driven, or minCount
.getOrElse(config.initialCount) //not reactive -> desired is always initial count
} else {
//normal backfill after removal - make sure at least minCount or initialCount is started
config.reactive.map(_.minCount).getOrElse(config.initialCount)
}
}
if (currentCount < desiredCount) {
logging.info(
this,
s"found ${currentCount} started and ${startingCount} starting; ${if (init) "initing" else "backfilling"} ${desiredCount - currentCount} pre-warms to desired count: ${desiredCount} for kind:${config.exec.kind} mem:${config.memoryLimit.toString}")(
TransactionId.invokerWarmup)
(currentCount until desiredCount).foreach { _ =>
prewarmContainer(config.exec, config.memoryLimit, config.reactive.map(_.ttl))
}
}
//remove expired
config.reactive.foreach { reactiveValue =>
prewarmedPool
.filter { warmInfo =>
warmInfo match {
case (_, p @ PreWarmedData(_, `kind`, `memory`, _, _)) if p.isExpired() => true
case _ => false
}
}
.drop(reactiveValue.minCount) //keep minCount even if expired
.map(_._1 ! Remove)
}
}
if (scheduled) {
//clear coldStartCounts each time scheduled event is processed to reset counts
coldStartCount = immutable.Map.empty[ColdStartKey, Int]
}
}
def getReactiveCold(config: ReactivePrewarmingConfig, kind: String, memory: ByteSize): Option[Int] = {
coldStartCount.get(ColdStartKey(kind, memory)).map { value =>
// Let's assume that threshold is `2`, increment is `1` in runtime.json
// if cold start number in previous minute is `2`, requireCount is `2/2 * 1 = 1`
// if cold start number in previous minute is `4`, requireCount is `4/2 * 1 = 2`
math.min(math.max(config.minCount, (value / config.threshold) * config.increment), config.maxCount)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! codes become more clear.
Updated accordingly.
@@ -32,10 +36,14 @@ sealed trait WorkerState | |||
case object Busy extends WorkerState | |||
case object Free extends WorkerState | |||
|
|||
case class ColdStartKey(kind: String, memory: Long, date: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is date used for?
Memory should be ByteSize for consistency elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly
} | ||
case None => | ||
} | ||
coldStartCount = coldStartCount - coldStartKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just reset the map to empty? Otherwise this map may grow without bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, reset the map's key of kind/memory to empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that it might be good to simple reset the map at once, instead of removing each key separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly
|
||
/** statistics the cold start */ | ||
def incrementColdStartCount(kind: String, memoryLimit: ByteSize): Unit = { | ||
prewarmConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would clarify that this is only for cold start of prewarm configs, e.g. not blackbox or other configs. Maybe just a comment is fine, but this is very important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly
kind == config.exec.kind && memoryLimit == config.memoryLimit | ||
} | ||
.foreach { _ => | ||
val time = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(Calendar.getInstance().getTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need this time field in the key, i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, because our schedule intervel is every minute, so can remove it
prewarmContainerIfpossible(kind, memory, count) | ||
} else { | ||
// at lease create 1 prewarmed container | ||
prewarmContainerIfpossible(kind, memory, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this else
should be here - it is not clear from configuration that when value==5 and threshold==5, you get 5/increment created, but if value== 1,2,3,or4, you get 1 created. It should just always use math.max(1, value/increment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly, you can check it again ^^
memoryLimit: ByteSize, | ||
ttl: Duration, | ||
threshold: Int, | ||
increment: Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be good to separate the dynamic affects into a separate config, which is disabled by default so that operators need to opt-in to this new behavior. e.g.
case class PrewarmingConfig(initialCount: Int,
exec: CodeExec[_],
memoryLimit: ByteSize,
reactiveConfig: Option[ReactivePrewarmingConfig]=None)
case class ReactivePrewarmingConfig(minCount: Int,
maxCount: Int,
ttl: Duration,
threshold: Int,
increment: Int)
Add some comments to indicate that the relationship of threshold
and increment
is; i.e. how do they determine the number of prewarms created and when.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, updated accordingly.
val memory = config.memoryLimit | ||
val ttlSeconds = config.ttl.toSeconds | ||
val minCount = config.minCount | ||
val containers = prewarmedPool.filter { warmInfo => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest:
- change
PrewarmData.created:Instant
toPrewarmData.expires: Option[Deadline]
- add to PrewarmedData:
def isExpired(): Boolean = expires.exists(_.isOverdue())
- simplifying this removal logic to
prewarmedPool
.filter { warmInfo =>
warmInfo match {
case (_, p @ PreWarmedData(_, `kind`, `memory`, _, _)) if p.isExpired() => true
case _ => false
}
}
.drop(config.minCount)
.foreach(_._1 ! Remove)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated accordingly.
BTW I think the
|
3d21308
to
7b761ce
Compare
val coldStartValue = coldStart._2 | ||
MetricEmitter.emitHistogramMetric( | ||
LoggingMarkers.CONTAINER_POOL_COLDSTART(coldStartKey.memory.toString, coldStartKey.kind), | ||
coldStartValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would name this metric "prewarmColdstart" - to not confuse it with containerStart (which has a tag for containerState). These values are not just cold starts - but cold starts that matched prewarm configs (and did not have a prewarm available)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated accordingly
} | ||
// emit expired container counter metric with memory + kind | ||
val removedCount = expiredPrewarmedContainer.size | ||
MetricEmitter.emitHistogramMetric(LoggingMarkers.CONTAINER_POOL_EXPIRED(memory.toString, kind), removedCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would name this metric "prewarmExpired"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, these metrics should be counters not histograms, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated accordingly
stream.reset() | ||
|
||
// Make sure prewarmed containers can be deleted from prewarmedPool due to unused | ||
Thread.sleep(3.seconds.toMillis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should sleep an amount based on ttl (e.g. ttl + 1.millis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated accordingly
@@ -67,6 +68,9 @@ class ContainerPoolTests | |||
// the values is done properly. | |||
val exec = CodeExecAsString(RuntimeManifest("actionKind", ImageName("testImage")), "testCode", None) | |||
val memoryLimit = 256.MB | |||
val ttl = FiniteDuration(2, TimeUnit.SECONDS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make ttl shorter - so that test runs don't take longer than necessary. Maybe 500.millis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! updated accordingly
containers(1).send(pool, ContainerRemoved(false)) | ||
|
||
// Make sure prewarmed containers are deleted | ||
Thread.sleep(2.seconds.toMillis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave out this sleep (and the log check) - receiving the Remove message should be enough to make sure it is removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated accordingly
|
||
pool ! AdjustPrewarmedContainer | ||
// Make sure adjustPrewarmContainer run finished | ||
Thread.sleep(2.seconds.toMillis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid most of the Thread.sleep
usage - e.g. use eventually, or similar
eventually {
stream.toString should include(s"currentCount: 0 prewarmed container")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! updated accordingly
Thread.sleep(ttl.toMillis) | ||
pool ! AdjustPrewarmedContainer | ||
// Make sure adjustPrewarmContainer run finished | ||
Thread.sleep(2.seconds.toMillis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed accordingly
// Make sure the created prewarmed containers are expired | ||
stream.reset() | ||
Thread.sleep(ttl.toMillis) | ||
pool ! AdjustPrewarmedContainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't pool send AdjustPrewarmedContainer
to itself eventually after the poolConfig.prewarmExpiredCheckPeriod
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think need, bacause the prewarmExpirationCheckInterval is 1 minute
def poolConfig(userMemory: ByteSize) =
ContainerPoolConfig(userMemory, 0.5, false, FiniteDuration(1, TimeUnit.MINUTES))
so need to send AdjustPrewarmedContainer to itself manually in codes.
Why i don't change above value to a less value, the reason is i want to make user's to know the detail clearly when see the test codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the test is written in a way suggesting that the AdjustPrewarmedContainer
message is explicitly sent by logic, but it never is - it is only ever sent via schedule. So it would be more clear to rely on the timing (with a shorter value) and see that the results of the message occur without interference of the test logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly.
@@ -278,6 +278,7 @@ | |||
"CONFIG_whisk_invoker_https_keystorePassword": "{{ invoker.ssl.keystore.password }}" | |||
"CONFIG_whisk_invoker_https_keystoreFlavor": "{{ invoker.ssl.storeFlavor }}" | |||
"CONFIG_whisk_invoker_https_clientAuth": "{{ invoker.ssl.clientAuth }}" | |||
"CONFIG_whisk_containerPool_prewarmExpiredCheckPeriod": "{{ container_pool_prewarmExpiredCheckPeriod | default('1 minute') }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit picky but how about changing the name to a more intuitive one such as container_pool_prewarm_expirationCheckInterval
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, updated accordingly.
.orElse(fields.get("count")) | ||
.map(_.convertTo[Int]) | ||
.get | ||
val memory = fields.get("memory").map(_.convertTo[ByteSize]).get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid using get
on Option
.
Since these configurations are required, I think we can do something simliar to this.
val memory = fields.get("memory").map(_.convertTo[ByteSize]).get | |
override def read(value: JsValue): StemCell = { | |
val fields = value.asJsObject.fields | |
val initialCount: Option[Int] = | |
fields | |
.get("initialCount") | |
.orElse(fields.get("count")) | |
.map(_.convertTo[Int]) | |
val memory: Option[ByteSize] = fields.get("memory").map(_.convertTo[ByteSize]) | |
val config = fields.get("reactive").map(_.convertTo[ReactivePrewarmingConfig]) | |
(initialCount, memory) match { | |
case (Some(c), Some(m)) => StemCell(c, m, config) | |
case (Some(c), None) => throw new IllegalArgumentException(s"blah blah: $c") | |
case (None, Some(m)) => throw new IllegalArgumentException(s"blah blah: $m") | |
case _ => throw new IllegalArgumentException(s"blah blah") | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated accordingly.
(py.kind, py.image, 2, 256.MB)) | ||
} | ||
|
||
it should "parse manifest with reactive from JSON string" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add another test case for parse manifest with reactive from JSON string
@ningyougang Thanks for all your work on this! I think this PR is close to done - do you want to solicit feedback on the dev list once more to let other people chime in on this before it gets merged? |
2db7514
to
5ae5fa4
Compare
Codecov Report
@@ Coverage Diff @@
## master #4871 +/- ##
==========================================
- Coverage 83.19% 77.25% -5.95%
==========================================
Files 201 201
Lines 9351 9441 +90
Branches 397 396 -1
==========================================
- Hits 7780 7294 -486
- Misses 1571 2147 +576
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉 Thanks! Will wait a day for any additional feedback, and merge tomorrow if nobody else does it first.
Description
As we already know, when invoker starts up, will create some prewarmed containers in prewarmedpool in advance.
If the prewarmed containers are not used for sometime, the prewarmed containers will exist forever.
it may lead to
waste of resources
so there has need to provide some mechanism to save memory resources, e.g.
Node: Current, i didn't add test cases, after some reviews, i will add it.
Related issue and scope
#4725 (comment)
My changes affect the following components
Types of changes
Checklist: