Adjust prewarm container dynamically #4871

ningyougang · 2020-04-02T06:13:16Z

Description

As we already know, when invoker starts up, will create some prewarmed containers in prewarmedpool in advance.

If the prewarmed containers are not used for sometime, the prewarmed containers will exist forever.
it may lead to waste of resources

so there has need to provide some mechanism to save memory resources, e.g.

If the prewarmed containers are not used for sometime, should delete them automatically.
When cold start happened, we should supplement a prewarmed container which matched the kind/memory in runtime.

Node: Current, i didn't add test cases, after some reviews, i will add it.

Related issue and scope

#4725 (comment)

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

style95 · 2020-04-02T09:55:30Z

ansible/files/runtimes.json

@@ -58,7 +58,8 @@
                "stemCells": [
                    {
                        "count": 2,
-                        "memory": "256 MB"
+                        "memory": "256 MB",
+                        "ttlMinutes": 10


How about adding more fine granularity by changing the unit to second?

It seems unit of second is very small, and the unit of hour is too big.
so i chosed the unit of minute (second < minute < hour).

Does it make sense if we use unit of second?

Why not FiniteDuration? So that operators can make the value whatever is appropriate for them.

@tysonnorris ,got it

Already changed to Duration

style95 · 2020-04-02T10:15:50Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+        }
+      }
+      for ((container, data) <- containers) {
+        if (Duration.between(data.lastUsed, Instant.now).compareTo(Duration.ofMinutes(ttlMinutes)) > 0) {


If there is no activation for prewarm kind for the given TTL time, it will always cause a cold start next time.
Also, even if we can add more prewarm containers when cold starts happen I think it may not fit well for some workloads. If loads spike for some runtime kind, it will cause a lot of cold starts. Also, it will try to create more prewarm containers at the same time. As we know, docker operation is quite expensive and it will end up with a huge number of container creation along with long wait-time.

Yes, agree with you.

I introduced a new mechanism to avoid the cold start
added threshold, increment in runtime.json
after prewarmed containers are deleted due to unused.
if action for that kind/memory's activation number >= threshold in previous one minute, create increment prewarmed containers.

refer to the new added commit: 8c36f33

@style95 already fixed

tysonnorris · 2020-04-08T22:14:57Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+      .foreach { _ =>
+        val time = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(Calendar.getInstance().getTime)
+        val key = s"${kind},${memoryLimit.toMB},${time}"
+        coldStartCount.getOrElseUpdate(key, new AtomicInteger(0)).getAndIncrement()


Does this need AtomicInteger? Container allocation should happen in actor message processing, which is single threaded

Already fixed

tysonnorris · 2020-04-08T22:17:04Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

@@ -80,8 +85,49 @@ class ContainerPool(childFactory: ActorRefFactory => ActorRef,
  //periodically emit metrics (don't need to do this for each message!)
  context.system.scheduler.schedule(30.seconds, 10.seconds, self, EmitMetrics)

+  // Key is `kind,memory,time`, value is the number of cold Start in minute
+  val coldStartCount = TrieMap[String, AtomicInteger]()


this can be a var immutable.Map instead, i think?

Actually, I would probably not create a separate map here, but rather add expiration:Option[Deadline] to PreWarmedData class. Then you can track expiration AND more easily filter to find expired prewarms.
EDIT: sorry, to be clear: the expiration is required to remove unused ones, but there is a separate issue that needs to be tracked which is prewam "miss", where the config matches prewarm config, but no prewarm was available. For the miss, you can use an immutable.Map that is reset every minute (or configurable time period), without tracking timestamps or expirations there.

Already fixed

tysonnorris · 2020-04-08T22:34:43Z

Please also update the ContainerPool.emitMetrics() function to include count per stemcell config, instead of just aggregate count of all stemcells - since with fluctuating numbers, these values will get very confusing.

tysonnorris · 2020-04-08T22:36:20Z

ansible/files/runtimes.json

+                        "memory": "256 MB",
+                        "ttlMinutes": 10,
+                        "threshold": 5,
+                        "increment": 1


In this scenario, I think "count" behaves like "initialCount" -right? so now I think we should also have a "maxCount" so that additional prewarms cannot be created past the maxCount limit.

We could consider renaming "count" to "initialCount", but I'm not sure this is an issue for backwards compatibility.

Already fixed

ningyougang · 2020-04-13T07:05:56Z

@style95 @tysonnorris
Have any other review comments?

style95 · 2020-04-13T13:27:17Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

  backfillPrewarms(true)

+  // check periodically every 1 minute, delete prewarmed container if unused for sometime
+  context.system.scheduler.schedule(10.seconds, 1.minute, self, DeleteUnusedPrewarmedContainer)


So the main idea is removing unused prewarm containers every 1 minute and create more prewarm containers based on the usage during the last 1 minute.

I think it is a good starting point.
This is a kind of heuristic approach expecting the future based on history

One thing to consider is, we can't exactly expect the required prewarm types.
Any kind of activation can come at any time.
If an action is sparsely invoked or invoked with a bigger interval than prewarm TTL, it will always trigger a cold-start.
So how about introducing a minimum limit as well?
We can configure the minimum number of prewarm containers while controlling the ratio of them based on the usage.

Regarding So how about introducing a minimum limit as well?
You mean add minCount in runtime.json

Regarding We can configure the minimum number of prewarm containers while controlling the ratio of them based on the usage.
What's mean? the minimum numbe can be changed on some cases?

It is true that without a minimum, the prewarms may all go to zero, which is what you need if you have blackbox dedicated invokers if you don't want any prewarms running there.
So the stemcell config could have:
initialCount : this should probably be whatever the value is used today for number of prewarms of that config.
minCount : 0 is required, if you want to prevent case where unused prewarms are wasting resources, but it implies that you will get a cold start for that config on that invoker.
maxCount : this is an upper limit on max number of prewarms for that config.

For future work, I would also consider:

Does this invoker support only blackbox? If so, it can safely scale down to 0 prewarms, since blackbox are never prewarmed.

Can new prewarm configs be completely inferred? So that if lots of actions of a particular config arrive, they eventually get benefits of having prewarms.

Can blackbox then be allowed to have prewarms, if they receive enough traffic at this invoker?

Already added minCount in runtimes.json

style95 · 2020-04-13T13:28:20Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+      calendar.add(Calendar.MINUTE, -1)
+      val lastDate = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(calendar.getTime)
+      val key = s"${kind},${memory.toMB},${lastDate}"
+      coldStartCount.get(key) match {


Can we use coldStartCount.remove(key)?

If we changed from immutable.Map to TrieMap, we can use coldStartCount.remove
But here, coldStartCount can only be accessed via single thread due to actor, i think use immutable.Map is ok.

+1 for immutable.Map

style95 · 2020-04-13T13:29:06Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+      val calendar = Calendar.getInstance()
+      calendar.add(Calendar.MINUTE, -1)
+      val lastDate = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(calendar.getTime)
+      val key = s"${kind},${memory.toMB},${lastDate}"


I think it would be better to use a typed key such as a case class rather than a String key with multiple fields.

style95 · 2020-04-13T13:30:39Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

@@ -80,8 +86,59 @@ class ContainerPool(childFactory: ActorRefFactory => ActorRef,
  //periodically emit metrics (don't need to do this for each message!)
  context.system.scheduler.schedule(30.seconds, 10.seconds, self, EmitMetrics)

+  // Key is `kind,memory,time`, value is the number of cold Start in minute
+  var coldStartCount = immutable.Map.empty[String, Int]


It seems this map can be used in multiple threads, should we use concurrent.map instead?

Oh, really, i think coldStartCount can only be accessed in single thread.

immutable.Map should be fine if it is only modified during actor message processing

oh ok I thought it was being accessed by a scheduler and actor thread at the same time.
But it was not.

style95 · 2020-04-13T13:31:12Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+    // prewarmed container got removed
+    case PreWarmedContainerRemoved =>
+      prewarmedPool.get(sender()).foreach { _ =>
+        logging.info(this, "prewarmed container is deleted due to unused for long time")


Would be great to include the type of prewarm container.

Already added

style95 · 2020-04-13T13:32:27Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

@@ -313,12 +391,12 @@ class ContainerPool(childFactory: ActorRefFactory => ActorRef,
      }
      val startingCount = prewarmStartingPool.count(p => p._2._1 == kind && p._2._2 == memory)
      val containerCount = currentCount + startingCount
-      if (containerCount < config.count) {
+      if (containerCount < config.initialCount) {


config.initialCount is being used multiple times.
How about introducing a local variable for this?

Already added

style95 · 2020-04-13T13:33:00Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+          case None        => coldStartCount = coldStartCount + (key -> 1)
+        }
+        for ((k, v) <- coldStartCount) {
+          logging.info(this, s"===statistics the cold start, k: ${k}, v: ${v}")


Removing ===?

Already removed

style95 · 2020-04-13T13:34:58Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+  }
+
+  /** statistics the cold start */
+  def countColdStart(kind: String, memoryLimit: ByteSize): Unit = {


It seems this is to increase a count for the number of cold-starts.
How about renaming this to something more intuitive such as incrementColdStartCount?

Already changed to incrementColdStartCount

style95 · 2020-04-13T13:35:27Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+      }
+      .foreach { _ =>
+        val time = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(Calendar.getInstance().getTime)
+        val key = s"${kind},${memoryLimit.toMB},${time}"


Same with the above.
A typed key is preferred.

tysonnorris · 2020-04-14T03:59:57Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+  context.system.scheduler.schedule(10.seconds, 1.minute, self, DeleteUnusedPrewarmedContainer)
+
+  // check periodically for the cold start and create some increment containers automatically if activation >= threshold
+  context.system.scheduler.schedule(1.minute, 1.minute, self, SupplementPrewarmedContainer)


Use same scheduled message to do both delete + supplement

Already used same message.

tysonnorris · 2020-04-14T04:08:59Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerProxy.scala

@@ -146,7 +146,7 @@ case class PreWarmedData(override val container: Container,
                         kind: String,
                         override val memoryLimit: ByteSize,
                         override val activeActivationCount: Int = 0)
-    extends ContainerStarted(container, Instant.EPOCH, memoryLimit, activeActivationCount)
+    extends ContainerStarted(container, Instant.now(), memoryLimit, activeActivationCount)


overloading the lastUse property is confusing - I would move that property to ContainerInUse trait, and create a dedicated property in PrewarmedData for tracking expiration Instant or Deadline, so that every scheduled period can easily check (without calculation of expiration) whether the expiration is in the past.

Maybe your method（I would move that property to ContainerInUse trait, and create a dedicated property in PrewarmedData for tracking expiration Instant or Deadline） is good,

But when i wrote using your method, it seems i cannot wrote it well, can you check it in your local?

I think it takes refactoring some or most of the uses of "ContainerData" generic type. For now you can just:

change prewarmedPool to type var prewarmedPool = immutable.Map.empty[ActorRef, PreWarmedData]

add a val created = Instant.now() to PrewarmedData type
Then least the ContainerData.lastUsed property for use when container is actually allocated to an action.

Updated accordingly

tysonnorris · 2020-04-14T04:13:17Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+      for ((container, data) <- containers) {
+        if (JDuration.between(data.lastUsed, Instant.now).compareTo(JDuration.ofSeconds(ttlSeconds)) > 0) {
+          // Don't recover a new one under this situation
+          container ! RemovePreWarmedContainer


Can you just use Remove message here?

If use Remove message to remove unused prewarmed container, this lead to another new prewarmed created after delete the unused prewarmed container

Logic as below
firstly, codes will be executed here: https://github.com/apache/openwhisk/blob/master/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerProxy.scala#L376, then after executed context.parent ! ContainerRemoved in def destroyContainer, codes will be executed here: https://github.com/apache/openwhisk/blob/master/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala#L272

So i used another message RemovePreWarmedContainer to distinguish them, and has another benefit, after deleted the prewarmed container, can print the prewarmed container's detail info via PreWarmedContainerRemoved(prewarmedData: PreWarmedData)'s prewarmedData field

I see. In that case, I suggest changing case object ContainerRemoved to case class ContainerRemoved(replacePrewarm:Booleam) so that we can limit the duplicated logic in ContainerProxy - would that work?

Yes, can work, updated accordingly

tysonnorris · 2020-04-14T04:13:53Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerProxy.scala

@@ -375,6 +377,8 @@ class ContainerProxy(factory: (TransactionId,

    case Event(Remove, data: PreWarmedData) => destroyContainer(data)

+    case Event(RemovePreWarmedContainer, data: PreWarmedData) => destoryPreWarmedContainer(data)


I think using Remove message and existing destroyContainer() function is simpler?

Updated accordingly

tysonnorris · 2020-04-23T18:33:34Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

      val currentCount = prewarmedPool.count {
        case (_, PreWarmedData(_, `kind`, `memory`, _)) => true //done starting
        case _                                          => false //started but not finished starting
      }
      val startingCount = prewarmStartingPool.count(p => p._2._1 == kind && p._2._2 == memory)
      val containerCount = currentCount + startingCount
-      if (containerCount < config.count) {
+      if (containerCount < initialCount) {


I think this should be something like:

val requiredCount = if (init) { initialCount } else {minCount} if (containerCount < requiredCount) {

Otherwise if min is 0 and initial is 2, we can never scale down to 0.

Updated accordingly

tysonnorris · 2020-04-23T18:39:47Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+  // check periodically every 1 minute, adjust prewarmed container(delete if unused for some time and create some increment containers)
+  context.system.scheduler.schedule(1.minute, 1.minute, self, AdjustPrewarmedContainer)
+
+  def adjustPrewarmedContainer(): Unit = {


Can you DRY up the code in adjustPrewarmedContainer and backfillPrewarms, possibly using the same function for both?

hm.. it seems adjustPrewarmedContainer's logic is not same as backfillPrewarms, and in its inner, it seems can't extract the common code as well.

There is a lot of duplicated delicate logic - these need to be consolidated, I think.

Here is a proposed version that:

consolidates backfillPrewarms() and adjustPrewarmedContainer()

removes prewarmContainerIfpossible()

def adjustPrewarmedContainer(init: Boolean, scheduled: Boolean): Unit = { prewarmConfig.foreach { config => // Delete unused prewarmed container until minCount is reached val kind = config.exec.kind val memory = config.memoryLimit val runningCount = prewarmedPool.count { //done starting, and not expired case (_, p @ PreWarmedData(_, `kind`, `memory`, _, _)) if !p.isExpired() => true //started but not finished starting (or expired) case _ => false } val startingCount = prewarmStartingPool.count(p => p._2._1 == kind && p._2._2 == memory) val currentCount = runningCount + startingCount //determine how many are needed val desiredCount: Int = if (init) config.initialCount else { if (scheduled) { //scheduled/reactive config backfill config.reactive .map(c => getReactiveCold(c, kind, memory).getOrElse(c.minCount)) //reactive -> desired is either cold start driven, or minCount .getOrElse(config.initialCount) //not reactive -> desired is always initial count } else { //normal backfill after removal - make sure at least minCount or initialCount is started config.reactive.map(_.minCount).getOrElse(config.initialCount) } } if (currentCount < desiredCount) { logging.info( this, s"found ${currentCount} started and ${startingCount} starting; ${if (init) "initing" else "backfilling"} ${desiredCount - currentCount} pre-warms to desired count: ${desiredCount} for kind:${config.exec.kind} mem:${config.memoryLimit.toString}")( TransactionId.invokerWarmup) (currentCount until desiredCount).foreach { _ => prewarmContainer(config.exec, config.memoryLimit, config.reactive.map(_.ttl)) } } //remove expired config.reactive.foreach { reactiveValue => prewarmedPool .filter { warmInfo => warmInfo match { case (_, p @ PreWarmedData(_, `kind`, `memory`, _, _)) if p.isExpired() => true case _ => false } } .drop(reactiveValue.minCount) //keep minCount even if expired .map(_._1 ! Remove) } } if (scheduled) { //clear coldStartCounts each time scheduled event is processed to reset counts coldStartCount = immutable.Map.empty[ColdStartKey, Int] } } def getReactiveCold(config: ReactivePrewarmingConfig, kind: String, memory: ByteSize): Option[Int] = { coldStartCount.get(ColdStartKey(kind, memory)).map { value => // Let's assume that threshold is `2`, increment is `1` in runtime.json // if cold start number in previous minute is `2`, requireCount is `2/2 * 1 = 1` // if cold start number in previous minute is `4`, requireCount is `4/2 * 1 = 2` math.min(math.max(config.minCount, (value / config.threshold) * config.increment), config.maxCount) } }

Great! codes become more clear.
Updated accordingly.

tysonnorris · 2020-04-23T18:44:57Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

@@ -32,10 +36,14 @@ sealed trait WorkerState
 case object Busy extends WorkerState
 case object Free extends WorkerState

+case class ColdStartKey(kind: String, memory: Long, date: String)


What is date used for?
Memory should be ByteSize for consistency elsewhere.

Updated accordingly

tysonnorris · 2020-04-23T18:47:01Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+          }
+        case None =>
+      }
+      coldStartCount = coldStartCount - coldStartKey


Should this just reset the map to empty? Otherwise this map may grow without bound?

Yes, reset the map's key of kind/memory to empty.

I mean that it might be good to simple reset the map at once, instead of removing each key separately.

Updated accordingly

tysonnorris · 2020-04-23T18:52:51Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+
+  /** statistics the cold start */
+  def incrementColdStartCount(kind: String, memoryLimit: ByteSize): Unit = {
+    prewarmConfig


I would clarify that this is only for cold start of prewarm configs, e.g. not blackbox or other configs. Maybe just a comment is fine, but this is very important.

Updated accordingly

tysonnorris · 2020-04-23T18:53:38Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+        kind == config.exec.kind && memoryLimit == config.memoryLimit
+      }
+      .foreach { _ =>
+        val time = new SimpleDateFormat("yyyy-MM-dd-HH:mm").format(Calendar.getInstance().getTime)


Don't need this time field in the key, i think.

Yes, you are right, because our schedule intervel is every minute, so can remove it

tysonnorris · 2020-04-24T20:38:04Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+            prewarmContainerIfpossible(kind, memory, count)
+          } else {
+            // at lease create 1 prewarmed container
+            prewarmContainerIfpossible(kind, memory, 1)


I don't think this else should be here - it is not clear from configuration that when value==5 and threshold==5, you get 5/increment created, but if value== 1,2,3,or4, you get 1 created. It should just always use math.max(1, value/increment)

Updated accordingly, you can check it again ^^

tysonnorris · 2020-04-24T20:44:41Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+                            memoryLimit: ByteSize,
+                            ttl: Duration,
+                            threshold: Int,
+                            increment: Int)


I think it may be good to separate the dynamic affects into a separate config, which is disabled by default so that operators need to opt-in to this new behavior. e.g.

case class PrewarmingConfig(initialCount: Int, exec: CodeExec[_], memoryLimit: ByteSize, reactiveConfig: Option[ReactivePrewarmingConfig]=None) case class ReactivePrewarmingConfig(minCount: Int, maxCount: Int, ttl: Duration, threshold: Int, increment: Int)

Add some comments to indicate that the relationship of threshold and increment is; i.e. how do they determine the number of prewarms created and when.

Great suggestion, updated accordingly.

tysonnorris · 2020-04-24T20:54:28Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+      val memory = config.memoryLimit
+      val ttlSeconds = config.ttl.toSeconds
+      val minCount = config.minCount
+      val containers = prewarmedPool.filter { warmInfo =>


suggest:

change PrewarmData.created:Instant to PrewarmData.expires: Option[Deadline]

add to PrewarmedData:def isExpired(): Boolean = expires.exists(_.isOverdue())

simplifying this removal logic to

prewarmedPool .filter { warmInfo => warmInfo match { case (_, p @ PreWarmedData(_, `kind`, `memory`, _, _)) if p.isExpired() => true case _ => false } } .drop(config.minCount) .foreach(_._1 ! Remove)

updated accordingly.

tysonnorris · 2020-04-24T21:07:29Z

BTW I think the takePrewarmContainer needs enhancement as well, so that the "oldest" prewarm is always used - otherwise, it could end up where a random but recent prewarm is taken, and then all the others suddenly expire and are removed, leaving only the min. But if we always take the oldest, then there will be more left when expiring ones are removed.
e.g. (assuming PrewarmedData.expires is a Option[Deadline] as commented elsewhere)

    val now = Deadline.now
    prewarmedPool
      .toSeq
      .sortBy(_._2.expires.getOrElse(now))
      .find {
      ...

tysonnorris · 2020-05-14T23:30:37Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+        val coldStartValue = coldStart._2
+        MetricEmitter.emitHistogramMetric(
+          LoggingMarkers.CONTAINER_POOL_COLDSTART(coldStartKey.memory.toString, coldStartKey.kind),
+          coldStartValue)


I would name this metric "prewarmColdstart" - to not confuse it with containerStart (which has a tag for containerState). These values are not just cold starts - but cold starts that matched prewarm configs (and did not have a prewarm available)

updated accordingly

tysonnorris · 2020-05-14T23:31:35Z

core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala

+          }
+        // emit expired container counter metric with memory + kind
+        val removedCount = expiredPrewarmedContainer.size
+        MetricEmitter.emitHistogramMetric(LoggingMarkers.CONTAINER_POOL_EXPIRED(memory.toString, kind), removedCount)


I would name this metric "prewarmExpired"

Also, these metrics should be counters not histograms, I think.

updated accordingly

tysonnorris · 2020-05-14T23:42:21Z

tests/src/test/scala/org/apache/openwhisk/core/containerpool/test/ContainerPoolTests.scala

+    stream.reset()
+
+    // Make sure prewarmed containers can be deleted from prewarmedPool due to unused
+    Thread.sleep(3.seconds.toMillis)


should sleep an amount based on ttl (e.g. ttl + 1.millis)

updated accordingly

tysonnorris · 2020-05-14T23:43:09Z

tests/src/test/scala/org/apache/openwhisk/core/containerpool/test/ContainerPoolTests.scala

@@ -67,6 +68,9 @@ class ContainerPoolTests
  // the values is done properly.
  val exec = CodeExecAsString(RuntimeManifest("actionKind", ImageName("testImage")), "testCode", None)
  val memoryLimit = 256.MB
+  val ttl = FiniteDuration(2, TimeUnit.SECONDS)


I would make ttl shorter - so that test runs don't take longer than necessary. Maybe 500.millis?

Great! updated accordingly

tysonnorris · 2020-05-14T23:47:43Z

tests/src/test/scala/org/apache/openwhisk/core/containerpool/test/ContainerPoolTests.scala

+    containers(1).send(pool, ContainerRemoved(false))
+
+    // Make sure prewarmed containers are deleted
+    Thread.sleep(2.seconds.toMillis)


I would leave out this sleep (and the log check) - receiving the Remove message should be enough to make sure it is removed

updated accordingly

tysonnorris · 2020-05-14T23:53:36Z

tests/src/test/scala/org/apache/openwhisk/core/containerpool/test/ContainerPoolTests.scala

+
+    pool ! AdjustPrewarmedContainer
+    // Make sure adjustPrewarmContainer run finished
+    Thread.sleep(2.seconds.toMillis)


I think you can avoid most of the Thread.sleep usage - e.g. use eventually, or similar

eventually { stream.toString should include(s"currentCount: 0 prewarmed container") }

Great! updated accordingly

tysonnorris · 2020-05-14T23:54:12Z

tests/src/test/scala/org/apache/openwhisk/core/containerpool/test/ContainerPoolTests.scala

+    Thread.sleep(ttl.toMillis)
+    pool ! AdjustPrewarmedContainer
+    // Make sure adjustPrewarmContainer run finished
+    Thread.sleep(2.seconds.toMillis)


can you remove this sleep?

removed accordingly

tysonnorris · 2020-05-14T23:55:38Z

tests/src/test/scala/org/apache/openwhisk/core/containerpool/test/ContainerPoolTests.scala

+    // Make sure the created prewarmed containers are expired
+    stream.reset()
+    Thread.sleep(ttl.toMillis)
+    pool ! AdjustPrewarmedContainer


shouldn't pool send AdjustPrewarmedContainer to itself eventually after the poolConfig.prewarmExpiredCheckPeriod?

I think need, bacause the prewarmExpirationCheckInterval is 1 minute

def poolConfig(userMemory: ByteSize) = ContainerPoolConfig(userMemory, 0.5, false, FiniteDuration(1, TimeUnit.MINUTES))

so need to send AdjustPrewarmedContainer to itself manually in codes.
Why i don't change above value to a less value, the reason is i want to make user's to know the detail clearly when see the test codes

But the test is written in a way suggesting that the AdjustPrewarmedContainer message is explicitly sent by logic, but it never is - it is only ever sent via schedule. So it would be more clear to rely on the timing (with a shorter value) and see that the results of the message occur without interference of the test logic.

Updated accordingly.

style95 · 2020-05-15T02:18:57Z

ansible/roles/invoker/tasks/deploy.yml

@@ -278,6 +278,7 @@
      "CONFIG_whisk_invoker_https_keystorePassword": "{{ invoker.ssl.keystore.password }}"
      "CONFIG_whisk_invoker_https_keystoreFlavor": "{{ invoker.ssl.storeFlavor }}"
      "CONFIG_whisk_invoker_https_clientAuth": "{{ invoker.ssl.clientAuth }}"
+      "CONFIG_whisk_containerPool_prewarmExpiredCheckPeriod": "{{ container_pool_prewarmExpiredCheckPeriod | default('1 minute') }}"


It's a bit picky but how about changing the name to a more intuitive one such as container_pool_prewarm_expirationCheckInterval?

Great, updated accordingly.

style95 · 2020-05-15T02:39:10Z

common/scala/src/main/scala/org/apache/openwhisk/core/entity/ExecManifest.scala

+          .orElse(fields.get("count"))
+          .map(_.convertTo[Int])
+          .get
+      val memory = fields.get("memory").map(_.convertTo[ByteSize]).get


I would avoid using get on Option.

Since these configurations are required, I think we can do something simliar to this.

Suggested change

val memory = fields.get("memory").map(_.convertTo[ByteSize]).get

override def read(value: JsValue): StemCell = {

val fields = value.asJsObject.fields

val initialCount: Option[Int] =

fields

.get("initialCount")

.orElse(fields.get("count"))

.map(_.convertTo[Int])

val memory: Option[ByteSize] = fields.get("memory").map(_.convertTo[ByteSize])

val config = fields.get("reactive").map(_.convertTo[ReactivePrewarmingConfig])

(initialCount, memory) match {

case (Some(c), Some(m)) => StemCell(c, m, config)

case (Some(c), None) => throw new IllegalArgumentException(s"blah blah: $c")

case (None, Some(m)) => throw new IllegalArgumentException(s"blah blah: $m")

case _ => throw new IllegalArgumentException(s"blah blah")

}

updated accordingly.

ningyougang · 2020-05-15T04:46:51Z

tests/src/test/scala/org/apache/openwhisk/core/entity/test/ExecManifestTests.scala

+      (py.kind, py.image, 2, 256.MB))
+  }
+
+  it should "parse manifest with reactive from JSON string" in {


Add another test case for parse manifest with reactive from JSON string

tysonnorris · 2020-05-18T18:35:29Z

@ningyougang Thanks for all your work on this! I think this PR is close to done - do you want to solicit feedback on the dev list once more to let other people chime in on this before it gets merged?

codecov-commenter · 2020-05-19T04:55:31Z

Codecov Report

Merging #4871 into master will decrease coverage by 5.94%.
The diff coverage is 93.12%.

@@            Coverage Diff             @@
##           master    #4871      +/-   ##
==========================================
- Coverage   83.19%   77.25%   -5.95%     
==========================================
  Files         201      201              
  Lines        9351     9441      +90     
  Branches      397      396       -1     
==========================================
- Hits         7780     7294     -486     
- Misses       1571     2147     +576

Impacted Files	Coverage Δ
...rg/apache/openwhisk/core/entity/ExecManifest.scala	`91.83% <78.57%> (-5.39%)`	⬇️
.../openwhisk/core/containerpool/ContainerProxy.scala	`93.85% <85.00%> (+0.30%)`	⬆️
...in/scala/org/apache/openwhisk/common/Logging.scala	`76.41% <100.00%> (-8.41%)`	⬇️
...penwhisk/core/containerpool/ContainerFactory.scala	`87.50% <100.00%> (+0.83%)`	⬆️
...e/openwhisk/core/containerpool/ContainerPool.scala	`97.45% <100.00%> (+1.80%)`	⬆️
...pache/openwhisk/core/invoker/InvokerReactive.scala	`79.64% <100.00%> (ø)`
...core/database/cosmosdb/RxObservableImplicits.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...ore/database/cosmosdb/cache/CacheInvalidator.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...e/database/cosmosdb/cache/ChangeFeedConsumer.scala	`0.00% <0.00%> (-100.00%)`	⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala	`0.00% <0.00%> (-96.23%)`	⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4122fd...5ae5fa4. Read the comment docs.

tysonnorris

LGTM 🎉 Thanks! Will wait a day for any additional feedback, and merge tomorrow if nobody else does it first.

style95 reviewed Apr 2, 2020

View reviewed changes

ningyougang force-pushed the adjust-prewarm-container-dynamically branch 3 times, most recently from 2fd4dd1 to 8c36f33 Compare April 8, 2020 05:48

tysonnorris reviewed Apr 8, 2020

View reviewed changes

ningyougang force-pushed the adjust-prewarm-container-dynamically branch 3 times, most recently from 98e06dd to cf30f7c Compare April 11, 2020 09:51

ningyougang closed this Apr 12, 2020

ningyougang reopened this Apr 12, 2020

style95 reviewed Apr 13, 2020

View reviewed changes

tysonnorris reviewed Apr 14, 2020

View reviewed changes

ningyougang force-pushed the adjust-prewarm-container-dynamically branch from cf30f7c to d6a6181 Compare April 23, 2020 09:10

tysonnorris reviewed Apr 23, 2020

View reviewed changes

tysonnorris reviewed Apr 24, 2020

View reviewed changes

ningyougang added 7 commits May 14, 2020 15:00

Adjust prewarm container dynamically

6e1a229

Sort by

3e5f6d7

Add custom deserializer

43a2beb

Optimize the custome serialization

4e1aace

Do some optimize

7e22309

Make codes more clear

9e3c582

Some optimize points

7b761ce

ningyougang force-pushed the adjust-prewarm-container-dynamically branch from 3d21308 to 7b761ce Compare May 14, 2020 09:47

tysonnorris reviewed May 14, 2020

View reviewed changes

Optimize the test case

e6e6507

style95 reviewed May 15, 2020

View reviewed changes

ningyougang added 3 commits May 15, 2020 11:01

Change prewarm expired check method name

98bd82e

Make stemCellSerdes more pretty

752a04c

Add test for parse manifest without reactive

d1ebb39

ningyougang commented May 15, 2020

View reviewed changes

Move increasePrewarms/removeExpired to object ContainerPool

5ae5fa4

ningyougang force-pushed the adjust-prewarm-container-dynamically branch from 2db7514 to 5ae5fa4 Compare May 19, 2020 03:35

tysonnorris approved these changes May 19, 2020

View reviewed changes

tysonnorris merged commit 1274fdf into apache:master May 20, 2020

ningyougang mentioned this pull request May 25, 2020

take prewarmed container's memory as used memory #4911

Merged

21 tasks

style95 mentioned this pull request Jun 23, 2020

Control when runtime container gets deleted apache/openwhisk-deploy-kube#614

Closed

		@@ -375,6 +377,8 @@ class ContainerProxy(factory: (TransactionId,

		case Event(Remove, data: PreWarmedData) => destroyContainer(data)

		case Event(RemovePreWarmedContainer, data: PreWarmedData) => destoryPreWarmedContainer(data)

-      val memory = fields.get("memory").map(_.convertTo[ByteSize]).get
+    override def read(value: JsValue): StemCell = {
+      val fields = value.asJsObject.fields
+      val initialCount: Option[Int] =
+        fields
+          .get("initialCount")
+          .orElse(fields.get("count"))
+          .map(_.convertTo[Int])
+      val memory: Option[ByteSize] = fields.get("memory").map(_.convertTo[ByteSize])
+      val config = fields.get("reactive").map(_.convertTo[ReactivePrewarmingConfig])
+      (initialCount, memory) match {
+        case (Some(c), Some(m)) => StemCell(c, m, config)
+        case (Some(c), None)  => throw new IllegalArgumentException(s"blah blah: $c")
+        case (None, Some(m))  => throw new IllegalArgumentException(s"blah blah: $m")
+        case _ => throw new IllegalArgumentException(s"blah blah")
+      }

Adjust prewarm container dynamically #4871

Adjust prewarm container dynamically #4871

Conversation

ningyougang commented Apr 2, 2020 • edited Loading

Description

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

Choose a reason for hiding this comment

ningyougang Apr 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang Apr 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tysonnorris Apr 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tysonnorris commented Apr 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang commented Apr 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang Apr 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang Apr 23, 2020 • edited Loading

Choose a reason for hiding this comment

tysonnorris Apr 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang commented Apr 2, 2020 •

edited

Loading

ningyougang Apr 3, 2020 •

edited

Loading

ningyougang Apr 3, 2020 •

edited

Loading

tysonnorris Apr 8, 2020 •

edited

Loading

ningyougang Apr 23, 2020 •

edited

Loading

ningyougang Apr 23, 2020 •

edited

Loading

tysonnorris Apr 23, 2020 •

edited

Loading