Skip to content

Understanding autoscaling target and target percentage #15861

Closed
@Sylphe88

Description

@Sylphe88

Running KNative 1.15 (Openshift Serverless operator)

I'm struggling to understand how KNative scales when sending requests. I have a pod running a job (concurrency is 1) that takes a few minutes, I'd like to scale them to 5. I have this configuration snippet:

spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/max-scale: '5'
        autoscaling.knative.dev/min-scale: '1'
        autoscaling.knative.dev/scale-down-delay: 60s
        autoscaling.knative.dev/target: '1'
        autoscaling.knative.dev/target-utilization-percentage: '50'
        autoscaling.knative.dev/window: 6s
      creationTimestamp: null
    spec:
      containerConcurrency: 1

I have a readiness probe implemented that returns 503 as long as the job is not finished (probed every second).
In this scenario, here is what happens when I run curl requests every few seconds:

  • 1st curl request: routes request to available pod + spawns a new replica --> expected
  • 2nd curl request: routes request to available pod but doest NOT spawn a new replica --> unexpected (because we're exceeding the target utilization percentage?)
  • 3rd: curl hangs (waits), not additional replicas requested
  • 4th: Autoscaler spawns two new replicas, 3rd curl is routed to one of the new pods, 4th curl is routed to an existing pod and results in HTTP 503 (returned by the application), thus bypassing the readiness probe
  • Subsequent curls: routed to the falsely available pod and result in application 503

Is all this expected? According to the documentation the autoscaler should just add one new replica every time I run a curl request so it is ready for a future request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions