Closed
Description
Running KNative 1.15 (Openshift Serverless operator)
I'm struggling to understand how KNative scales when sending requests. I have a pod running a job (concurrency is 1) that takes a few minutes, I'd like to scale them to 5. I have this configuration snippet:
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/max-scale: '5'
autoscaling.knative.dev/min-scale: '1'
autoscaling.knative.dev/scale-down-delay: 60s
autoscaling.knative.dev/target: '1'
autoscaling.knative.dev/target-utilization-percentage: '50'
autoscaling.knative.dev/window: 6s
creationTimestamp: null
spec:
containerConcurrency: 1
I have a readiness probe implemented that returns 503 as long as the job is not finished (probed every second).
In this scenario, here is what happens when I run curl requests every few seconds:
- 1st curl request: routes request to available pod + spawns a new replica --> expected
- 2nd curl request: routes request to available pod but doest NOT spawn a new replica --> unexpected (because we're exceeding the target utilization percentage?)
- 3rd: curl hangs (waits), not additional replicas requested
- 4th: Autoscaler spawns two new replicas, 3rd curl is routed to one of the new pods, 4th curl is routed to an existing pod and results in HTTP 503 (returned by the application), thus bypassing the readiness probe
- Subsequent curls: routed to the falsely available pod and result in application 503
Is all this expected? According to the documentation the autoscaler should just add one new replica every time I run a curl request so it is ready for a future request.