-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[🐛 Bug]: Selenium tests remain stuck in queue (no sessions) when autoscaling with existing KEDA is enabled (GKE) #2682
Comments
@ValleJulien, thank you for creating this issue. We will troubleshoot it as soon as we can. Info for maintainersTriage this issue by using labels.
If information is missing, add a helpful comment and then
If the issue is a question, add the
If the issue is valid but there is no time to troubleshoot it, consider adding the
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable
After troubleshooting the issue, please add the Thank you! |
With enableWithExistingKEDA: true, what is KEDA core version are you using in cluster? |
@VietND96 I am using the docker image keda-operator:2.12.1 (kedacore have the same version). This version is deployed by a chart helm too. |
Okay, you are using the latest chart and image tag. So, I recommend that KEDA core also need to be updated to latest. In between, we have fixed and improved for the Selenium Grid scaler. https://keda.sh/docs/2.16/scalers/selenium-grid-scaler/ |
@VietND96 very well, I'll update keda and come back to you to let you know if it fixes my problem. |
@VietND96 I updated the keda chart to version 2.16.1 as you suggested. |
Can you also share the keda-operator pod logs during the requests in queue? |
This is the logs of keda operator during the requests in queue :
|
Ok, I saw via your values yaml, looks like you are scaling from 0 extraEnvironmentVariables:
- name: SE_REJECT_UNSUPPORTED_CAPS
value: "true"
|
Also, looks like in your client is setting the platform Linux. So, in chart value, let's update this config chromeNode:
hpa:
platformName: "Linux" |
@VietND96 I have changed the values you suggest me i-e , remove the env var For my personal understanding and to understand the autoscaling part technically, can you explain exactly why having this configuration works? This is my values.yaml files for the hub, chromeNode and autoscaling part. Is this configuration optimal for autoscaling to work perfectly ? autoscaling:
enableWithExistingKEDA: true
enabled: true
scaledJobOptions:
scalingStrategy:
strategy: default
scaledOptions:
maxReplicaCount: 24
minReplicaCount: 0
pollingInterval: 10
scalingType: job
basicAuth:
create: true
embeddedUrl: true
enabled: true
password: $PASSWORD
username: $USERNAME
chromeNode:
affinity:
node_affinity:
required_during_scheduling_ignored_during_execution:
node_selector_terms:
- match_expressions:
- key: my-organization.com/test-node
operator: In
values:
- test
enabled: true
hpa:
platformName: Linux
nameOverride: selenium-node-chrome
scaledOptions:
maxReplicaCount: 15
minReplicaCount: 0
pollingInterval: 10
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
tolerations:
- effect: NoSchedule
key: my-organization.com/test-node
operator: Equal
value: test
hub:
affinity:
node_affinity:
required_during_scheduling_ignored_during_execution:
node_selector_terms:
- match_expressions:
- key: my-organization.com/test-node
operator: In
values:
- test
nameOverride: selenium
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
serviceType: NodePort
tolerations:
- effect: NoSchedule
key: my-organization.com/test-node
operator: Equal
value: test I tested the same configuration of the selenium grid chart, reverting to the version of keda I had before (2.12.1) because I use keda for other services in my cluster and such a major version upgrade could have an impact. Everything works too with the same setup. Does this mean that in addition to enable |
Configs under |
Thanks for all your explanations, the autoscaling part is clearer now between keda and selenium. |
What happened?
I am using and deployed the latest version of the selenium-grid chart. The deployment is made using terraform but I can provide all values I configured in the chart.
I deploy the chart with
autoscaling.enabled: true
andautoscaling.enableWithExistingKEDA: true
. I have only a selenium deployment as the chromeNode is a keda scaleObject (autoscaling.scalingType: job
). Keda is deployed in another namespace.This is only the configuration I supplied in my values.yaml file :
Just notice that I have an ingress configured and deploy outside of the selenium-grid chart by terraform.
As I am working on GKE, I have some requirements to be met. But this is the ingress setup :
My hub is available on
my-organization.com
domain. I can access the hub using the basic auth. Everything is working well on this side.Now When I run a test locally or from my CI/CD tool, I can see that keda is triggered and create the job ScaleObject. But No session are running in the job. The queue is tuck in pending and nothing seems to be ran into the pod.
I'd like to point out that when I disable autoscaling and only configure the chromeNode in deployment mode, the same test and process work. So I don't think this is a network or a test configuration issue.
When i ran my test I have this log in keda-operator :
As you can, a job is created and scale a pod, then the queue is not consumed by the pods (no session is running in it).
Command used to start Selenium Grid with Docker (or Kubernetes)
Relevant log output
Operating System
GKE
Docker Selenium version (image tag)
4.29.0-20250222
Selenium Grid chart version (chart version)
0.40.0
The text was updated successfully, but these errors were encountered: