supersetWorker Liveness Probes always fail causing restarts #25225

saraangelmurphy · 2023-09-07T17:48:55Z

A clear and concise description of what the bug is.

How to reproduce the bug

Deploy superset with version 2.1.1 and the latest version of the helm chart with at least 2 workers
Observe Kubernetes events
superset worker livesness probes constantly fail with
Liveness probe failed: Loaded your LOCAL configuration at [/app/pythonpath/superset_config.py] ... <various warnings related to talisman omitted> Error: No nodes replied within time constraint

Expected results

Superset Celery Workers do not fail their liveness probes constantly

Actual results

Superset Workers always fail liveness probes and restart every 5 minutes as a result.

Screenshots

Environment

(please complete the following information):

browser type and version: Google Chrome | 116.0.5845.179 (Official Build) (arm64)
superset version: Superset 2.1.1
python version: Python 3.8.18
node.js version: node is not installed on the superset webserver or worker pods
any feature flags active:
"DYNAMIC_PLUGINS": True,
"ENABLE_TEMPLATE_PROCESSING": True,
"DASHBOARD_CROSS_FILTERS": True

Checklist

Make sure to follow these steps before submitting your issue - thank you!

[x ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
[x ] I have reproduced the issue with at least the latest released version of superset.
[x ] I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

My superset values files is as follows, with some organization-specific items omitted (secrets, oauth, the supersetNode connections, and the ingress):


serviceAccount:
  create: true

serviceAccountName: superset

image:
  tag: 2.1.1
  pullPolicy: IfNotPresent

service:
  loadBalancerIP: null


resources: 
  requests:
    cpu: 100m
    memory: 1500Mi
  limits:
    memory: 1500Mi

# Install additional packages and do any other bootstrap configuration in this script
bootstrapScript: |
  #!/bin/bash
  rm -rf /var/lib/apt/lists/* && \
  pip install \
    Authlib \
    trino \
    nodejs \
    elasticsearch-dbapi \
    psycopg2-binary==2.9.1 \
    pybigquery==0.10.2 \
    redis==3.5.3 && \
  if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
## Extra environment variables that will be passed into pods
##
extraEnv:
  # Extend timeout to allow long running queries.
  GUNICORN_TIMEOUT: 300

# SECRET AND OAUTH OVERRIDES OMMITTED
configOverrides:
  my_override: |
    # This will make sure the redirect_uri is properly computed, even with SSL offloading
    ENABLE_PROXY_FIX = True
    FEATURE_FLAGS = {
        "DYNAMIC_PLUGINS": True,
        "ENABLE_TEMPLATE_PROCESSING": True,
        "DASHBOARD_CROSS_FILTERS": True
    }
    WTF_CSRF_ENABLED = False
    ROW_LIMIT = 5000
    CACHE_CONFIG = {
        'CACHE_TYPE': 'redis',
        'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
        'CACHE_KEY_PREFIX': 'superset_results',
        'CACHE_REDIS_URL': 'redis://{{ template "superset.fullname" . }}-redis-headless:6379/0',
    }
  extend_timeout: |
    # Extend timeout to allow long running queries.
    SUPERSET_WEBSERVER_TIMEOUT = 300

##
## Superset node configuration
supersetNode:
  deploymentAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  ## Annotations to be added to supersetNode pods
  podAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  resources:
    requests:
      cpu: 500m
      memory: 1500Mi
    limits:
      cpu: 500m
      memory: 1500Mi


## Superset worker configuration
supersetWorker:
  replicaCount: 2
  ## Annotations to be added to supersetWorker deployment
  deploymentAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  ## Annotations to be added to supersetNode pods
  podAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  resources:
    requests:
      cpu: 1
      memory: 6Gi
    limits: 
      memory: 6Gi
##
## Superset beat configuration (to trigger scheduled jobs like reports)
supersetCeleryBeat:
  # This is only required if you intend to use alerts and reports
  enabled: true
  deploymentAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  ## Annotations to be added to supersetNode pods
  podAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  resources:
    requests:
      cpu: 500m
      memory: 1000Mi
    limits:
      memory: 1000Mi

##
## Init job configuration
init:
  # Configure resources
  # Warning: fab command consumes a lot of ram and can
  # cause the process to be killed due to OOM if it exceeds limit
  # Make sure you are giving a strong password for the admin user creation( else make sure you are changing after setup)
  # Also change the admin email to your own custom email.
  resources:
    requests:
      cpu: 100m
      memory: 1500Mi
    limits:
      memory: 1500Mi
  enabled: true
  ## Annotations to be added to init job pods
  podAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'

redis:
  ##
  ## Use the redis chart dependency.
  ##
  ## If you are bringing your own redis, you can set the host in supersetNode.connections.redis_host
  ##
  ## Set to false if bringing your own redis.
  enabled: true
  ##
  ## Set architecture to standalone/replication
  architecture: replication
  master:
    ##
    ## Image configuration
    # image:
      ##
      ## docker registry secret names (list)
      # pullSecrets: nil
    ##
    count: 3
    service: 
      sessionAffinity: ClientIP
    ## Configure persistance
    persistence:
      ##
      ## Use a PVC to persist data.
      enabled: false
      ##
      ## Persistant class
      # storageClass: classname
      ##
      ## Access mode:
      accessModes:
      - ReadWriteOnce
  replica:
    replicaCount: 0
#  sentinel:
#    enabled: true

ingress:
  # ingress.enabled -- Enable ingress controller resource
  enabled: true
  ingressClassName: nginx
  # ingress.annotations -- Ingress annotations configuration
  annotations:
  .... # environment specific details omitted

postgresql:
  ##
  ## Use the PostgreSQL chart dependency.
  ## Set to false if bringing your own PostgreSQL.
  enabled: true
  ## PostgreSQL Primary parameters
  primary:
    ##
    ## Persistent Volume Storage configuration.
    ## ref: https://kubernetes.io/docs/user-guide/persistent-volumes
    persistence:
      ##
      ## Enable PostgreSQL persistence using Persistent Volume Claims.
      enabled: true
      ##
      ## Persistant class
      storageClass: encrypted-gp2-allow-expansion
      ##
      ## Access modes:
      accessModes:
        - ReadWriteOnce
    ## PostgreSQL port
    service:
      ports:
        postgresql: "5432"
supersetNode:
  connections:
  ... # environment specific details omitted

The text was updated successfully, but these errors were encountered:

oopjot · 2023-09-13T18:57:01Z

Hey @saraangelmurphy, I had the same issue and was able to fix it by configuring Celery broker and backend to use Redis. Refer to this part of the documentation. Hope this helps.
I don't know why default sqlite broker/backend causes Error: No nodes replied within time constraint after running inspect commands. Maybe someone else could be helpful.
Edit: celery:4.4.7 (cliffs) kombu:4.6.11 py:3.7.9 versions work fine with sqlite on different environment, while celery:5.2.2 (dawn-chorus) kombu:5.2.4 py:3.8.16 fails.

rusackas · 2024-03-08T20:21:40Z

Hopefully, the answer above is sufficient to close this thread. I'll do so anyway since it's been so long since there was a comment here. If this needs revisiting/reopening (using Superset 3.x or newer), just say the word.

saraangelmurphy changed the title ~~supersetWorker Liveness Probes always fail~~ supersetWorkers restart every 5 minutes + Liveness Probes always fail Sep 7, 2023

saraangelmurphy changed the title ~~supersetWorkers restart every 5 minutes + Liveness Probes always fail~~ supersetWorker Liveness Probes always fail causing restarts Sep 8, 2023

rusackas closed this as completed Mar 8, 2024

dosubot bot mentioned this issue May 13, 2024

Worker instance keep restarting after upgrade to 4.0.0-dev #28445

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supersetWorker Liveness Probes always fail causing restarts #25225

supersetWorker Liveness Probes always fail causing restarts #25225

saraangelmurphy commented Sep 7, 2023

oopjot commented Sep 13, 2023 •

edited

Loading

rusackas commented Mar 8, 2024

supersetWorker Liveness Probes always fail causing restarts #25225

supersetWorker Liveness Probes always fail causing restarts #25225

Comments

saraangelmurphy commented Sep 7, 2023

How to reproduce the bug

Expected results

Actual results

Screenshots

Environment

Checklist

Additional context

oopjot commented Sep 13, 2023 • edited Loading

rusackas commented Mar 8, 2024

oopjot commented Sep 13, 2023 •

edited

Loading