Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supersetWorker Liveness Probes always fail causing restarts #25225

Closed
saraangelmurphy opened this issue Sep 7, 2023 · 2 comments
Closed

supersetWorker Liveness Probes always fail causing restarts #25225

saraangelmurphy opened this issue Sep 7, 2023 · 2 comments

Comments

@saraangelmurphy
Copy link

A clear and concise description of what the bug is.

How to reproduce the bug

  1. Deploy superset with version 2.1.1 and the latest version of the helm chart with at least 2 workers
  2. Observe Kubernetes events
  3. superset worker livesness probes constantly fail with
    Liveness probe failed: Loaded your LOCAL configuration at [/app/pythonpath/superset_config.py] ... <various warnings related to talisman omitted> Error: No nodes replied within time constraint

Expected results

Superset Celery Workers do not fail their liveness probes constantly

Actual results

Superset Workers always fail liveness probes and restart every 5 minutes as a result.

Screenshots

image

Environment

(please complete the following information):

  • browser type and version: Google Chrome | 116.0.5845.179 (Official Build) (arm64)
  • superset version: Superset 2.1.1
  • python version: Python 3.8.18
  • node.js version: node is not installed on the superset webserver or worker pods
  • any feature flags active:
    "DYNAMIC_PLUGINS": True,
    "ENABLE_TEMPLATE_PROCESSING": True,
    "DASHBOARD_CROSS_FILTERS": True

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • [x ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • [x ] I have reproduced the issue with at least the latest released version of superset.
  • [x ] I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

My superset values files is as follows, with some organization-specific items omitted (secrets, oauth, the supersetNode connections, and the ingress):


serviceAccount:
  create: true

serviceAccountName: superset

image:
  tag: 2.1.1
  pullPolicy: IfNotPresent

service:
  loadBalancerIP: null


resources: 
  requests:
    cpu: 100m
    memory: 1500Mi
  limits:
    memory: 1500Mi

# Install additional packages and do any other bootstrap configuration in this script
bootstrapScript: |
  #!/bin/bash
  rm -rf /var/lib/apt/lists/* && \
  pip install \
    Authlib \
    trino \
    nodejs \
    elasticsearch-dbapi \
    psycopg2-binary==2.9.1 \
    pybigquery==0.10.2 \
    redis==3.5.3 && \
  if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
## Extra environment variables that will be passed into pods
##
extraEnv:
  # Extend timeout to allow long running queries.
  GUNICORN_TIMEOUT: 300

# SECRET AND OAUTH OVERRIDES OMMITTED
configOverrides:
  my_override: |
    # This will make sure the redirect_uri is properly computed, even with SSL offloading
    ENABLE_PROXY_FIX = True
    FEATURE_FLAGS = {
        "DYNAMIC_PLUGINS": True,
        "ENABLE_TEMPLATE_PROCESSING": True,
        "DASHBOARD_CROSS_FILTERS": True
    }
    WTF_CSRF_ENABLED = False
    ROW_LIMIT = 5000
    CACHE_CONFIG = {
        'CACHE_TYPE': 'redis',
        'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
        'CACHE_KEY_PREFIX': 'superset_results',
        'CACHE_REDIS_URL': 'redis://{{ template "superset.fullname" . }}-redis-headless:6379/0',
    }
  extend_timeout: |
    # Extend timeout to allow long running queries.
    SUPERSET_WEBSERVER_TIMEOUT = 300

##
## Superset node configuration
supersetNode:
  deploymentAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  ## Annotations to be added to supersetNode pods
  podAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  resources:
    requests:
      cpu: 500m
      memory: 1500Mi
    limits:
      cpu: 500m
      memory: 1500Mi


## Superset worker configuration
supersetWorker:
  replicaCount: 2
  ## Annotations to be added to supersetWorker deployment
  deploymentAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  ## Annotations to be added to supersetNode pods
  podAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  resources:
    requests:
      cpu: 1
      memory: 6Gi
    limits: 
      memory: 6Gi
##
## Superset beat configuration (to trigger scheduled jobs like reports)
supersetCeleryBeat:
  # This is only required if you intend to use alerts and reports
  enabled: true
  deploymentAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  ## Annotations to be added to supersetNode pods
  podAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'
  resources:
    requests:
      cpu: 500m
      memory: 1000Mi
    limits:
      memory: 1000Mi

##
## Init job configuration
init:
  # Configure resources
  # Warning: fab command consumes a lot of ram and can
  # cause the process to be killed due to OOM if it exceeds limit
  # Make sure you are giving a strong password for the admin user creation( else make sure you are changing after setup)
  # Also change the admin email to your own custom email.
  resources:
    requests:
      cpu: 100m
      memory: 1500Mi
    limits:
      memory: 1500Mi
  enabled: true
  ## Annotations to be added to init job pods
  podAnnotations:
    service.name: 'superset'
    service.owner: 'infra'
    environment: 'prod'

redis:
  ##
  ## Use the redis chart dependency.
  ##
  ## If you are bringing your own redis, you can set the host in supersetNode.connections.redis_host
  ##
  ## Set to false if bringing your own redis.
  enabled: true
  ##
  ## Set architecture to standalone/replication
  architecture: replication
  master:
    ##
    ## Image configuration
    # image:
      ##
      ## docker registry secret names (list)
      # pullSecrets: nil
    ##
    count: 3
    service: 
      sessionAffinity: ClientIP
    ## Configure persistance
    persistence:
      ##
      ## Use a PVC to persist data.
      enabled: false
      ##
      ## Persistant class
      # storageClass: classname
      ##
      ## Access mode:
      accessModes:
      - ReadWriteOnce
  replica:
    replicaCount: 0
#  sentinel:
#    enabled: true

ingress:
  # ingress.enabled -- Enable ingress controller resource
  enabled: true
  ingressClassName: nginx
  # ingress.annotations -- Ingress annotations configuration
  annotations:
  .... # environment specific details omitted

postgresql:
  ##
  ## Use the PostgreSQL chart dependency.
  ## Set to false if bringing your own PostgreSQL.
  enabled: true
  ## PostgreSQL Primary parameters
  primary:
    ##
    ## Persistent Volume Storage configuration.
    ## ref: https://kubernetes.io/docs/user-guide/persistent-volumes
    persistence:
      ##
      ## Enable PostgreSQL persistence using Persistent Volume Claims.
      enabled: true
      ##
      ## Persistant class
      storageClass: encrypted-gp2-allow-expansion
      ##
      ## Access modes:
      accessModes:
        - ReadWriteOnce
    ## PostgreSQL port
    service:
      ports:
        postgresql: "5432"
supersetNode:
  connections:
  ... # environment specific details omitted
@saraangelmurphy saraangelmurphy changed the title supersetWorker Liveness Probes always fail supersetWorkers restart every 5 minutes + Liveness Probes always fail Sep 7, 2023
@saraangelmurphy saraangelmurphy changed the title supersetWorkers restart every 5 minutes + Liveness Probes always fail supersetWorker Liveness Probes always fail causing restarts Sep 8, 2023
@oopjot
Copy link

oopjot commented Sep 13, 2023

Hey @saraangelmurphy, I had the same issue and was able to fix it by configuring Celery broker and backend to use Redis. Refer to this part of the documentation. Hope this helps.
I don't know why default sqlite broker/backend causes Error: No nodes replied within time constraint after running inspect commands. Maybe someone else could be helpful.
Edit: celery:4.4.7 (cliffs) kombu:4.6.11 py:3.7.9 versions work fine with sqlite on different environment, while celery:5.2.2 (dawn-chorus) kombu:5.2.4 py:3.8.16 fails.

@rusackas
Copy link
Member

rusackas commented Mar 8, 2024

Hopefully, the answer above is sufficient to close this thread. I'll do so anyway since it's been so long since there was a comment here. If this needs revisiting/reopening (using Superset 3.x or newer), just say the word.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants