New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: cannot marshal <class 'getset_descriptor'> objects #1141

Closed
bsolomon1124 opened this Issue Aug 28, 2018 · 7 comments

Comments

2 participants
@bsolomon1124

bsolomon1124 commented Aug 28, 2018

We are using Supervisor, Celery, and Redis as (message broker) for task scheduling & async calls in a Django project. (All version info given below.)

Thanks in advance for giving this a look, and please let me know if it would be more appropriate to ask on the Celery GitHub page. Let me know what other info I can provide.

We are currently starting supervisord with just:

supervisord -c ./supervisord.conf

Initially it looks like the workers are up and doing fine:

$ supervisorctl status
workers:celery_beat              RUNNING   pid 40XX, uptime 0:00:03
workers:celery_dashboard         RUNNING   pid 40XX, uptime 0:00:03
workers:celery_default           RUNNING   pid 40XX, uptime 0:00:03
workers:celery_google            RUNNING   pid 40XX, uptime 0:00:03
workers:celery_master            RUNNING   pid 40XX, uptime 0:00:03
workers:celery_webcrawler        RUNNING   pid 40XX, uptime 0:00:03

However, what actually seems to be happening is that workers are stopping and then auto-restarting very frequently:

$ supervisorctl status
workers:celery_beat              RUNNING   pid 40XX, uptime 0:00:07
workers:celery_dashboard         STARTING
workers:celery_default           RUNNING   pid 41XX, uptime 0:00:02
workers:celery_google            RUNNING   pid 41XX, uptime 0:00:02
workers:celery_master            RUNNING   pid 41XX, uptime 0:00:02
workers:celery_webcrawler        RUNNING   pid 40XX, uptime 0:00:07

$ supervisorctl status
workers:celery_beat              RUNNING   pid 40XX, uptime 0:00:12
workers:celery_dashboard         STARTING
workers:celery_default           STARTING
workers:celery_google            RUNNING   pid 42XX, uptime 0:00:01
workers:celery_master            RUNNING   pid 42XX, uptime 0:00:02
workers:celery_webcrawler        RUNNING   pid 40XX, uptime 0:00:12

It looks like what's happening is that the programs are spawning/starting for a few seconds, and then exiting with a nonzero code:

$ tail -n 25 /tmp/supervisor.log
2018-08-28 16:20:52,057 INFO exited: celery_default (exit status 1; not expected)
2018-08-28 16:20:52,216 INFO spawned: 'celery_default' with pid 48XX
2018-08-28 16:20:52,883 INFO success: celery_google entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-28 16:20:52,946 INFO exited: celery_dashboard (exit status 1; not expected)
2018-08-28 16:20:53,253 INFO success: celery_default entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-28 16:20:53,255 INFO spawned: 'celery_dashboard' with pid 48XX
2018-08-28 16:20:54,760 INFO success: celery_dashboard entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-28 16:20:55,746 INFO exited: celery_master (exit status 1; not expected)
2018-08-28 16:20:55,796 INFO spawned: 'celery_master' with pid 48XX
2018-08-28 16:20:56,052 INFO exited: celery_google (exit status 1; not expected)
2018-08-28 16:20:56,369 INFO spawned: 'celery_google' with pid 48XX
2018-08-28 16:20:56,999 INFO success: celery_master entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-28 16:20:57,063 INFO exited: celery_default (exit status 1; not expected)
2018-08-28 16:20:57,370 INFO spawned: 'celery_default' with pid 48XX
2018-08-28 16:20:57,370 INFO success: celery_google entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-28 16:20:58,100 INFO exited: celery_dashboard (exit status 1; not expected)
2018-08-28 16:20:58,400 INFO success: celery_default entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

We can see more by looking at the individual logs, but the actual source of the error isn't present. Here is /tmp first:

$ ls -lahFG --sort=time /tmp/
-rw-r--r--.  1 deploy  15M Aug 28 16:21 supervisor.log
-rw-------.  1 root   1.1K Aug 28 16:21 celery_webcrawler-stdout---supervisor-XXXXXXXXX.log
-rw-------.  1 root    18K Aug 28 16:21 celery_google-stdout---supervisor-XXXXXXXXX.log
-rw-------.  1 root    17K Aug 28 16:21 celery_master-stdout---supervisor-XXXXXXXXX.log
-rw-------.  1 root    34K Aug 28 16:21 celery_master-stderr---supervisor-XXXXXXXXX.log
-rw-------.  1 root    16K Aug 28 16:21 celery_webcrawler-stderr---supervisor-XXXXXXXXX.log
-rw-------.  1 root    33K Aug 28 16:21 celery_google-stderr---supervisor-XXXXXXXXX.log
-rw-------.  1 root    435 Aug 28 16:20 celery_beat-stdout---supervisor-XXXXXXXXX.log
-rw-------.  1 root    203 Aug 28 16:20 celery_beat-stderr---supervisor-XXXXXXXXX.log
-rw-r--r--.  1 root    32K Aug 28 16:20 celerybeat-schedule-refactor
-rw-------.  1 root    17K Aug 28 16:20 celery_dashboard-stdout---supervisor-XXXXXXXXX.log
-rw-------.  1 root    29K Aug 28 16:20 celery_dashboard-stderr---supervisor-XXXXXXXXX.log
-rw-------.  1 root    16K Aug 28 16:20 celery_default-stdout---supervisor-XXXXXXXXX.log
-rw-------.  1 root    30K Aug 28 16:20 celery_default-stderr---supervisor-XXXXXXXXX.log
-rw-r--r--.  1 root      1 Aug 27 08:17 analytics.log

Now, every one of those stderr logs shows the same pattern that simply cycles over and over again, in parallel with supervisor.log:

/path/to/project_name/py36/lib64/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/path/to/project_name/py36/lib/python3.6/site-packages/celery/platforms.py:796: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  uid=uid, euid=euid, gid=gid, egid=egid,

INFO:celery.worker.consumer.connection [connection.py start:24]
    asctime: 28/Aug/2018 20:20:52
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/worker/consumer/connection.py
Connected to redis://<elastic_cache_redis_server_name>.abcdef.ng.XXXX.use1.cache.amazonaws.com:6379/0


INFO:celery.worker.consumer.mingle [mingle.py sync:43]
    asctime: 28/Aug/2018 20:20:52
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/worker/consumer/mingle.py
mingle: searching for neighbors


INFO:celery.worker.consumer.mingle [mingle.py sync:47]
    asctime: 28/Aug/2018 20:20:53
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/worker/consumer/mingle.py
mingle: sync with 1 nodes


INFO:celery.worker.consumer.mingle [mingle.py sync:50]
    asctime: 28/Aug/2018 20:20:53
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/worker/consumer/mingle.py
mingle: sync complete


INFO:celery.apps.worker [worker.py on_consumer_ready:160]
    asctime: 28/Aug/2018 20:20:53
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/apps/worker.py
celery@masterip-XX-XX-XX-XX.ec2.internal-0 ready.


INFO:celery.worker.control [control.py hello:314]
    asctime: 28/Aug/2018 20:20:53
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/worker/control.py
sync with celery@defaultip-XX-XX-XX-XX.ec2.internal-0


WARNING:celery.redirected [log.py write:235]
    asctime: 28/Aug/2018 20:20:55
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/utils/log.py
Restoring 1 unacknowledged message(s)

Unfortunately none of this hints at (as far as I can tell) the actual cause of the "exit status 1" logs from supervisor.log.

As a result, the tasks are not running as they are scheduled by celery.schedules.crontab

I am filing this at the Supervisor GitHub rather than Celery because the final two messages don't appear if we run the Supervisor commands individually, i.e.

$ /path/to/project_name/py36/bin/celery -A project worker --autoscale=1,1 --loglevel INFO -Q webcrawler -n webcrawler%%h-%%i
/path/to/project_name/py36/lib64/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/path/to/project_name/py36/lib/python3.6/site-packages/celery/platforms.py:796: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  uid=uid, euid=euid, gid=gid, egid=egid,
 
 -------------- celery@webcrawler%ip-XX-XX-XX-XX.ec2.internal-%0 v4.2.1 (windowlicker)
---- **** ----- 
--- * ***  * -- Linux-XXXXXXXXXXXXX.el7.x86_64-x86_64-with-redhat-7.1-Maipo 2018-08-28 20:37:12
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         app:0x7faac0XXXXXXX
- ** ---------- .> transport:   redis://<elastic_cache_redis_server_name>.abcdef.ng.XXXX.use1.cache.amazonaws.com:6379/0
- ** ---------- .> results:     disabled://
- *** --- * --- .> concurrency: {min=1, max=1} (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> webcrawler       exchange=webcrawler(direct) key=webcrawler
                

[tasks]
  . dashboard.refresh_pickups
  . dashboard.refresh_view
  . google_analytics.events_all
  . google_analytics.events_day
  . google_analytics.metrics_all
  . google_analytics.metrics_day
  . master.insights
  . master.insights_acct
  . master.update
  . master.update_acct
  . singleton_write
  . url.multiprocess_links
  . urlanalytics.update_analytics


INFO:celery.worker.consumer.connection [connection.py start:24]
    asctime: 28/Aug/2018 20:37:12
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/worker/consumer/connection.py
Connected to redis://<elastic_cache_redis_server_name>.abcdef.ng.XXXX.use1.cache.amazonaws.com:6379/0


INFO:celery.worker.consumer.mingle [mingle.py sync:43]
    asctime: 28/Aug/2018 20:37:12
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/worker/consumer/mingle.py
mingle: searching for neighbors


INFO:celery.worker.consumer.mingle [mingle.py sync:52]
    asctime: 28/Aug/2018 20:37:13
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/worker/consumer/mingle.py
mingle: all alone


INFO:celery.apps.worker [worker.py on_consumer_ready:160]
    asctime: 28/Aug/2018 20:37:13
    pathname: /path/to/project_name/py36/lib/python3.6/site-packages/celery/apps/worker.py
celery@webcrawler%ip-XX-XX-XX-XX.ec2.internal-%0 ready.

Version info:

$ python --version
Python 3.6.5

$ python -c 'import django; print(django.__version__)'
2.1

$ celery --version
4.2.1 (windowlicker)

$ supervisord --version
4.0.0.dev0

$ redis-cli --version
redis-cli 4.0.11 (git:230489b2)

System: Red Hat 4.8.5-28 on linux Amazon EC2 instance


Project skeleton:

project_name
├── celeryconfig.py
├── manage.py
├── supervisord.conf
├── py36                # virtual environment
├── apps/               # houses Django apps with tasks.py, etc
└── project
    ├── __init__.py
    ├── celery_app.py
    └── settings.py

supervisord.conf (most comments removed):

[supervisord]
environment=LOGGING="%(ENV_LOGGING)s",DJ_LOGGING="%(ENV_DJ_LOGGING)s",SECRET_KEY="%(ENV_SECRET_KEY)s",DEBUG="%(ENV_DEBUG)s",DATABASE_ENGINE="%(ENV_DATABASE_ENGINE)s",DATABASE_NAME="%(ENV_DATABASE_NAME)s",DATABASE_USER="%(ENV_DATABASE_USER)s",DATABASE_PASSWORD="%(ENV_DATABASE_PASSWORD)s",DATABASE_HOST="%(ENV_DATABASE_HOST)s",USER_ID="%(ENV_USER_ID)s",DATABASE_PORT="%(ENV_DATABASE_PORT)s",EMAIL_HOST_USER="%(ENV_EMAIL_HOST_USER)s",EMAIL_HOST_PASSWORD="%(ENV_EMAIL_HOST_PASSWORD)s",GOOGLE_ANALYTICS_PROPERTY_ID="%(ENV_GOOGLE_ANALYTICS_PROPERTY_ID)s",GOOGLE_ANALYTICS_DOMAIN="%(ENV_GOOGLE_ANALYTICS_DOMAIN)s",REDIS_SERVER="%(ENV_REDIS_SERVER)s"
logfile=/tmp/supervisor.log           ; supervisord log file
logfile_maxbytes=50MB                 ; maximum size of logfile before rotation
logfile_backups=10                    ; number of backed up logfiles
loglevel=info                         ; info, debug, warn, trace
user=root


[unix_http_server]
file=/tmp/supervisor.sock               ; path to socket file

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///tmp/supervisor.sock

[program:celery_beat]
command=/path/to/project_name/py36/bin/celery -A project beat --loglevel WARNING --pidfile=/tmp/celerybeat-refactor.pid --schedule=/tmp/celerybeat-schedule-refactor
autostart=true
autorestart=true

[program:celery_webcrawler]
command=/path/to/project_name/py36/bin/celery -A project worker --autoscale=1,1 --loglevel INFO -Q webcrawler -n webcrawler%%h-%%i
autostart=true
autorestart=true

[program:celery_default]
command=/path/to/project_name/py36/bin/celery -A project worker --autoscale=5,1 --loglevel INFO -Q default -n default%%h-%%i
autostart=true
autorestart=true

[program:celery_google]
command=/path/to/project_name/py36/bin/celery -A project worker --autoscale=2,1 --loglevel INFO -Q google_analytics -n google_analytics%%h-%%i
autostart=true
autorestart=true

[program:celery_master]
command=/path/to/project_name/py36/bin/celery -A project worker --autoscale=5,1 --loglevel INFO -Q master -n master%%h-%%i
autostart=true
autorestart=true

[program:celery_dashboard]
command=/path/to/project_name/py36/bin/celery -A project worker --autoscale=1,1 --loglevel INFO -Q dashboard -n dashboard%%h-%%i
autostart=true
autorestart=true

[group:workers]
programs=celery_default,celery_google,celery_master,celery_dashboard,celery_beat,celery_webcrawler

Module contents:

celeryconfig.py:

import os

from celery.schedules import crontab
from kombu import Exchange, Queue

from project.settings import _verboselogfmt

worker_hijack_root_logger = False
worker_log_format = _verboselogfmt
worker_task_log_format = _verboselogfmt

broker_url = 'redis://' + os.environ['REDIS_SERVER'] + ':6379/0'

task_default_queue = 'default'
task_default_exchange_type= 'direct'
task_default_routing_key = 'default'

task_serializer = 'pickle'
result_serializer = 'pickle'

task_queues = (
    Queue('default', Exchange('default'), routing_key='default'),
    Queue('facebook', Exchange('facebook'), routing_key='facebook'),
    Queue('twitter', Exchange('twitter'), routing_key='twitter'),
    Queue('webcrawler', Exchange('webcrawler'), routing_key='webcrawler'),
    Queue('google_analytics', Exchange('google_analytics'), routing_key='google_analytics'),
    Queue('master', Exchange('master'), routing_key='master'),
    Queue('dashboard', Exchange('dashboard'), routing_key='dashboard'),
    Queue('urlanalytics', Exchange('urlanalytics'), routing_key='urlanalytics')
)

beat_schedule = {
    'daily_google_analytics_events_all': {
        'task': 'google_analytics.events_all',
        'schedule': crontab(hour='6', minute=5),
    },
    'daily_google_analytics_events_day': {
        'task': 'google_analytics.events_day',
        'schedule': crontab(hour='6', minute=0),
        'args': [1]
    },
    'daily_google_analytics_metrics_all': {
        'task': 'google_analytics.metrics_all',
        'schedule': crontab(hour='6', minute=10),
    },
    'daily_google_analytics_metrics_day': {
        'task': 'google_analytics.metrics_day',
        'schedule': crontab(hour='6', minute=15),
        'args': [1]
    },
    'daily_social': {
        'task': 'master.update',
        'schedule': crontab(hour='6', minute=0),
    },
    'webcrawler_crawl': {
        'task': 'webcrawler.crawl',
        'schedule': crontab(hour='5,10,15,20', minute=0),
    },
    'dashboard_refresh': {
        'task': 'dashboard.refresh_view',
        'schedule': crontab(hour='5,10,15,20', minute=0),
    },
    'update_url_clicks': {
        'task': 'urlanalytics.update_analytics',
        'schedule': crontab(hour='5', minute=0),
    },
}

task_routes = {
    'master.update': {
        'queue': 'master',
        'routing_key': 'master'
    },
    'master.update_acct': {
        'queue': 'master',
        'routing_key': 'master'
    },
    'webcrawler.crawl': {
        'queue': 'webcrawler',
        'routing_key': 'webcrawler'
    },
    'google_analytics.events_all': {
        'queue': 'google_analytics',
        'routing_key': 'google_analytics'
    },
    'google_analytics.events_day': {
        'queue': 'google_analytics',
        'routing_key': 'google_analytics'
    },
    'google_analytics.metrics_day': {
        'queue': 'google_analytics',
        'routing_key': 'google_analytics',
    },
    'google_analytics.metrics_all': {
        'queue': 'google_analytics',
        'routing_key': 'google_analytics'
    },
    'dashboard.refresh_view': {
        'queue': 'dashboard',
        'routing_key': 'dashboard'
    },
    'dashboard.refresh_pickups': {
        'queue': 'dashboard',
        'routing_key': 'dashboard'
    },
    'urlanalytics.update_analytics': {
        'queue': 'urlanalytics',
        'routing_key': 'urlanalytics'
    },
}

project/celery_app.py

import os

from celery import Celery
import django

import celeryconfig

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'project.settings')
django.setup()

app = Celery('app')
app.config_from_object(celeryconfig)
app.autodiscover_tasks(force=True)

project/__init__.py:

from .celery_app import app
__all__ = ['app']

Partial snippet from project/settings.py:

import logging
import logging.config
import os
import sys

_ENV = os.environ

_DJANGO_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'django.contrib.sites',
    'django.contrib.postgres'
]

_THIRD_PTY_APPS = [
    'allauth',
    'allauth.account',
    'allauth.socialaccount.providers.google',
    'allauth.socialaccount',
    'bootstrap3',
    'corsheaders',
    'envelope',
    'rest_framework',
]

_PROJECT_APPS = [
    'accounts',
    'dashboard',
    'google_analytics',
    'lib',
    'master',
    'translate',
    'url',
    'urlanalytics',
    'webcrawler',
    # and a few others
]

LOGGING_CONFIG = None

_verboselogfmt = """
%(levelname)s:%(name)s [%(filename)s %(funcName)s:%(lineno)s]
    asctime: %(asctime)s
    pathname: %(pathname)s
%(message)s
"""

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'formatters': {
        'verbose': {
            'format': _verboselogfmt,
            'datefmt': '%d/%b/%Y %H:%M:%S'
        },
        'simple': {
            'format': logging.BASIC_FORMAT
        },
    },
    'handlers': {
        'file': {
            'level': 'ERROR',
            'class': 'logging.FileHandler',
            'filename': '/tmp/analytics.log',
            'formatter': 'verbose'
        },
        'console': {
            'class': 'logging.StreamHandler',
            'formatter': 'verbose'
        },
    },
    'loggers': {
        # Root
        '': {
            'level': 'WARNING',
            'handlers': ['console']
        },
        'django': {
            'handlers': ['console'],
            'level': _ENV.get('DJ_LOGGING', 'INFO').upper(),
            'propagate': True
        },
        'app': {
            'handlers': ['console', 'file'],
            'level': _ENV.get('LOGGING', 'INFO').upper(),
            'propagate': False
        },
    }
}

logging.config.dictConfig(LOGGING)

A few environment variables:

export DEBUG=False
export LOGGING='DEBUG'
export DJ_LOGGING='WARNING'
export REDIS_SERVER='<elastic_cache_redis_server_name>.abcdef.ng.XXXX.use1.cache.amazonaws.com'

@mnaberez mnaberez added the question label Aug 28, 2018

@mnaberez

This comment has been minimized.

Show comment
Hide comment
@mnaberez

mnaberez Aug 28, 2018

Member

Please ask questions about installing or using Supervisor on the supervisor-users mailing list. We use this issue tracker mainly for bugs in Supervisor itself. As this question seems heavily related to Celery, you might also consider asking this question on a support forum for Celery.

Member

mnaberez commented Aug 28, 2018

Please ask questions about installing or using Supervisor on the supervisor-users mailing list. We use this issue tracker mainly for bugs in Supervisor itself. As this question seems heavily related to Celery, you might also consider asking this question on a support forum for Celery.

@mnaberez mnaberez closed this Aug 28, 2018

@bsolomon1124

This comment has been minimized.

Show comment
Hide comment
@bsolomon1124

bsolomon1124 Aug 28, 2018

@mnaberez I stumbled across the following within the supervisorctl shell:

$ supervisord -c supervisord.conf
$ supervisorctl
supervisor> avail
error: <class 'xmlrpc.client.ProtocolError'>, <ProtocolError for 127.0.0.1/RPC2: 500 Internal Server Error>: file: /path/to/project_name/py36/src/supervisor/supervisor/xmlrpc.py line: 542

Does this more resemble a bug?

When I do maintail within the supervisorctl shell, there is no additional traceback, just the usual:

2018-08-28 18:49:53,973 INFO success: celery_google entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-28 18:49:54,982 INFO exited: celery_dashboard (exit status 1; not expected)
2018-08-28 18:49:54,989 INFO spawned: 'celery_dashboard' with pid 8006
2018-08-28 18:49:54,990 INFO exited: celery_master (exit status 1; not expected)
2018-08-28 18:49:55,177 INFO spawned: 'celery_master' with pid 8010
2018-08-28 18:49:56,420 INFO success: celery_master entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-28 18:49:56,420 INFO success: celery_dashboard entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

bsolomon1124 commented Aug 28, 2018

@mnaberez I stumbled across the following within the supervisorctl shell:

$ supervisord -c supervisord.conf
$ supervisorctl
supervisor> avail
error: <class 'xmlrpc.client.ProtocolError'>, <ProtocolError for 127.0.0.1/RPC2: 500 Internal Server Error>: file: /path/to/project_name/py36/src/supervisor/supervisor/xmlrpc.py line: 542

Does this more resemble a bug?

When I do maintail within the supervisorctl shell, there is no additional traceback, just the usual:

2018-08-28 18:49:53,973 INFO success: celery_google entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-28 18:49:54,982 INFO exited: celery_dashboard (exit status 1; not expected)
2018-08-28 18:49:54,989 INFO spawned: 'celery_dashboard' with pid 8006
2018-08-28 18:49:54,990 INFO exited: celery_master (exit status 1; not expected)
2018-08-28 18:49:55,177 INFO spawned: 'celery_master' with pid 8010
2018-08-28 18:49:56,420 INFO success: celery_master entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-08-28 18:49:56,420 INFO success: celery_dashboard entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
@bsolomon1124

This comment has been minimized.

Show comment
Hide comment
@bsolomon1124

bsolomon1124 Aug 29, 2018

@mnaberez I now have a traceback about 200k lines into supervisor.log. Kindly ask that you consider re-opening this issue. From `vim +291009 /tmp/supervisor.log

291008 2018-08-29 10:45:37,195 CRIT Handling XML-RPC request with data "<?xml version='1.0'?>\n<methodCall>\n<methodName>supervisor.getAllConfigInfo</methodName>\n<params>\n</params>\n</methodCall>\n" raised an unexpected exception: Traceback (most rece       nt call last):
291009   File "/usr/lib64/python3.6/xmlrpc/client.py", line 510, in __dump
291010     f = self.dispatch[type(value)]
291011 KeyError: <class 'getset_descriptor'>
291012 
291013 During handling of the above exception, another exception occurred:
291014 
291015 Traceback (most recent call last):
291016   File "/path/to/project_name/py36/src/supervisor/supervisor/xmlrpc.py", line 432, in continue_request
291017     body = as_bytes(xmlrpc_marshal(value))
291018   File "/path/to/project_name/py36/src/supervisor/supervisor/xmlrpc.py", line 150, in xmlrpc_marshal
291019     body = xmlrpclib.dumps(value,  methodresponse=ismethodresponse)
291020   File "/usr/lib64/python3.6/xmlrpc/client.py", line 971, in dumps
291021     data = m.dumps(params)
291022   File "/usr/lib64/python3.6/xmlrpc/client.py", line 502, in dumps
291023     dump(v, write)
291024   File "/usr/lib64/python3.6/xmlrpc/client.py", line 524, in __dump
291025     f(self, value, write)
291026   File "/usr/lib64/python3.6/xmlrpc/client.py", line 577, in dump_array
291027     dump(v, write)
291028   File "/usr/lib64/python3.6/xmlrpc/client.py", line 524, in __dump
291029     f(self, value, write)
291030   File "/usr/lib64/python3.6/xmlrpc/client.py", line 595, in dump_struct
291031     dump(v, write)
291032   File "/usr/lib64/python3.6/xmlrpc/client.py", line 524, in __dump
291033     f(self, value, write)
291034   File "/usr/lib64/python3.6/xmlrpc/client.py", line 615, in dump_instance
291035     self.dump_struct(value.__dict__, write)
291036   File "/usr/lib64/python3.6/xmlrpc/client.py", line 595, in dump_struct
291037     dump(v, write)
291038   File "/usr/lib64/python3.6/xmlrpc/client.py", line 514, in __dump
291039     raise TypeError("cannot marshal %s objects" % type(value))
291040 TypeError: cannot marshal <class 'getset_descriptor'> objects

bsolomon1124 commented Aug 29, 2018

@mnaberez I now have a traceback about 200k lines into supervisor.log. Kindly ask that you consider re-opening this issue. From `vim +291009 /tmp/supervisor.log

291008 2018-08-29 10:45:37,195 CRIT Handling XML-RPC request with data "<?xml version='1.0'?>\n<methodCall>\n<methodName>supervisor.getAllConfigInfo</methodName>\n<params>\n</params>\n</methodCall>\n" raised an unexpected exception: Traceback (most rece       nt call last):
291009   File "/usr/lib64/python3.6/xmlrpc/client.py", line 510, in __dump
291010     f = self.dispatch[type(value)]
291011 KeyError: <class 'getset_descriptor'>
291012 
291013 During handling of the above exception, another exception occurred:
291014 
291015 Traceback (most recent call last):
291016   File "/path/to/project_name/py36/src/supervisor/supervisor/xmlrpc.py", line 432, in continue_request
291017     body = as_bytes(xmlrpc_marshal(value))
291018   File "/path/to/project_name/py36/src/supervisor/supervisor/xmlrpc.py", line 150, in xmlrpc_marshal
291019     body = xmlrpclib.dumps(value,  methodresponse=ismethodresponse)
291020   File "/usr/lib64/python3.6/xmlrpc/client.py", line 971, in dumps
291021     data = m.dumps(params)
291022   File "/usr/lib64/python3.6/xmlrpc/client.py", line 502, in dumps
291023     dump(v, write)
291024   File "/usr/lib64/python3.6/xmlrpc/client.py", line 524, in __dump
291025     f(self, value, write)
291026   File "/usr/lib64/python3.6/xmlrpc/client.py", line 577, in dump_array
291027     dump(v, write)
291028   File "/usr/lib64/python3.6/xmlrpc/client.py", line 524, in __dump
291029     f(self, value, write)
291030   File "/usr/lib64/python3.6/xmlrpc/client.py", line 595, in dump_struct
291031     dump(v, write)
291032   File "/usr/lib64/python3.6/xmlrpc/client.py", line 524, in __dump
291033     f(self, value, write)
291034   File "/usr/lib64/python3.6/xmlrpc/client.py", line 615, in dump_instance
291035     self.dump_struct(value.__dict__, write)
291036   File "/usr/lib64/python3.6/xmlrpc/client.py", line 595, in dump_struct
291037     dump(v, write)
291038   File "/usr/lib64/python3.6/xmlrpc/client.py", line 514, in __dump
291039     raise TypeError("cannot marshal %s objects" % type(value))
291040 TypeError: cannot marshal <class 'getset_descriptor'> objects
@bsolomon1124

This comment has been minimized.

Show comment
Hide comment
@bsolomon1124

bsolomon1124 Aug 29, 2018

maintail from /tmp/supervisor.log after setting loglevel=debug in supervisord.conf:

2018-08-29 12:20:07,669 DEBG fd 18 closed, stopped monitoring <POutputDispatcher at 140356791694952 for <Subprocess at 140356791551032 with name celery_master in state RUNNING> (stdout)>
2018-08-29 12:20:07,669 DEBG fd 21 closed, stopped monitoring <POutputDispatcher at 140356791464232 for <Subprocess at 140356791551032 with name celery_master in state RUNNING> (stderr)>
2018-08-29 12:20:07,669 INFO exited: celery_master (exit status 1; not expected)
2018-08-29 12:20:07,669 DEBG received SIGCHLD indicating a child quit

bsolomon1124 commented Aug 29, 2018

maintail from /tmp/supervisor.log after setting loglevel=debug in supervisord.conf:

2018-08-29 12:20:07,669 DEBG fd 18 closed, stopped monitoring <POutputDispatcher at 140356791694952 for <Subprocess at 140356791551032 with name celery_master in state RUNNING> (stdout)>
2018-08-29 12:20:07,669 DEBG fd 21 closed, stopped monitoring <POutputDispatcher at 140356791464232 for <Subprocess at 140356791551032 with name celery_master in state RUNNING> (stderr)>
2018-08-29 12:20:07,669 INFO exited: celery_master (exit status 1; not expected)
2018-08-29 12:20:07,669 DEBG received SIGCHLD indicating a child quit

@mnaberez mnaberez reopened this Sep 7, 2018

@mnaberez mnaberez changed the title from Supervisor programs start and then quickly exit to TypeError: cannot marshal <class 'getset_descriptor'> objects Sep 7, 2018

@mnaberez mnaberez added python 3 and removed question labels Sep 7, 2018

@mnaberez

This comment has been minimized.

Show comment
Hide comment
@mnaberez

mnaberez Sep 7, 2018

Member

Log from above wrapped for readability:

291008 2018-08-29 10:45:37,195 CRIT Handling XML-RPC request with data 
"<?xml version='1.0'?>\n<methodCall>\n
<methodName>supervisor.getAllConfigInfo</methodName>\n
<params>\n</params>\n</methodCall>\n
" raised an unexpected exception: Traceback (most recent call last):

The exception occurred while handling a call to getAllConfigInfo(). The log shows it's running under Python 3.6 so this must not be a release version of Supervisor. On current git master (66038a1), it looks like the only place that supervisorctl calls getAllConfigInfo() is in the avail command. This bug is likely to be Python 3 specific.

Member

mnaberez commented Sep 7, 2018

Log from above wrapped for readability:

291008 2018-08-29 10:45:37,195 CRIT Handling XML-RPC request with data 
"<?xml version='1.0'?>\n<methodCall>\n
<methodName>supervisor.getAllConfigInfo</methodName>\n
<params>\n</params>\n</methodCall>\n
" raised an unexpected exception: Traceback (most recent call last):

The exception occurred while handling a call to getAllConfigInfo(). The log shows it's running under Python 3.6 so this must not be a release version of Supervisor. On current git master (66038a1), it looks like the only place that supervisorctl calls getAllConfigInfo() is in the avail command. This bug is likely to be Python 3 specific.

@bsolomon1124

This comment has been minimized.

Show comment
Hide comment
@bsolomon1124

bsolomon1124 Sep 7, 2018

Yes--actually, I do remember seeing a snippet of this traceback when I called avail from within the supervisorctl shell.

And so, that said, it seems like this exception might be unrelated to the actual core of this issue, which is celery workers suddenly quitting.

I mentioned in an issue on the Celery page that downgrading from Celery 4.2.1 to 3.1.x seemed to solve the issue. (#5035 possibly related also.)

Thanks for reopening, but at this point I would venture to say it does look to be more related to Celery than supervisord.

bsolomon1124 commented Sep 7, 2018

Yes--actually, I do remember seeing a snippet of this traceback when I called avail from within the supervisorctl shell.

And so, that said, it seems like this exception might be unrelated to the actual core of this issue, which is celery workers suddenly quitting.

I mentioned in an issue on the Celery page that downgrading from Celery 4.2.1 to 3.1.x seemed to solve the issue. (#5035 possibly related also.)

Thanks for reopening, but at this point I would venture to say it does look to be more related to Celery than supervisord.

@mnaberez

This comment has been minimized.

Show comment
Hide comment
@mnaberez

mnaberez Sep 7, 2018

Member

Reproduce Instructions

Create a config file supervisord.conf:

[supervisord]
loglevel=debug 

[inet_http_server]
port=127.0.0.1:9001        

[supervisorctl]
serverurl=http://127.0.0.1:9001 

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

Start supervisord in the foreground that config:

$ supervisord -n -c supervisord.conf

In a new terminal, add a program to the config file, then rereadandavail:

$ printf '[program:cat]\ncommand=/bin/cat\n' >> supervisord.conf
$ supervisorctl -c supervisord.conf
supervisor> reread
cat: available
supervisor> avail
error: <class 'xmlrpc.client.ProtocolError'>, <ProtocolError for 127.0.0.1/RPC2: 500 Internal Server Error>

The exception in the supervisord log will be the same as the one above. This bug only appears to happen on Python 3, and only when a new program is added to the config file.

Member

mnaberez commented Sep 7, 2018

Reproduce Instructions

Create a config file supervisord.conf:

[supervisord]
loglevel=debug 

[inet_http_server]
port=127.0.0.1:9001        

[supervisorctl]
serverurl=http://127.0.0.1:9001 

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

Start supervisord in the foreground that config:

$ supervisord -n -c supervisord.conf

In a new terminal, add a program to the config file, then rereadandavail:

$ printf '[program:cat]\ncommand=/bin/cat\n' >> supervisord.conf
$ supervisorctl -c supervisord.conf
supervisor> reread
cat: available
supervisor> avail
error: <class 'xmlrpc.client.ProtocolError'>, <ProtocolError for 127.0.0.1/RPC2: 500 Internal Server Error>

The exception in the supervisord log will be the same as the one above. This bug only appears to happen on Python 3, and only when a new program is added to the config file.

@mnaberez mnaberez closed this in abef0a2 Sep 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment