New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server closed the connection unexpectedly #46

Closed
ghost opened this Issue Apr 14, 2011 · 24 comments

Comments

Projects
None yet
@ghost

ghost commented Apr 14, 2011

I'm having a weird issue with PostgreSQL (8.4.7) and celery (2.2.5) and Django (1.2.3).

Suddenly all my celery tasks, that are touching the database, are randomly raising:

DatabaseError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

Sometimes a task fails 3 times, and then works just fine, other times it fails randomly.

if I run:

$python manage.py shell

and create an object and save it to the database, all works fine.
Here's the traceback:

[2011-04-14 16:59:43,071: INFO/MainProcess] Got task from broker:   WebTools.controlpanel.tasks.discoverDeviceTask[15dcea93-d3ff-416d-aedd-bc6fe80a6f6b]
[2011-04-14 16:59:43,122: WARNING/PoolWorker-25] server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
[2011-04-14 16:59:43,126: ERROR/MainProcess] Task WebTools.controlpanel.tasks.discoverDeviceTask[15dcea93-d3ff-416d-aedd-bc6fe80a6f6b] raised exception: DatabaseError('server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 34, in trace
    return cls(states.SUCCESS, retval=fun(*args, **kwargs))
  File "/usr/local/lib/python2.6/dist-packages/celery/task/base.py", line 234, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/ast-tools/WebTools/../WebTools/controlpanel/tasks.py", line 54, in wrapper
    IP.objects.filter(ip=ip).update(busy="N")
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 467, in update
    rows = query.get_compiler(self.db).execute_sql(None)
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 861, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 727, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/util.py", line 15, in execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 50, in execute
    return self.cursor.execute(query, args)
DatabaseError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

Here's database log:

2011-04-14 18:26:37 CEST LOG:  connection received: host=127.0.0.1 port=40295
2011-04-14 18:26:37 CEST LOG:  connection authorized: user=ast database=astdb
2011-04-14 18:26:37 CEST LOG:  disconnection: session time: 0:00:00.044 user=ast database=astdb host=127.0.0.1 port=40295
2011-04-14 18:26:54 CEST LOG:  statement: UPDATE "djcelery_workerstate" SET "hostname" = E'ast-tool01', "last_heartbeat" = E'2011-04-14 18:26:52.081696' WHERE "djcelery_workerstate"."id" = 2 
2011-04-14 18:27:02 CEST LOG:  statement: INSERT INTO "djcelery_taskstate" ("state", "task_id", "name", "tstamp", "args", "kwargs", "eta", "expires", "result", "traceback", "runtime", "worker_id", "hidden") VALUES (E'FAILURE', E'352493f7-1fe5-4216-92aa-da44478233d2', E'WebTools.TestingReports.tasks.parseTestResultTaskv2', E'2011-04-14 18:27:00.068098', E'[u''/media/sdb/AST_TestLogs/testresults/'']', E'{}', NULL, NULL, E'DatabaseError(''server closed the connection unexpectedly\\n\\tThis probably means the server terminated abnormally\\n\\tbefore or while processing the request.\\n'',)', E'Traceback (most recent call last):
      File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 34, in trace
        return cls(states.SUCCESS, retval=fun(*args, **kwargs))
      File "/usr/local/lib/python2.6/dist-packages/celery/task/base.py", line 234, in __call__
        return self.run(*args, **kwargs)
      File "/usr/local/ast-tools/WebTools/../WebTools/TestingReports/tasks.py", line 45, in wrapper
        return func(*args, **kwargs)
      File "/usr/local/ast-tools/WebTools/../WebTools/TestingReports/tasks.py", line 243, in run
        tr.save()
      File "/usr/local/ast-tools/WebTools/../WebTools/TestingReports/models.py", line 60, in save
        super(TestReport, self).save(*args, **kwargs)
      File "/usr/local/lib/python2.6/dist-packages/django/db/models/base.py", line 434, in save
        self.save_base(using=using, force_insert=force_insert, force_update=force_update)
      File "/usr/local/lib/python2.6/dist-packages/django/db/models/base.py", line 527, in save_base
        result = manager._insert(values, return_id=update_pk, using=using)
      File "/usr/local/lib/python2.6/dist-packages/django/db/models/manager.py", line 195, in _insert
        return insert_query(self.model, values, **kwargs)
      File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 1479, in insert_query
        return query.get_compiler(using=using).execute_sql(return_id)
      File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 783, in execute_sql
        cursor = super(SQLInsertCompiler, self).execute_sql(None)
      File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 727, in execute_sql
        cursor.execute(sql, params)
      File "/usr/local/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 50, in execute
        return self.cursor.execute(query, args)
    DatabaseError: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
    ', NULL, 2, false)

Anybody knows what's going on ?
Thanks

@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Apr 14, 2011

Member

On Thursday, April 14, 2011 at 5:31 PM, PonasNiekas wrote:

I'm having a weird issue with PostgreSQL (8.4.7) and celery (2.2.5) and Django (1.2.3).

Suddenly all my celery tasks, that are touching the database, are raising:

DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Check the postgresql logs maybe? Often the reason is found in the logs.

Reply to this email directly or view it on GitHub:
https://github.com/ask/django-celery/issues/46

Member

ask commented Apr 14, 2011

On Thursday, April 14, 2011 at 5:31 PM, PonasNiekas wrote:

I'm having a weird issue with PostgreSQL (8.4.7) and celery (2.2.5) and Django (1.2.3).

Suddenly all my celery tasks, that are touching the database, are raising:

DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Check the postgresql logs maybe? Often the reason is found in the logs.

Reply to this email directly or view it on GitHub:
https://github.com/ask/django-celery/issues/46

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Apr 14, 2011

Hi ask, I updated my post with database log - the connections are often closed immediately after opening them. So the question is why, as I'm not doing anything with connections/transactions or alike, just creating/updating objects.
That leaves Django ORM and celery, I guess :/.

ghost commented Apr 14, 2011

Hi ask, I updated my post with database log - the connections are often closed immediately after opening them. So the question is why, as I'm not doing anything with connections/transactions or alike, just creating/updating objects.
That leaves Django ORM and celery, I guess :/.

@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Apr 15, 2011

Member

Does it work better with Celery 2.2.4?

If not, then I'm not sure we can rule out some other error just yet. What does unexpectedly close the connection mean? Celery doesn't close the connection until after the task returns, so maybe the query timed out or similar? (Is the query very expensive?)

Member

ask commented Apr 15, 2011

Does it work better with Celery 2.2.4?

If not, then I'm not sure we can rule out some other error just yet. What does unexpectedly close the connection mean? Celery doesn't close the connection until after the task returns, so maybe the query timed out or similar? (Is the query very expensive?)

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Apr 17, 2011

I started having the problem while using Celery 2.2.3, then upgraded to 2.2.5 with no change.

Just to clarify:

  • reading from database works fine everywhere
  • reading/writing/modifying data in Django admin works fine
  • reading data in celery tasks works fine
  • writing/modifying data in celery tasks fails randomly with: "server closed the connection unexpectedly"

Queries modifying data are simple, amount of data is small.

After some debugging, I believe that request_finished signal (which closes the db connection) is being send before the task is actually finished. (Everything works fine If I comment closing connection in django/db/init.py)

Is there any way to find out which task issues "request_finished" signal and when it does so ?

ghost commented Apr 17, 2011

I started having the problem while using Celery 2.2.3, then upgraded to 2.2.5 with no change.

Just to clarify:

  • reading from database works fine everywhere
  • reading/writing/modifying data in Django admin works fine
  • reading data in celery tasks works fine
  • writing/modifying data in celery tasks fails randomly with: "server closed the connection unexpectedly"

Queries modifying data are simple, amount of data is small.

After some debugging, I believe that request_finished signal (which closes the db connection) is being send before the task is actually finished. (Everything works fine If I comment closing connection in django/db/init.py)

Is there any way to find out which task issues "request_finished" signal and when it does so ?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Apr 19, 2011

It looks like the source of the error I'm getting is in the psycopg2 module itself. For some reason it's raising DatabaseError: server closed the connection unexpectedly, even thought database logs show no disconnection entry.

I moved data to MySQL and everything works as expected.

tasks.py

class TestTask(Task):
    ignore_result = True

    def run(self):
        IP.objects.filter(ip="0.0.1.1").update(status=time.strftime("%Y.%m.%d %H:%M:%S"))
        return "ok"

Execution (first time the task runs fine):

In [4]: from WebTools.controlpanel.tasks import TestTask
In [5]: TestTask.apply_async()
Out[5]: <AsyncResult: f2f82ed2-f79e-4fdd-a40b-629f6ac33517>
In [6]: TestTask.apply_async()
Out[5]: <AsyncResult: 7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508>

celeryd.log

[2011-04-19 11:38:20,870: INFO/MainProcess] Got task from broker: WebTools.controlpanel.tasks.TestTask[f2f82ed2-f79e-4fdd-a40b-629f6ac33517]
[2011-04-19 11:38:21,095: INFO/MainProcess] Task WebTools.controlpanel.tasks.TestTask[f2f82ed2-f79e-4fdd-a40b-629f6ac33517] succeeded in 0.181342840195s: 'ok'
[2011-04-19 11:38:29,358: INFO/MainProcess] Got task from broker: WebTools.controlpanel.tasks.TestTask[7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508]
[2011-04-19 11:38:29,438: ERROR/MainProcess] Task WebTools.controlpanel.tasks.TestTask[7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508] raised exception: DatabaseError('server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 34, in trace
    return cls(states.SUCCESS, retval=fun(*args, **kwargs))
  File "/usr/local/lib/python2.6/dist-packages/celery/task/base.py", line 241, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/ast-tools/WebTools/../WebTools/controlpanel/tasks.py", line 352, in run
    IP.objects.filter(ip="0.0.1.1").update(status=time.strftime("%Y.%m.%d %H:%M:%S"))
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 467, in update
    rows = query.get_compiler(self.db).execute_sql(None)
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 861, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 727, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/util.py", line 15, in execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
    return self.cursor.execute(query, args)
DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
    before or while processing the request.

None

PostgreSQL logs (for the time task run fine):

2011-04-19 11:38:20 CEST 4dad5505.678 0 LOG:  statement: UPDATE "controlpanel_ip" SET "status" = E'2011.04.19 11:38:20' WHERE HOST("controlpanel_ip"."ip") = E'0.0.1.1' 
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id",     "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517' 
2011-04-19 11:38:21 CEST 4dad5505.678 1117441 LOG:  statement: COMMIT
2011-04-19 11:38:21 CEST 4dad5505.678 0 LOG:  disconnection: session time: 0:12:55.559 user=ast database=astdb host=10.47.247.155 port=36178
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id", "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517' 
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: SAVEPOINT s1225172112_x1
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: INSERT INTO "djcelery_taskstate" ("state", "task_id", "name", "tstamp", "args", "kwargs", "eta", "expires", "result", "traceback", "runtime", "worker_id", "hidden") VALUES (E'STARTED', E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517', E'WebTools.controlpanel.tasks.TestTask', E'2011-04-19 11:38:20.914830', E'[]', E'{}', NULL, NULL, NULL, NULL, NULL, 1, false)
2011-04-19 11:38:21 CEST 4dad5507.69d 1117442 LOG:  statement: SELECT CURRVAL('"djcelery_taskstate_id_seq"')
2011-04-19 11:38:21 CEST 4dad5507.69d 1117442 LOG:  statement: RELEASE SAVEPOINT s1225172112_x1
2011-04-19 11:38:21 CEST 4dad5507.69d 1117442 LOG:  statement: COMMIT
2011-04-19 11:38:23 CEST 4dad5507.69d 0 LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
2011-04-19 11:38:23 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id", "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517' 
2011-04-19 11:38:23 CEST 4dad5507.69d 0 LOG:  statement: SELECT (1) AS "a" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."id" = 98785  LIMIT 1
2011-04-19 11:38:23 CEST 4dad5507.69d 0 LOG:  statement: UPDATE "djcelery_taskstate" SET "state" = E'SUCCESS', "task_id" = E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517', "name" = E'WebTools.controlpanel.tasks.TestTask', "tstamp" = E'2011-04-19 11:38:21.095244', "args" = E'[]', "kwargs" = E'{}', "eta" = NULL, "expires" = NULL, "result" = E'''ok''', "traceback" = NULL, "runtime" = 0.18134284019470215, "worker_id" = 1, "hidden" = false WHERE "djcelery_taskstate"."id" = 98785 
2011-04-19 11:38:23 CEST 4dad5507.69d 1117444 LOG:  statement: COMMIT

Second run when it fails:

2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_workerstate"."id", "djcelery_workerstate"."hostname", "djcelery_workerstate"."last_heartbeat" FROM "djcelery_workerstate" WHERE "djcelery_workerstate"."hostname" = E'ast-tool02' 
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SELECT (1) AS "a" FROM "djcelery_workerstate" WHERE "djcelery_workerstate"."id" = 1  LIMIT 1
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: UPDATE "djcelery_workerstate" SET "hostname" = E'ast-tool02', "last_heartbeat" = E'2011-04-19 11:38:29.438096' WHERE "djcelery_workerstate"."id" = 1 
2011-04-19 11:38:31 CEST 4dad5507.69d 1117445 LOG:  statement: COMMIT
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id", "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508' 
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id", "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508' 
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SAVEPOINT s1225172112_x1
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: INSERT INTO "djcelery_taskstate" ("state", "task_id", "name", "tstamp", "args", "kwargs", "eta", "expires", "result", "traceback", "runtime", "worker_id", "hidden") VALUES (E'FAILURE', E'7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508', E'WebTools.controlpanel.tasks.TestTask', E'2011-04-19 11:38:29.438096', E'[]', E'{}', NULL, NULL, E'DatabaseError(''server closed the connection unexpectedly\\n\\tThis probably means the server terminated abnormally\\n\\tbefore or while processing the request.\\n'',)', E'Traceback (most recent call last):
#011  File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 34, in trace
#011    return cls(states.SUCCESS, retval=fun(*args, **kwargs))
#011  File "/usr/local/lib/python2.6/dist-packages/celery/task/base.py", line 241, in __call__
#011    return self.run(*args, **kwargs)
#011  File "/usr/local/ast-tools/WebTools/../WebTools/controlpanel/tasks.py", line 352, in run
#011    IP.objects.filter(ip="0.0.1.1").update(status=time.strftime("%Y.%m.%d %H:%M:%S"))
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 467, in update
#011    rows = query.get_compiler(self.db).execute_sql(None)
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 861, in execute_sql
#011    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 727, in execute_sql
#011    cursor.execute(sql, params)
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/util.py", line 15, in execute
#011    return self.cursor.execute(sql, params)
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
#011    return self.cursor.execute(query, args)
#011DatabaseError: server closed the connection unexpectedly
#011#011This probably means the server terminated abnormally
#011#011before or while processing the request.
#011
#011', NULL, 1, false)
2011-04-19 11:38:31 CEST 4dad5507.69d 1117446 LOG:  statement: SELECT CURRVAL('"djcelery_taskstate_id_seq"')
2011-04-19 11:38:31 CEST 4dad5507.69d 1117446 LOG:  statement: RELEASE SAVEPOINT s1225172112_x1
2011-04-19 11:38:31 CEST 4dad5507.69d 1117446 LOG:  statement: COMMIT

All this with latest psycopg2 (2.4).

ghost commented Apr 19, 2011

It looks like the source of the error I'm getting is in the psycopg2 module itself. For some reason it's raising DatabaseError: server closed the connection unexpectedly, even thought database logs show no disconnection entry.

I moved data to MySQL and everything works as expected.

tasks.py

class TestTask(Task):
    ignore_result = True

    def run(self):
        IP.objects.filter(ip="0.0.1.1").update(status=time.strftime("%Y.%m.%d %H:%M:%S"))
        return "ok"

Execution (first time the task runs fine):

In [4]: from WebTools.controlpanel.tasks import TestTask
In [5]: TestTask.apply_async()
Out[5]: <AsyncResult: f2f82ed2-f79e-4fdd-a40b-629f6ac33517>
In [6]: TestTask.apply_async()
Out[5]: <AsyncResult: 7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508>

celeryd.log

[2011-04-19 11:38:20,870: INFO/MainProcess] Got task from broker: WebTools.controlpanel.tasks.TestTask[f2f82ed2-f79e-4fdd-a40b-629f6ac33517]
[2011-04-19 11:38:21,095: INFO/MainProcess] Task WebTools.controlpanel.tasks.TestTask[f2f82ed2-f79e-4fdd-a40b-629f6ac33517] succeeded in 0.181342840195s: 'ok'
[2011-04-19 11:38:29,358: INFO/MainProcess] Got task from broker: WebTools.controlpanel.tasks.TestTask[7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508]
[2011-04-19 11:38:29,438: ERROR/MainProcess] Task WebTools.controlpanel.tasks.TestTask[7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508] raised exception: DatabaseError('server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 34, in trace
    return cls(states.SUCCESS, retval=fun(*args, **kwargs))
  File "/usr/local/lib/python2.6/dist-packages/celery/task/base.py", line 241, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/ast-tools/WebTools/../WebTools/controlpanel/tasks.py", line 352, in run
    IP.objects.filter(ip="0.0.1.1").update(status=time.strftime("%Y.%m.%d %H:%M:%S"))
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 467, in update
    rows = query.get_compiler(self.db).execute_sql(None)
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 861, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 727, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/util.py", line 15, in execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
    return self.cursor.execute(query, args)
DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
    before or while processing the request.

None

PostgreSQL logs (for the time task run fine):

2011-04-19 11:38:20 CEST 4dad5505.678 0 LOG:  statement: UPDATE "controlpanel_ip" SET "status" = E'2011.04.19 11:38:20' WHERE HOST("controlpanel_ip"."ip") = E'0.0.1.1' 
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id",     "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517' 
2011-04-19 11:38:21 CEST 4dad5505.678 1117441 LOG:  statement: COMMIT
2011-04-19 11:38:21 CEST 4dad5505.678 0 LOG:  disconnection: session time: 0:12:55.559 user=ast database=astdb host=10.47.247.155 port=36178
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id", "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517' 
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: SAVEPOINT s1225172112_x1
2011-04-19 11:38:21 CEST 4dad5507.69d 0 LOG:  statement: INSERT INTO "djcelery_taskstate" ("state", "task_id", "name", "tstamp", "args", "kwargs", "eta", "expires", "result", "traceback", "runtime", "worker_id", "hidden") VALUES (E'STARTED', E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517', E'WebTools.controlpanel.tasks.TestTask', E'2011-04-19 11:38:20.914830', E'[]', E'{}', NULL, NULL, NULL, NULL, NULL, 1, false)
2011-04-19 11:38:21 CEST 4dad5507.69d 1117442 LOG:  statement: SELECT CURRVAL('"djcelery_taskstate_id_seq"')
2011-04-19 11:38:21 CEST 4dad5507.69d 1117442 LOG:  statement: RELEASE SAVEPOINT s1225172112_x1
2011-04-19 11:38:21 CEST 4dad5507.69d 1117442 LOG:  statement: COMMIT
2011-04-19 11:38:23 CEST 4dad5507.69d 0 LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
2011-04-19 11:38:23 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id", "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517' 
2011-04-19 11:38:23 CEST 4dad5507.69d 0 LOG:  statement: SELECT (1) AS "a" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."id" = 98785  LIMIT 1
2011-04-19 11:38:23 CEST 4dad5507.69d 0 LOG:  statement: UPDATE "djcelery_taskstate" SET "state" = E'SUCCESS', "task_id" = E'f2f82ed2-f79e-4fdd-a40b-629f6ac33517', "name" = E'WebTools.controlpanel.tasks.TestTask', "tstamp" = E'2011-04-19 11:38:21.095244', "args" = E'[]', "kwargs" = E'{}', "eta" = NULL, "expires" = NULL, "result" = E'''ok''', "traceback" = NULL, "runtime" = 0.18134284019470215, "worker_id" = 1, "hidden" = false WHERE "djcelery_taskstate"."id" = 98785 
2011-04-19 11:38:23 CEST 4dad5507.69d 1117444 LOG:  statement: COMMIT

Second run when it fails:

2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_workerstate"."id", "djcelery_workerstate"."hostname", "djcelery_workerstate"."last_heartbeat" FROM "djcelery_workerstate" WHERE "djcelery_workerstate"."hostname" = E'ast-tool02' 
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SELECT (1) AS "a" FROM "djcelery_workerstate" WHERE "djcelery_workerstate"."id" = 1  LIMIT 1
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: UPDATE "djcelery_workerstate" SET "hostname" = E'ast-tool02', "last_heartbeat" = E'2011-04-19 11:38:29.438096' WHERE "djcelery_workerstate"."id" = 1 
2011-04-19 11:38:31 CEST 4dad5507.69d 1117445 LOG:  statement: COMMIT
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id", "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508' 
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SELECT "djcelery_taskstate"."id", "djcelery_taskstate"."state", "djcelery_taskstate"."task_id", "djcelery_taskstate"."name", "djcelery_taskstate"."tstamp", "djcelery_taskstate"."args", "djcelery_taskstate"."kwargs", "djcelery_taskstate"."eta", "djcelery_taskstate"."expires", "djcelery_taskstate"."result", "djcelery_taskstate"."traceback", "djcelery_taskstate"."runtime", "djcelery_taskstate"."worker_id", "djcelery_taskstate"."hidden" FROM "djcelery_taskstate" WHERE "djcelery_taskstate"."task_id" = E'7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508' 
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: SAVEPOINT s1225172112_x1
2011-04-19 11:38:31 CEST 4dad5507.69d 0 LOG:  statement: INSERT INTO "djcelery_taskstate" ("state", "task_id", "name", "tstamp", "args", "kwargs", "eta", "expires", "result", "traceback", "runtime", "worker_id", "hidden") VALUES (E'FAILURE', E'7a65f3c2-7e50-4dd2-b2ca-5dabe8b2f508', E'WebTools.controlpanel.tasks.TestTask', E'2011-04-19 11:38:29.438096', E'[]', E'{}', NULL, NULL, E'DatabaseError(''server closed the connection unexpectedly\\n\\tThis probably means the server terminated abnormally\\n\\tbefore or while processing the request.\\n'',)', E'Traceback (most recent call last):
#011  File "/usr/local/lib/python2.6/dist-packages/celery/execute/trace.py", line 34, in trace
#011    return cls(states.SUCCESS, retval=fun(*args, **kwargs))
#011  File "/usr/local/lib/python2.6/dist-packages/celery/task/base.py", line 241, in __call__
#011    return self.run(*args, **kwargs)
#011  File "/usr/local/ast-tools/WebTools/../WebTools/controlpanel/tasks.py", line 352, in run
#011    IP.objects.filter(ip="0.0.1.1").update(status=time.strftime("%Y.%m.%d %H:%M:%S"))
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/models/query.py", line 467, in update
#011    rows = query.get_compiler(self.db).execute_sql(None)
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 861, in execute_sql
#011    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 727, in execute_sql
#011    cursor.execute(sql, params)
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/util.py", line 15, in execute
#011    return self.cursor.execute(sql, params)
#011  File "/usr/local/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
#011    return self.cursor.execute(query, args)
#011DatabaseError: server closed the connection unexpectedly
#011#011This probably means the server terminated abnormally
#011#011before or while processing the request.
#011
#011', NULL, 1, false)
2011-04-19 11:38:31 CEST 4dad5507.69d 1117446 LOG:  statement: SELECT CURRVAL('"djcelery_taskstate_id_seq"')
2011-04-19 11:38:31 CEST 4dad5507.69d 1117446 LOG:  statement: RELEASE SAVEPOINT s1225172112_x1
2011-04-19 11:38:31 CEST 4dad5507.69d 1117446 LOG:  statement: COMMIT

All this with latest psycopg2 (2.4).

@chase-seibert

This comment has been minimized.

Show comment
Hide comment
@chase-seibert

chase-seibert Jun 1, 2011

Having the same issue. Also PostgreSQL 8.4.7, but with Django 1.3 and the latest celery. So far it's only happening on celery startup with scheduled tasks.

Edit: I resolved it on my local machine with a monkey patch in loaders.py:

# inside DjangoLoader
def on_task_init(self, task_id, task):
    self.close_database()

chase-seibert commented Jun 1, 2011

Having the same issue. Also PostgreSQL 8.4.7, but with Django 1.3 and the latest celery. So far it's only happening on celery startup with scheduled tasks.

Edit: I resolved it on my local machine with a monkey patch in loaders.py:

# inside DjangoLoader
def on_task_init(self, task_id, task):
    self.close_database()
@aldarund

This comment has been minimized.

Show comment
Hide comment
@aldarund

aldarund Jun 28, 2011

Faced this issue too.
Fix with patching loaders.py works fine for me.

aldarund commented Jun 28, 2011

Faced this issue too.
Fix with patching loaders.py works fine for me.

@bradleyayers

This comment has been minimized.

Show comment
Hide comment
@bradleyayers

bradleyayers Jul 19, 2011

This issue is affecting me too.

bradleyayers commented Jul 19, 2011

This issue is affecting me too.

@ask ask closed this in 3395d4e Jul 19, 2011

@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Jul 19, 2011

Member

Ok, I added the patch given by @chase-seibert. Really would like to know why it's needed though.

Member

ask commented Jul 19, 2011

Ok, I added the patch given by @chase-seibert. Really would like to know why it's needed though.

@sbywater

This comment has been minimized.

Show comment
Hide comment
@sbywater

sbywater Sep 8, 2011

We began getting this problem recently, after a significant code change. This happens with only some tasks, only with some arguments. For a task that fails, that task with the same arguments will always fail. These tasks have all been working well until now. This is our stack:

celery 2.3.2
djcelery 2.3.3
PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit
psycopg2 2.4.2
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
django 1.3.0

celeryd.log, slightly sanitized:

[2011-09-08 11:33:10,715: CRITICAL/MainProcess] Internal error <class 'django.db.utils.DatabaseError'>: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

Traceback (most recent call last):
  File "/home/django/myvirtualenv/lib/python2.6/site-packages/celery/worker/__init__.py", line 264, in process_task
    self.loglevel, self.logfile)
  File "/home/django/myvirtualenv/lib/python2.6/site-packages/celery/app/task/__init__.py", line 688, in execute
    request.execute_using_pool(pool, loglevel, logfile)
  File "/home/django/myvirtualenv/lib/python2.6/site-packages/celery/worker/job.py", line 366, in execute_using_pool
    timeout=self.task.time_limit)
  File "/home/django/myvirtualenv/lib/python2.6/site-packages/celery/concurrency/base.py", line 84, in apply_async
    target, args, kwargs))
  File "/home/django/myvirtualenv/src/django/django/db/models/base.py", line 370, in __repr__
    u = unicode(self)
  File "/home/django/myvirtualenv/myproject/myapp/models/thing_models.py", line 183, in __unicode__
    self.thing_type.thing_type_name, self.id,
  File "/home/django/myvirtualenv/src/django/django/db/models/fields/related.py", line 315, in __get__
    rel_obj = QuerySet(self.field.rel.to).using(db).get(**params)
  File "/home/django/myvirtualenv/src/django/django/db/models/query.py", line 344, in get
    num = len(clone)
  File "/home/django/myvirtualenv/src/django/django/db/models/query.py", line 82, in __len__
    self._result_cache = list(self.iterator())
  File "/home/django/myvirtualenv/src/django/django/db/models/query.py", line 273, in iterator
    for row in compiler.results_iter():
  File "/home/django/myvirtualenv/src/django/django/db/models/sql/compiler.py", line 680, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/home/django/myvirtualenv/src/django/django/db/models/sql/compiler.py", line 735, in execute_sql
    cursor.execute(sql, params)
  File "/home/django/myvirtualenv/src/django/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
    return self.cursor.execute(query, args)
DatabaseError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

How this error looks in PostgreSQL logs:

2011-09-08 13:37:52 UTC LOG:  SSL error: decryption failed or bad record mac
2011-09-08 13:37:52 UTC LOG:  could not receive data from client: Connection reset by peer
2011-09-08 13:37:52 UTC LOG:  unexpected EOF on client connection

sbywater commented Sep 8, 2011

We began getting this problem recently, after a significant code change. This happens with only some tasks, only with some arguments. For a task that fails, that task with the same arguments will always fail. These tasks have all been working well until now. This is our stack:

celery 2.3.2
djcelery 2.3.3
PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit
psycopg2 2.4.2
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
django 1.3.0

celeryd.log, slightly sanitized:

[2011-09-08 11:33:10,715: CRITICAL/MainProcess] Internal error <class 'django.db.utils.DatabaseError'>: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

Traceback (most recent call last):
  File "/home/django/myvirtualenv/lib/python2.6/site-packages/celery/worker/__init__.py", line 264, in process_task
    self.loglevel, self.logfile)
  File "/home/django/myvirtualenv/lib/python2.6/site-packages/celery/app/task/__init__.py", line 688, in execute
    request.execute_using_pool(pool, loglevel, logfile)
  File "/home/django/myvirtualenv/lib/python2.6/site-packages/celery/worker/job.py", line 366, in execute_using_pool
    timeout=self.task.time_limit)
  File "/home/django/myvirtualenv/lib/python2.6/site-packages/celery/concurrency/base.py", line 84, in apply_async
    target, args, kwargs))
  File "/home/django/myvirtualenv/src/django/django/db/models/base.py", line 370, in __repr__
    u = unicode(self)
  File "/home/django/myvirtualenv/myproject/myapp/models/thing_models.py", line 183, in __unicode__
    self.thing_type.thing_type_name, self.id,
  File "/home/django/myvirtualenv/src/django/django/db/models/fields/related.py", line 315, in __get__
    rel_obj = QuerySet(self.field.rel.to).using(db).get(**params)
  File "/home/django/myvirtualenv/src/django/django/db/models/query.py", line 344, in get
    num = len(clone)
  File "/home/django/myvirtualenv/src/django/django/db/models/query.py", line 82, in __len__
    self._result_cache = list(self.iterator())
  File "/home/django/myvirtualenv/src/django/django/db/models/query.py", line 273, in iterator
    for row in compiler.results_iter():
  File "/home/django/myvirtualenv/src/django/django/db/models/sql/compiler.py", line 680, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/home/django/myvirtualenv/src/django/django/db/models/sql/compiler.py", line 735, in execute_sql
    cursor.execute(sql, params)
  File "/home/django/myvirtualenv/src/django/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
    return self.cursor.execute(query, args)
DatabaseError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

How this error looks in PostgreSQL logs:

2011-09-08 13:37:52 UTC LOG:  SSL error: decryption failed or bad record mac
2011-09-08 13:37:52 UTC LOG:  could not receive data from client: Connection reset by peer
2011-09-08 13:37:52 UTC LOG:  unexpected EOF on client connection
@sbywater

This comment has been minimized.

Show comment
Hide comment
@sbywater

sbywater Sep 9, 2011

Update: we were able to fix this by reinstalling the latest version of celery.

sbywater commented Sep 9, 2011

Update: we were able to fix this by reinstalling the latest version of celery.

@vibrant

This comment has been minimized.

Show comment
Hide comment
@vibrant

vibrant Oct 6, 2011

I had the same issue and upgrading to newest celery, django-celery and psycopg didn't help. What helped was making one of my tasks stop calling sys.exit(-1). Comes out celery is not happy about such things ;)

vibrant commented Oct 6, 2011

I had the same issue and upgrading to newest celery, django-celery and psycopg didn't help. What helped was making one of my tasks stop calling sys.exit(-1). Comes out celery is not happy about such things ;)

@mixmastamyk

This comment has been minimized.

Show comment
Hide comment
@mixmastamyk

mixmastamyk Dec 13, 2011

Having the same issue with postgres 9.1 and pgbouncer. Upgraded celery, djcelery and psycopg2 to newest versions. The addition to loaders,py didn't help.

mixmastamyk commented Dec 13, 2011

Having the same issue with postgres 9.1 and pgbouncer. Upgraded celery, djcelery and psycopg2 to newest versions. The addition to loaders,py didn't help.

@danawoodman

This comment has been minimized.

Show comment
Hide comment
@danawoodman

danawoodman May 10, 2012

For those that are Googling for this issue: Make sure you're running Django Celery 2.5.5+, that fixes the issue.

danawoodman commented May 10, 2012

For those that are Googling for this issue: Make sure you're running Django Celery 2.5.5+, that fixes the issue.

@dpedowitz

This comment has been minimized.

Show comment
Hide comment
@dpedowitz

dpedowitz Nov 17, 2012

I'm running celery 3.0.11, no django and this is happening for me. I've attempted to add a custom loader but the loader does not appear to be registered.

dpedowitz commented Nov 17, 2012

I'm running celery 3.0.11, no django and this is happening for me. I've attempted to add a custom loader but the loader does not appear to be registered.

@mattmcgrew

This comment has been minimized.

Show comment
Hide comment
@mattmcgrew

mattmcgrew Feb 24, 2014

We recently ran into this issue (we're stuck on an older version of Celery) and it turned out to be a (admittedly rather dumb) non-lazy query in a Django form class definition's form field. The form was imported before the process forked, creating a connection in the parent that was then trashed in the child process. We changed the query to be lazy and it fixed the issue. We never noticed it before because the form was never imported early enough to cause the issue elsewhere. We started seeing it when we added a new task that happened to import something that imported something that imported the form in question.

mattmcgrew commented Feb 24, 2014

We recently ran into this issue (we're stuck on an older version of Celery) and it turned out to be a (admittedly rather dumb) non-lazy query in a Django form class definition's form field. The form was imported before the process forked, creating a connection in the parent that was then trashed in the child process. We changed the query to be lazy and it fixed the issue. We never noticed it before because the form was never imported early enough to cause the issue elsewhere. We started seeing it when we added a new task that happened to import something that imported something that imported the form in question.

@DylanYoung

This comment has been minimized.

Show comment
Hide comment
@DylanYoung

DylanYoung Aug 19, 2016

This is still an issue in 3.0.17. Just had Celery go down on 6 instances running Django 1.3. Could be that a newer version of Django fixes it (none of our 1.6 instances went down).

DylanYoung commented Aug 19, 2016

This is still an issue in 3.0.17. Just had Celery go down on 6 instances running Django 1.3. Could be that a newer version of Django fixes it (none of our 1.6 instances went down).

@auvipy

This comment has been minimized.

Show comment
Hide comment
@auvipy

auvipy Aug 19, 2016

Member

try to upgrade to 1.8

Member

auvipy commented Aug 19, 2016

try to upgrade to 1.8

@DylanYoung

This comment has been minimized.

Show comment
Hide comment
@DylanYoung

DylanYoung Aug 28, 2016

If only that were as easy as you make it sound. Working on it though.

DylanYoung commented Aug 28, 2016

If only that were as easy as you make it sound. Working on it though.

@auvipy

This comment has been minimized.

Show comment
Hide comment
@auvipy

auvipy Aug 29, 2016

Member

it is not a good idea to run un-supported software frameworks.

Member

auvipy commented Aug 29, 2016

it is not a good idea to run un-supported software frameworks.

@mrgaolei

This comment has been minimized.

Show comment
Hide comment
@mrgaolei

mrgaolei May 3, 2017

@ask This bug to reproduce on my server.

Django: 1.9.7
Celery: 4.0.2
PostgreSQL: 9.4

mrgaolei commented May 3, 2017

@ask This bug to reproduce on my server.

Django: 1.9.7
Celery: 4.0.2
PostgreSQL: 9.4

@Revocsid

This comment has been minimized.

Show comment
Hide comment
@Revocsid

Revocsid Jun 26, 2017

Same problem here with periodic tasks that need to fetch / update models stored in my PostgreSQL db.

Django : 1.11.1
Psycopg2 : 2.7.1
Celery : 4.0.2
PostgreSQL : 9.6.2

Revocsid commented Jun 26, 2017

Same problem here with periodic tasks that need to fetch / update models stored in my PostgreSQL db.

Django : 1.11.1
Psycopg2 : 2.7.1
Celery : 4.0.2
PostgreSQL : 9.6.2

@goinnn

This comment has been minimized.

Show comment
Hide comment
@goinnn

goinnn Jun 29, 2017

Me too, is it related?

psycopg/psycopg2#443

This will be fixed in the next psycopg2 release (2.7.2)

goinnn commented Jun 29, 2017

Me too, is it related?

psycopg/psycopg2#443

This will be fixed in the next psycopg2 release (2.7.2)

@dojeda

This comment has been minimized.

Show comment
Hide comment
@dojeda

dojeda Feb 22, 2018

I've stumbled on this exact problem. Difficult to pin down since I could not reproduce it at first.

To reproduce it, I created a dummy task that would query the database with an artificially long query. Something like

@shared_task
def dummy():
    result = list(SomeModel.objects.raw('SELECT id, pg_sleep(1) FROM some_table LIMIT 10'))

This task will connect with the database for 10 seconds.
On the other hand, I connected to my postgres database and severed the connection while the task is running with:

SELECT pid, pg_terminate_backend(pid), datid, client_addr, client_hostname, state FROM pg_stat_get_activity(NULL::integer) 
WHERE datid=(SELECT oid from pg_database where datname = 'django_db');

Where django_db is the database that I have for my Django application.

This procedure reproduced my problem and gave me something to work with. My best workaround is to add django.db.utils.OperationalError to the autoretry_for of my task. Since I had many tasks, and was already using a different base Task object, and did not want to change them all, I wrapped the celery.shared_task decorator:

import celery
def shared_task(*args, **kwargs):
    """Task decorator for robust tasks

    Robust tasks are tasks that will be retried 5 times if one of the
    exceptions in ...
    """
    kwargs['base'] = kwargs.get('base', None) or BaseTask  # BaseTask is my abstract base class that has some custom logging logic
    # Add the django.db.utils.OperationalError in the auto retry list of
    # exceptions. It seems that postgres is closing some connections and this
    # makes the task fail
    kwargs['autoretry_for'] = kwargs.get('autoretry_for', ()) + (django.db.utils.OperationalError,)
    if 'max_retries' not in kwargs:
        kwargs['max_retries'] = 5
    if 'countdown' not in kwargs and 'eta' not in kwargs:
        # TODO: the countdown might not be working on celery 4.1.0 ?
        # I haven't had any problems yet, but keep an eye on this issue:
        # https://github.com/celery/celery/issues/4221
        kwargs['countdown'] = 10
    return celery.shared_task(*args, **kwargs)

dojeda commented Feb 22, 2018

I've stumbled on this exact problem. Difficult to pin down since I could not reproduce it at first.

To reproduce it, I created a dummy task that would query the database with an artificially long query. Something like

@shared_task
def dummy():
    result = list(SomeModel.objects.raw('SELECT id, pg_sleep(1) FROM some_table LIMIT 10'))

This task will connect with the database for 10 seconds.
On the other hand, I connected to my postgres database and severed the connection while the task is running with:

SELECT pid, pg_terminate_backend(pid), datid, client_addr, client_hostname, state FROM pg_stat_get_activity(NULL::integer) 
WHERE datid=(SELECT oid from pg_database where datname = 'django_db');

Where django_db is the database that I have for my Django application.

This procedure reproduced my problem and gave me something to work with. My best workaround is to add django.db.utils.OperationalError to the autoretry_for of my task. Since I had many tasks, and was already using a different base Task object, and did not want to change them all, I wrapped the celery.shared_task decorator:

import celery
def shared_task(*args, **kwargs):
    """Task decorator for robust tasks

    Robust tasks are tasks that will be retried 5 times if one of the
    exceptions in ...
    """
    kwargs['base'] = kwargs.get('base', None) or BaseTask  # BaseTask is my abstract base class that has some custom logging logic
    # Add the django.db.utils.OperationalError in the auto retry list of
    # exceptions. It seems that postgres is closing some connections and this
    # makes the task fail
    kwargs['autoretry_for'] = kwargs.get('autoretry_for', ()) + (django.db.utils.OperationalError,)
    if 'max_retries' not in kwargs:
        kwargs['max_retries'] = 5
    if 'countdown' not in kwargs and 'eta' not in kwargs:
        # TODO: the countdown might not be working on celery 4.1.0 ?
        # I haven't had any problems yet, but keep an eye on this issue:
        # https://github.com/celery/celery/issues/4221
        kwargs['countdown'] = 10
    return celery.shared_task(*args, **kwargs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment