-
-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError in main celery process #171
Comments
I was viewing #138 at the time and didn't notice that it was an issue in the kombu repo. I meant to submit this to the celery repo. I've not managed to reproduce this issue when running a single worker process. |
Kombu is the right place to report this issue so that's fine. Would you able to print out the value it receives? (Value of response in \redis\client.py", line 128, in zset_score_pairs) |
The response value is 1. |
I added logging to redis-py to track everything sent/received and on which socket fd to catch this in action:
I tracked this down to a race between the main celery thread where the BRPOP command is sent and the celery heartbeat thread where a PUBLISH command is sent. If _in_poll is still False the heartbeat thread will not use a new client. This should be True when blocking on BRPOP however _in_poll is set to True after BRPOP has been sent. |
There is still a problem when PUBLISH and ZREVRANGEBYSCORE are interleaved. |
This removes a race condition where _avail_client can return a polling client instead of a new client. Fixes issue #171
I think ZREVRANGEBYSCORE is slightly different. redis.Redis.execute_command pops a connection from the list of available connections or creates a new connection. AFAIK List.pop is atomic. KombuRedis overrides this to use a single connection created in init. It's no longer safe to run redis commands from multiple threads. Actually, now that I think about it, this was the issue with BRPOP. |
The author of redis-py wrote the KombuRedis class, I'm not sure why it's there but I think it was due to changes in the redis-py API. |
Yea, I spotted that in 7c362f6 |
This removes a race condition where _avail_client can return a polling client instead of a new client. Fixes issue #171
I've taken the easy option and used rabbitmq as the backend for now. I'll try to take another look at this when I have more time to spare. |
kombu+rabbitmq is not working for me, so i tried redis and i have this issue... |
It should work, it's the most common configuration. Have you created another issue for this?
Any details? Package versions, OS version, using celery? With celery events enabled? |
yes: celery/celery#644
Kombu 2.5.3, celery 3.0.12, django-celery 3.0.11 on a debian testing.
yes. Did you try to remove the client instance in |
Remove KombuRedis.execute_command. It is not thread safe. The base class implementation is thread safe.
No, I was using 2.4.8. conn_or_acquire was added in the 2.5 branch... I'm almost certain that the problem is this: You need to be extremely unlucky for this to occur. I've not managed to reproduce it on a test system. However it's easy to emulate what would happen by injecting a publish command:
I think it's safe to remove execute_command completely. The base redis.Redis implementation is thread safe (takes a connection from the pool each time and returned it afterwards). |
We are seeing this issue also a few times a day. Any plans to commit this fix? |
@dknecht FYI my forks of celery and kombu works stable in production for about 3 months without any problems... |
Using this fix? I haven't seen any pull request? |
Btw, you could also try using the 3.0 branch, as I've removed a thread that was accidentaly enabled. The worker should be thread-less when using Redis:
|
Want to apply this patch, but anyone think we would need the KombuRedis class at all? It was originally added there by the author of redis-py, but I'm not entirely sure of the purpose. We definitely need Redis.connection to get to the underlying socket though, but that's just for the connections used for polling. |
where polling here refers to kqueue/select/epoll |
Patch merged into 2.5 and master |
Thanks. We will give this a try today. |
Fixed |
This occurs quite regularly ~ once a day.
Worker (XP SP3 32bit):
2 worker processes.
CELERYD_PREFETCH_MULTIPLIER = 1
CELERY_ACKS_LATE = True
Python 2.7.3
celery==3.0.11
celery-with-redis==3.0
kombu==2.4.7
redis==2.7.1
I've not seen this issue on a similar Windows 7 x64 worker although it is not using CELERY_ACKS_LATE = True and has a single worker process.
Client debian squeeze x86_64:
Python 2.7.3
celery==3.0.11
celery-with-redis==3.0
kombu==2.4.7
redis==2.6.2
Redis server debian squeeze x86_64:
redis 2.4.13
The text was updated successfully, but these errors were encountered: