-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss of data/jobs with no possibility of exceptions or logs #228
Comments
There's a patch around someplace to return false instead of the generated job ID if job creation fails, but this project has been stalled for about a year, so it hasn't been pulled in. Would exceptions be better? Perhaps, but they wouldn't be consistent with the majority of how the project interfaces with external code. Either way, at the moment I'd feel better recommending illuminate/queue for queue management (the Redis driver operates very similarly to how PHP-Resque does) if you're looking for something actively maintained. |
@danhunsaker thank you for the speedy reply. Do you have contribution rights to this repo or is it only @chrisboulton? It would be really useful (for future teams/users) to mark this repo as abandoned if possible to avoid others running into this issue with no hope of resolution (save for an existing or new fork) |
It's not abandoned, just yet. Just stalled for a while. I believe Chris was planning to open push access to the repo at some point soon, when he has time to properly review applicants and get the selected individuals up to speed on the direction the project is intended to move forward. At such time, progress will likely leap forward, and PHP-Resque will again become a powerful contender in the queue engine landscape. |
Ah ok, thanks for the update. In the meantime is there anything we can do (other than this issue) to make users aware of this issue (and the potential lengthy timeline for a fix). Users who do not do their due diligence when assessing packages for inclusion in their projects will not likely be aware of the problem here - and when (google) searching for background queuing solutions this lib (and the symfony bundle) are high up if not first in the results. |
@peterjmit Nice work! +1! Thanks for sharing, I always wondered where that problem came from... :-) |
Thanks for spending the time tracking this down - I'm definitely eager to get something in place for this. I feel like surfacing an exception is the correct solution here - do we want to refactor everything to surface something like |
Unless there's a chance we'd have a non-connection Redis exception to raise. If there is, I'd call it It would also be good, at that point, to raise exceptions on other errors we currently do not, for consistency's sake. This is a non-BC change, though, so versioning will be affected. Returning false instead of raising an exception would be a decent patch for 1.2.1, while the exception would be better overall, but require tagging under 1.3. |
Credis doesn't really seem to differentiate between what is connection related and what's just normal error related - looks like |
Sounds good, then. :-) |
@danhunsaker @chrisboulton We deployed a similar fix (which has typically yet to report any errors since we looked into it) peterjmit@e11af37. Having an exception specific to redis is definitely an improvement |
@peterjmit can you pls clarify if your issue was that there were jobs that was not picked by threads and were you able to fix it. I am facing a similar situation where if i q 30 jobs to be processed in default q some where around 20 - 25 gers picked up and processed and remaining just doesnt get picked up at all and the q state probably is cold |
@raajeshnew we have not actually tracked down the source of the issue, and we are still facing problems with losing data - your experience is good to know (and will hopefully help us figure things out!) |
I am working on an application that is processing ~25k jobs per day, and we have started to notice that some jobs are not being processed or queued, more problematic is that these failed (or missing) jobs have no corresponding log entries in Symfony (via https://github.com/michelsalib/BCCResqueBundle), php-fpm, nginx or redis.
After some investigation, I believe this is being caused by the
CredisException
suppression inResque_Redis
.Resque_Redis
andCredis_Client
Inside
Resque_Redis
all methods calls are handled magically by__call
and supported methods are proxied toCredis_Client::__call
.Credis_Client::_call
has a few failure states represented in part by aCredisException
, this can happen during connection to redis (Credis_Client::connect
) or during the call to the corresponding redis method.If a
CredisException
is raised,Resque_Redis::_call
will returnfalse
, this is less than ideal (as it is harder for users of this library to detect/deal with redis errors). However it is ok as long as all upstream usages ofResque_Redis::_call
check for afalse
return value.Unfortunately this brings me to my specific issue of losing data in
Resque::enqueue
.Resque::enqueue
callsResque_Job::create
which in turn callsResque::push
.Resque::push()
The variable
$length
is confusing/misleading here, if there is an exception raised by redis/credis during this call$length
is actuallyfalse
and is indicative of a failure state due to an exception (rather than an incorrect queue length). Again this is ok if the usage of this function checks for afalse
return value which at this point could indicate an exception, or an erroneously empty queue.Resque_Job::create()
If we remove the code that deals with job status/input validation we have
Clearly we fail to check the return value of
Resque::push()
here.Job::create
is the only usage ofResque::push()
. This means if we experience any kind of Redis error, or Credis error during Job creation the end user will be none the wiser.Solutions?
Resque_Redis::rpush
returnsfalse
CredisException
insideResque_Redis
The only reason I have hesitated to open a pull request, to at the very least remove the exception suppression is that I do not know if it is there for a specific reason/use case for it being there.
Notes.
Any one evaluating the use of this library in their application should be very wary it due to the possibility of silent data loss as of c335bc3 (or v1.2)
The text was updated successfully, but these errors were encountered: