New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faktory cannot boot, because Redis is slow to load (?) #225
Comments
I just copied my database volume and tried booting
|
If you can get me a copy of that database I'll see if I can reproduce the issue. Master has a fix for this LOADING issue and I'd like to verify it fixes the problem before releasing 1.0.1. |
Oh, hey! There it is! I didn't keep reading down because I was feeling a bit lost in the sauce. Happy to send you that file in the morning, I don't currently have remove access. Now that's a fast fix! |
🎉 THANK YOU! 🎉 The fix has been confirmed and will be in 1.0.1.
|
@phawzy I would consider it an error that the machine can't load its dataset in under a minute. IMO that signals that you need to upgrade to a faster machine or trim down the size of your jobs. I don't want to make this a config option -- the user should not have to care about this -- and I don't want to extend the timeout any further because it's arbitrary, what do I extend it to: 2 min, 5 min, 10 min? A 6GB dataset indicates many millions of jobs or you are loading too much state into Faktory when you should be storing that state in another persistent store, like S3. |
Oofa this just bit us. Background job processing is the bread and butter of our app and we just ported over a huge portion of our Sidekiq jobs to Faktory. I had to build a custom Faktory binary and deploy it. It took 66 seconds for Redis to boot. Previously we used Sidekiq with four 16 gb Redis ElastiCache instances (just to give you a use case datapoint). |
There's never a right value for a maximum timeout. I've lengthened the value on master from 60 to 600. |
Faktory is unable to startup. I receive error:
I'm using the Docker configuration, storing data in a separate docker volume:
This seems to be the relevant area of code I'm getting to: https://github.com/contribsys/faktory/blob/master/storage/redis.go#L48
I get the impression that there is not currently a mechanism to handle the
LOADING
status and try again. I don't know Go at all, so I'm kind of guessing. I see this being an issue in some other projects:Here's some background about arriving here:
I had a system issue this weekend where a data cleanup job processor was not running and a lot of jobs were queued totaling a somewhat sizable amount of data. Looking at various logs, I believe I ran out of memory at some point, as I have chunk of lines like this in the Redis log:
I saw the following error on my worker processes:
This made me think I also ran out of disk space, but disk space appeared to be okay on the host (around 50%)
I've doubled the amount of memory the host VM has available, but Faktory still will not boot. The redis log generated when attempting to boot is below, which seems to show Redis is getting up, but there's a lot of warnings and I'm not totally fluent in what these are implying. Looking at older logs, I feel that these may even be common in a successful run of Faktory.
The text was updated successfully, but these errors were encountered: