New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malformed yaml in handler could crash all delayed_job workers (and prevent recovery forever) #558
Comments
I'm still having this issue. I'm processing some text files that contain scantron answers, and the column that holds that text appears to be breaking the daemon. I'm using delayed_job_active_record 4.0 The error message I'm receiving is: Line 5, column 11 in the YAML doesn't look suspicious, but something about the format is tripping it up. The YAML is here: The raw text that is stored in the data column is here: |
As a follow-up here, the issue I was having was caused by the mysql TEXT column type for the delayed_jobs table being too small for my data. It only holds 64 KB and my data in this case was much longer than that, causing the content in the YAML string to be truncated. Changing the column type to a LONGTEXT fixed the problem. |
if you run into the issue and have a lot of jobs already in the database, you can try to identify the broken job with:
which should give you an error message (like the above) |
Typical MySQL, just truncating data instead of throwing an error like good database with actual data integrity concerns. I suspect in MySQL 5.7 this is actually addressed depending on sql_modes db param. |
Issue: One of our jobs was loaded with a big list of data (totally unintentional and I'm fixing it in our end). The data was longer than the column width for
handler
and this resulted in a malformed (or rather incomplete) YAML. This repeatedly crashed all the workers and took sometime for me to figure out and isolate the issue. What worried me was that one job brought down all the workers and prevented recovery.Steps to reproduce: Insert a job with a huge dataset
Diagnosis: The rescue in payload_object missed the exception raised.
Honestly, I'm not sure if this is something that would happen in the real world or it's something delayed_job has to handle. However, the exception handling can be more robust so that the job can be marked as failed and further jobs are processed.
The text was updated successfully, but these errors were encountered: