Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed yaml in handler could crash all delayed_job workers (and prevent recovery forever) #558

Closed
sankara opened this issue Aug 5, 2013 · 4 comments

Comments

@sankara
Copy link

sankara commented Aug 5, 2013

Issue: One of our jobs was loaded with a big list of data (totally unintentional and I'm fixing it in our end). The data was longer than the column width for handler and this resulted in a malformed (or rather incomplete) YAML. This repeatedly crashed all the workers and took sometime for me to figure out and isolate the issue. What worried me was that one job brought down all the workers and prevented recovery.

Steps to reproduce: Insert a job with a huge dataset

Diagnosis: The rescue in payload_object missed the exception raised.

Honestly, I'm not sure if this is something that would happen in the real world or it's something delayed_job has to handle. However, the exception handling can be more robust so that the job can be marked as failed and further jobs are processed.

    def handle_failed_job(job, error)
      job.last_error = "{#{error.message}\n#{error.backtrace.join("\n")}"
      say "#{job.name} failed with #{error.class.name}: #{error.message} - #{job.attempts} failed attempts", Logger::ERROR
      reschedule(job)
    end
      def payload_object
        @payload_object ||= YAML.load(self.handler)
      rescue TypeError, LoadError, NameError, ArgumentError => e
        raise DeserializationError,
          "Job failed to load: #{e.message}. Handler: #{handler.inspect}"
      end
/data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/railties-3.2.12/lib/rails/commands/runner.rb:53:in `eval': (<unknown>): found unexpected end of stream while scanning a quoted scalar at line 5 column 5 (Psych::SyntaxError)
    from /usr/lib64/ruby/1.9.1/psych.rb:203:in `parse_stream'
    from /usr/lib64/ruby/1.9.1/psych.rb:151:in `parse'
    from /usr/lib64/ruby/1.9.1/psych.rb:127:in `load'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/backend/base.rb:84:in `payload_object'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/backend/base.rb:71:in `name'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/worker.rb:230:in `handle_failed_job'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/worker.rb:191:in `block in run'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/lifecycle.rb:60:in `call'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/lifecycle.rb:60:in `block in initialize'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/lifecycle.rb:65:in `call'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/lifecycle.rb:65:in `execute'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/lifecycle.rb:38:in `run_callbacks'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/worker.rb:191:in `rescue in run'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/worker.rb:181:in `run'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/worker.rb:238:in `block in reserve_and_run_one_job'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/lifecycle.rb:60:in `call'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/lifecycle.rb:60:in `block in initialize'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/lifecycle.rb:65:in `call'
    from /data/dolphin/shared/bundled_gems/ruby/1.9.1/gems/delayed_job-3.0.3/lib/delayed/lifecycle.rb:65:in `execute'
@matt-hwy1
Copy link

I'm still having this issue. I'm processing some text files that contain scantron answers, and the column that holds that text appears to be breaking the daemon.

I'm using delayed_job_active_record 4.0

The error message I'm receiving is:
(): found unexpected end of stream while scanning a quoted scalar at line 5 column 11

Line 5, column 11 in the YAML doesn't look suspicious, but something about the format is tripping it up.

The YAML is here:
https://gist.github.com/matt-hwy1/8058426

The raw text that is stored in the data column is here:
https://gist.github.com/matt-hwy1/8058615

@matt-hwy1
Copy link

As a follow-up here, the issue I was having was caused by the mysql TEXT column type for the delayed_jobs table being too small for my data. It only holds 64 KB and my data in this case was much longer than that, causing the content in the YAML string to be truncated. Changing the column type to a LONGTEXT fixed the problem.

@fluxsaas
Copy link

if you run into the issue and have a lot of jobs already in the database, you can try to identify the broken job with:

Delayed::Job.find(JOB_ID).invoke_job

which should give you an error message (like the above)

@bjm88
Copy link

bjm88 commented Dec 11, 2015

Typical MySQL, just truncating data instead of throwing an error like good database with actual data integrity concerns. I suspect in MySQL 5.7 this is actually addressed depending on sql_modes db param.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants