"Jobs pending" #392

Closed
craigmayhew opened this Issue Jun 28, 2014 · 23 comments

Projects

None yet

4 participants

@craigmayhew

Following on from managing to get huginn to start as a service, I'm now finding that none of the agent jobs are actually running and just remain pending. How can I debug this?

The only debug info I have at present is that the CPU is at 100% all of the time running process, "/usr/bin/ruby1.9.1 bin/rails runner bin/threaded.rb". I've also restarted the service with the issue remaining after huginn comes back online.

The log for the agents show them as having run but not firing any events. Curiously, they run every 30 minutes but are only scheduled to run every 5 hours.

@cantino
Owner

That's very odd. Can you tell us more about how you're running Huginn? Are you using foreman start?

@craigmayhew

I'm running huginn as a service and start it with "sudo start huginn". I rebuilt my setup so I could run it as a service by adjusting the instructions from the wiki instructions to get it working on AWS: https://github.com/craigmayhew/scratchpad/blob/master/huginn%20install%20instructions. Mostly it was just packages that were needed.

Can you tell me how I can debug this? What log files might be relevant etc? I'm not too familiar with ruby but am used to debugging LAMP and LEMP.

@alias1
Collaborator
@craigmayhew

I've verified all tables have "DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci". I did import the data from another huginn install. It is possible that it's somehow damaged by my simple dump/import as I didn't specify a character set when importing. Although, I would expect there to be an error in a log somewhere that points towards the problem?

@dsander
Collaborator

@craigmayhew in the directory you installed huginn to you should have a log subdirectory, can you nopstie or gist the sidekiq.log and production.log?

@craigmayhew

production.log contains 19MB of this over and over:

I, [2014-06-29T21:55:55.972593 #31634] INFO -- : [1ec31f21-2c9a-439c-9794-f9ecc99dad1d] Processing by WorkerStatusController#show as JSON
I, [2014-06-29T21:55:55.986247 #31634] INFO -- : [1ec31f21-2c9a-439c-9794-f9ecc99dad1d] Completed 200 OK in 14ms (Views: 0.2ms | ActiveRecord: 8.9ms)
I, [2014-06-29T21:55:58.941426 #31631] INFO -- : [a27ba6a1-e9d3-421f-bc93-a4ffe1c78a71] Started GET "/worker_status" for 127.0.0.1 at 2014-06-29 21:55:58 +0000
I, [2014-06-29T21:55:58.942767 #31631] INFO -- : [a27ba6a1-e9d3-421f-bc93-a4ffe1c78a71] Processing by WorkerStatusController#show as JSON
I, [2014-06-29T21:55:58.955277 #31631] INFO -- : [a27ba6a1-e9d3-421f-bc93-a4ffe1c78a71] Completed 200 OK in 12ms (Views: 0.2ms | ActiveRecord: 4.1ms)
I, [2014-06-29T21:56:01.911989 #31634] INFO -- : [c123642d-6bad-4ba6-82a5-e40a6e51b4b4] Started GET "/worker_status" for 127.0.0.1 at 2014-06-29 21:56:01 +0000
I, [2014-06-29T21:56:01.913412 #31634] INFO -- : [c123642d-6bad-4ba6-82a5-e40a6e51b4b4] Processing by WorkerStatusController#show as JSON
I, [2014-06-29T21:56:01.925665 #31634] INFO -- : [c123642d-6bad-4ba6-82a5-e40a6e51b4b4] Completed 200 OK in 12ms (Views: 0.2ms | ActiveRecord: 7.8ms)
I, [2014-06-29T21:56:04.912401 #31631] INFO -- : [8ab5d807-054e-445d-9ac2-be4632df3793] Started GET "/worker_status" for 127.0.0.1 at 2014-06-29 21:56:04 +0000
I, [2014-06-29T21:56:04.913553 #31631] INFO -- : [8ab5d807-054e-445d-9ac2-be4632df3793] Processing by WorkerStatusController#show as JSON
I, [2014-06-29T21:56:04.927425 #31631] INFO -- : [8ab5d807-054e-445d-9ac2-be4632df3793] Completed 200 OK in 14ms (Views: 0.2ms | ActiveRecord: 8.7ms)
I, [2014-06-29T21:56:08.122558 #31634] INFO -- : [f2f1cc06-9d28-453e-8bad-88dabf7d6d6b] Started GET "/worker_status" for 127.0.0.1 at 2014-06-29 21:56:08 +0000
I, [2014-06-29T21:56:08.123954 #31634] INFO -- : [f2f1cc06-9d28-453e-8bad-88dabf7d6d6b] Processing by WorkerStatusController#show as JSON
I, [2014-06-29T21:56:08.134718 #31634] INFO -- : [f2f1cc06-9d28-453e-8bad-88dabf7d6d6b] Completed 200 OK in 11ms (Views: 0.2ms | ActiveRecord: 6.3ms)

sidekiq.log doesn't seem to exist anywhere on the server.

@cantino
Owner

It'd be nice to know what the CPU is doing when it's sitting there at 100%. Do you know how to use strace?

@craigmayhew

Yes but not how to effectively interpret the output in this case. I now have a 9MB file. It shows lot's of file reads, some of gems, some of documentation files. A spot check looks like it doesn't contain anything sensitive. How can I send this to you? I could upload it to github but I'm wary that I may have missed something.

@cantino
Owner

You could email it to me: andrew at andrewcantino.com

@cantino
Owner

Oops. "andrew"

@craigmayhew

Sent.

@dsander
Collaborator

@craigmayhew I am sorry, i actually ment delayed_job.log
You could also try to run bundle exec foreman start as the user you installed huginn and look at the output (make sure you stop your huginn instance before running it non daemonized)

@craigmayhew

delayed_job.log contains variations on this:

I, [2014-06-28T15:02:55.119162 #17031] INFO -- : 2014-06-28T15:02:55+0000: [Worker(host:ip-10-48-46-116 pid:17031)] Job Class#async_check_without_delay (id=13) RUNNING
I, [2014-06-28T15:04:37.583020 #17054] INFO -- : 2014-06-28T15:04:37+0000: [Worker(host:ip-10-48-46-116 pid:17054)] Starting job worker
I, [2014-06-28T15:04:44.892347 #17054] INFO -- : 2014-06-28T15:04:44+0000: [Worker(host:ip-10-48-46-116 pid:17054)] Job Class#receive! (id=8) RUNNING
I, [2014-06-28T15:08:50.974539 #17076] INFO -- : 2014-06-28T15:08:50+0000: [Worker(host:ip-10-48-46-116 pid:17076)] Starting job worker
I, [2014-06-28T15:08:56.553289 #17076] INFO -- : 2014-06-28T15:08:56+0000: [Worker(host:ip-10-48-46-116 pid:17076)] Job Class#receive! (id=6) RUNNING
I, [2014-06-28T15:11:39.126909 #17097] INFO -- : 2014-06-28T15:11:39+0000: [Worker(host:ip-10-48-46-116 pid:17097)] Starting job worker
I, [2014-06-28T15:11:40.632960 #17097] INFO -- : 2014-06-28T15:11:40+0000: [Worker(host:ip-10-48-46-116 pid:17097)] Job Class#receive! (id=4) RUNNING
I, [2014-06-28T15:13:08.485480 #17120] INFO -- : 2014-06-28T15:13:08+0000: [Worker(host:ip-10-48-46-116 pid:17120)] Starting job worker
I, [2014-06-28T15:13:14.669237 #17120] INFO -- : 2014-06-28T15:13:14+0000: [Worker(host:ip-10-48-46-116 pid:17120)] Job Class#async_check_without_delay (id=14) RUNNING
I, [2014-06-28T15:17:18.420897 #17142] INFO -- : 2014-06-28T15:17:18+0000: [Worker(host:ip-10-48-46-116 pid:17142)] Starting job worker
I, [2014-06-28T15:17:24.593250 #17142] INFO -- : 2014-06-28T15:17:24+0000: [Worker(host:ip-10-48-46-116 pid:17142)] Job Class#async_check_without_delay (id=12) RUNNING
I, [2014-06-28T15:20:21.070091 #17167] INFO -- : 2014-06-28T15:20:21+0000: [Worker(host:ip-10-48-46-116 pid:17167)] Starting job worker
I, [2014-06-28T15:20:22.582020 #17167] INFO -- : 2014-06-28T15:20:22+0000: [Worker(host:ip-10-48-46-116 pid:17167)] Job Class#receive! (id=11) RUNNING
I, [2014-06-28T15:21:23.155018 #17189] INFO -- : 2014-06-28T15:21:23+0000: [Worker(host:ip-10-48-46-116 pid:17189)] Starting job worker
I, [2014-06-28T15:21:29.049221 #17189] INFO -- : 2014-06-28T15:21:29+0000: [Worker(host:ip-10-48-46-116 pid:17189)] Job Class#async_check_without_delay (id=15) RUNNING
I, [2014-06-28T15:25:15.073233 #17211] INFO -- : 2014-06-28T15:25:15+0000: [Worker(host:ip-10-48-46-116 pid:17211)] Starting job worker

Nothing to indicate an error and nothing looking incredibly busy (e.g. job worker creation every other second).

I've tried running the "bundle exec foreman start" command from the /home/ubuntu/huginn/ directory and the /home/huginn/current/ directory (different .env file). They give the following errors respectively:

`raise_no_secret_key': Devise.secret_key was not set. Please add the following to your Devise initializer: (RuntimeError)

14:14:31 jobs.1 | /var/lib/gems/1.9.1/gems/devise-3.2.4/lib/devise/rails/routes.rb:487:in `raise_no_secret_key': Devise.secret_key was not set. Please add the following to your Devise initializer: (RuntimeError)
14:14:31 jobs.1 |
14:14:31 jobs.1 | config.secret_key = 'f3a820450dc5e2c40515f2513f1a9beac0dccae8046ac1cce04a72da86cff8a5117ad06f77bb4cfa313f2b46e2edad7b1c2440a2e2d5fad6bd2d6f80b8358130'
14:14:31 jobs.1 |
14:14:31 jobs.1 | Please ensure you restarted your application after installing Devise or setting the key.

and

14:15:37 web.1 | started with pid 1619
14:15:37 jobs.1 | started with pid 1622
14:16:24 jobs.1 | terminate called after throwing an instance of 'std::runtime_error'
14:16:24 jobs.1 | what(): Encryption not available on this event-machine
14:16:25 jobs.1 | exited with code 134
14:16:25 system | sending SIGTERM to all processes
14:16:25 web.1 | terminated by SIGTERM

So I've clearly damaged or changed my key. At a guess, could this mean that the huginn service will end up in a start, terminate, start, terminate loop?

@cantino
Owner

It seems like it's failing to source your .env file. Does bundle exec rails c work?

@craigmayhew

I fixed my key.

I ran the "bundle exec rails c" command from /home/huginn/current but the web interface maintains a "We're sorry, but something went wrong." during this time.

ubuntu@ip-172-30-0-63:/home/huginn/current$ sudo bundle exec rails c
sudo: unable to resolve host ip-172-30-0-63
Loading production environment (Rails 4.1.1)
irb(main):001:0>

As per the original issue, if I now use "sudo start huginn", the huginn interface is available but won't run any jobs.

@cantino
Owner

It sounds like something's strange with your deployment. How did you setup Huginn?

@craigmayhew

I used this list of instructions: https://github.com/craigmayhew/scratchpad/blob/master/huginn%20install%20instructions

For an additional test case, I've setup a fresh vm (this is the fourth) and this "jobs not running" issue occured after trying to use the twitter agent. From that moment onwards (could be a coincidence) no jobs on any agents ran. I will try pulling the latest git code and restart the server when I next have a moment.

@cantino
Owner

If you search your logs, do you see anything like "Terminating myself"?

@craigmayhew

Yes - but not recently. At the moment I get this over and over again:

terminate called after throwing an instance of 'std::runtime_error'
what(): Encryption not available on this event-machine

@cantino
Owner

Looks like an environment issue: ConradIrwin/em-imap#8

(Different project, similar issue.)

@craigmayhew

Solved. Thanks for all the help. I realise this ticket has been open for some time.

apt-get install libssl-dev
gem install eventmachine

@craigmayhew craigmayhew closed this Aug 5, 2014
@cantino
Owner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment