Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Xenial] Execution stuck in 'requested' state on a fresh RabbitMQ #3290

Closed
armab opened this issue Mar 15, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@armab
Copy link
Member

commented Mar 15, 2017

When executing any st2 action for the first time on a fresh & clean RabbitMQ immediately after st2 startup, - it runs forewer and stuck in requested state:

root@ubuntu16:~# st2 run core.local echo 123
....................................................................................................................................................
root@ubuntu16:~# st2 execution list
+--------------------------+------------+--------------+------------------------+------------------------+----------------------+
| id                       | action.ref | context.user | status                 | start_timestamp        | end_timestamp        |
+--------------------------+------------+--------------+------------------------+------------------------+----------------------+
| 58c96cadc8980518d7cf8ada | core.local | st2admin     | requested              | Wed, 15 Mar 2017       |                      |
|                          |            |              |                        | 16:32:45 UTC           |                      |
+--------------------------+------------+--------------+------------------------+------------------------+----------------------+


root@ubuntu16:~# st2 execution get 58c96cadc8980518d7cf8ada
id: 58c96cadc8980518d7cf8ada
status: requested
parameters: 
  cmd: echo 123
result: None

I can only guess that at early point st2 is busy with RabbitMQ bootstrapping and for some reason can't trigger an action (since topic/queue is not yet created/message is lost or something like that ?) when running things for the first time.

Reproduce

Requirements to reproduce:

  • OS is Ubuntu Xenial
  • RabbitMQ is clean
  • st2 was just started
  • action runs immediately after st2 start

Script to reproduce

I could reproduce it every time with this script:

#!/bin/bash

# output executed commands
set -o xtrace

sudo st2ctl stop
# emulate fresh & clean RabbitMQ
rabbitmqctl stop_app
sudo rabbitmqctl reset
rabbitmqctl start_app

# isolate & make sure the problem is not with RabbitMQ startup
sleep 30
# but with StackStorm startup itself
sudo st2ctl start

# The command is stuck and runs forever
# See: st2 execution list
st2 run core.local echo 123

This is similar to StackStorm/st2-packages#445 (comment)
The problem is more serious than it looks like, being blocker for Automation, when deploying StackStorm in prod. Stuck execution immediately after startup is pretty much a bad thing.

I originally thought this could be solved with packaging, but after repro it sounds like more about StackStorm core.
cc @m4dcoder @Kami @lakshmi-kannan.

armab added a commit to StackStorm/st2-packages that referenced this issue Mar 15, 2017

@armab armab added the deployment label Mar 15, 2017

armab added a commit to StackStorm/st2-packages that referenced this issue Mar 15, 2017

@lakshmi-kannan lakshmi-kannan self-assigned this Mar 16, 2017

@lakshmi-kannan

This comment has been minimized.

Copy link
Contributor

commented Mar 16, 2017

I'll look into this. cc: @Kami, @m4dcoder

@armab armab changed the title [Xenial] Execution stuck in 'requested' state on a clean RabbitMQ [Xenial] Execution stuck in 'requested' state on a fresh RabbitMQ Apr 11, 2017

@Kami

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

Interesting thing about this issue is that I can't replicate it if I set number of action runner workers to 1.

This makes it look like some kind of weird race inside action runners processes.

@armab

This comment has been minimized.

Copy link
Member Author

commented Aug 3, 2017

Great!
+1 step for better problem isolation

@Kami

This comment has been minimized.

Copy link
Member

commented Aug 4, 2017

Confirmed that #3648 fixes this.

@armab it also wouldn't be bad if you can confirm as well with the latest v2.4dev packages :)

@Kami Kami added this to the 2.4.0 milestone Aug 4, 2017

@Kami

This comment has been minimized.

Copy link
Member

commented Aug 4, 2017

Btw, now while testing the fix - we can also remove sleep 30 from the script above to speed up the testing a bit and it works fine (aka it seems RabbitMQ starts up quite fast so sleep is not really necessary).

@nmaludy

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2017

+1 this just bit me in the puppet module

i believe the lines @Kami is referring to is: https://github.com/StackStorm/st2-packages/blob/master/scripts/st2bootstrap-deb.sh#L557-L561

armab added a commit to StackStorm/st2-packages that referenced this issue Aug 7, 2017

@armab

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2017

@Kami Awesome job!

This allows removing horrible timeout 30 hack from curl|bash installer: StackStorm/st2-packages#480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.