Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisioning freezes during ansible playbook launch #21972

Open
kTipSSIoYv opened this issue Jul 3, 2022 · 20 comments
Open

Provisioning freezes during ansible playbook launch #21972

kTipSSIoYv opened this issue Jul 3, 2022 · 20 comments

Comments

@kTipSSIoYv
Copy link

kTipSSIoYv commented Jul 3, 2022

Provisioning freezes during ansible playbook (method) launch. It doesn't even launch/run the ansible-playbook command.

Any new provisioning requests also gets frozen. This issue exists on Morphy and Lasker.

image

image

[----] I, [2022-07-03T13:34:47.582215 #1382:2b07b5895958]  INFO -- automation: Q-task_id([r725_miq_provision_1900]) Invoking [playbook] method [/ManageIQ_Custom/Cloud/VM/Provisioning/StateMachines/Methods/testing_vm] with inputs [{"vm_id"=>"nil"}]
[----] I, [2022-07-03T13:34:48.736359 #1382:2b07b5895958]  INFO -- automation: Q-task_id([r725_miq_provision_1900]) Playbook Method passing options:
[----] I, [2022-07-03T13:34:48.736915 #1382:2b07b5895958]  INFO -- automation: Q-task_id([r725_miq_provision_1900])
---
:become_enabled: false
:execution_ttl: '3'
:extra_vars:
  vm_id: nil
  :manageiq:
    automate_workspace: automate_workspaces/19d36f9e-4722-481c-ac2d-f5a12478cfad
    api_url: https://172.17.0.2
    api_token: f5992ebdecd767efd2fcae46e753edb7
    user: users/1
    group: groups/2
    X_MIQ_Group: EvmGroup-super_administrator
    request_task: requests/725/request_tasks/1900
    request: requests/725
  :manageiq_connection:
    url: https://172.17.0.2
    token: f5992ebdecd767efd2fcae46e753edb7
    X_MIQ_Group: EvmGroup-super_administrator
:hosts:
- 10.10.10.10
:verbosity: '3'
:credential: 96
[----] I, [2022-07-03T13:34:49.434445 #1382:2b07b5895958]  INFO -- automation: Q-task_id([r725_miq_provision_1900]) Ansible job with ref 42342 was created.
[----] I, [2022-07-03T13:34:50.512541 #297:2b1baf6ad95c]  INFO -- automation: User [admin] with current group ID [1] name [Tenant My Company/1 access]
[----] I, [2022-07-03T13:34:50.515707 #297:2b1baf6ad95c]  INFO -- automation: Instantiating [/System/Process/Event?EventStream%3A%3Aevent_stream=19884&MiqServer%3A%3Amiq_server=4&User%3A%3Auser=1&event_id=19884&event_stream_id=19884&event_type=AWS_EC2_Subnet_UPDATE&object_name=Event&vmdb_object_type=event_stream]
[----] I, [2022-07-03T13:34:50.528811 #297:2b1baf6ad95c]  INFO -- automation: Updated namespace [/System/Process/Event?EventStream%3A%3Aevent_stream=19884&MiqServer%3A%3Amiq_server=4&User%3A%3Auser=1&event_id=19884&event_stream_id=19884&event_type=AWS_EC2_Subnet_UPDATE&object_name=Event&vmdb_object_type=event_stream  ManageIQ/System]
[----] I, [2022-07-03T13:34:50.549414 #297:2b1baf6ad95c]  INFO -- automation: Following Relationship [miqaedb:/System/Event/EmsEvent/AMAZON/AWS_EC2_Subnet_UPDATE#create]
[----] I, [2022-07-03T13:34:50.554425 #297:2b1baf6ad95c]  INFO -- automation: Updated namespace [miqaedb:/System/Event/EmsEvent/AMAZON/AWS_EC2_Subnet_UPDATE#create  ManageIQ/System/Event/EmsEvent]
[----] I, [2022-07-03T13:34:50.563418 #295:2ad159e6f974]  INFO -- automation: User [admin] with current group ID [1] name [Tenant My Company/1 access]
[----] I, [2022-07-03T13:34:50.564429 #297:2b1baf6ad95c]  INFO -- automation: Following Relationship [miqaedb:/System/event_handlers/refresh#create]
[----] I, [2022-07-03T13:34:50.565566 #295:2ad159e6f974]  INFO -- automation: Instantiating [/System/Process/Event?EventStream%3A%3Aevent_stream=19885&MiqServer%3A%3Amiq_server=4&User%3A%3Auser=1&event_id=19885&event_stream_id=19885&event_type=AWS_EC2_NetworkInterface_CREATE&object_name=Event&vmdb_object_type=event_stream]
[----] I, [2022-07-03T13:34:50.569891 #297:2b1baf6ad95c]  INFO -- automation: Updated namespace [miqaedb:/System/event_handlers/refresh#create  ManageIQ/System]
[----] I, [2022-07-03T13:34:50.575273 #295:2ad159e6f974]  INFO -- automation: Updated namespace [/System/Process/Event?EventStream%3A%3Aevent_stream=19885&MiqServer%3A%3Amiq_server=4&User%3A%3Auser=1&event_id=19885&event_stream_id=19885&event_type=AWS_EC2_NetworkInterface_CREATE&object_name=Event&vmdb_object_type=event_stream  ManageIQ/System]

Here is another example of provisioning freezing. It works fine when its just a single provisioning but when multiple VM counts are set, it freezes after the playbook options.

[----] I, [2022-07-12T03:46:20.248847 #307:2ad5997a5974]  INFO -- automation: Q-task_id([r821_miq_provision_2087]) Playbook Method passing options:
[----] I, [2022-07-12T03:46:20.249442 #307:2ad5997a5974]  INFO -- automation: Q-task_id([r821_miq_provision_2087])
---
:become_enabled: false
:execution_ttl: '3'
:extra_vars:
  vm_id: i-03bd18e4d331734b8
  :manageiq:
    automate_workspace: automate_workspaces/c866a302-ff83-420e-907f-371273f157c3
    api_url: https://172.17.0.2
    api_token: 88ad01ad237a6d0ac4c27f32098ea2fb
    user: users/1
    group: groups/2
    X_MIQ_Group: EvmGroup-super_administrator
    request_task: requests/821/request_tasks/2087
    request: requests/821
  :manageiq_connection:
    url: https://172.17.0.2
    token: 88ad01ad237a6d0ac4c27f32098ea2fb
    X_MIQ_Group: EvmGroup-super_administrator
:hosts:
- localhost
:verbosity: '3'
:credential: 96
[----] I, [2022-07-12T03:46:20.252724 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088]) Updated namespace [Cloud/VM/Provisioning/StateMachines/Methods/testing_vm  ManageIQ_Custom/Cloud/VM/Provisioning/StateMachines]
[----] I, [2022-07-12T03:46:20.261804 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088]) Invoking [playbook] method [/ManageIQ_Custom/Cloud/VM/Provisioning/StateMachines/Methods/testing_vm] with inputs [{"vm_id"=>"i-092a45b3bddd9a0f2"}]
[----] I, [2022-07-12T03:46:20.525966 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088]) Playbook Method passing options:
[----] I, [2022-07-12T03:46:20.526465 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088])
---
:become_enabled: false
:execution_ttl: '3'
:extra_vars:
  vm_id: i-092a45b3bddd9a0f2
  :manageiq:
    automate_workspace: automate_workspaces/546e38fc-bd83-41eb-8c8c-a701965eae20
    api_url: https://172.17.0.2
    api_token: c97d29074a3baac85a7aa2000ab84c8a
    user: users/1
    group: groups/2
    X_MIQ_Group: EvmGroup-super_administrator
    request_task: requests/821/request_tasks/2088
    request: requests/821
  :manageiq_connection:
    url: https://172.17.0.2
    token: c97d29074a3baac85a7aa2000ab84c8a
    X_MIQ_Group: EvmGroup-super_administrator
:hosts:
- localhost
:verbosity: '3'
:credential: 96
[----] I, [2022-07-12T03:46:20.581454 #307:2ad5997a5974]  INFO -- automation: Q-task_id([r821_miq_provision_2087]) Ansible job with ref 44048 was created.
[----] I, [2022-07-12T03:46:21.172005 #309:2abe5625b964]  INFO -- automation: Q-task_id([r821_miq_provision_2088]) Ansible job with ref 44049 was created.

If I restart the Docker instance, the rest of the provisioning continues.

@Fryguy Fryguy added the bug label Jul 6, 2022
@Fryguy
Copy link
Member

Fryguy commented Jul 6, 2022

@kbrock Can you take a look at this one?

@kTipSSIoYv
Copy link
Author

kTipSSIoYv commented Jul 20, 2022

@Fryguy @kbrock This issue still persist and it happens when you provisioning more than 1 VM. Playbook executes and provisioning continues when provisioning only 1 VM but we try 2 or 3 VM then the provisioning gets stuck.

I have to restart docker container in order for other requests to continue.

@kTipSSIoYv
Copy link
Author

kTipSSIoYv commented Aug 8, 2022

This is still an issue. Sometimes ansible job launches and sometimes it doesn't and other jobs ends up as either stuck in active state or ends up pending even if approved.

image

irb(main):001:0> pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694")
PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0
#<ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job:0x000055fe967719e8
 id: 559,
 name: "azure_sg_create.yaml",
 type: "ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job",
 description: nil,
 status: "Pre_execute",
 ems_ref: "66694",
 ancestry: nil,
 ems_id: 1,
 orchestration_template_id: nil,
 created_at: Mon, 08 Aug 2022 15:03:37 UTC +00:00,
 updated_at: Mon, 08 Aug 2022 15:03:37 UTC +00:00,
 retired: nil,
 retires_on: nil,
 retirement_warn: nil,
 retirement_last_warn: nil,
 retirement_state: nil,
 retirement_requester: nil,
 status_reason: nil,
 cloud_tenant_id: nil,
 resource_group: nil,
 start_time: nil,
 finish_time: nil,
 configuration_script_base_id: 126,
 verbosity: 3,
 hosts: nil,
 evm_owner_id: nil,
 miq_group_id: nil,
 tenant_id: nil>
=> #<ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job id: 559, name: "azure_sg_create.yaml", type: "ManageIQ::Providers::EmbeddedAnsible::AutomationMa...", description: nil, status: "Pre_execute", ems_ref: "66694", ancestry: nil, ems_id: 1, orchestration_template_id: nil, created_at: "2022-08-08 15:03:37", updated_at: "2022-08-08 15:03:37", retired: nil, retires_on: nil, retirement_warn: nil, retirement_last_warn: nil, retirement_state: nil, retirement_requester: nil, status_reason: nil, cloud_tenant_id: nil, resource_group: nil, start_time: nil, finish_time: nil, configuration_script_base_id: 126, verbosity: 3, hosts: nil, evm_owner_id: nil, miq_group_id: nil, tenant_id: nil>

@agrare
Copy link
Member

agrare commented Aug 8, 2022

@kTipSSIoYv what do you see for the playbook stdout?

pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694").raw_stdout_json

@kTipSSIoYv
Copy link
Author

kTipSSIoYv commented Aug 8, 2022

@agrare Is there a typo in the command? I see a syntax error.

SyntaxError ((irb):15: syntax error, unexpected local variable or method, expecting `end')
..._ref => "66694")raw_stdout_json
...                ^~~~~~~~~~~~~~~

@agrare
Copy link
Member

agrare commented Aug 8, 2022

Updated, you need the .

@kTipSSIoYv
Copy link
Author

kTipSSIoYv commented Aug 8, 2022

irb(main):016:0> pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694").raw_stdout_json
Traceback (most recent call last):
       15: from bin/rails:4:in `<main>'
       14: from bootsnap (1.11.1) lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:30:in `require'
       13: from bootsnap (1.11.1) lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:30:in `require'
       12: from railties (6.0.5.1) lib/rails/commands.rb:18:in `<main>'
       11: from railties (6.0.5.1) lib/rails/command.rb:46:in `invoke'
       10: from railties (6.0.5.1) lib/rails/command/base.rb:69:in `perform'
        9: from thor (1.2.1) lib/thor.rb:392:in `dispatch'
        8: from thor (1.2.1) lib/thor/invocation.rb:127:in `invoke_command'
        7: from thor (1.2.1) lib/thor/command.rb:27:in `run'
        6: from railties (6.0.5.1) lib/rails/commands/console/console_command.rb:102:in `perform'
        5: from railties (6.0.5.1) lib/rails/commands/console/console_command.rb:19:in `start'
        4: from railties (6.0.5.1) lib/rails/commands/console/console_command.rb:70:in `start'
        3: from (irb):15
        2: from (irb):16:in `rescue in irb_binding'
        1: from activemodel (6.0.5.1) lib/active_model/attribute_methods.rb:429:in `method_missing'
NoMethodError (private method `raw_stdout_json' called for #<ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job:0x000055fe95d4a050>)
Did you mean?  raw_stdout


irb(main):017:0> pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694").raw_stdout
""
=> ""

@agrare
Copy link
Member

agrare commented Aug 8, 2022

Okay someone who actually knows embedded Ansible is going to have to help you sorry @kTipSSIoYv

@kTipSSIoYv
Copy link
Author

kTipSSIoYv commented Aug 8, 2022

@kbrock @agrare Now the other jobs are running.

image

@jrafanie
Copy link
Member

jrafanie commented Aug 8, 2022

What did the stdout show? What changed to make it work?

@agrare
Copy link
Member

agrare commented Aug 8, 2022

@kTipSSIoYv do you mean they're working, or the older ones timed out and now the latest is hung

@kTipSSIoYv
Copy link
Author

kTipSSIoYv commented Aug 8, 2022

@jrafanie @agrare after running the pp ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.find_by(:ems_ref => "66694").raw_stdout command other stuck jobs started to work. It was stuck until then. Note: Playbook fails right now without the variables and that's why the status is error.

@agrare
Copy link
Member

agrare commented Aug 9, 2022

@kTipSSIoYv is that reproducible?

@kTipSSIoYv
Copy link
Author

kTipSSIoYv commented Aug 11, 2022

@agrare Yea its reproducible. Usually when you launch multiple count of servers using a single catalog it gets stuck during ansible playbook launch.

@kTipSSIoYv
Copy link
Author

kTipSSIoYv commented Sep 5, 2022

@Fryguy In the latest version of Narjdorf Ansible job isn't launching at all. I see the following log entry but ansible runner isn't launching at all.

[----] I, [2022-09-05T16:20:05.401602 #324:8cdc] INFO -- automation: Q-task_id([r1169_miq_provision_2912]) Ansible job with ref 92663 was created.

@miq-bot miq-bot added the stale label Feb 27, 2023
@miq-bot
Copy link
Member

miq-bot commented Feb 27, 2023

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

@kbrock kbrock removed the stale label Mar 9, 2023
@miq-bot miq-bot added the stale label Jun 12, 2023
@miq-bot
Copy link
Member

miq-bot commented Jun 12, 2023

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

@kbrock kbrock removed the stale label Jul 13, 2023
@miq-bot miq-bot added the stale label Oct 16, 2023
@miq-bot
Copy link
Member

miq-bot commented Oct 16, 2023

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

@miq-bot
Copy link
Member

miq-bot commented Jan 22, 2024

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

1 similar comment
@miq-bot
Copy link
Member

miq-bot commented Apr 29, 2024

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants