Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Discovery got same problem with body variable #51

Closed
elwood218 opened this issue May 30, 2022 · 16 comments
Closed

[BUG] Discovery got same problem with body variable #51

elwood218 opened this issue May 30, 2022 · 16 comments
Assignees
Labels
bug Something isn't working

Comments

@elwood218
Copy link

elwood218 commented May 30, 2022

Describe the bug

There are 2 errors but one is related to the module and the other is imho a problem of the API.
The error with the module is again that "body" is used but not defined. (If I put that in a loop over "play_hosts" then it works)
The general problem is that first hosts get "service discovered" and then there is a 500 error or so..

TASK [cmk_host_registration : Add/update/remove host] ****************************************************************
changed: [Host1 -> localhost]
changed: [Host2 -> localhost]
changed: [Host3 -> localhost]
changed: [Host4 -> localhost]
changed: [Host5 -> localhost]

RUNNING HANDLER [cmk_host_registration : service discovery] **********************************************************
changed: [Host4 -> localhost]
changed: [Host1 -> localhost]
changed: [Host3 -> localhost]
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'body'
fatal: [Host5 -> localhost]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n  File \"/Users/mathias.buresch/.ansible/tmp/ansible-tmp-1653903276.5778098-98682-264119765984817/AnsiballZ_cmk_discovery.py\", line 107, in <module>\n    _ansiballz_main()\n  File \"/Users/mathias.buresch/.ansible/tmp/ansible-tmp-1653903276.5778098-98682-264119765984817/AnsiballZ_cmk_discovery.py\", line 99, in _ansiballz_main\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n  File \"/Users/mathias.buresch/.ansible/tmp/ansible-tmp-1653903276.5778098-98682-264119765984817/AnsiballZ_cmk_discovery.py\", line 47, in invoke_module\n    runpy.run_module(mod_name='ansible.modules.cmk_discovery', init_globals=dict(_module_fqn='ansible.modules.cmk_discovery', _modlib_path=modlib_path),\n  File \"/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py\", line 209, in run_module\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py\", line 96, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py\", line 86, in _run_code\n    exec(code, run_globals)\n  File \"/var/folders/6_/qjhhb7fn13g6r839l49tjg58s_y621/T/ansible_cmk_discovery_payload_vulpzc1w/ansible_cmk_discovery_payload.zip/ansible/modules/cmk_discovery.py\", line 170, in <module>\n  File \"/var/folders/6_/qjhhb7fn13g6r839l49tjg58s_y621/T/ansible_cmk_discovery_payload_vulpzc1w/ansible_cmk_discovery_payload.zip/ansible/modules/cmk_discovery.py\", line 166, in main\n  File \"/var/folders/6_/qjhhb7fn13g6r839l49tjg58s_y621/T/ansible_cmk_discovery_payload_vulpzc1w/ansible_cmk_discovery_payload.zip/ansible/modules/cmk_discovery.py\", line 147, in run_module\nKeyError: 'body'\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'body'

Component Name

discovery

Ansible Version

$ ansible --version
ansible [core 2.12.5]

Checkmk Version

2.1.0 (CEE)

Collection Version

$ ansible-galaxy collection list
Tested with 0.3.2 and "devel"

Environment

To Reproduce
Steps to reproduce the behavior:

- name: "Add/update/remove host"
#  tribe29.checkmk.host:
  cmk_host:
    server_url: "https://{{ cmk_central }}/"
    site: "{{ cmk_central_site }}"
    automation_user: "{{ cmk_site_user }}"
    automation_secret: "{{ cmk_site_password }}"
    host_name: "{{ cmk_host_name | default(host_name) }}"
    attributes:
      "{{ cmk_host_attributes | default(omit) }}"
    folder: "{{ host_folder }}"
    state: "{{ cmk_host_state | default('present') }}"
  delegate_to: localhost
  become: no
  notify:
    - service discovery
    - activate changes
- name: service discovery
#  tribe29.checkmk.discovery:
  cmk_discovery:
    server_url: "https://{{ cmk_central }}/"
    site: "{{ cmk_central_site }}"
    automation_user: "{{ cmk_site_user }}"
    automation_secret: "{{ cmk_site_password }}"
    host_name: "{{ host_name }}"
#    host_name: "{{ hostvars[item]['cmk_host_name'] | default(hostvars[item]['host_name']) }}"
#    state: "fix_all"
    state: "new"
  delegate_to: localhost
#  loop: "{{ play_hosts }}"
##  loop_control:
##   pause: 3
#  run_once: true
  become: no

Expected behavior

  1. There should be a correct error message
  2. API should not throw an error

Actual behavior

Screenshots

Additional context

@elwood218 elwood218 added the bug Something isn't working label May 30, 2022
lgetwan added a commit that referenced this issue Jun 2, 2022
lgetwan added a commit that referenced this issue Jun 2, 2022
lgetwan added a commit that referenced this issue Jun 3, 2022
@lgetwan
Copy link
Contributor

lgetwan commented Jun 3, 2022

Hi @elwood218,

I fixed the discovery module in the devel branch. The error message still looks a bit ugly, but at least it's no longer an "Exception", and it contains the actual error message from the API.
Can you please test it?

Best regards
Lars

@elwood218
Copy link
Author

@lgetwan This is the output now.

TASK [cmk_host_registration : service discovery] *********************************************************************
Friday 03 June 2022  11:09:39 +0200 (0:00:17.068)       0:01:15.787 ***********
Friday 03 June 2022  11:09:39 +0200 (0:00:17.068)       0:01:15.787 ***********
changed: [pgdb03 -> localhost]
changed: [pgdb02 -> localhost]
changed: [db01 -> localhost]
fatal: [pgdb01 -> localhost]: FAILED! => changed=false
  http_code: -1
  msg: 'Error calling API. HTTP Return Code is -1 Details: N/A'
fatal: [db02 -> localhost]: FAILED! => changed=false
  http_code: -1
  msg: 'Error calling API. HTTP Return Code is -1 Details: N/A'
fatal: [app01 -> localhost]: FAILED! => changed=false
  http_code: -1
  msg: 'Error calling API. HTTP Return Code is -1 Details: N/A'

@lgetwan
Copy link
Contributor

lgetwan commented Jun 3, 2022

Hi @elwood218,

I definitely didn't expect a "-1". :-o
But when checking the sources of fetch_url, I found some further information about that case. I improved the error message. Can you please check again?
Thanks so far for your help!

Best regards
Lars

@elwood218
Copy link
Author

Hi, if that git pull are your changes then the output is same:

From https://github.com/tribe29/ansible-collection-tribe29.checkmk
   3c52245..32ca914  devel                    -> origin/devel
   58aba7f..8315f77  fix-51-exceptionhandling -> origin/fix-51-exceptionhandling
Updating 3c52245..32ca914
 fatal: [pgdb03 -> localhost]: FAILED! => changed=false
  http_code: -1
  msg: 'Error calling API. HTTP Return Code is -1 Details: N/A'

@lgetwan
Copy link
Contributor

lgetwan commented Jun 3, 2022

Hi,
for some reason, my fix wasn't merged into the devel branch. I did that again and verified it. Can you please try again?
Best regards
Lars

@elwood218
Copy link
Author

Hi, now it looks like that:

fatal: [pgdb03 -> localhost]: FAILED! => changed=false
  http_code: -1
  msg: 'Error calling API. HTTP Return Code is -1 Details: Connection failure: The read operation timed out'

Btw. before when I was getting a 500 from the API sometimes a link to a crashreport was directly shown.. but it seems that this is not working anymore. Just wanted to mention it - don't know if it is related.

@lgetwan
Copy link
Contributor

lgetwan commented Jun 3, 2022

Hi,
for some reason, your API is not responding in time.
Probably you should try to do the call with curl for testing:

out=$(curl -v \
    --request POST \
    --write-out "\nxxx-status_code=%{http_code}\n" \
    --header "Authorization: Bearer <your_user> <your_password>" \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '{ "alias": "SLES12_cgroup2", "name": "SLES12_cgroup2" }' \
    "http://<your_cmk_server>/<your_cmk_site>/check_mk/api/1.0/objects/host/<your_host>/actions/discover_services/invoke")

echo "$out"
resp=$( echo "${out}" | grep -v "xxx-status_code" )
code=$( echo "${out}" | awk -F"=" '/^xxx-status_code/ {print $2}')
echo "$resp" | jq
if [[ $code -lt 400 ]]; then
     echo "OK"
else
     echo "Request error"
fi

How long does this call take approximately?

@elwood218
Copy link
Author

elwood218 commented Jun 3, 2022

Hi, I don't think that this is the problem.

xxx-status_code=302
OK

The problem only occurs after the first 2 or 3 hosts and only when called like I already described in #33.

But you can close the this issue if you want because it was more or less only about the body variable. At least we got now a "better" error message.
The issue with the API I don't really know how to debug and for now my workaround is at least working.

Thank you for your help.

@elwood218
Copy link
Author

I just recognized that the curl was missing a "follow" so I put https manually. Now this is the output:

{"title": "Bad Request", "status": 400, "detail": "These fields have problems: name, alias", "fields": {"name": ["Unknown field."], "alias": ["Unknown field."]}}
xxx-status_code=400
{
  "title": "Bad Request",
  "status": 400,
  "detail": "These fields have problems: name, alias",
  "fields": {
    "name": [
      "Unknown field."
    ],
    "alias": [
      "Unknown field."
    ]
  }
}
Request error

@elwood218
Copy link
Author

It must look like that I guess:

   --data '{
          "mode": "new"
        }' \

@elwood218
Copy link
Author

           http_code:  200
     time_namelookup:  0.072466s
        time_connect:  0.087947s
     time_appconnect:  0.250204s
    time_pretransfer:  0.250424s
       time_redirect:  0.000000s
  time_starttransfer:  0.250430s
                     ----------
          time_total:  4.760772s

But service discovery could take longer.. and like I said it is working when I "loop" over it in Ansible but when I run it for each inventory host (which could be parallel) then it fails.

@elwood218
Copy link
Author

elwood218 commented Jun 3, 2022

I have just tested it on shell.

discover-all-hosts.sh -> this script does a "for" loop - so sequential and it takes about 4 secs for each host like in my last post.

But if I do that and run all those 4 scripts in parallel:

discover-host1.sh
discover-host2.sh
discover-host3.sh
discover-host4.sh

Then these are the times of each host:

      time_total:  4.544909s
      time_total:  9.610563s
      time_total:  13.358466s
      time_total:  17.429323s

@robin-checkmk
Copy link
Member

If I understand this right, we are chasing another issue here, so @elwood218 would you be so kind to close this issue and move your research about the timing issue into a new issue? There we can then research, whether it is a collection issue or an API issue.

Thanks!

@elwood218
Copy link
Author

@robin-tribe29 It is an issue of the API because I have tested it on shell. So I write you an email at feedback if you can't put a ticket yourself. This ticket can be closed.

@robin-checkmk
Copy link
Member

@elwood218 Alright, then please send an email, so all relevant information is in there. Thanks!

@elwood218
Copy link
Author

But maybe the module could use bulk discovery but I don't know how to solve that in Ansible.. (and the discovery and "accepting" is not working anyway - already opened a FEED for that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants