Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In some circumstances Events never get cleared from callback_receiver buffer. #13368

Open
4 of 9 tasks
gamuniz opened this issue Dec 21, 2022 · 5 comments
Open
4 of 9 tasks

Comments

@gamuniz
Copy link
Contributor

gamuniz commented Dec 21, 2022

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.

Bug Summary

Events of a certain type can cause issues with the callback's bulk create. In those cases, some events can make it into the database and will later cause duplicate key errors:

Database Error Saving individual Job Event, error A string literal cannot contain NUL (0x00) characters.

The above exception breaks the initial bulk create that then causes the other issue:

Database Error Flushing Job Events
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "main_inventoryupdateevent_20221220_14_pkey"
DETAIL:  Key (id, job_created)=(974510, 2022-12-20 14:54:09.658252+00) already exists.

Those events never get cleared from the buffer and are later attempted again:

Error in events bulk_create, will try indiviually up to 5 errors, error duplicate key value violates unique constraint "main_inventoryupdateevent_20221221_04_pkey"
DETAIL:  Key (id, job_created)=(979979, 2022-12-21 04:25:23.812477+00) already exists.

AWX version

devel

Select the relevant components

  • UI
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

docker development environment

Modifications

no

Ansible version

2.14.0

Operating system

RHEL

Web browser

No response

Steps to reproduce

  1. create a job template which would send null character as result data
  2. tail logs

Expected results

All event data is saved to the database

Actual results

some events never make it to the database and are not cleared from the callback receiver's buffer.

Additional information

This issue seems to be caused initially by bad result data from an event, which causes a bulk create to fail after inserting X number of events, those events are then reprocessed and raise a duplicate key error repeatedly.

@shanemcd
Copy link
Member

I reproduced this by crafting a stupid plugin that causes null characters to be printed to stdout:

shanemcd/test-playbooks@40edb76

Then used this inventory plugin by using an inventory source from a project:

image (12)

The key here is that the bad character needs to be printed to stdout so it's passed into ansible_runner's event_callback via the handler we pass to pexpect:

https://github.com/ansible/ansible-runner/blob/c6644a15ce915b82bbb759f7581f6c4c2c756f5e/ansible_runner/runner.py#L51

The full breadcrumb trail for those that are interested:

https://github.com/ansible/ansible-runner/blob/c6644a15ce915b82bbb759f7581f6c4c2c756f5e/ansible_runner/runner.py#L160

https://github.com/ansible/ansible-runner/blob/c6644a15ce915b82bbb759f7581f6c4c2c756f5e/ansible_runner/runner.py#L294

If the null character is passed through Ansible's callback system we don't see this because it's escaped when we load the event data here:

https://github.com/ansible/ansible-runner/blob/c6644a15ce915b82bbb759f7581f6c4c2c756f5e/ansible_runner/runner.py#L70

Upon running my inventory sync with this bad plugin, I see this in my logs:

tools_awx_1     | 2022-12-22 01:51:56,691 WARNING  [-] awx.main.commands.run_callback_receiver Error in events bulk_create, will try indiviually up to 5 errors, error A string literal cannot contain NUL (0x00) characters.
tools_awx_1     | 2022-12-22 01:51:56,704 INFO     [-] awx.main.commands.run_callback_receiver Database Error Saving individual Job Event, error A string literal cannot contain NUL (0x00) characters.

@AlanCoding
Copy link
Member

Connect #13398

This issue seems to have gotten lost, @jay-steurer looked at the PRs. I am fairly confident that addressed everything described in this issue.

@jay-steurer
Copy link
Contributor

I will double-check it. As Alan mentioned, we had a PR for this issue.

@jay-steurer jay-steurer self-assigned this Mar 15, 2023
@jay-steurer
Copy link
Contributor

I was able to reproduce this issue using @shanemcd playbook

tools_awx_1 | awx-receiver stderr | 
tools_awx_1 | awx-receiver stderr | 2023-04-18 14:05:53,673 INFO     [-] awx.main.commands.run_callback_receiver Database Error Saving individual Event uuid=6885b1a4-7088-45a4-a01e-b1b6c796177d try=1, error: A string literal cannot contain NUL (0x00) characters.

Moving this issue to the backlog

@AlanCoding
Copy link
Member

^ that output from the example inventory still get saved. The log shows try=1, and the maximum retries is 3. So if you saw more retries, and then a message that it gave up on the event, ("Hit max retries") then it would be a lost event.

Instead, this is just an error log that sends it to the fallback logic. If you look at the patch review comments, #13398 (comment), you see they're all about performance. We do a replacement for this specific character (because it's a postgres-level error), but only after an initial try has been used and had an error. This is because of the performance impact of doing a find and replace on all events in the buffer - many of which are error-free, but didn't get saved because they were in the buffer alongside of error events.

The logs are noisy, yes, but that's because what the inventory does isn't really valid. This is working as designed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants