-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job Event processing slow when smart inventory is used #3106
Comments
TL;DR While processing events we find the host for which the event was generated. For smart-inventory this is expensive. We do this for each event, thus the slowdown. https://github.com/ansible/awx/blob/devel/awx/main/models/events.py#L343 Non-Smart InventoryThe query is, as expected, very fast and how we would expect it to look.
Smart InventoryThe query is much more computationally complex.
Why is the smart inventory query so complex?Smart inventory is dynamically generated. More concretely, the host_filter query is applied to any query that flows through a smart inventory. That is not the expensive part. Appending the queryset that removes duplicate host names is. https://github.com/ansible/awx/blob/devel/awx/main/managers.py#L53 New Smart InventoryThe query is simpler and much closer to the non-smart inventory query. This is desirable.
Note: The explain analyze timings were taken from a dev env with 1/10 of the data-set required to invoke the 500ms database query times we were seeing. However, the data-set is still characteristic of the original observed slowdown. We see that the original smart inventory queries take =~ 5ms and 0.3ms after optimizations. Ending ThoughtsI tried removing the distinct to see if there was a speed up.
For Testing Correctness
For DocsWe should call out that smart inventory will not include duplicate hosts. For example, if You have two inventories, EC2 East Coast and EC2 West Coast and you have a host named production in each. A smart inventory that matches production will only include one of those production hosts. Which host is non-deterministic. The host_filter "preview" view in the UI WILL show duplicates. This gives the user a chance to refine their query so that duplicates are not included. If the user goes through with saving a filter that includes duplicates hosts, the smart inventory will NOT include multiple hosts with the same name. It will include one of the hosts with that name only. |
Closing this, the actual issue is #3205 |
ISSUE TYPE
COMPONENT NAME
SUMMARY
Pushing job events into the database, in the callback receiver, is slow when a smart inventory is used.
Note: Pushing events into rabbit is fast, pulling events from rabbit is fast. Events end up queued in run_callback_receiver subprocesses. It's nice to see that > 40k events can sit in our callback receiver waiting to be pushed into the database, no problem.
ENVIRONMENT
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS
ADDITIONAL INFORMATION
SELECT "main_host"."id", "main_host"."created", "main_host"."modified", "main_host"."description", "main_host"."created_by_id", "main_host"."modified_by_id", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host_summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified", "main_host"."insights_system_id" FROM "main_host" WHERE ("main_host"."ansible_facts" @> '{"ansible_local": {"node": {"build": "beta"}}}' AND "main_host"."ansible_facts" @> '{"ansible_local": {"node": {"department": "engineering"}}}') ORDER BY "main_host"."name" ASC
The text was updated successfully, but these errors were encountered: