Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job Event processing slow when smart inventory is used #3106

Closed
chrismeyersfsu opened this issue Jan 30, 2019 · 2 comments
Closed

Job Event processing slow when smart inventory is used #3106

chrismeyersfsu opened this issue Jan 30, 2019 · 2 comments

Comments

@chrismeyersfsu
Copy link
Member

chrismeyersfsu commented Jan 30, 2019

ISSUE TYPE
  • Bug Report
COMPONENT NAME
  • API
SUMMARY

Pushing job events into the database, in the callback receiver, is slow when a smart inventory is used.

Note: Pushing events into rabbit is fast, pulling events from rabbit is fast. Events end up queued in run_callback_receiver subprocesses. It's nice to see that > 40k events can sit in our callback receiver waiting to be pushed into the database, no problem.

ENVIRONMENT
  • Tower 3.4.1
STEPS TO REPRODUCE
  • Create an inventory with at least 3,000 hosts
  • Create 60 smart inventories that matches a fraction of those hosts, say 300.
  • Run a playbook against any of the smart inventories.
EXPECTED RESULTS
  • Shortly after ansible job finishes, stdout shows in the UI
ACTUAL RESULTS
  • job events take a long time to be processed (> 500 ms per event). This is observable in the UI/API
ADDITIONAL INFORMATION
  • Postgres queries taking =>~ 500 ms. Confirmed they are issued from callback receiver via strace.
    SELECT "main_host"."id", "main_host"."created", "main_host"."modified", "main_host"."description", "main_host"."created_by_id", "main_host"."modified_by_id", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host_summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified", "main_host"."insights_system_id" FROM "main_host" WHERE ("main_host"."ansible_facts" @> '{"ansible_local": {"node": {"build": "beta"}}}' AND "main_host"."ansible_facts" @> '{"ansible_local": {"node": {"department": "engineering"}}}') ORDER BY "main_host"."name" ASC
@chrismeyersfsu
Copy link
Member Author

TL;DR While processing events we find the host for which the event was generated. For smart-inventory this is expensive. We do this for each event, thus the slowdown.

https://github.com/ansible/awx/blob/devel/awx/main/models/events.py#L343
host_qs = self.job.inventory.hosts.filter(name=self.host_name)

Non-Smart Inventory

The query is, as expected, very fast and how we would expect it to look.

explain analyze SELECT "main_host"."id", "main_host"."created", "main_host"."modified", "mai n_host"."description", "main_host"."created_by_id", "main_host"."modified_by_id", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host_summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified ", "main_host"."insights_system_id" FROM "main_host" WHERE ("main_host"."inventory_id" = 65 AND "main_ host"."name" = 'test_host_111') ORDER BY "main_host"."name" ASC;

                                                                      QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using main_host_name_inventory_id_45aecd68_uniq on main_host  (cost=0.28..8.30 rows=1 widt
h=229) (actual time=0.031..0.031 rows=1 loops=1)
   Index Cond: (((name)::text = 'test_host_111'::text) AND (inventory_id = 65))
 Planning time: 0.142 ms
 Execution time: 0.049 ms
(4 rows)

image
https://explain.depesz.com/s/iyT2

Smart Inventory

The query is much more computationally complex.
SELECT "main_host"."id", "main_host"."created", "main_host"."modified", "main_host"."description", "main_host"."created_by_id", "main_host"."modified_by_id", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host_summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified", "main_host"."insights_system_id" FROM "main_host" INNER JOIN "main_inventory" ON ("main_host"."inventory_id" = "main_inventory"."id") WHERE ("main_host"."ansible_facts" @> '{"ansible_system": "Linux"}' AND "main_host"."ansible_facts" @> '{"foo": "bar"}' AND "main_inventory"."organization_id" = 3 AND "main_host"."id" IN (SELECT DISTINCT ON (U0."name") U0."id" AS Col1 FROM "main_host" U0 INNER JOIN "main_inventory" U1 ON (U0."inventory_id" = U1."id") WHERE (U0."ansible_facts" @> '{"ansible_system": "Linux"}' AND U0."ansible_facts" @> '{"foo": "bar"}' AND U1."organization_id" = 3) ORDER BY U0."name" ASC, U0."id" ASC) AND "main_host"."name" = 'test_host_67') ORDER BY "main_host"."name" ASC;

image
https://explain.depesz.com/s/KIP

Why is the smart inventory query so complex?

Smart inventory is dynamically generated. More concretely, the host_filter query is applied to any query that flows through a smart inventory. That is not the expensive part. Appending the queryset that removes duplicate host names is. https://github.com/ansible/awx/blob/devel/awx/main/managers.py#L53
Note that we are taking the existing query and doing an __in on it. Instead, we can remove that and just add a distinct to the existing query.

New Smart Inventory

The query is simpler and much closer to the non-smart inventory query. This is desirable.

SELECT DISTINCT ON ("main_host"."name") "main_host"."id", "main_host"."created", "main_host"."modified", "main_host"."description", "main_host"."created_by_id", "main_host"."modified_by_id", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host_summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified", "main_host"."insights_system_id" FROM "main_host" INNER JOIN "main_inventory" ON ("main_host"."inventory_id" = "main_inventory"."id") WHERE ("main_host"."ansible_facts" @> '{"ansible_system": "Linux"}' AND "main_host"."ansible_facts" @> '{"foo": "bar"}' AND "main_inventory"."organization_id" = 3 AND "main_host"."name" = 'test_host_111') ORDER BY "main_host"."name" ASC;

image
https://explain.depesz.com/s/TnKR

Note: The explain analyze timings were taken from a dev env with 1/10 of the data-set required to invoke the 500ms database query times we were seeing. However, the data-set is still characteristic of the original observed slowdown. We see that the original smart inventory queries take =~ 5ms and 0.3ms after optimizations.

Ending Thoughts

I tried removing the distinct to see if there was a speed up.

SELECT "main_host"."id", "main_host"."created", "main_host"."modified", "main_host"."description", "main_host"."created_by_id", "main_host"."modified_by_i d", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host _summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified", "main_host"."insights_syste m_id" FROM "main_host" INNER JOIN "main_inventory" ON ("main_host"."inventory_id" = "main_inventory"."id") WHERE ("main_host"."ansible_facts" @> '{"ansible_system": "Linux"}' AND " main_host"."ansible_facts" @> '{"foo": "bar"}' AND "main_inventory"."organization_id" = 3 AND "main_host"."name" = 'test_host_111') ORDER BY "main_host"."name" ASC;

image
https://explain.depesz.com/s/qGH5

  • An index on hostname may speed up with bitmap heap scan phase? If nothing else, it has to be common that users add some variable of the hostname to smart inventory filters.

For Testing Correctness

  • Two inventories w/ same named host
  • Smart inventory matching on the non-unique hostname
  • Ensure 1 host ends up in the smart host
  • Run a job against the smart host
  • Ensure the job event host field gets related to THE host in the smart inventor

For Docs

We should call out that smart inventory will not include duplicate hosts. For example, if You have two inventories, EC2 East Coast and EC2 West Coast and you have a host named production in each. A smart inventory that matches production will only include one of those production hosts. Which host is non-deterministic.

The host_filter "preview" view in the UI WILL show duplicates. This gives the user a chance to refine their query so that duplicates are not included. If the user goes through with saving a filter that includes duplicates hosts, the smart inventory will NOT include multiple hosts with the same name. It will include one of the hosts with that name only.

@ryanpetrello
Copy link
Contributor

Closing this, the actual issue is #3205

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants