Job Event processing slow when smart inventory is used #3106

chrismeyersfsu · 2019-01-30T15:05:55Z

ISSUE TYPE

Bug Report

COMPONENT NAME

API

SUMMARY

Pushing job events into the database, in the callback receiver, is slow when a smart inventory is used.

Note: Pushing events into rabbit is fast, pulling events from rabbit is fast. Events end up queued in run_callback_receiver subprocesses. It's nice to see that > 40k events can sit in our callback receiver waiting to be pushed into the database, no problem.

ENVIRONMENT

Tower 3.4.1

STEPS TO REPRODUCE

Create an inventory with at least 3,000 hosts
Create 60 smart inventories that matches a fraction of those hosts, say 300.
Run a playbook against any of the smart inventories.

EXPECTED RESULTS

Shortly after ansible job finishes, stdout shows in the UI

ACTUAL RESULTS

job events take a long time to be processed (> 500 ms per event). This is observable in the UI/API

ADDITIONAL INFORMATION

Postgres queries taking =>~ 500 ms. Confirmed they are issued from callback receiver via strace.
SELECT "main_host"."id", "main_host"."created", "main_host"."modified", "main_host"."description", "main_host"."created_by_id", "main_host"."modified_by_id", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host_summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified", "main_host"."insights_system_id" FROM "main_host" WHERE ("main_host"."ansible_facts" @> '{"ansible_local": {"node": {"build": "beta"}}}' AND "main_host"."ansible_facts" @> '{"ansible_local": {"node": {"department": "engineering"}}}') ORDER BY "main_host"."name" ASC

The text was updated successfully, but these errors were encountered:

chrismeyersfsu · 2019-02-01T13:34:43Z

TL;DR While processing events we find the host for which the event was generated. For smart-inventory this is expensive. We do this for each event, thus the slowdown.

https://github.com/ansible/awx/blob/devel/awx/main/models/events.py#L343
host_qs = self.job.inventory.hosts.filter(name=self.host_name)

Non-Smart Inventory

The query is, as expected, very fast and how we would expect it to look.

explain analyze SELECT "main_host"."id", "main_host"."created", "main_host"."modified", "mai n_host"."description", "main_host"."created_by_id", "main_host"."modified_by_id", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host_summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified ", "main_host"."insights_system_id" FROM "main_host" WHERE ("main_host"."inventory_id" = 65 AND "main_ host"."name" = 'test_host_111') ORDER BY "main_host"."name" ASC;

                                                                      QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using main_host_name_inventory_id_45aecd68_uniq on main_host  (cost=0.28..8.30 rows=1 widt
h=229) (actual time=0.031..0.031 rows=1 loops=1)
   Index Cond: (((name)::text = 'test_host_111'::text) AND (inventory_id = 65))
 Planning time: 0.142 ms
 Execution time: 0.049 ms
(4 rows)

https://explain.depesz.com/s/iyT2

Smart Inventory

The query is much more computationally complex.
SELECT "main_host"."id", "main_host"."created", "main_host"."modified", "main_host"."description", "main_host"."created_by_id", "main_host"."modified_by_id", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host_summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified", "main_host"."insights_system_id" FROM "main_host" INNER JOIN "main_inventory" ON ("main_host"."inventory_id" = "main_inventory"."id") WHERE ("main_host"."ansible_facts" @> '{"ansible_system": "Linux"}' AND "main_host"."ansible_facts" @> '{"foo": "bar"}' AND "main_inventory"."organization_id" = 3 AND "main_host"."id" IN (SELECT DISTINCT ON (U0."name") U0."id" AS Col1 FROM "main_host" U0 INNER JOIN "main_inventory" U1 ON (U0."inventory_id" = U1."id") WHERE (U0."ansible_facts" @> '{"ansible_system": "Linux"}' AND U0."ansible_facts" @> '{"foo": "bar"}' AND U1."organization_id" = 3) ORDER BY U0."name" ASC, U0."id" ASC) AND "main_host"."name" = 'test_host_67') ORDER BY "main_host"."name" ASC;

https://explain.depesz.com/s/KIP

Why is the smart inventory query so complex?

Smart inventory is dynamically generated. More concretely, the host_filter query is applied to any query that flows through a smart inventory. That is not the expensive part. Appending the queryset that removes duplicate host names is. https://github.com/ansible/awx/blob/devel/awx/main/managers.py#L53
Note that we are taking the existing query and doing an __in on it. Instead, we can remove that and just add a distinct to the existing query.

New Smart Inventory

The query is simpler and much closer to the non-smart inventory query. This is desirable.

SELECT DISTINCT ON ("main_host"."name") "main_host"."id", "main_host"."created", "main_host"."modified", "main_host"."description", "main_host"."created_by_id", "main_host"."modified_by_id", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host_summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified", "main_host"."insights_system_id" FROM "main_host" INNER JOIN "main_inventory" ON ("main_host"."inventory_id" = "main_inventory"."id") WHERE ("main_host"."ansible_facts" @> '{"ansible_system": "Linux"}' AND "main_host"."ansible_facts" @> '{"foo": "bar"}' AND "main_inventory"."organization_id" = 3 AND "main_host"."name" = 'test_host_111') ORDER BY "main_host"."name" ASC;

https://explain.depesz.com/s/TnKR

Note: The explain analyze timings were taken from a dev env with 1/10 of the data-set required to invoke the 500ms database query times we were seeing. However, the data-set is still characteristic of the original observed slowdown. We see that the original smart inventory queries take =~ 5ms and 0.3ms after optimizations.

Ending Thoughts

I tried removing the distinct to see if there was a speed up.

SELECT "main_host"."id", "main_host"."created", "main_host"."modified", "main_host"."description", "main_host"."created_by_id", "main_host"."modified_by_i d", "main_host"."name", "main_host"."inventory_id", "main_host"."enabled", "main_host"."instance_id", "main_host"."variables", "main_host"."last_job_id", "main_host"."last_job_host _summary_id", "main_host"."has_active_failures", "main_host"."has_inventory_sources", "main_host"."ansible_facts", "main_host"."ansible_facts_modified", "main_host"."insights_syste m_id" FROM "main_host" INNER JOIN "main_inventory" ON ("main_host"."inventory_id" = "main_inventory"."id") WHERE ("main_host"."ansible_facts" @> '{"ansible_system": "Linux"}' AND " main_host"."ansible_facts" @> '{"foo": "bar"}' AND "main_inventory"."organization_id" = 3 AND "main_host"."name" = 'test_host_111') ORDER BY "main_host"."name" ASC;

https://explain.depesz.com/s/qGH5

An index on hostname may speed up with bitmap heap scan phase? If nothing else, it has to be common that users add some variable of the hostname to smart inventory filters.

For Testing Correctness

Two inventories w/ same named host
Smart inventory matching on the non-unique hostname
Ensure 1 host ends up in the smart host
Run a job against the smart host
Ensure the job event host field gets related to THE host in the smart inventor

For Docs

We should call out that smart inventory will not include duplicate hosts. For example, if You have two inventories, EC2 East Coast and EC2 West Coast and you have a host named production in each. A smart inventory that matches production will only include one of those production hosts. Which host is non-deterministic.

The host_filter "preview" view in the UI WILL show duplicates. This gives the user a chance to refine their query so that duplicates are not included. If the user goes through with saving a filter that includes duplicates hosts, the smart inventory will NOT include multiple hosts with the same name. It will include one of the hosts with that name only.

ryanpetrello · 2019-02-12T22:21:57Z

Closing this, the actual issue is #3205

ryanpetrello added flag:🎱 type:bug component:api labels Jan 30, 2019

awxbot added the state:needs_triage label Jan 30, 2019

chrismeyersfsu self-assigned this Jan 30, 2019

chrismeyersfsu added state:in_progress and removed state:needs_devel labels Jan 30, 2019

chrismeyersfsu mentioned this issue Feb 1, 2019

simplify smart inventory query #3133

Closed

chrismeyersfsu added state:needs_test and removed state:in_progress labels Feb 6, 2019

chrismeyersfsu mentioned this issue Feb 8, 2019

do not observe queries when constructing them #3191

Merged

ryanpetrello closed this as completed Feb 12, 2019

ryanpetrello reopened this Feb 12, 2019

ryanpetrello closed this as completed Feb 12, 2019

AlanCoding mentioned this issue Feb 19, 2020

fix smart inventory duplicate hosts #5972

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job Event processing slow when smart inventory is used #3106

Job Event processing slow when smart inventory is used #3106

chrismeyersfsu commented Jan 30, 2019 •

edited

Loading

chrismeyersfsu commented Feb 1, 2019

ryanpetrello commented Feb 12, 2019

Job Event processing slow when smart inventory is used #3106

Job Event processing slow when smart inventory is used #3106

Comments

chrismeyersfsu commented Jan 30, 2019 • edited Loading

ISSUE TYPE

COMPONENT NAME

SUMMARY

ENVIRONMENT

STEPS TO REPRODUCE

EXPECTED RESULTS

ACTUAL RESULTS

ADDITIONAL INFORMATION

chrismeyersfsu commented Feb 1, 2019

Non-Smart Inventory

Smart Inventory

Why is the smart inventory query so complex?

New Smart Inventory

Ending Thoughts

For Testing Correctness

For Docs

ryanpetrello commented Feb 12, 2019

chrismeyersfsu commented Jan 30, 2019 •

edited

Loading