Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually run subquery for parent event updates #14044

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

AlanCoding
Copy link
Member

SUMMARY

This reduces the maximum time that may be spent in 2 bulk .update queries.

What about the size of set(changed)? Could that be large? It can scale with the number of tasks run by a playbook, but from experimentation it is very clear that it will not scale with the number of hosts. So imagine that you run 5 tasks (all changed) against 50,000 hosts. This will produce potentially 2,500,000 events with changed set to True. However, all those changed events will only have 5 unique parent_uuids.

Some people will run playbooks that produce a lot of changed tasks against 1 or 2 hosts, this is true (assume all tasks are changed). But "a lot" here only scales with the volume of automation that people write. 1,000 tasks wouldn't surprise me, and 10,000 might be possible, but I think this code will still work. Beyond that, you stretch credibility. In fact, we already do operations on this scale (and greater) with the recently-added /api/v2/jobs/132/job_events/children_summary/, justified by these same arguments.

This also adds a DEBUG log because this is only done 1 time for each job and this is frequently of interest.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

@roberto-mello
Copy link

I don't know how to convert this to Django's ORM syntax, but it would be more efficient if the IN clause wasn't used to filter on a large number of ids.

I haven't tested this but it might be much better to have a CTE (Common Table Expression) that the UPDATE joins against, avoiding the need for an IN clause altogether. Would be interesting to see how the execution plan of this compared to the ones currently generated, particularly against large data sets:

WITH no_parent_uuids AS (
    SELECT DISTINCT U0."parent_uuid"
    FROM "main_jobevent" U0
    WHERE
        U0."job_created" = '2023-05-23T06:00:36.428083+00:00'::timestamptz
	AND U0."job_id" = 8306063
	AND U0."changed"
	AND U0."parent_uuid" IS NOT NULL
)
UPDATE "main_jobevent"
SET "changed" = true
FROM no_parent_uuids nu
WHERE 
  "main_jobevent"."uuid" = nu.parent_uuid
  AND "main_jobevent"."job_created" = '2023-05-23T06:00:36.428083+00:00'::timestamptz 
  AND "main_jobevent"."job_id" = 8306063

@TheRealHaoLiu TheRealHaoLiu requested a review from relrod May 25, 2023 18:22
@relrod
Copy link
Member

relrod commented May 25, 2023

@roberto-mello thanks for that, it looks like a cool solution. It doesn't look like Django has built-in support for generating CTE queries, though I see some third-party libraries that add that functionality.

I'm going to approve this PR as a stop-gap and we can explore the CTE solution as a more permanent solution.

@roberto-mello
Copy link

It looks like it's accomplished with django.db.models.SubQuery and django.db.models.F

@relrod
Copy link
Member

relrod commented May 25, 2023

It looks like it's accomplished with django.db.models.SubQuery and django.db.models.F

Hmm, I wasn't able to find a way to make it work (i.e. find anything to generate the WITH syntax), but I could very well be missing something - I'm definitely not a Django ORM expert.

I did do a quick grep of the source and didn't see anything in its SQL generator that looks like it would generate those kinds of queries.

I found a ticket upstream which seems stalled, and a mailing list thread from a few years ago where people were discussing adding CTE support to the ORM core, but I think it hasn't happened yet.

We could probably do something with .raw() but it would be really fragile if the models ever changed. 😕

@AlanCoding AlanCoding merged commit 2c320cb into ansible:devel Jun 1, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants