Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure pools via Helm chart #29401

Closed

Conversation

csp33
Copy link
Contributor

@csp33 csp33 commented Feb 7, 2023

closes #11707

Credits to @FloChehab (#15093)

image

image

image

With no pools defined
image


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label Feb 7, 2023
@csp33 csp33 force-pushed the feature/configure-pools-via-helm-chart branch 4 times, most recently from 9a02977 to b1c40a5 Compare February 7, 2023 11:42
@csp33 csp33 marked this pull request as ready for review February 7, 2023 11:43
@eladkal eladkal added this to the Airflow Helm Chart 1.9.0 milestone Feb 7, 2023
@csp33 csp33 force-pushed the feature/configure-pools-via-helm-chart branch 4 times, most recently from 9ac5630 to 2232eea Compare February 7, 2023 12:52
@csp33 csp33 force-pushed the feature/configure-pools-via-helm-chart branch 8 times, most recently from 9416332 to a37dd08 Compare February 7, 2023 22:12
@csp33 csp33 force-pushed the feature/configure-pools-via-helm-chart branch from a37dd08 to 543cf3b Compare February 8, 2023 07:13
@csp33
Copy link
Contributor Author

csp33 commented Feb 8, 2023

@potiuk @jedcunningham test are (finally) passing (the failing ones are due to timeouts). Could you please review this PR?
Thanks!

Copy link
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really a fan of this approach, unfortunately. This has "source of truth" problems in my eyes, as one can still modify these values in the UI. At that point, is the chart right to overwrite it on the next update, or is the UI value the right one? Also, removing a pool here doesn't actually remove it from the db.

There isn't really a "better" option, short of doing some Airflow side enhancements of pools. I'd almost rather waiting for that and only supporting it on newer Airflows instead of supporting something with a number of nuances to it?

I'm curious what other maintainers think.

#############################################
## Airflow Import Pools Job ServiceAccount
##############################################
{{- if .Values.importPoolsJob.serviceAccount.create }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason do deploy anything when pools aren't defined.

@@ -757,6 +757,80 @@ scheduler:

env: []

# Pools that will be added to Airflow (Keys are pools names and values are pools settings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be very clear here that this doesn't actually manage the pools, just does a one off, at time of deploy, import.

@kaxil
Copy link
Member

kaxil commented Feb 9, 2023

I'm not really a fan of this approach, unfortunately. This has "source of truth" problems in my eyes, as one can still modify these values in the UI. At that point, is the chart right to overwrite it on the next update, or is the UI value the right one? Also, removing a pool here doesn't actually remove it from the db.

There isn't really a "better" option, short of doing some Airflow side enhancements of pools. I'd almost rather waiting for that and only supporting it on newer Airflows instead of supporting something with a number of nuances to it?

I'm curious what other maintainers think.

+1 -- Agreed with @jedcunningham . That is one of the things we said the user-community chart did that we would never do when we start building the official helm chart for Airflow (or so I remember :) )

@dstandish
Copy link
Contributor

What if we just make pools optionally configurable with env vars? Sort of like var or conn now. Might screw up some queries but, maybe could be handled. Then you could in effect accomplish same from chart

@eladkal
Copy link
Contributor

eladkal commented Feb 9, 2023

What if we just make pools optionally configurable with env vars? Sort of like var or conn now. Might screw up some queries but, maybe could be handled. Then you could in effect accomplish same from chart

We had discussion on that. See #18582

@potiuk
Copy link
Member

potiuk commented Feb 9, 2023

+1 -- Agreed with @jedcunningham . That is one of the things we said the user-community chart did that we would never do when we start building the official helm chart for Airflow (or so I remember :) )

+1 too. Managing pools this way is a bad idea for the reasons @jedcunningham nicely laid out.

We had discussion on that. See #18582

As @eladkal mentioned, this would have to redefine the way our SQL queries are done in scheduler. There is a certain complexity of Pools in multi-scheduler context that makes it a bit complex (but not impossible I think). I once assesed this might be hard. But I looked closely how it works and I think possibly we could attempt to change how Pools table is used. I think either code was different when I looked at it in #18582, or maybe I have not thought about this idea before, but let me explain the context:

  • Pool table is currenlty locked during scheduling as critical section in _executable_task_instances_to_queued:
        # Get the pool settings. We get a lock on the pool rows, treating this as a "critical section"
        # Throws an exception if lock cannot be obtained, rather than blocking
        pools = Pool.slots_stats(lock_rows=True, session=session)
  • And the slot_stats method uses pool to lock the pool table
        query = session.query(Pool.pool, Pool.slots)

What happen next in the method then it builds the "PoolStats" dict and marks tasks as queued if they still have "free slots" (but it uses the in-memory Dict for that). Only after tasks are marked as queued then it releases the lock on Pools.

This is - in order to accomodate multiple schedulers and the critical section is to make sure schedulers will not mark dag runs as executable when Pools would be over-subscribed.

So while indeed Pools table is locked and used in the query, fundamentally I think we could take the "total" values from ENV variables rather than from Pools table. This is somewhat coincidental that we use "totals" from the Pools table - it's convenient and the code already has Pool objects returned by the query to build PoolStats, but not really necessary.
The main usage of the Pool table is that whole table is locked while the critical section happens. Locking Pool table also has nice side-effect that when we use UI or CLI to modify pools, we won't change the pool size while scheduler keeps the table locked in critical section - so pool size update will never happen while scheduler is in it. But with env variables it is impossible to change it without restarting airflow, so it is not really appealing argument.

We even could have a separate lock for this ciricial section in _executable_task_instances_to_queued - it does not have to be Pools table lock at all. It's just convenience that made it being used as we can both retrieve pool total counts and obtain the lock. We are not changing the Pools totals in the criticall section. We merely obtain the lock, read the total slots, see how many tasks we have "running" in every pool and we set the number of tasks to queued so that totals are not exceeded and we release the lock.

There is no referential identity between Pools and any other table it seems so I think we would just have to modify slots_stats function to retrieve totals from elsewhere (env vars) and things would work. The pool column in TaskInstance is just string (part of an index but not part of referential integrity with Pool table).

    pool = Column(String(256), nullable=False)

I still do not like the "multiple sources of truth" in this case. But there is potentially an option that we could disable the pool counts in the UI entirely and replace them with pool counts. managed in env vars only.

Also I think changing the approach in this area is something that we want anyway so we could kill two birds with the same stone - see #29416 - our pool count currently does not currently take into account deferred tasks. but there are cases where it would be desired.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 27, 2023
@github-actions github-actions bot closed this Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Manage pools in Helm Chart
7 participants