Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helmchart: Provision Standalone Dag Processor #23711

Merged
merged 15 commits into from
Jul 6, 2022
Merged

Helmchart: Provision Standalone Dag Processor #23711

merged 15 commits into from
Jul 6, 2022

Conversation

gmsantos
Copy link
Contributor

@gmsantos gmsantos commented May 14, 2022

Initial support to deploy standalone dag processor from Helm on Airflow 2.3+

It is disabled by default for now.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label May 14, 2022
@gmsantos gmsantos marked this pull request as ready for review May 16, 2022 18:51
Copy link
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need the DAGs mounted in the scheduler if this is enabled, right?

chart/templates/_helpers.yaml Outdated Show resolved Hide resolved
chart/templates/dagprocessor/dag-processor-deployment.yaml Outdated Show resolved Hide resolved
tests/charts/test_basic_helm_chart.py Outdated Show resolved Hide resolved
@skyboi1233

This comment was marked as spam.

@gmsantos
Copy link
Contributor Author

We no longer need the DAGs mounted in the scheduler if this is enabled, right?

True, that makes sense, I can remove the persistent DAG volume depending on dagProcessor.enabled.

@gmsantos
Copy link
Contributor Author

@jedcunningham rebased and incorporated the review suggestions.

It's working as expected in a dev deployment:

> cat values.yaml
[...]

dagProcessor:
  enabled: true

[...]

> kubectl get pods
NAME                                                    READY   STATUS      RESTARTS   AGE
airflow-cleanup-27546945-55mjl                          0/1     Completed   0          41m
airflow-cleanup-27546960-mslp7                          0/1     Completed   0          26m
airflow-cleanup-27546975-zvh75                          0/1     Completed   0          11m
airflow-dag-processor-68479f4fb-8bfnx                   1/1     Running     0          3m52s
airflow-scheduler-699b79bc7d-59wc2                      2/2     Running     0          3m52s
airflow-scheduler-699b79bc7d-j7xb4                      2/2     Running     0          53s
airflow-statsd-579db47f48-nms6x                         1/1     Running     0          29h
airflow-triggerer-5df797bbf6-zrng6                      1/1     Running     0          3m51s
airflow-webserver-78bcfc575-btwc4                       1/1     Running     0          3m25s
airflow-webserver-78bcfc575-vjrv2                       1/1     Running     0          3m50s

This deployment does not use gitsync or persistent volume mounted with the pods, but it should work according to the tests.

@potiuk
Copy link
Member

potiuk commented May 19, 2022

Nice! @mhenc -> maybe you want to take a look too?

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cautious LGTM. But let's wait for the Helm Gurus.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label May 22, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk
Copy link
Member

potiuk commented May 22, 2022

Would love to get some feedback from our Helm people :)

Copy link
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me, but someone needs to test this when DAGs aren't available in the scheduler (gitsync or PVC), and make sure DAG level callbacks work too. I'm happy to do it once I finish my summit talk prep.

@gmsantos
Copy link
Contributor Author

Related #23872

We might need this patch before running the standalone dag processor with the official image.

@jedcunningham
Copy link
Member

@gmsantos, sorry for the delay, just got around to testing this. I overlooked the fact that with LocalExecutor, the scheduler is the worker 🤦‍♂️. So the condition to mount DAGs needs to take that into account as well. You can probably use $local for that purpose.

@gmsantos
Copy link
Contributor Author

gmsantos commented Jun 2, 2022

@jedcunningham okay, let me take a look.

@gmsantos
Copy link
Contributor Author

gmsantos commented Jun 6, 2022

@jedcunningham I have a question about the LocalKubernetesExecutor, I assume it if the tasks doesn't use the kubernetes_pool, the scheduler works as a worker as well, isn't?

I'm assuming that we should include LocalKubernetesExecutor in the $local variable as well.

@jedcunningham
Copy link
Member

Yes, exactly. Sorry, that was a half baked comment, but that's what I was envisioning - expanding $local to cover both LocalExecutor and LocalKubernetesExecutor and using it to determine whether to mount DAGs in addition to the standalone flag.

@gmsantos
Copy link
Contributor Author

gmsantos commented Jun 7, 2022

hey @jedcunningham, it should be ready to review. I just trying to make all the PR checks pass ✅

@gmsantos
Copy link
Contributor Author

gmsantos commented Jun 8, 2022

@jedcunningham I could use some help to retry the failed CI checks here, please :)

@gmsantos
Copy link
Contributor Author

gmsantos commented Jul 5, 2022

@jedcunningham I added some combinations of DAG Processor and executors in c8c5265

I assumed that gitsync and persistence was always enabled to make the assertions easier to understand.

@jedcunningham jedcunningham added this to the Airflow Helm Chart 1.7.0 milestone Jul 6, 2022
@jedcunningham jedcunningham added the type:new-feature Changelog: New Features label Jul 6, 2022
@jedcunningham jedcunningham merged commit 6642f80 into apache:main Jul 6, 2022
@jedcunningham
Copy link
Member

Thanks @gmsantos 🎉🍺

@gmsantos gmsantos deleted the helm-dag-processor branch July 6, 2022 21:06
@ephraimbuddy ephraimbuddy removed the type:new-feature Changelog: New Features label Sep 13, 2022
@jtvmatos
Copy link

Hi @gmsantos,

Sorry the mention in a merged PR, but taking into account the new Support multiple DagProcessors parsing files from different locations (#25935), will it be possible to use this new Helm configurations to setup two DagProcessors for different subdirs? Or should I create a new issue/PR?

Thanks!

@potiuk
Copy link
Member

potiuk commented Sep 19, 2022

Different subdirs are only available as of Airlfow 2.4.0 and this is not available yet in the Helm Chart. But feel free to add it as PR.

@jtvmatos
Copy link

Different subdirs are only available as of Airlfow 2.4.0 and this is not available yet in the Helm Chart. But feel free to add it as PR.

I'm not completely sure how is the best way to implement this feature, but at least I'll start by opening the issue -> #26494

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants