Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permitting airflow kerberos to run in different modes #35146

Merged
merged 12 commits into from Oct 25, 2023

Conversation

amoghrajesh
Copy link
Contributor

@amoghrajesh amoghrajesh commented Oct 24, 2023

I was trying to run airflow kerberos as an init container for airflow worker pods (I am using K8SExecutor with K8S deployment)
Realised that there is no mode that can be defined for the airflow kerberos to run. This leads to issue where the kerberos container runs forever due to an while loop in the code!

Adding support for this by using a "mode" field - daemon vs one-time. In case it is not configured, the default one is taken as sidecar, hence older behaviour isn't changed.

Screenshots
Command:
image

Running in default mode:
image

Overriding run with --one-time mode
image


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@amoghrajesh amoghrajesh added security Security issues that must be fixed and removed security Security issues that must be fixed labels Oct 24, 2023
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two problems with this implementation:

  1. I would not call it sidecar/init. Those are k8s terms, and last time we checked as I remember, ~50% of airlfow installation managed individually by our users (not managed instances) do NOT use k8s. So we should rather be deployment-agnostic here.

How about "one-time", "daemon" (just a proposal possibly better names can be figured out.

  1. I think it is bad idea to use configuration. I can very easily imagine that kerberos sidecar in single deployment (even in k8s) is run in both modes:
  • as init container to refresh it before the main component starts
  • continously as sidecar in order to periodically refresh the token

In this case having a configuration makes little sense. It shoudl be a flag of the airflow kerberos CLI, so that in your deployment you will be able to choose how it is run without changing airflow configuration (airflow kerberos --mode one-time for example).

@amoghrajesh
Copy link
Contributor Author

Thanks for your feedback @potiuk
I like the idea of using "daemon" vs "one-time" better. I will start working on this.

@amoghrajesh
Copy link
Contributor Author

Oh, the reason why I liked this idea is because there is no terminology for "non daemons". They are just called as "normal" processes. One time sounds rather more interesting.

@amoghrajesh
Copy link
Contributor Author

@potiuk took another pass at this, can you take a look when you have some time?

@potiuk
Copy link
Member

potiuk commented Oct 24, 2023

Also one more request here / comment. Since our helm chart already supports kerberos as an option, I think we should also actually modify the Helm chart of ours to add kerberos init container in case kerberos is enabled - that would be not only good (the token would be refreshed when airflow starts) but also serve as an example to others who want to use it.

This would be perfect if it comes together, however we have to be careful to add this init container only if airflow >= 2.8.0 - this is a new feature that will come there (we do not release new features in patchlevel versions) and such init container will only work when airflow is >= 2.8.0. We have a few similar conditionals in our helm chart already - doing similar things.

@eladkal
Copy link
Contributor

eladkal commented Oct 24, 2023

Just noting that a nice workaround for this issue (till fixed) can be found in
https://medium.com/apache-airflow/enable-kerberos-with-airflow-kubernetesexecutor-6e86621e97a5

The idea is that base container will signal to the side car when to terminate via shared volume

@potiuk
Copy link
Member

potiuk commented Oct 24, 2023

Just noting that a nice workaround for this issue (till fixed) can be found in https://medium.com/apache-airflow/enable-kerberos-with-airflow-kubernetesexecutor-6e86621e97a5

The idea is that base container will signal to the side car when to terminate via shared volume

Very nasty wokaround.

@eladkal
Copy link
Contributor

eladkal commented Oct 24, 2023

I think we should also actually modify the Helm chart of ours to add kerberos init container in case kerberos is enabled - that would be not only good (the token would be refreshed when airflow starts) but also serve as an example to others who want to use it.

I dont know about this.
My use case for example requires side car. Init container wont be good enough. I use Hive and tasks can run for even 20 hours. Running kerberos one time in init container is not enough - it wont support refresh.
When deployed on Celery this wasnt an issue because we could just run it on workers that were always live.

@eladkal
Copy link
Contributor

eladkal commented Oct 24, 2023

Very nasty wokaround

Yet unavoidable till this PR is released :)
At least for long running tasks on K8s executor

@potiuk
Copy link
Member

potiuk commented Oct 24, 2023

My use case for example requires side car. Init container wont be good enough. I use Hive and tasks can run for even 20 hours. Running kerberos one time in init container is not enough - it wont support refresh.

Our chart uses sidecar already. My comment was to ALSO use init container to make sure the token is refreshed before the main container starts..

@eladkal
Copy link
Contributor

eladkal commented Oct 24, 2023

Our chart uses sidecar already. My comment was to ALSO use init container to make sure the token is refreshed before the main container starts..

That makes sense yes.

So what we need:

  1. Enable init container option
  2. When using side car solve the pod termination (base container is terminated but the side car is not)

@potiuk
Copy link
Member

potiuk commented Oct 24, 2023

When using side car solve the pod termination (base container is terminated but the side car is not

I think solving sidecar termination - if we will do it at all - is completely different problem (not related to this change).

@eladkal
Copy link
Contributor

eladkal commented Oct 24, 2023

I think solving sidecar termination - if we will do it at all - is completely different problem (not related to this change).

Yes separated issue

@potiuk
Copy link
Member

potiuk commented Oct 24, 2023

I think solving sidecar termination - if we will do it at all - is completely different problem (not related to this change).

Yes separated issue

Actually the issue is finaly easily solvable for k8s 1.28+ (and we already support 1.28) - the KEP-753 implementation has been released in k8s 1.28 and we can finally solve sidecar problems in generic way (implementing the intra-process /via volume communication as a "generic" solution in our chart was discarded as an option a long time ago as non-practical.

Created a "good first issue" for that for someone who would like to take a stab on it #35154

cc: @jedcunningham

@amoghrajesh
Copy link
Contributor Author

Also one more request here / comment. Since our helm chart already supports kerberos as an option, I think we should also actually modify the Helm chart of ours to add kerberos init container in case kerberos is enabled - that would be not only good (the token would be refreshed when airflow starts) but also serve as an example to others who want to use it.

This would be perfect if it comes together, however we have to be careful to add this init container only if airflow >= 2.8.0 - this is a new feature that will come there (we do not release new features in patchlevel versions) and such init container will only work when airflow is >= 2.8.0. We have a few similar conditionals in our helm chart already - doing similar things.

Completely agree with you. We should add this framework that naturally works out of box for such use cases. I will start working on this here

@amoghrajesh
Copy link
Contributor Author

I think we should also actually modify the Helm chart of ours to add kerberos init container in case kerberos is enabled - that would be not only good (the token would be refreshed when airflow starts) but also serve as an example to others who want to use it.

I dont know about this. My use case for example requires side car. Init container wont be good enough. I use Hive and tasks can run for even 20 hours. Running kerberos one time in init container is not enough - it wont support refresh. When deployed on Celery this wasnt an issue because we could just run it on workers that were always live.

We should be doing an init container + sidecar model. Init container ensures early exit in case kinit fails

@amoghrajesh
Copy link
Contributor Author

When using side car solve the pod termination (base container is terminated but the side car is not

I think solving sidecar termination - if we will do it at all - is completely different problem (not related to this change).

I was able to internally solve this in my company using shareProcessNamespace: https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/

Maybe we can take this approach, pretty neat

@amoghrajesh
Copy link
Contributor Author

@potiuk I am open to assigning #35154 to others but would like to lead this enter effort to completion. But, rest is up to you. :)

@amoghrajesh
Copy link
Contributor Author

(BTW. I think you should rebase as you have conflicts to solve :)

Just saw the merge conflict. Sure, will rebase now

@amoghrajesh
Copy link
Contributor Author

Oh the main branch seems to have diverged quite a bit. I will rebase it

@potiuk
Copy link
Member

potiuk commented Oct 24, 2023

That's why best to start with main always.

airflow/security/kerberos.py Outdated Show resolved Hide resolved
airflow/cli/cli_config.py Outdated Show resolved Hide resolved
@potiuk
Copy link
Member

potiuk commented Oct 24, 2023

What if we didn't have "mode", but a "one-time" flag instead. I doubt we will have another mode down the road and sidesteps the "normal/daemon/standard" naming problem.

❤️ it

@amoghrajesh
Copy link
Contributor Author

What if we didn't have "mode", but a "one-time" flag instead. I doubt we will have another mode down the road and sidesteps the "normal/daemon/standard" naming problem.

Great idea!

@amoghrajesh
Copy link
Contributor Author

I did rebase it. Now working on review comments!

@amoghrajesh
Copy link
Contributor Author

That's why best to start with main always.

When I started unfortunately the code was up to date. In a few hours time, the change was introduced!

@eladkal eladkal added this to the Airflow 2.8.0 milestone Oct 25, 2023
@eladkal eladkal added the type:improvement Changelog: Improvements label Oct 25, 2023
@amoghrajesh
Copy link
Contributor Author

@potiuk @eladkal @jedcunningham I updated the code and also the usage screenshots in the PR description. Can you take a look when you have soem time?

@potiuk
Copy link
Member

potiuk commented Oct 25, 2023

looks great, but docs and tests need fixing.

@amoghrajesh
Copy link
Contributor Author

Thank you @potiuk
Just pushed a fix for this

@amoghrajesh
Copy link
Contributor Author

@potiuk @eladkal @jedcunningham I'd like to run the PR again by all of you. CI is green now

Take a look when you have some time

@potiuk potiuk merged commit 36c5c11 into apache:main Oct 25, 2023
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants