Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1310] Basic operator to run docker container on Kubernetes #2456

Closed

Conversation

mdziemianko
Copy link
Contributor

Dear Airflow maintainers,

The PR contains implementation of the hook and operator that allow creation,
deployment and tracking of docker containers to Kubernetes pods.
The hook supports crating the specification, deploying,
and deleting the pod, while the operator allows for fire-forget
or wait for completion execution modes.
The feature was requested here https://issues.apache.org/jira/browse/AIRFLOW-1310

Suitable test cases provided.
The operator is being used for last few weeks without any major issues in our internal deployments.

@codecov-io
Copy link

codecov-io commented Jul 18, 2017

Codecov Report

Merging #2456 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2456   +/-   ##
=======================================
  Coverage   69.99%   69.99%           
=======================================
  Files         146      146           
  Lines       11454    11454           
=======================================
  Hits         8017     8017           
  Misses       3437     3437

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bc248f...e65f7ee. Read the comment docs.

@mdziemianko mdziemianko force-pushed the AIRFLOW-1310-kubernetes branch 3 times, most recently from 197cca8 to 42bd04a Compare July 19, 2017 09:54
@bolkedebruin
Copy link
Contributor

What's the relationship with #2414 ?

@mdziemianko
Copy link
Contributor Author

As far as I understand #2414 is a larger effort to provide native K8s executor. This is a simple operator that allows spawning any docker based container on a (remote) K8s cluster. Does not require k8s executor so you might stay on your local/celery for general operation and just deploy selected containers to k8s. I imagine when the #2414 is ready this becomes less useful for people who use k8s executor as they will be able to run docker containers directly. I suppose both PRs might be integrated/or this replaced when 2414 is ready


sleep(self.poke_interval)
finally:
if (self.wait and self.should_do_cleanup(status)):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status may still be None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop should not exit in a normal way if the status is None. The only way it can happen is an uncaught exception - in this case the whole DAG blows up anyways, I was thinking what to do in such case - simplest way is to set status to Failed, but thats a lazy solution. A good suggestion welcome!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more that the pod will never be cleaned up if there was an exception of some sorts. Which will require manual intervention if pod names are not forced to be unique.

try:
status = None
while not (status in exit_statuses):
status = hook.get_pod_state(pod)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pod logs should be captured.

poke_interval=3,
*args, **kwargs):
super(KubernetesPodOperator, self).__init__(*args, **kwargs)
self.image = image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably more flexible to allow passing in a pod manifest, so you can template more complex pods in the DAG file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to add it in next step - currently supporting the basic case where you want to run simple container in a simple pod

*args, **kwargs):
super(KubernetesPodOperator, self).__init__(*args, **kwargs)
self.image = image
self.name = name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pod names must be unique within the namespace. If a pod failed and wasn't cleaned up, this might block execution. Not sure if it should be the executors task of guarannteeing uniqueness or if the user should take care of this in the DAG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say - hard to tell whose responsibility it is. I guess the best solution is to provide 2 params "name" and "name_prefix" or something like that - one is used directly, while the other is used with a random string appended - this way the dev has a choice what he want to do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a unique_name flag which defaults to True is a more clear solution. If it's true, name may then even be optional, you can just construct it from available DAG data.

@mdziemianko mdziemianko force-pushed the AIRFLOW-1310-kubernetes branch 4 times, most recently from 8efe55e to 8810119 Compare August 6, 2017 14:02
The commit contains implementation of the hook and operator
that allow creation, deployment and tracking of docker containers
to Kubernetes pods.
The hook supports crating the specification, deploying,
and deleting the pod, while the operator allows for fire-forget
or wait for completion execution modes.
@stale
Copy link

stale bot commented Dec 10, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 10, 2018
@stale stale bot closed this Dec 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants