New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Containerize custom tasks #5
Comments
Hey @adlersantos! When the time comes we may also want to look at the |
Hey @leahecole. Thanks for the suggestion! We definitely need a way for pipelines to use their own GKE clusters in various contexts. I'll look into it as I add support for |
Happy to work with you on this. I started going down this road with my interns last summer and I can look back on their notes and share them whenever you're ready |
@leahecole I'm prioritizing this as we're starting to receive onboarding requests with "heavier" workloads. Would be nice to chat with you about it and look at your interns' notes as well. See ya! |
Closing this. We now support operators that can create and delete GKE clusters, plus start GKE pods in those clusters. |
Note: The following is taken from @tswast's recommendation on a separate thread.
What are you trying to accomplish?
One of the Airflow "gotchas" is that workers share resources with the scheduler, so any "real work" that uses CPU and/or memory can cause slowdowns in the scheduler or even instability if memory is used up.
The recommendation is to do any "real work" in one of:
What challenges are you running into?
In the generated DAG, I see the following operator:
I haven't looked closely at the
csv_transform.py
script yet, but I'd expect it to use non-trivial CPU / memory resources.For custom Python scripts such as this, I'd expect us to use the KubernetesPodOperator, where the work is scheduled on a separate node pool.
Checklist
The text was updated successfully, but these errors were encountered: