Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for AWS Step functions #2

Open
romain-intel opened this issue Dec 2, 2019 · 10 comments
Open

Support for AWS Step functions #2

romain-intel opened this issue Dec 2, 2019 · 10 comments
Assignees
Labels

Comments

@romain-intel
Copy link
Contributor

@romain-intel romain-intel commented Dec 2, 2019

Metaflow on AWS currently requires a human-in-the-loop to execute and cannot automatically be scheduled. Metaflow could be made to work with AWS Step functions to allow the orchestration of Metaflow steps to be done by AWS.

@savingoyal savingoyal self-assigned this Dec 2, 2019
@gonzalodiaz

This comment has been minimized.

Copy link

@gonzalodiaz gonzalodiaz commented Dec 3, 2019

I just arrived to Metaflow and I'm thrilled to give it a try in my company.
Currently we are using Airflow on Kubernetes to schedule workflows. I would like to hear if you analyzed the possibility of scheduling Metaflow over Airflow. And if it would be possible to use K8s as infrastructure to run the steps. Thanks!

@savingoyal

This comment has been minimized.

Copy link
Contributor

@savingoyal savingoyal commented Dec 3, 2019

Hi @gonzalodiaz
Thanks for giving Metaflow a try. We follow a plugins based architecture and it is indeed possible to schedule flows over Airflow and use K8s as the compute substrate and something we would like to offer in the near future. We welcome feature requests. Please open one.

@thundergolfer

This comment has been minimized.

Copy link

@thundergolfer thundergolfer commented Dec 4, 2019

Is your team familiar with https://github.com/argoproj/argo? In theory you could compile your Flows down into Argo's workflow spec format (JSON/YAML) and then Argo could take care of execution.

@savingoyal

This comment has been minimized.

Copy link
Contributor

@savingoyal savingoyal commented Dec 4, 2019

Thanks for the link. Yes I am familiar with argo but haven’t looked at it in depth.

@impredicative

This comment has been minimized.

Copy link

@impredicative impredicative commented Dec 26, 2019

Metaflow on AWS currently requires a human-in-the-loop to execute and cannot automatically be scheduled. Metaflow could be made to work with AWS Step functions to allow the orchestration of Metaflow steps to be done by AWS.

Given that Metaflow is evidently seriously lacking a scheduler, either Step Functions or better yet an open source component of Metaflow itself can probably fill in the gap. Without a scheduler, indeed it seems to be an incomplete solution.

@codypenta

This comment has been minimized.

Copy link

@codypenta codypenta commented Jan 13, 2020

For step function integration, is it possible to incorporate https://github.com/aws/aws-step-functions-data-science-sdk-python?

@impredicative

This comment has been minimized.

Copy link

@impredicative impredicative commented Jan 13, 2020

For step function integration, is it possible to incorporate https://github.com/aws/aws-step-functions-data-science-sdk-python?

As an observer, I don't see any need for AWS Step Functions integration since Metaflow should be able to manage workflow steps directly. Why pay extra for Step Functions?

@hgahlot

This comment has been minimized.

Copy link

@hgahlot hgahlot commented Jan 13, 2020

AWS Step Functions need to be scheduled through CloudWatch. They do not have an in-built scheduler. However, CloudWatch has a direct integration with Step Functions. It might be better to look into how CloudWatch + Lambda may be leveraged to act as a scheduler for Metaflow, separate from Step Functions.

Metaflow could be made to work with AWS Step functions to allow the orchestration of Metaflow steps to be done by AWS.

Metaflow is an orchestrator itself so I think the only missing piece is to figure out the scheduling aspect. Using Step Functions as an orchestrator just because we need it to schedule Metaflow workflows is an overkill, IMO.

@impredicative

This comment has been minimized.

Copy link

@impredicative impredicative commented Jan 13, 2020

Metaflow could in principle then manage those Cloudwatch Events and Lambdas too using a single combined job+schedule definition. This would be the simplest scheduler integration assuming one cannot be built-in or integrated into Metaflow directly. I would still prefer the integration and use of an open source scheduler into Metaflow though to avoid the reliance on Cloudwatch Events and Lambdas.

@steveash

This comment has been minimized.

Copy link

@steveash steveash commented Jan 15, 2020

Also maybe check out Glue Workflows which are a little more DAG-like compared to the Step Functions model https://docs.aws.amazon.com/glue/latest/dg/workflows_overview.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.