Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Batch backend #423

Open
wlandau opened this issue Oct 11, 2020 · 6 comments
Open

AWS Batch backend #423

wlandau opened this issue Oct 11, 2020 · 6 comments

Comments

@wlandau
Copy link

wlandau commented Oct 11, 2020

I propose AWS Batch as a new clustermq scheduler. Batch has become extremely popular, especially as traditional HPC is waning. I have a strong personal interest in making Batch integrate nicely with R (ref: ropensci/targets#152, ropensci/tarchetypes#8, https://wlandau.github.io/targets-manual/cloud.html).

Batch is super easy to set up through the AWS web console, and I think it would fit nicely into future's ecosystem: maybe with something like future::plan(future.aws.batch::future_aws_batch, template = "batch.tmpl"), where batch.tmpl contains an AWS API call with the compute environment, job queue, job definition, and key pair. I think we could use curl directly instead of the much larger and rapidly developing paws package. The tricky part is how we retrieve the data back from an AWS Batch job. I'm not sure how to do that yet.

@wlandau wlandau changed the title AWS Batch AWS Batch backend Oct 11, 2020
@HenrikBengtsson
Copy link
Owner

I'm all supportive for this - AWS Lambda and AWS Batch been on my radar for a while. My hope was that there would be a low-level R API that could be leveraged for this. There have been different efforts on AWS Lambda but I don't they've taken off.

Should we have another call on this? It'll help me clarify a few things related to the future roadmap.

@wlandau
Copy link
Author

wlandau commented Oct 12, 2020

Awesome! I would love to chat about this, and I can definitely make time after R/Pharma (Oct 13-15).

paws is ostensibly capable of setting up the web API calls to submit jobs to Batch (https://github.com/paws-r/paws/blob/main/examples/batch.R). However, I am not sure how to communicate with Batch workers. I could easily see that as enough motivation for a new R API.

@wlandau
Copy link
Author

wlandau commented Nov 11, 2020

From mschubert/clustermq#208 (comment), it seems possible for clustermq to support an AWS backend (Batch or similar), and then future could interact with it through future.clustermq.

Should we have another call on this? It'll help me clarify a few things related to the future roadmap.

I would be happy to arrange something on Google Meet for us and @mschubert. Does that still sound good?

@wlandau
Copy link
Author

wlandau commented Jan 21, 2021

Should I open a separate issue for Lambda? I think we agreed this may be easier to start with, especially with @davidkretch's nice demo.

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Jan 28, 2021

I've created https://github.com/HenrikBengtsson/future.lambda with the goal of implementing support for plan(future.lambda::lambda).

@wlandau
Copy link
Author

wlandau commented Jan 29, 2021

Fantastic! Eager to try when it is ready. (Currently working on getting access to my company's AWS resources.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants