-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Fargate GPU Support: When is GPU support coming to fargate? #88
Comments
Hi There, can you give us more details about your use case? Instance type, CUDA version, and more info about what you're trying to do - workload, etc.? Thanks. |
We would like to run object detection on Fargate. Setup: Does Fargate have some concept of reserved instance discounts in EC2 or Sustained usage discounts? |
No |
I have a similar use case. I'd like to run deep learning inference tasks on CUDA-capable GPUs on Fargate (edit: or Lambda), and pay per second of usage. The specific use case is inference tasks which are run fairly seldom, but need to respond in seconds, rather than minutes. In other words, waiting a few minutes for an EC2 instance to boot up, just doesn't cut the mustard. But neither does the application need to be taking up a GPU 24/7 unproductively, just to run the inference job for a minute or two, twice a day. Edit: By mid-2021, extremely easy quantization and optimization, along with with better models, have removed my need for this use case - but I suppose the people giving the comment the thumbs up might still have something going on in this direction. |
I also have an inference use-case where we would like to be able to autoscale inference sqs workers in Fargate. We originally tried to use ECS, but found it too cumbersome to scale both the containers and the EC2 instances, so we are currently just using EC2 instances with an autoscaling group. We considered using Sagemaker, but that will require some engineering effort for us to adapt our architecture and models. |
I'd be interested in this too and have similar usecases as above. |
I have a use case for this too, where we want to spin up GPU resources to do live video streaming of a WebGL application but be able to relinquish those completely after the stream ends, with minimal start up time or over-metering. In our case, we would need the ability to run an X11 server with GPU hardware acceleration. |
@mbnr85 I too am trying to do object detection on fargate. |
When training data science models our workloads can take advantage of GPU compute. To start those workloads will run in ECS although eventually we’d likely migrate those to EKS. We’d like to be able to use Fargate to run GPU accelerated workloads but that is not currently supported. Does AWS have GPU compute on the Fargate roadmap, and if so, is there any timeline that can be shared? |
Also interested for machine learning... |
Interested for ML training and inference as well. The overhead to transfer to sagemaker is too high, we just train models on EC2 GPU boxes and then use CPU runtime for inference on Fargate instances. However, some models would benefit from GPU at inference time (namely those trained on CUDA specific implementations, which as of now we are not using for lack of inference infrastructure). The inference use case is sporadic, such that a full-time EC2 box is too pricey. |
@romanovzky We both are on the same boat I guess. I too am in a similar situation. |
I too am looking forward for this feature. My use-case: I need to run jobs that benefit from GPU acceleration (mostly model inference and some CPU bound tasks eg. embedding clustering, DB insertions etc.). Each job takes around 10-15 mins on a p2.xlarge. I receive 100-120 such jobs through the day (get 8-10 jobs in the span of 30 sec at max). My requirement: A server-less GPU container solution. My current solution: My GPU utilizing containers run as custom Sagemaker training jobs. Advantages:
Disadvantages:
|
Also.... For example (an InternalError that can occur when attempting to get a RefineNet predictions on CPU): I too support GPU support with Fargate |
We would like to call from a Docker container (RStudio) several others for a distributed deep/machine learning training using Fargate/AWS Batch. The results should be saved on S3 and wrote back to the RStudio Docker container. Unfortunately, Fargate shows no support for GPUs. |
I would also like to launch GPU containers from Fargate. I have two use-cases: 1. spawning powerful deep learning Jupyterhub development environments for our machine-learning group's researchers that will effortlessly disappear when the individual Jupyterhub kernel is killed. 2. Infrequent, quickly-scaled, deep (i.e. the use of GPU is justified) inference tasks. a thought: for 2., I hadn't thought of using the suggestion above of an auto-scaling EC2 group (that presumably then use something like a scripted docker-machine command to provision the instance, and launch a kernel container) to run the GPU containers, but this seems like a nasty, expensive (in time and currency) hack for what should be a bit more elegant. |
Any news on this? |
@ClaasBrueggemann I dont think they will provide this anytime soon. AWS is heavily promoting SageMaker now and in many/most cases that's the way to go. :) |
what about for 3d model rendering? we aren't needing this for machine learning. |
+1 for this support. |
In that case getting a GPU instance like P2, G3 etc might help? |
Any SLA for this? Currently Fargate implementation provides general-purpose CPU cycle speed 2.2GHz- 2.3GHz for us and not capable of running CPU/GPU critical applications. |
Fargate does not support GPU and we can expect nearly in future.
|
FWIW, it'd be great to run a typical deep learning experiment queue on something like this. Upload code+configs to S3. Lambda picks up, stuffs it into a container, training runs to completion and saves back to S3. Super simple, very scalable. |
Sounds much more like something that sagemaker would do. |
What is the status of this? I'm very interested in CUDA support in Fargate tasks. |
Can you give a reference for this? |
Any update on this. |
Adding my nudge here on this historic ticket. |
+1 |
For simple Generative AI workloads, fargate+gpu would be really valuable, even more now with so many companies working gen AI features. |
The reality is there's not enough GPU power available for on demand scenarios. Even more, the available supply goes directly to big players. I just opened some AWS accounts for new clients and they start with a zero EC2 GPU quota, have to increasing by submitting an increment request and even after that they don't give you all the availability you ask for rather they told you to wait and see if you really needed more. Looking at the trends over the years GPU will need to become either absurdly cheap or widely available (same thing must of the times ) before we can go on demand mode for later to be IaC available. |
1911 days and counting... |
Hi team, this feature can be very interesting for certain types of inferences, especially considering the weight of ML and IA in general on the overall AWS path. Thank you so much team :) |
Do we have any updates on this? |
Sadly Amazon are letting us down on the AI/ML front by not giving us the flexibility we need to advance. As a result we are falling behind where we should be at this point. The model seems to be that cloud providers know best and will force us down their path. We need GPU access in multiple scenarios. |
Custom Model Import was announced for Bedrock the other day. Not sure of all your use cases, but might be an option for a managed AI model hosting with pay-per-token pricing. https://aws.amazon.com/about-aws/whats-new/2024/04/custom-model-import-amazon-bedrock/ |
"Custom models can only be accessed using Provisioned Throughput." => So no pay-per token. |
I'm working on getting clarification from AWS, but this blog post says that custom model import uses the On-Demand mode. |
. |
What is gpu_count for on FargateTaskDefinition? |
@johnwheeler https://nocd.hashnode.dev/registering-gpu-instance-w-aws-elastic-container-service-ecs |
This is WIP for ECS (edited)
|
updated comment to clarify the feature for GPUs and other advanced capacity |
Tell us about your request
What do you want us to build?
Which service(s) is this request for?
This could be Fargate, ECS, EKS, ECR
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.
Are you currently working around this issue?
How are you currently solving this problem?
Additional context
Anything else we should know?
Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
The text was updated successfully, but these errors were encountered: