-
Notifications
You must be signed in to change notification settings - Fork 325
[Fargate] [request]: Add higher vCPU / Memory options #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
yes, pls support more cpu and more ebs. @abby-fuller , does AWS have any plan on this? |
Any SLA for this? Currently Fargate implementation provides general-purpose CPU cycle speed (2.2GHz- 2.3GHz) for us and not capable of running CPU/GPU critical applications. |
Please extend the current vCPUs limit. It will really help a lot of our Customers. |
Any idea if other CPU option are even considered? This is 'proposed' for a while without feedback from AWS. |
Recently discovered Fargate and the idea of not having to manually provision instances, keep the underlying instance up to date, etc. is very attractive. However, a particular workload we have, which involves some ML model evaluation, requires somewhere from 60-100GB of RAM. Will likely fall back to a dedicated ECS/EC2 cluster with autoscaling, but would rather let Fargate handle the process... |
I had similar cpu spikes issue when migrated to fargate. In the task definition I have specified 4 cores (maximum resources) initially and which was not sufficient for my app. After specifying cpu in the container >> Environment >> cpu to 4096, the cpu spike is comparatively less. This cpu field is optional for fargate launch type but without it I guess it will only allocating 4 cores for the total number of tasks running, though I am not sure. @luxaritas Similarly for memory if we specify the to 32G in container >> Environment >> memory, it might fix your issue. Aws definitely needs to upgrade cores to a minimum of 8. |
@abby-fuller any update on this request? 8 vcpu would be a start! |
Fargate Compute Optimized CPUs ftw! |
Increase maximum allowed vCPU / Memory resources of Fargate tasks minimum to 8. |
any update on this ? |
Can the newly launched AWS App Runner run on more CPUs? Is that an alternative to Fargate? |
No. App Runner can run a subset of the native Fargate options (App Runner can run applications with up to 2 vCPU and 4GB of memory per instance) |
I actually suspect that App runner internally is implemented on Fargate. Such like Fargate is implemented on EC2 etc. |
App Runner is indeed built on top of Fargate but (today) it cannot be configured to take advantage of all the CPU/mem configurations "raw" Fargate offers. |
At least 8 vcpu will be helpful. this is holding up back from migrating fully to fargate. |
I second that, more vCPU would be very useful |
Second that, more vCPU on Fargate would be very useful |
I see that this issue is categorized in the "Researching" project; is there any estimate of when we might see some additional progress on this task? Would love to deploy one of our core services (with high CPU demand) to Fargate, but will likely only be able to do so once Fargate supports up to 12 vCPUs :) |
The compute limits are far too restrictive for many applications (e.g. custom video/audio processing). |
Thanks @peegee123 for the feedback. We heard this loud and clear and we want to lift this limitation. Stay tuned. |
We are indeed also heavily relient on fargate and are constantly maxing out our containers now, we would throw more money your way if we can more VCPUS (16 maybe) and up until 64 GB of ram :) ! Some people have asked how we solve it now: Basically, we have a ECS ASG set to desired capacity of 0, then if we add a task with a placement constraint of XXL or XXXL it will create a new machine (EC2) for that instance and it will place the task there. When the task is done running, the EC2 instance will be destroyed so you basically also only pay for what you use. When Fargate gets higher VCPU and Memory usage options, we would drop this approach. |
The ability to be more flexible with the selections would be nice as well. Currently it is not possible to do something like 2 cores and 2gb of memory. |
i am using 4gb ram and 4vcpu of fargate? is it possible to dynamically allocate hardware for every request... |
@jbidinger we are actively working on it. |
This is great, I've began to impement this since I had similar problems when running a task in FARGATE. For reference, here is a snippet of my aws cdk code.
EDIT: In particular step 4. Which states that after 15 minutes (or 15 datapoints) it scales in (down). So after waiting 15 mins I achieve the desired outcome. What I dont have found is how to configures this interval. Ideally I would like a scale in directly after a task is finished but if I can configure this to 1-5 mins I would be happy. |
Quick update: this is a top priority for us, we're actively developing on it, and will move it to the next roadmap phase within a few weeks. |
Would it be possible to provide a rough estimation of when this would be ready and what would be the new memory and vCPU limits that you are aiming for? It would be very useful if this information could be shared (the answer would not be taken as a commitment but rather as an indication). Thanks in advance. |
Any updates about that? We are starting to consider the change to EC2 because of this limitation. |
@luiszimmermann it's coming soon. If you have an AWS employee you work with (AM/TAM/SA/etc), can you ask them to ping me referencing you and this thread? If you don't have AWS contacts you work with, can you send me an email to mreferre at amazon dot com? Thanks. |
As @ghomem asked, can you release any information about how high the memory and vCPU limits might be set? It would be helpful to have even a ballpark estimate to know if these changes could make Fargate a viable option for us. |
@ascrookes before the launch, we can't release publicly more information. Sorry about that. |
Following |
We're half way through 2022 and ECS Fargate still has a maximum of 4 vCPUs - this really doesn't play nice with enterprise clients running Java in containers 🤣 |
Thanks for the patience on this. We'll announce increased vCPU and memory options for Fargate-based tasks and pods within a few weeks. |
Would love to be able to run 60G tasks to spin up ephemeral databases for adhoc data processing. |
This limitation is holding us back too, as we're relying on node worker threads to parallelise work which can't realistically be spread across network communications. |
Hi @omieomye - it has been 25 days since you mentioned that the increased vCPU and memory options will be available in a few weeks. Can you share an update on the timing of the launch of these new options? Is it likely to be a few days, a few more weeks or a few more months? I'm asking because one of my Fargate workloads is hitting the 30GB memory limitation, and I'm trying to decide whether to abandon Fargate and run it on an ECS cluster or EC2 instance, or wait until the increased limits launch. Thanks. [Edit: grammar] |
Looking forward to increased vCPU options, it will definitely be one of the greatest milestones for the Fargate service since it has launched! (After the ECS Exec) |
Hi, guys. To workaround this, I've built two callbacks on Apache Airflow to create ECS cluster using EC2 container instance. You can choose any desired hardware, but it's required use a default AMI with ECS capabilities. You can use boto3 directly to integrate with AWS too as alternative. Declare this constants: ECS_EC2_AMI = "ami-09ce6553a7f2ae75d"
ECS_EC2_IAM_ROLE = "ecsInstanceRole"
ECS_ARGS = {
"region_name": "YOUR-REGION",
"aws_conn_id": "aws_default",
"task_definition": "taskdef",
"cluster": "YOUR-CLUSTER-NAME",
"launch_type": "FARGATE",
"platform_version": "1.4.0",
"network_configuration": {
"awsvpcConfiguration": {
"securityGroups": ["YOUR-SECURITY-GROUPS"],
"subnets": ["YOUR-SUBNETS"],
"assignPublicIp": "ENABLED",
}
},
"retries": 0,
} Function to create cluster (create cluster, create instance, associate EC2 instance, await instance state to ready): import pprint
import traceback
import time
import boto3
def on_success_callback_start_ecs_cluster_container_instance(kwargs, instance_type):
try:
has_container_instance = False
if instance_type:
## Force delete before create to not create many EC2 Instances
on_callback_stop_ecs_cluster_container_instance(kwargs)
region_name = ECS_ARGS.get("region_name")
ecs_client = boto3.client("ecs", region_name=region_name)
ec2_client = boto3.client("ec2", region_name=region_name)
ec2_resource = boto3.resource("ec2", region_name=region_name)
cluster_name = kwargs["dag_run"].conf.get("cluster_name")
ecs_client.create_cluster(
clusterName=cluster_name,
tags=[
{"key": "PROJECT", "value": str(cluster_name).upper()},
],
)
pprint.pprint(
f"Cluster {cluster_name} created with success on {ECS_ARGS.get('region_name')}"
)
security_group_ids = (
ECS_ARGS.get("network_configuration")
.get("awsvpcConfiguration")
.get("securityGroups")
)
subnet_id = (
ECS_ARGS.get("network_configuration")
.get("awsvpcConfiguration")
.get("subnets")[1]
)
ec2_response = ec2_client.run_instances(
ImageId=ECS_EC2_AMI,
MinCount=1,
MaxCount=1,
SecurityGroupIds=security_group_ids,
SubnetId=subnet_id,
InstanceType=instance_type,
IamInstanceProfile={"Name": ECS_EC2_IAM_ROLE},
TagSpecifications=[
{
"ResourceType": "instance",
"Tags": [
{
"Key": "PROJECT",
"Value": str(cluster_name).upper(),
},
],
},
],
UserData=f"""
#!/bin/bash
echo ECS_CLUSTER={cluster_name} >> /etc/ecs/ecs.config
echo ECS_AVAILABLE_LOGGING_DRIVERS='["json-file","awslogs"]' >> /etc/ecs/ecs.config
echo ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true >> /etc/ecs/ecs.config
""",
)
pprint.pprint(f"EC2 create response: {ec2_response}")
instance_id = ec2_response.get("Instances")[0].get("InstanceId")
instance = ec2_resource.Instance(instance_id)
pprint.pprint(f"Waiting instance {instance_id} start on {region_name}...")
instance.wait_until_running()
pprint.pprint(
f"Instance {instance_id} is running on {ECS_ARGS.get('region_name')}"
)
while not has_container_instance:
response = ecs_client.list_container_instances(cluster=cluster_name)
if response["containerInstanceArns"]:
has_container_instance = True
print("Container instance installed on cluster.")
else:
print(
"There's not a container instance installed on cluster. Repeating..."
)
time.sleep(20)
except Exception:
traceback.print_exc()
raise Exception("Internal error") Stop Cluster (drain EC2, delete EC2, await EC2 be deleted, drop cluster): import pprint
import traceback
import time
import boto3
def on_callback_stop_ecs_cluster_container_instance(kwargs):
response = None
region_name = ECS_ARGS.get("region_name")
cluster_name = kwargs["dag_run"].conf.get("cluster_name")
ecs_client = boto3.client("ecs", region_name=region_name)
ec2_client = boto3.client("ec2", region_name=region_name)
ec2_resource = boto3.resource("ec2", region_name=region_name)
try:
response = ecs_client.list_container_instances(cluster=cluster_name)
except:
print(f"Cluster {cluster_name} not found. Skipping delete cluster.")
if response and response["containerInstanceArns"]:
container_instance_resp = ecs_client.describe_container_instances(
cluster=cluster_name, containerInstances=response["containerInstanceArns"]
)
for ec2_instance in container_instance_resp["containerInstances"]:
ec2_client.terminate_instances(
DryRun=False,
InstanceIds=[
ec2_instance["ec2InstanceId"],
],
)
instance_id = ec2_instance["ec2InstanceId"]
instance = ec2_resource.Instance(instance_id)
pprint.pprint(
f"Waiting instance {instance_id} be terminated on {region_name}..."
)
instance.wait_until_terminated()
pprint.pprint(f"Instance {instance_id} was terminated on {region_name}")
ecs_client.delete_cluster(cluster=cluster_name)
pprint.pprint(f"Cluster {cluster_name} deleted with success on {region_name}") |
When can we expect the increase of vCPU's? 4 vCPU's is not enough for a workload we are running. |
Our testing and staging setups are running on FARGATE but the production needs higher limits. We added EC2 on prod but this is something we wanted to avoid. I hope AWS announces by this month. |
Hitting the vCPU limit doing some video transcoding tasks every now and then. Badly need the vCPU increase. |
Seems like the delay is because they first need to migrate the old Fargate quota system to the new vCPU-based quota system: The article explains good reasons for this:
If I'm right, then I would guess that the higher vCPU's feature will be rolled out in either October or November, based on the transitional rollout dates for the new quota system (dates available in the linked blog post). Hopefully someone from AWS can give us an official update since we haven't had one since the start of July. @omieomye , do you have any new information you can share? |
Yeah. this is promising... |
Following. |
The team (that has been working hard on this) appreciates the enthusiasm in this thread. Hang in there. |
As many will notice, we've begun rolling out higher resource configurations (more vCPU and memory) options. We'll make a formal announcement soon. Ensure that you first opt-in your accounts to vCPU-based quotas, which we announced last week (post, FAQs) before using 8 and 16 vCPU tasks on Fargate. Opt'ing in to vCPU-based quotas for an ECS customer is simple, we recommend you use the ECS PutAccountSettingDefault API call. For EKS customers, or if you don't want to use the API to use vCPU-based quotas, file a request in the AWS Support Center console. Echo'ing what @mreferre said, thanks for the patience! |
Thanks, that's all working well for me. Congrats on the launch. |
Announcement. Thanks for the engagement, all! Please note that to use this, first opt-in your accounts to vCPU-based quotas. ECS Fargate customers can easily opt-in to vCPU-based quotas using the PutAccountSettingDefault API, before their accounts run larger tasks. EKS Fargate customers can cut us a ticket. Closing. |
Has anyone been successful in opting-in to vCPU-based quotes on EKS? I created a customer support ticket for this on AWS, and they had no idea how to do it. |
Uh oh!
There was an error while loading. Please reload this page.
Tell us about your request
Increase maximum allowed vCPU / Memory resources of Fargate tasks.
Which service(s) is this request for?
Fargate
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I want to offload computationally heavy tasks, that can only be locally parallelized to Fargate. Without having to boot an EC2 instance and its associated maintenance overhead.
An example of such a task is the compilation of GHC (the Haskell compiler). Its build system allows parallel computation, but no distribution.
Are you currently working around this issue?
Considering to script the use of a bigger EC2 instance, with its associated maintenance overhead.
Additional context
None.
The text was updated successfully, but these errors were encountered: