-
Notifications
You must be signed in to change notification settings - Fork 111
Description
Background
This issue is the follow-up for #3740 (comment). As shown in #3740 (comment), when we were trying to solve the cost issue in Texera development Phase 2, we first tried to use AWS Lambda, which seemed to be the simplest way to deploy. But later we found out that AWS Lambda functions cannot be invoked using HTTP/HTTPS requests. Thus AWS Lambda is not suitable for our use cases. Then we moved on to AWS ECS and AWS Fargate to test their feasibility. The result is quite promising and we show the details as below.
Introduction
The architecture consists of four primary AWS components working in concert:
Application Load Balancer (ALB): A Layer load balancer serving as the ingress point. It listens for HTTP requests and routes them to registered targets within a specified target group (this is one of the biggest differences from lambda)
Amazon ECS Service: A logical controller that maintains a declared number of task instantiations (Desired Count) from a specific task definition. It is responsible for service discovery and integrating with the ALB and Application Auto Scaling.
AWS Fargate: A serverless compute engine that provides the underlying compute plane for the ECS tasks. It abstracts away all server management.
Application Auto Scaling & CloudWatch: The monitoring and control loop. CloudWatch tracks the ECSServiceAverageCPUUtilization metric, and its alarms trigger scaling actions defined in Application Auto Scaling.
The Role of AWS Fargate
When the ECS Service needs to launch a task (either at initial deployment or during a scale-out event), it dispatch the request to Fargate. When a task is finished, the ECS Service instructs Fargate to terminate the task. Fargate then reclaims the compute resources. The key point is that Fargate handles the entire flow, without the need to manually allocate/de-allocate resources.
#3740 # Some Other Key Points
The following sequence details the auto-scaling process from load generation to system stabilization:
- Auto Scale-up/Scale-down: There are three methods to set the threshold to scale-up/scale-down, and CPU usage is the most commonly used one. Users specify the maximum percentage of CPU usage before the task is deployed. If, in the process, the usage exceeds that threshold, then more CPU (more resources) will be allocated. Other metrics include the RAM usage, and the number of commands coming in.
- Alarm State Transition: The Target Tracking scaling policy creates a CloudWatch alarm. When the
ECSServiceAverageCPUUtilizationmetric exceeds the defined target value (e.g., 70%) for the configured number of evaluation periods (e.g., 3 consecutive periods of 1 minute), the alarm's state transitions fromOKtoIN_ALARM. - Scaling Policy Invocation: The
IN_ALARMstate triggers the associated Application Auto Scaling policy. The policy calculates the required number of new tasks needed to bring the average CPU back down to the target value. It then makes an API call to update theDesired Countof the ECS Service. - Fargate Task Launch: The ECS Service scheduler detects that the
Running Countis less than the newDesired Count. It invokes the FargateRunTaskAPI, providing the task definition and network configuration. Fargate then provisions and launches the new task. - Load Rebalancing: Once the new Fargate task is running and passes the ALB's health checks, the ALB adds its ENI's private IP to its list of active targets. It immediately begins routing a portion of the incoming requests to this new task, thereby reducing the CPU load on the original task and stabilizing the service's overall average CPU.
Some More Things to Know
- A very common misunderstanding: If I set the threshold to be the usage of CPU, I may not be able to wake up the Fargate service. Because the service remains shutdown and thus cannot take in any coming requests. So the service can never be waken up (because there are literally no requests that can come to the mind of a sleeping man). HOWEVER, that's wrong! AWS Fargate can be configured in such a way that whenever a request comes in, it can always wake the system up (even though it is now shutdown and seems to be unable to take in any requests).
- There are three and only three ways to auto scale-up/scale-down Fargate service. The threshold can be set to be CPU usage, RAM usage, and the number of requests coming in the system. Note that even if there are no user requests coming in, the CPU and RAM may still not be "vacant" enough to meet the threshold. That's because the system itself will use resources for its own management and logistics. So a more common and safer way to set a threshold is to use the number of incoming requests.
- You can always test the scalability of the service by directly sending a request using http! In fact, you can even directly access to that http address. Even though most certainly it will show the service is not there, there should be something showing up (instead of a blank page).