Skip to content
Switch branches/tags


Failed to load latest commit information.


Serverless containers on AWS

Deploy, manage, and scale containers without managing infrastructure.

Scale realtime, batch, and async workloads

Realtime - respond to requests in real-time and autoscale based on in-flight request volumes.

Batch - run distributed and fault-tolerant batch processing jobs on-demand.

Async - process requests asynchronously and autoscale based on request queue length.

$ cortex deploy

creating realtime text-generator
creating batch image-classifier
creating async video-analyzer

Allocate CPU, GPU, and memory without limits

No resource limits - allocate as much CPU, GPU, and memory as each workload requires.

No cold starts - keep a minimum number of replicas running to ensure that requests are handled in real-time.

No timeouts - run workloads for as long as you want.

$ cortex get

WORKLOAD             TYPE         REPLICAS
text-generator       realtime     32
image-classifier     batch        64
video-analyzer       async        16

Control your AWS spend

Scale to zero - optimize the autoscaling behavior of each workload to minimize idle resources.

Multi-instance - run different workloads on different EC2 instances to ensure efficient resource utilization.

Spot instances - run workloads on spot instances and fall back to on-demand instances to ensure reliability.

$ cortex cluster up

c5.xlarge       $0.17     yes      0-100
g4dn.xlarge     $0.53     yes      0-100