Production infrastructure for machine learning at scale
Deploy, manage, and scale machine learning models in production.
Realtime - respond to requests in real-time and autoscale based on in-flight request volumes.
Async - process requests asynchronously and autoscale based on request queue length.
Batch - run distributed and fault-tolerant batch processing jobs on-demand.
Automated cluster management
Autoscaling - elastically scale clusters with CPU and GPU instances.
Spot instances - run workloads on spot instances with automated on-demand backups.
Environments - create multiple clusters with different configurations.
CI/CD and observability integrations
Provisioning - provision clusters with declarative configuration or a Terraform provider.
Metrics - send metrics to any monitoring tool or use pre-built Grafana dashboards.
Logs - stream logs to any log management tool or use the pre-built CloudWatch integration.
Built for AWS
EKS - Cortex runs on top of EKS to scale workloads reliably and cost-effectively.
VPC - deploy clusters into a VPC on your AWS account to keep your data private.
IAM - integrate with IAM for authentication and authorization workflows.