Skip to content

Terraform module for creating EC2 spot instances with GPUs

License

Notifications You must be signed in to change notification settings

buildstar-online/aws-tf-starter

Repository files navigation

AWS Starter-Project

Create and manage AWS resources using Terraform and Github Actions


TODO: customize the user-data files to use gaming drivers as well: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html#nvidia-gaming-driver

Prerequisites

  1. An AWS account

    This project uses non-free resources. You will need to sign up for an AWS account, verify your identity as well as provide a payment method. One of the benefits of automating your cloud projects with Terraform is the ease with which you may re-create and destroy cloud resources. Make use of this festure to turn off your project when it is not in use.

  2. AWS CLI

    You will need the AWS cli tool to authenticate your innitial account as well as create some base resources and permissions that will allow terraform to control your project.

  3. Terraform

    You will need terraform to manage all of the terraform (obviously). Be aware that terraform doesn't have ARM64 support yet so M1/M2 mac users will need to use the docker version of the cli with the --platform linux/amd64 flag.

  4. Resource Quotas (Optional)

    AWS as well as most other cloud providers make use of Quotas to limit the amount of resources customers can create. This prevents abuse of their free-tier as well as stops customer from accidentially letting autoscaling generate massive bills. If you plan on deploying GPU/TPU accelerators or more than a couple VMs, you will need to request a quota increase for those resources. See below for more information.

  5. Infracost (Optional)

    Infracost shows cloud cost estimates for Terraform. It lets engineers see a cost breakdown and understand costs before making changes, either in the terminal, VS Code or pull requests.

Get Started

  1. Create the Terraform state bucket

    export REGION="eu-central-1"
    export STATE_BUCKET="buildstar-terraform-state"
    
    aws s3api create-bucket --bucket $STATE_BUCKET \
      --region $REGION \
      --create-bucket-configuration \
      LocationConstraint=$REGION
    
    aws s3api put-bucket-encryption --bucket $STATE_BUCKET \
      --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'
    
    aws dynamodb create-table --table-name Terraform-backend-lock \
      --attribute-definitions AttributeName=LockID,AttributeType=S \
      --key-schema AttributeName=LockID,KeyType=HASH \
      --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
  2. Create IAM Roles

    aws iam create-policy --policy-name Terraform-Backend-Policy-S3 \
      --policy-document file://s3-policy.json
    
    aws iam create-policy --policy-name Terraform-Backend-Policy-DynamoDB \
      --policy-document file://dynamo-policy.json
  3. Create an IAM User

    export ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
    
    aws iam create-user --user-name terraform
    
    aws iam create-access-key --user-name terraform
    
    aws iam attach-user-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/Terraform-Backend-Policy-S3  \
      --user-name terraform
    
    aws iam attach-user-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/Terraform-Backend-Policy-DynamoDB  \
      --user-name terraform
  4. Initialize Terraform

    export AWS_ACCESS_KEY_ID=""
    export AWS_SECRET_ACCESS_KEY=""
    
    docker run --platform linux/amd64 \
      -e "AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID" \
      -e "AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY" \
      -v $(pwd):/terraform -w /terraform hashicorp/terraform init -upgrade

Instance Types

This project is tested using the G5, G4dn, and G3 sinatnce tiers which utilize Nvidia A10, T4 and M60 GPUs. If you do not need a GPU, you are advised to consider Hetzner or Equinix who have better prices on CPU-only instances. Buildstar Online has a quota maximum limit of 48 vCores in eu-central.

  • G5 instances feature NVIDIA A10G Tensor Core GPUs and second generation AMD EPYC processors.

    Instance Size GPUs GPU RAM vCPUs RAM (GiB) Disk Size (GB) Network (Gbps) Price
    g5.xlarge 1 24 4 16 250 10 $0.3774
    g5.2xlarge 1 24 8 32 450 10 $0.4547
    g5.4xlarge 1 24 16 64 600 25 $0.6092
    g5.12xlarge 4 96 48 192 3800 40 $2.1278
  • G4dn instances feature NVIDIA T4 GPUs and custom Intel Cascade Lake CPU

    Instance Size GPUs GPU RAM vCPUs RAM (GiB) Disk Size (GB) Network (Gbps) Price
    g4dn.xlarge 1 16 4 16 125 25 $0.1974
    g4dn.2xlarge 1 16 8 32 225 25 $0.282
    g4dn.4xlarge 1 16 16 64 225 25 $0.462
    g4dn.12xlarge 4 64 48 192 900 50 $1.467
  • G3 instances provides access to NVIDIA Tesla M60 GPUs, each with up to 2,048 parallel processing cores and 8 GiB of GPU memory in a dual-socket Intel Xeon E5 2686 v4 system.

    Instance Size GPUs GPU RAM vCPUs RAM (GiB) Disk Size (GB) Network (Gbps) Price
    g3s.xlarge 1 8 4 16 not included 20 $0.2814
    g3.4xlarge 1 8 16 32 not included 20 $0.4511
    g3.8xlarge 2 16 32 64 not included 20 $0.855

Requirements

Name Version
aws ~>5.14.0

Providers

Name Version
aws ~>5.14.0

Modules

Name Source Version
ec2_instance terraform-aws-modules/ec2-instance/aws n/a
security_group terraform-aws-modules/security-group/aws n/a
vpc terraform-aws-modules/vpc/aws n/a

Resources

Name Type
aws_ami.ubuntu data source
aws_caller_identity.current data source

Inputs

Name Description Type Default Required
database_subnets n/a list(string) n/a yes
dhcp_options_domain_name n/a string n/a yes
ec2_instance_type n/a string n/a yes
environment_name n/a string n/a yes
ingress_rules n/a list(string) n/a yes
max_spot_price n/a number n/a yes
private_subnets n/a list(string) n/a yes
project_azs n/a list(string) n/a yes
project_name n/a string n/a yes
public_subnets n/a list(string) n/a yes
rbs_encrypted n/a string n/a yes
rbs_iops n/a number n/a yes
rbs_size n/a number n/a yes
rbs_type n/a string n/a yes
region n/a string n/a yes
spot_type n/a string n/a yes
vpc_cidr n/a string n/a yes

Outputs

Name Description
public_ip_address n/a

About

Terraform module for creating EC2 spot instances with GPUs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages