This CI tool aims to create a virtual machine on AWS with a GPU enabled Kubernetes master/node. It uses Terraform, Docker, Kubeadm, the driver container and the device plugin to setup the VM.
WARNING: This tool is in alpha version
This repository can be used to create and configure a development instance that mirrors the configuration used in CI. Run
terraform initTo initialize the local terraform environment the variables for the plan, apply, and destroy commands are explicitly overridden.
This can be done using the command line as shown below, or by creating a .local-dev.auto.tfvars file with ther required overrides as terraform automatically includes *.auto.tfvars files.
If running outside of the corp VPN, you need to add your IP to the list ingress_ip_ranges at the local-dev.tfvars file first then, Run plan to check that everything is ready:
terraform plan \
-var "region=us-east-2" \
-var "key_name=elezar" \
-var "private_key=/Users/elezar/.ssh/elezar.pem" \
-var-file=local-dev.tfvarswill preview the changes that will be applied. Note this assumes that valid AWS credentials have been configured. This also assumes that an AWS key has already been created with a name elezar using the public key associated with the private key /Users/elezar/.ssh/elezar.pem, so please change accordingly, then create the instance by running apply:
terraform apply \
-auto-approve \
-var "region=us-east-2" \
-var "key_name=elezar" \
-var "private_key=/Users/elezar/.ssh/elezar.pem" \
-var-file=local-dev.tfvarsWill proceed to create the required resources.
Once this is complete, the instance hostname can be obtained using:
export instance_hostname=$(terraform output -raw instance_hostname)And assuming that the private key specified during creation is added to the ssh agent:
eval $(ssh-agent)
ssh-add /Users/elezar/.ssh/elezar.pemrunning:
ssh ${instance_hostname}should connect to the created instance. Alternatively, the identity can be explicitly specified:
ssh -i /Users/elezar/.ssh/elezar.pem ${instance_hostname}To start using the Kubernetes cluster first we retrieve the Kubeconfig file:
scp -i /Users/elezar/.ssh/elezar.pem ${instance_hostname}:/home/ubuntu/.kube/config kubeconfigNow we can use the kubernetes clusters from our host:
kubectl --kubeconfig kubeconfig get node
ip-10-0-0-14 Ready control-plane,master 7m51s v1.23.10In order to remove the created resources, run:
terraform destroy \
-auto-approve \
-var "region=us-east-2" \
-var-file=local-dev.tfvarsBy using the container_runtime variable the provisioned node can be set up to run either docker or
containerd as the container runtime. The default is docker and can be changed using the -var "legacy_setup=false" -var "container_runtime=containerd" command line arguments when running terraform apply (or destroy).
To use this CI tool, you need to:
- Place this directory at the root of your repo, with
aws-kube-cias name (you may want to use submodules). For example:
git submodule add https://gitlab.com/nvidia/container-infrastructure/aws-kube-ci.git- In your
.gitlab-ci.yml, define the stagesaws_kube_setupandaws_kube_clean - In your
.gitlab-ci.yml, include theaws-kube-ci.ymlfile. It is strongly recommended to include a specific version. The version must be a git ref. For example:
include:
project: nvidia/container-infrastructure/aws-kube-ci
file: aws-kube-ci.yml
ref: vX.Y- In your
.gitlab-ci.yml, extends the.aws_kube_setupand.aws_kube_cleanjobs. For example:
aws_kube_setup:
extends: .aws_kube_setup
aws_kube_clean:
extends: .aws_kube_cleanNote that we include project rather than file here because gitlab doesn't support including local files from submodules
- Write a terraform variable file with these variables:
instance_type: The AWS instance typeproject_name: The name of your project
For example:
instance_type = "g2.2xlarge"
project_name = "my-project"
- In your .gitlab-ci.yml, set the
TF_VAR_FILEto the path of the previous file. - Write your ci task. You should write your tasks in a stage which is between
aws_kube_setupandaws_kube_clean. - Set the
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYvariables in CI/CD settings of the projet
The aws_kube_setup job will expose different files as artifacts:
- The private key to connect to the VM using ssh:
aws-kube-ci/key - The associated public key:
aws-kube-ci/key.pub - A file to source to get the
user@hostnamein the env:aws-kube-ci/hostname. The env variable isinstance_hostname.
A docker registry is running on the VM once the setup is done. You can access it
using 127.0.0.1:5000. It could be useful if you need to build a local image
and use it in the Kubernetes cluster.
A simple .gitlab-ci.yml could be:
variables:
GIT_SUBMODULE_STRATEGY: recursive
TF_VAR_FILE: "$CI_PROJECT_DIR/variables.tfvars"
stages:
- aws_kube_setup
- my_stage
- aws_kube_clean
aws_kube_setup:
extends: .aws_kube_setup
job:
stage: my_stage
script:
- source aws-kube-ci/hostname
- ssh -i aws-kube-ci/key $instance_hostname "echo Hello world!"
- ssh -i aws-kube-ci/key $instance_hostname "docker push 127.0.0.1:5000/my_image"
dependencies:
- aws_kube_setup
aws_kube_clean:
extends: .aws_kube_clean
include:
project: nvidia/container-infrastructure/aws-kube-ci
file: aws-kube-ci.yml
ref: vX.Y