This CI tool aims to create a virtual machine on AWS with a GPU enabled Kubernetes master/node. It uses Terraform, Docker, Kubeadm, the driver container and the device plugin to setup the VM.
WARNING: This tool is in alpha version
This repository can be used to create and configure a development instance that mirrors the configuration used in CI. Run
terraform init
To initialize the local terraform environment the variables for the plan
, apply
, and destroy
commands are explicitly overridden.
This can be done using the command line as shown below, or by creating a .local-dev.auto.tfvars
file with ther required overrides as terraform automatically includes *.auto.tfvars
files.
If running outside of the corp VPN, you need to add your IP to the list ingress_ip_ranges
at the local-dev.tfvars
file first then, Run plan
to check that everything is ready:
terraform plan \
-var "region=us-east-2" \
-var "key_name=elezar" \
-var "private_key=/Users/elezar/.ssh/elezar.pem" \
-var-file=local-dev.tfvars
will preview the changes that will be applied. Note this assumes that valid AWS credentials have been configured. This also assumes that an AWS key has already been created with a name elezar
using the public key associated with the private key /Users/elezar/.ssh/elezar.pem
, so please change accordingly, then create the instance by running apply
:
terraform apply \
-auto-approve \
-var "region=us-east-2" \
-var "key_name=elezar" \
-var "private_key=/Users/elezar/.ssh/elezar.pem" \
-var-file=local-dev.tfvars
Will proceed to create the required resources.
Once this is complete, the instance hostname can be obtained using:
export instance_hostname=$(terraform output -raw instance_hostname)
And assuming that the private key specified during creation is added to the ssh agent:
eval $(ssh-agent)
ssh-add /Users/elezar/.ssh/elezar.pem
running:
ssh ${instance_hostname}
should connect to the created instance. Alternatively, the identity can be explicitly specified:
ssh -i /Users/elezar/.ssh/elezar.pem ${instance_hostname}
To start using the Kubernetes cluster first we retrieve the Kubeconfig file:
scp -i /Users/elezar/.ssh/elezar.pem ${instance_hostname}:/home/ubuntu/.kube/config kubeconfig
Now we can use the kubernetes clusters from our host:
kubectl --kubeconfig kubeconfig get node
ip-10-0-0-14 Ready control-plane,master 7m51s v1.23.10
In order to remove the created resources, run:
terraform destroy \
-auto-approve \
-var "region=us-east-2" \
-var-file=local-dev.tfvars
By using the container_runtime
variable the provisioned node can be set up to run either docker or
containerd as the container runtime. The default is docker
and can be changed using the -var "legacy_setup=false" -var "container_runtime=containerd"
command line arguments when running terraform apply
(or destroy
).
To use this CI tool, you need to:
- Place this directory at the root of your repo, with
aws-kube-ci
as name (you may want to use submodules). For example:
git submodule add https://gitlab.com/nvidia/container-infrastructure/aws-kube-ci.git
- In your
.gitlab-ci.yml
, define the stagesaws_kube_setup
andaws_kube_clean
- In your
.gitlab-ci.yml
, include theaws-kube-ci.yml
file. It is strongly recommended to include a specific version. The version must be a git ref. For example:
include:
project: nvidia/container-infrastructure/aws-kube-ci
file: aws-kube-ci.yml
ref: vX.Y
- In your
.gitlab-ci.yml
, extends the.aws_kube_setup
and.aws_kube_clean
jobs. For example:
aws_kube_setup:
extends: .aws_kube_setup
aws_kube_clean:
extends: .aws_kube_clean
Note that we include project rather than file here because gitlab doesn't support including local files from submodules
- Write a terraform variable file with these variables:
instance_type
: The AWS instance typeproject_name
: The name of your project
For example:
instance_type = "g2.2xlarge"
project_name = "my-project"
- In your .gitlab-ci.yml, set the
TF_VAR_FILE
to the path of the previous file. - Write your ci task. You should write your tasks in a stage which is between
aws_kube_setup
andaws_kube_clean
. - Set the
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
variables in CI/CD settings of the projet
The aws_kube_setup
job will expose different files as artifacts:
- The private key to connect to the VM using ssh:
aws-kube-ci/key
- The associated public key:
aws-kube-ci/key.pub
- A file to source to get the
user@hostname
in the env:aws-kube-ci/hostname
. The env variable isinstance_hostname
.
A docker registry is running on the VM once the setup is done. You can access it
using 127.0.0.1:5000
. It could be useful if you need to build a local image
and use it in the Kubernetes cluster.
A simple .gitlab-ci.yml
could be:
variables:
GIT_SUBMODULE_STRATEGY: recursive
TF_VAR_FILE: "$CI_PROJECT_DIR/variables.tfvars"
stages:
- aws_kube_setup
- my_stage
- aws_kube_clean
aws_kube_setup:
extends: .aws_kube_setup
job:
stage: my_stage
script:
- source aws-kube-ci/hostname
- ssh -i aws-kube-ci/key $instance_hostname "echo Hello world!"
- ssh -i aws-kube-ci/key $instance_hostname "docker push 127.0.0.1:5000/my_image"
dependencies:
- aws_kube_setup
aws_kube_clean:
extends: .aws_kube_clean
include:
project: nvidia/container-infrastructure/aws-kube-ci
file: aws-kube-ci.yml
ref: vX.Y