Note: this page is going to be extended.
If you need to test Knative functions in stock Knative environment (i.e., containers) or in gVisor MicroVMs instead of Firecracker MicroVMs, use the following commands to set up the environment.
git clone https://github.com/vhive-serverless/vhive
cd vhive
./scripts/install_go.sh; source /etc/profile # or install Go manually
pushd scripts && go build -o setup_tool && popd && mv scripts/setup_tool .
./setup_tool setup_node [stock-only|gvisor|firecracker]
sudo containerd
./setup_tool create_one_node_cluster [stock-only|gvisor|firecracker]
# wait for the containers to boot up using
watch kubectl get pods -A
# once all the containers are ready/complete, you may start Knative functions
kn service apply
To deploy a function in the stock or gVisor execution environment, please create and use your own YAML files following this example
./scripts/github_runner/clean_cri_runner.sh stock-only
You can use the image to build/test/develop vHive inside a kind container. This image is preconfigured to run a single node Kubernetes cluster inside a container and contains packages to setup vHive on top of it.
# Set up the host (the same script as for the self-hosted GitHub CI runner)
./scripts/github_runner/setup_integ_runners_host.sh
# pull latest image
docker pull vhiveease/vhive_dev_env
# Start a container
kind create cluster --image vhiveease/vhive_dev_env
Before running a cluster, one might need to install additional tools, e.g., Golang, and check out the vHive repository manually.
# Enter the container
docker exec -it <container name> bash
# Inside the container, create a single-node cluster
./setup_tool create_one_node_cluster [stock-only]
Notes:
When running a vHive, or stock Knative, cluster inside a kind container, one should not run setup scripts but start the daemon(s) and create the cluster right away.
Currently, with Firecracker, running only a single-node cluster is supported (Issue raised). Running a multi-node cluster with stock Knative should work but is not tested.
# list all kind clusters
kind get clusters
# delete a cluster
kind delete cluster --name <name>
We also offer self-hosted stock-Knative environments powered by KinD. To be able to use them, follow the instructions below:
- Set
jobs.<job_id>.runs-on
tostock-knative
. - For your GitHub workflow, define
TMPDIR
environment variable in your manifest:env: TMPDIR: /root/tmp
- As the first step of all jobs, "create
TMPDIR
if not exists":jobs: my-job: name: My Job runs-on: [stock-knative] steps: - name: Setup TMPDIR run: mkdir -p $TMPDIR
- As the first step of all jobs, "create
- Make sure to clean-up and wait for it to end! This varies for each workload, but below are some examples:
jobs: my-job: name: My Job runs-on: [stock-knative] steps: # ... name: Cleaning if: ${{ always() }} run: | # ...
- If you have used
kubectl apply -f ...
then usekubectl delete -f ...
- If you have used
kn service apply
then usekn service delete -f ... --wait
- If you have used
Assuming you rented a node using the vHive CloudLab profile:
- Setup the node for the desired sandbox:
./setup_tool setup_node [firecracker|gvisor]
- Setup the CRI test environment for the desired sandbox:
./scripts/github_runner/setup_cri_test_env.sh [firecracker|gvisor]
- Run CRI tests:
source /etc/profile && go clean -testcache && go test ./cri -v -race -cover
- Cleanup:
./scripts/github_runner/clean_cri_runner.sh [firecracker|gvisor]
-
vHive supports both vanilla Firecracker snapshots. Our advanced Record-and-Prefetch (REAP) snapshots feature for the latest Firecracker is currently disabled (see GH-807), but it is available for its older version in the legacy branch. We are also working on supporting remote Firecracker snapshots (GH-823).
-
vHive integrates with Kubernetes and Knative via its built-in CRI support. Currently, only Knative Serving is supported.
-
vHive supports arbitrary distributed setup of a serverless cluster.
-
vHive supports arbitrary functions deployed with OCI (Docker images).
-
vHive has robust Continuous-Integration and our team is committed to deliver high-quality code.
# create a folder in the local storage (on <MINIO_NODE_NAME> that is one of the Kubernetes nodes)
sudo mkdir -p <MINIO_PATH>
cd ./configs/storage/minio
# create a persistent volume (PV) and the corresponding PV claim
# specify the node name that would host the MinIO objects
# (use `hostname` command for the local node)
MINIO_NODE_NAME=<MINIO_NODE_NAME> MINIO_PATH=<MINIO_PATH> envsubst < pv.yaml | kubectl apply -f -
kubectl apply -f pv-claim.yaml
# create a storage app and the corresponding service
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl delete deployment minio-deployment
kubectl delete pvc minio-pv-claim
kubectl delete svc minio-service
kubectl delete pv minio-pv
Note that files in the bucket persist in the local filesystem after a persistent volume removal.
Currently, vHive supports two modes of operation that enable different types of performance analysis:
-
Distributed setup. Allows analysis of the end-to-end performance based on the statistics provided by the invoker client.
-
Single-node setup. A test that is integrated with vHive-CRI orchestrator via a programmatic interface allows to analyze latency breakdown of boot-based and snapshot cold starts, using detailed latency and memory footprint metrics.
Knative function call requests can now be traced & visualized using zipkin. Zipkin is a distributed tracing system featuring easy collection and lookup of tracing data. Here are some useful commands (there are plenty of Zipkin tutorials online):
- Setup Zipkin with
./setup_tool setup_zipkin
- Once the zipkin container is running, start the dashboard using
istioctl dashboard zipkin
. - To access requests remotely, run
ssh -L 9411:127.0.0.1:9411 <Host_IP>
for port forwarding. - Go to your browser and enter localhost:9411 for the dashboard.
-
vHive uses firecracker binaries that are built using the
firecracker-v1.4.1-vhive-integration
branch of our fork of the upstream repository (see the official firecracker-containerd getting started guide for details on the building process). Currently, we are in the process of upstreaming VM snapshots support to the upstream repository. -
Current Firecracker version is 1.4.1, Knative 1.9, Kubernetes 1.25.3, gVisor 20210622.0, and Istio 1.16.0. We plan to keep our code loosely up to date with the upstream Firecracker repository.
-
vHive uses a fork of kind to speed up testing environment setup requiring Kubernetes.
-
Current eStargz version is 0.13.0.
Knative functions can use GPU although only stock-only
mode is supported.
Follow the guide to setup stock knative.
./setup_tool setup_node stock-only
The script will install NVIDIA CUDA Driver and assume there’s no NVIDIA driver currently running.
You can use the script provided if the install of containerd is using our script or manually edit the containerd settings following NVIDIA's official document.
The script has been tested on ubuntu20.04, with GPU including NVIDIA A100, V100 or P100.
./setup_tool setup_nvidia_gpu
sudo screen -dmS containerd containerd; sleep 5;
./setup_tool create_one_node_cluster stock-only
Using helm to install the NVIDIA Device plugin after all pods are running or completed.
helm install --generate-name -n nvidia-device-plugin --create-namespace nvdp/nvidia-device-plugin
At this point, all pods should be successfully deployed.
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
nvidia-device-plugin nvidia-device-plugin-1684892866-rtgk9 1/1 Running 0 80s
And you can use the gpu-pod.yaml to test whether the NVIDIA Device Plugin works
kubectl apply -f ./configs/gpu/gpu-pod.yaml
After that, check the log to contain:
$ kubectl logs gpu-operator-test
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Using GPU in Knative is simple and similar as for a regular Kubernetes service. The only change is to add the limits of GPU to the YAML file.
resources:
limits:
nvidia.com/gpu: 1
You can also deploy our example container, which is a Golang function that calls and returns the output of nvidia-smi
.
kn service apply -f ./configs/gpu/gpu-function.yaml
Once the service has been deployed successfully, you can call it and check it's response.
curl "$(kn service describe hello-gpu -o URL)"
Hello GPU!
Wed May 24 02:04:40 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
+---------------------------------------------------------------------------------------+