# Artifact Evaluation Instructions for "Acto: Push-Button End-to-End Testing of Kubernetes Operators and Controllers"

## Create experiment container

This container provides the following:

- One node of type "compute_skylake" ([see all types](https://chameleoncloud.readthedocs.io/en/latest/technical/reservations.html#chameleon-node-types))
- One public IP

### Configuration

Enter your project ID in the code block below, if you are not a member of `CHI-231080`.

In [1]:
import chi

chi.use_site("CHI@UC")
chi.set("project_name", "CHI-231080")

print(f'Using Project {chi.get("project_name")}')

Now using CHI@UC:
URL: https://chi.uc.chameleoncloud.org
Location: Argonne National Laboratory, Lemont, Illinois, USA
Support contact: help@chameleoncloud.org
Using Project CHI-231080


### Create reservation

Chameleon resources need to be reserved before they can be used. 
We will reserve one bare metal node and one public IP address, for right now.

If you get an error such as "no host availiable", it may be the case that all of our nodes are reserved. Check the availiablility calendar to see if this is true:
https://chi.uc.chameleoncloud.org/project/leases/calendar/host/

It may take around a minute or so for your lease to become active.

In [2]:
import os

USER = os.getenv('USER')

In [3]:
import os
import keystoneauth1, blazarclient
from chi import lease

reservations = []
lease_node_type = "compute_cascadelake_r"

try:
    print("Creating lease...")
    lease.add_fip_reservation(reservations, count=1)
    lease.add_node_reservation(reservations, node_type=lease_node_type, count=1)

    start_date, end_date = lease.lease_duration(hours=3)

    l = lease.create_lease(
        f"{os.getenv('USER')}-power-management", 
        reservations, 
        start_date=start_date, 
        end_date=end_date
    )
    lease_id = l["id"]

    print("Waiting for lease to start ...")
    lease.wait_for_active(lease_id)
    print("Lease is now active!")
except keystoneauth1.exceptions.http.Unauthorized as e:
    print("Unauthorized.\nDid set your project name and and run the code in the first cell?")
except blazarclient.exception.BlazarClientException as e:
    print(f"There is an issue making the reservation. Check the calendar to make sure a {lease_node_type} node is available.")
    print("https://chi.uc.chameleoncloud.org/project/leases/calendar/host/")
    print(e)
except Exception as e:
    print("An unexpected error happened.")
    print(e)

Creating lease...
Waiting for lease to start ...
Lease is now active!


### Provision bare metal node

Next, we will launch the reserved node with an image. 
It will take approximately 10 minutes for the bare metal node to be successfully provisioned. 

This step takes the longest. First, our controller node must configure the requested node, which first sets up a deploy image. This image then downloads and copies the real image onto the hard drive, and the node is configured to reboot to the new OS. 

You can browse the images we offer in our appliance catalog: http://chameleoncloud.org/appliances

In [4]:
from chi import server, lease

image = "CC-Ubuntu22.04"

s = server.create_server(
    f"{os.getenv('USER')}-power-management", 
    image_name=image,
    reservation_id=lease.get_node_reservation(lease_id)
)

print("Waiting for server to start ...")
server.wait_for_active(s.id)
print("Done")

Waiting for server to start ...
Done


In [5]:
floating_ip = lease.get_reserved_floating_ips(lease_id)[0]
with open("floating_ip.txt", "w") as f:
    f.write(f"{floating_ip}")
server.associate_floating_ip(s.id, floating_ip_address=floating_ip)

print(f"Waiting for SSH connectivity on {floating_ip} ...")
timeout = 60*2
import socket
import time
# Repeatedly try to connect via SSH.
start_time = time.perf_counter()
while True:
    try:
        with socket.create_connection((floating_ip, 22), timeout=timeout):
            print("Connection successful")
            break
    except OSError as ex:
        time.sleep(10)
        if time.perf_counter() - start_time >= timeout:
            print(f"After {timeout} seconds, could not connect via SSH. Please try again.")

Waiting for SSH connectivity on 192.5.87.208 ...
After 120 seconds, could not connect via SSH. Please try again.
After 120 seconds, could not connect via SSH. Please try again.
After 120 seconds, could not connect via SSH. Please try again.
Connection successful


## Setup environment in the node (~10 minute)

In [6]:
import os
from chi import ssh
import subprocess
import sys

subprocess.check_call([sys.executable, "-m", "pip", "install", "ansible"])
subprocess.run(["ansible-galaxy", "collection", "install", "ansible.posix"])
subprocess.run(["ansible-galaxy", "collection", "install", "community.general"])

with open("./ansible/ansible_hosts", mode="w") as f:
    f.write("{} ansible_connection=ssh ansible_user=cc ansible_port=22".format(floating_ip))
    
os.system("cd ./ansible && ansible-playbook -i ansible_hosts configure.yaml --key-file /work/.ssh/id_rsa")

Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'ansible.posix:1.5.4' to '/home/zhent6_illinois_edu/.ansible/collections/ansible_collections/ansible/posix'
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/ansible-posix-1.5.4.tar.gz to /home/zhent6_illinois_edu/.ansible/tmp/ansible-local-1294d8wt5hw/tmp8ul9hr_2
ansible.posix (1.5.4) was installed successfully
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'community.general:9.1.0' to '/home/zhent6_illinois_edu/.ansible/collections/ansible_collections/community/general'
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/community-general-9.1.0.tar.gz to /home/zhent6_illinois_edu/.ansible/tmp/ansible-local-130pewcxgzd/tmpj5m_a8mz
community.general (9.1.0) was installed successfully


# 192.5.87.208:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
# 192.5.87.208:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
# 192.5.87.208:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
# 192.5.87.208:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
# 192.5.87.208:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6



PLAY [Install everything] ******************************************************

TASK [Gathering Facts] *********************************************************
ok: [192.5.87.208]

PLAY [Mount file systems] ******************************************************

TASK [Gathering Facts] *********************************************************
ok: [192.5.87.208]

TASK [Mount tmpfs on docker path /var/lib/docker] ******************************
changed: [192.5.87.208]

TASK [Get the home directory] **************************************************
changed: [192.5.87.208]

TASK [create work dir] *********************************************************
changed: [192.5.87.208]

TASK [Change ownership of acto dir] ********************************************
changed: [192.5.87.208]

PLAY [Install go] **************************************************************

TASK [Gathering Facts] *********************************************************
ok: [192.5.87.208]

TASK [Remove old go] *****

If you need to use command because get_url or uri is insufficient you can add
ansible.cfg to get rid of this message.


changed: [192.5.87.208]

TASK [install kubectl] *********************************************************
changed: [192.5.87.208]

PLAY [Configure system inotify params] *****************************************

TASK [Gathering Facts] *********************************************************
ok: [192.5.87.208]

TASK [Configure fs.inotify.max_user_watches] ***********************************
changed: [192.5.87.208]

TASK [Configure fs.inotify.max_user_instances] *********************************
changed: [192.5.87.208]

TASK [Read parameter fs.inotify.max_user_watches value] ************************
changed: [192.5.87.208]

TASK [Read parameter fs.inotify.max_user_instances value] **********************
changed: [192.5.87.208]

TASK [Print configuration parameters] ******************************************
ok: [192.5.87.208] => {
    "msg": "fs.inotify.max_user_watches = 1048576 \n fs.inotify.max_user_instances = 1024"
}

PLAY [Install K3D] ********************************************

If you need to use command because get_url or uri is insufficient you can add
ansible.cfg to get rid of this message.


changed: [192.5.87.208]

PLAY [Install k9s] *************************************************************

TASK [Gathering Facts] *********************************************************
ok: [192.5.87.208]

TASK [download k9s tar.gz] *****************************************************
changed: [192.5.87.208]

TASK [extract k9s tar.gz] ******************************************************
changed: [192.5.87.208]

PLAY [Install htop] ************************************************************

TASK [Gathering Facts] *********************************************************
ok: [192.5.87.208]

TASK [install htop] ************************************************************
changed: [192.5.87.208]

PLAY RECAP *********************************************************************
192.5.87.208               : ok=62   changed=40   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   



CompletedProcess(args=['ansible-playbook', '-i', './ansible/ansible_hosts', './ansible/configure.yaml', '--key-file', '$HOME/work/.ssh/id_rsa'], returncode=0)

In [7]:
from chi import ssh

with ssh.Remote(floating_ip) as conn:
    conn.put("requirements.sh")
    conn.run("bash requirements.sh")



Defaulting to user installation because normal site-packages is not writeable
Collecting deepdiff~=6.3.0
  Using cached deepdiff-6.3.1-py3-none-any.whl (70 kB)
Collecting kubernetes==22.6.0
  Using cached kubernetes-22.6.0-py2.py3-none-any.whl (1.5 MB)
Collecting exrex~=0.11.0
  Using cached exrex-0.11.0-py2.py3-none-any.whl (23 kB)
Collecting jsonschema~=4.17.3
  Using cached jsonschema-4.17.3-py3-none-any.whl (90 kB)
Collecting jsonpatch~=1.33
  Using cached jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting pandas~=2.0.2
  Using cached pandas-2.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
Collecting PyYAML~=6.0
  Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
Collecting requests~=2.31.0
  Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting pytest~=7.4.0
  Using cached pytest-7.4.4-py3-none-any.whl (325 kB)
Collecting pydantic~=1.10.9
  Using cached pydantic-1.10.17-cp310-cp310-manylinux_2_17_



Successfully installed PyYAML-6.0.1 cachetools-5.3.3 charset-normalizer-3.3.2 coverage-7.5.4 deepdiff-6.3.1 exceptiongroup-1.2.1 exrex-0.11.0 google-auth-2.31.0 iniconfig-2.0.0 jsonpatch-1.33 jsonschema-4.17.3 kubernetes-22.6.0 numpy-2.0.0 ordered-set-4.1.0 packaging-24.1 pandas-2.0.3 pluggy-1.5.0 pydantic-1.10.17 pytest-7.4.4 pytest-cov-4.1.0 python-dateutil-2.9.0.post0 requests-2.31.0 requests-oauthlib-2.0.0 rsa-4.9 tabulate-0.9.0 tomli-2.0.1 typing-extensions-4.12.2 tzdata-2024.1 websocket-client-1.8.0
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy<1.24
  Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 61.1 MB/s eta 0:00:00
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.0.0
    Uninstalling numpy-2.0.0:
      Successfully uninstalled numpy-2.0.0




Successfully installed numpy-1.23.5
make: Entering directory '/home/cc/workdir/acto'
(cd acto/k8s_util/lib && make)
make[1]: Entering directory '/home/cc/workdir/acto/acto/k8s_util/lib'
go build -buildmode=c-shared -o k8sutil.so k8sutil.go


go: downloading k8s.io/apimachinery v0.24.0
go: downloading gopkg.in/inf.v0 v0.9.1
go: downloading github.com/gogo/protobuf v1.3.2


gcc test.c -o test ./k8sutil.so
make[1]: Leaving directory '/home/cc/workdir/acto/acto/k8s_util/lib'
(cd ssa && make)
make[1]: Entering directory '/home/cc/workdir/acto/ssa'
go build -buildmode=c-shared -o libanalysis.so ssa.go


go: downloading golang.org/x/tools v0.1.10
go: downloading github.com/goki/ki v1.1.8
go: downloading github.com/jinzhu/copier v0.3.2
go: downloading golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a
go: downloading golang.org/x/xerrors v0.0.0-20220517211312-f3a8303e98df
go: downloading golang.org/x/mod v0.6.0-dev.0.20220106191415-9b9b3d81d5e3


make[1]: Leaving directory '/home/cc/workdir/acto/ssa'
make: Leaving directory '/home/cc/workdir/acto'


## Run the experiment
Following the instructions, you will reproduce all the bugs (56 in total) that found by Acto and confirmed by developpers.

The process will take approximately 6 hours. 

In [8]:
from chi import ssh

with ssh.Remote(floating_ip) as conn:
    conn.put("start_acto.sh")
    print("Start reproducing all bugs...")
    print("Please wait 6 hours...")
    conn.run("bash start_acto.sh", disown=True)

Start reproducing all bugs...
Please wait 6 hours...


# Generate Results

The following commands can run independently and will gather all the results from the reproduction. It will generate Tables 5, 6, 7, and 8 of the paper.

In [1]:
from chi import ssh

with open("floating_ip.txt", "r") as f:
    floating_ip = f.read().strip()
    
with ssh.Remote(floating_ip) as conn:
    conn.get("./workdir/acto/table5.txt")
    conn.get("./workdir/acto/table6.txt")
    conn.get("./workdir/acto/table7.txt")
    
with open('table5.txt', 'r') as f:
    print("Table 5:\n" + f.read() + "\n")
    
with open('table6.txt', 'r') as f:
    print("Table 6:\n" + f.read() + "\n")
    
with open('table7.txt', 'r') as f:
    print("Table 7:\n" + f.read() + "\n")
    
with ssh.Remote(floating_ip) as conn:
    print("Table 8:")
    conn.run("cd ./workdir/acto/ && python3 collect_number_of_ops.py")

Table 5:
Operator         Undesired State    System Error    Operator Error    Recovery Failure    Total
-------------  -----------------  --------------  ----------------  ------------------  -------
CassOp                         2               0                 0                   2        4
CockroachOp                    3               0                 2                   0        5
KnativeOp                      1               0                 2                   0        3
OCK-RedisOp                    4               1                 3                   1        9
OFC-MongoDBOp                  3               1                 2                   2        8
PCN-MongoDBOp                  4               0                 0                   1        5
RabbitMQOp                     3               0                 0                   0        3
SAH-RedisOp                    2               0                 0                   1        3
TiDBOp                         