# OS Task Replication with GPT4
In this notebook I aim to replicate the results from the AgentBench paper


### 1. Setup
Note: this assumes Docker is installed locally. 

In [2]:
# check if docker is installed
!docker ps

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


In [7]:
# create conda env and install package
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


In [2]:
# build docker images for the os task
!docker pull mysql
!docker pull ubuntu
!docker build -f data/os_interaction/res/dockerfiles/default data/os_interaction/res/dockerfiles --tag local-os/default
!docker build -f data/os_interaction/res/dockerfiles/packages data/os_interaction/res/dockerfiles --tag local-os/packages
!docker build -f data/os_interaction/res/dockerfiles/ubuntu data/os_interaction/res/dockerfiles --tag local-os/ubuntu

Using default tag: latest
latest: Pulling from library/mysql
Digest: sha256:4a4e5e2a19aab7a67870588952e8f401e17a330466ecfc55c9acf51196da5bd0
Status: Image is up to date for mysql:latest
docker.io/library/mysql:latest
Using default tag: latest
latest: Pulling from library/ubuntu
Digest: sha256:3f85b7caad41a95462cf5b787d8a04604c8262cdcdf9a472b8c52ef83375fe15
Status: Image is up to date for ubuntu:latest
docker.io/library/ubuntu:latest
[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.1s (6/6) FINISHED                                 docker:default
[34m => [internal] load build definition from default                          0.0s
[0m[34m => => transferring dockerfile: 291B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/library/ubuntu:latest           0.0s
[0m[34m => [internal] load .dockerignore

### 2. Set API Key
Add your OpenAI API key to the confis/agents/openai-chat.yaml file. The line should read 
`Authorization: Bearer YOUR_KEY_HERE`.

### 3. Start task server
This step needs to be done in a _separate_ terminal window. Open a terminal, navigate to the repo directory and run 

`python -m src.start_task --auto-controller --start os-std 2`

Note: leave the terminal open.

### 4. Create assigner config file

In [16]:
import yaml


cfg = {
    "import": "definition.yaml",
    "concurrency": 
        {
        "task": {"os-std": 5},
        "agent": {"gpt-4-0613": 5}
        },
    "assignments": 
        [{"agent": "gpt-4-0613",
        "task": "os-std"
        }],
    "output": "outputs/{TIMESTAMP}" 
    
}

with open('./configs/assignments/custom.yaml', 'w') as outfile:
    yaml.dump(cfg, outfile, default_flow_style=False, sort_keys=False)

### 4. Start assigner
Once you have the task server running, you can start an assigner which coordinates tasks and agents:

In [23]:
!python -m src.assigner --config configs/assignments/custom.yaml

[93m    Agent: {'vicuna-13b', 'gpt-3.5-turbo-0613', 'vicuna-33b', 'text-davinci-003', 'vicuna-7b', 'text-davinci-002', 'wizard-30b'}[0m
[93m    Task: {'ltp-std', 'kg-std', 'cg-dev', 'webshop-dev', 'dbbench-dev', 'dbbench-std', 'cg-std', 'alfworld-dev', 'alfworld-std', 'avalon-dev-naive', 'ltp-dev', 'webshop-std', 'kg-dev', 'm2w-dev', 'os-dev', 'avalon-dev-single', 'm2w-std'}[0m
[92mcreating os-std client...[0m
TaskClient created: os-std (http://localhost:5000/api)
[96mMessage: 144 samples remaining.[0m
[96mAgent "gpt-4-0613" needs to run 1 tasks with total 144 samples:[0m
[96m    Task "os-std": 144[0m
Total:   0%|                                            | 0/144 [00:00<?, ?it/s]
                                                                                [A
[ARunning Count: 0                                                             
Total:   0%|                                            | 0/144 [00:00<?, ?it/s]
                                                    

#### Get the analysis and config file output and print it 

In [40]:
import os
import re

# get last modified directory in case there are multiple output directories
last_modified = None
last_modified_folder = None

for root, dirs, files in os.walk(os.path.join(os.getcwd(), "outputs")):

    for dir_name in dirs:
            dir_path = os.path.join(root, dir_name)
            modified_time = os.path.getmtime(dir_path)
            if last_modified is None or modified_time > last_modified:
                last_modified = modified_time
                last_modified_folder = dir_path


print("------------------------")
print("overall.json")
print("------------------------")
print(open("".join([last_modified_folder, "/overall.json"]), "r").read())

print("------------------------")
print("config.yaml")
print("------------------------")
with open(os.path.abspath(os.path.join(last_modified_folder, "..", "..", "config.yaml"))) as f:
     lines = f.readlines()
     for line in lines:
        # hide the key but print the rest of the config
        if "Authorization" not in line:
            print(line, end="")


------------------------
overall.json
------------------------
{
    "total": 144,
    "validation": {
        "running": 0.0,
        "completed": 0.5763888888888888,
        "agent context limit": 0.4236111111111111,
        "agent validation failed": 0.0,
        "agent invalid action": 0.0,
        "task limit reached": 0.0,
        "unknown": 0.0,
        "task error": 0.0,
        "average_history_length": 10.25,
        "max_history_length": 20,
        "min_history_length": 8
    },
    "custom": {
        "overall": {
            "total": 144,
            "pass": 34,
            "wrong": 110,
            "acc": 0.2361111111111111
        }
    }
}
------------------------
config.yaml
------------------------
assignments:
- agent: gpt-4-0613
  task: os-std
concurrency:
  agent:
    gpt-4-0613: 5
  task:
    os-std: 5
definition:
  agent:
    gpt-4-0613:
      module: src.client.agents.HTTPAgent
      parameters:
        body:
          max_tokens: 512
          model: gpt-4-061