# Computer-Use Agents SOTA Challenge

Congrats on joining the Cua + HUD hackathon at Hack The North 2025!

This notebook will show you how to create a computer use agent with Cua and evaluate it using HUD.

## ☁️ Connect to cloud services

Create Cua and HUD accounts and load your API keys. 

1. Create a Cua account at https://www.trycua.com/
2. Start a Cua container at https://www.trycua.com/dashboard/containers
3. Create a HUD account at https://www.hud.so/
4. Create a .env file:

In [None]:
# Create a .env file if it doesn't exist

ENV_TEMPLATE = """# Required environment variables:
CUA_API_KEY=
CUA_CONTAINER_NAME=
HUD_API_KEY=

# Any LLM provider will work:
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
"""

import os
if not os.path.exists(".env"):
    open(".env", "w").write(ENV_TEMPLATE)
    print("A .env file was created! Fill in the empty values.")

5. Fill in all missing values in the .env file

In [None]:
# Read the .env file
# HUD requires the .env file to be in the same directory

from dotenv import load_dotenv
load_dotenv(dotenv_path='.env', override=True)

## 🤖 Create a Computer Use Agent

Create and run a computer use agent using the Cua SDK.

Connect to your running Cua container using the Cua SDK and initialize an agent.

In [None]:
import logging
import os
from pathlib import Path
from agent import ComputerAgent
from computer import Computer, VMProviderType

api_key = os.getenv("CUA_API_KEY")
container_name = os.getenv("CUA_CONTAINER_NAME")
assert api_key and container_name

# Connect to your existing cloud container
computer = Computer(
    os_type="linux",
    provider_type=VMProviderType.CLOUD,
    api_key=api_key,
    name=container_name,
    verbosity=logging.INFO
)

agent_config = {
    "model": "openai/computer-use-preview",
    "tools": [computer],
    "trajectory_dir": str(Path("trajectories")),
    "only_n_most_recent_images": 3,
    "verbosity": logging.INFO
}

# Create agent
agent = ComputerAgent(**agent_config)

Try running the computer use agent on a simple task.

Trajectories are saved in the format: `trajectories/YYYY-MM-DD_computer-use-pre_XXX`.

To view a replay of the agent's actions, upload the trajectory to the [trajectory viewer](https://www.trycua.com/trajectory-viewer).

You can also connect to an agent through VNC on the [Cua Dashboard](https://www.trycua.com/dashboard).

In [None]:
tasks = [
    "Look for a repository named trycua/cua on GitHub."
]

for i, task in enumerate(tasks):
    print(f"\nExecuting task {i}/{len(tasks)}: {task}")
    async for result in agent.run(task):
        print(result)
        pass

    print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")

## 🧐 Evaluate the Agent with HUD

Test your agent's performance on a selection of tasks from the OSWorld benchmark.

In [None]:
import uuid
from pprint import pprint
from agent.integrations.hud import run_full_dataset

# Full dataset evaluation (runs via HUD's run_dataset under the hood)
job_name = f"osworld-test-{str(uuid.uuid4())[:4]}"

results = await run_full_dataset(
    dataset="ddupont/OSWorld-Tiny-Public",          # You can also pass a Dataset or a list[dict]
    job_name=job_name,                   # Optional; defaults to a timestamp for custom datasets
    **agent_config,
    max_concurrent=20,                   # Tune to your infra
    max_steps=50,                        # Safety cap per task
    #split="train[:5]"                   # Limit to just 5 tasks
)

# results is a list from hud.datasets.run_dataset; inspect/aggregate as needed
print(f"Job: {job_name}")
print(f"Total results: {len(results)}")
pprint(results[:3])

# 🦾 Improve your Agent

To improve your agent for OSWorld-Verified, experiment with different models and add custom tools that fit your use case. You can also dive into the ComputerAgent source code to design an improved version or subclass tailored to your needs.

Learn more about [Customizing Your ComputerAgent](https://docs.trycua.com/docs/agent-sdk/customizing-computeragent) in the docs.