# Using Ollama with Cua (Docker Edition)

This notebook demonstrates multiple ways to use Ollama with the Cua ComputerAgent, mirroring the structure of `notebooks/sota_hackathon.ipynb` while running the computer inside Docker.

We'll cover three patterns:

1. Use an all-in-one CUA model served by Ollama (e.g. `model="ollama/blaifa/InternVL3_5:8b"`).
2. Use a strong CUA grounding model composed with an Ollama VLM (e.g. `model="openai/computer-use-preview+ollama/gemma3:4b"`).
3. Conceptual: different ways to customize/extend your agent (link + outline only).


## 💻 Prerequisites

The easiest way to get started is by getting set up with the Cua development repository.

Install [Docker](https://www.docker.com/products/docker-desktop/) and [pdm](https://pdm-project.org/en/latest/#recommended-installation-method)

Clone the Cua repository:

`git clone https://github.com/trycua/cua`

Install the project dependencies:

`cd cua && pdm install`

Now, you should be able to run the `notebooks/hud_hackathon.ipynb` notebook in VS Code with the `.venv` virtual environment selected.

## 🔑 Environment Setup (.env)

Create a `.env` file with your API keys. You can use any provider keys that you plan to compose. For example, if composing with OpenAI or Anthropic, add those keys too.

Add these entries as needed (empty values are fine if not used):

- `OPENAI_API_KEY` (if composing with OpenAI)
- `ANTHROPIC_API_KEY` (if composing with Anthropic)
- `OLLAMA_API_BASE` (defaults to `http://localhost:11434`)

Note: For Cua Cloud computers, you would also set `CUA_API_KEY` and `CUA_CONTAINER_NAME`, but this notebook uses Docker for the computer.


In [None]:
# Create a .env template if it doesn't exist
ENV_TEMPLATE = """# Optional environment variables for composition:
OPENAI_API_KEY=
ANTHROPIC_API_KEY=

# Ollama endpoint (default shown)
OLLAMA_API_BASE=http://localhost:11434
"""

from pathlib import Path
if not Path('.env').exists():
    Path('.env').write_text(ENV_TEMPLATE)
    print('A .env file was created! Fill in the empty values you need.')
else:
    print('.env already exists')


In [None]:
# Load .env into environment
import os
from dotenv import load_dotenv
load_dotenv(dotenv_path='.env', override=True)
print('OPENAI_API_KEY set:', bool(os.getenv('OPENAI_API_KEY')))
print('ANTHROPIC_API_KEY set:', bool(os.getenv('ANTHROPIC_API_KEY')))
print('OLLAMA_API_BASE:', os.getenv('OLLAMA_API_BASE', 'http://localhost:11434'))


## 🐳 Run Ollama via Docker (recommended)

If you don't already have Ollama running locally, you can run it with Docker. 
Run the following command in your terminal (outside the notebook):

```bash
docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama \
  ollama/ollama:latest
```

Then pull any models you need, for example (terminal):

```bash
docker exec -it ollama ollama pull gemma3:4b
docker exec -it ollama ollama pull blaifa/InternVL3_5:8b
```

Make sure your `OLLAMA_HOST` points to `http://localhost:11434`.


## 🖥️ Launch a Docker Computer

We'll run the computer using the Cua Docker provider.
You can watch the live VNC stream at `http://localhost:8006/`.


In [None]:
import logging
from computer import Computer, VMProviderType
import webbrowser

computer = Computer(
    os_type="linux",
    provider_type=VMProviderType.DOCKER,
    verbosity=logging.INFO
)
await computer.run()

# Optional: open the VNC page in your browser
webbrowser.open('http://localhost:8006/', new=0, autoraise=True)


## 1) All-in-one CUA model via Ollama

Some community models on Ollama are trained for computer use end-to-end.
Point the agent's model to an Ollama-served model using the `ollama/` prefix.

Example: `model="ollama/blaifa/InternVL3_5:8b"`.


In [None]:
import logging
from pathlib import Path
from agent import ComputerAgent

agent_all_in_one = ComputerAgent(
    model="ollama/blaifa/InternVL3_5:8b",
    tools=[computer],
    trajectory_dir=str(Path('trajectories')),
    only_n_most_recent_images=3,
    verbosity=logging.INFO,
    # instructions="You are a helpful assistant." # Editable instructions for prompt engineering
)

print('Running all-in-one Ollama CUA model...')
async for _ in agent_all_in_one.run("Open the web browser and go to example.com"):
    pass
print('✅ Done')


## 2) Compose a strong CUA UI grounding model with an Ollama VLM

You can compose a UI grounding (element localization) model with a local Ollama VLM (reasoning + tool-use) for planning.
Use a `+ollama/<model>` suffix to compose.

Examples:
- `openai/computer-use-preview+ollama/gemma3:4b`
- `anthropic/claude-3-5-sonnet-20241022+ollama/gemma3:4b`


In [None]:
from agent import ComputerAgent
import logging

agent_composed = ComputerAgent(
    model="anthropic/claude-3-5-sonnet-20241022+ollama/gemma3:4b",
    tools=[computer],
    trajectory_dir='trajectories',
    only_n_most_recent_images=3,
    verbosity=logging.INFO,
)

print('Running composed agent (OpenAI grounding + Ollama VLM)...')
async for _ in agent_composed.run("Open a text editor and type: Hello from composed model!"):
    pass
print('✅ Done')


## 3) Customize your agent 🛠️

For a few customization options, see: https://docs.trycua.com/docs/agent-sdk/customizing-computeragent

Levels of customization you can explore:

1) Simple — Prompt engineering
2) Easy — Tools
3) Intermediate — Callbacks
4) Expert — Custom agent via `register_agent` (see `libs/python/agent/agent/decorators.py` → `register_agent`)

or, incorporate the ComputerAgent into your own agent framework!


## ✅ Summary

- You ran the computer in Docker via the Cua Docker provider and viewed it at `http://localhost:8006/`.
- You tried two runnable ways to leverage Ollama and reviewed a conceptual path to go further:
  - All-in-one computer-use model served by Ollama.
  - A composed agent using a strong grounding model + an Ollama VLM.
  - A link + outline for further customization paths (prompting, tools, callbacks, custom agent via `register_agent`).

Explore more configurations and models in the Cua docs.
