# Deep Research Agent — Quick Tutorial

This notebook shows how to run the lightweight training scaffold, call the FastAPI agent, and use the Streamlit UI. Keep a terminal handy for background services.


## 1) Environment & Imports

Install dependencies in the project's virtual environment and import key libraries.

```bash
python -m venv .venv
.venv\Scripts\activate
python -m pip install -r requirements.txt
```

Python checks (run in a cell):

In [None]:
```python
import sys
import platform
import torch
import requests

print('Python', sys.version)
print('Platform', platform.platform())
print('Torch', torch.__version__)
print('Requests', requests.__version__)
```

## 2) Project Configuration (VSCode)

- Use the integrated terminal to run the Streamlit app and FastAPI backend.
- Sample `tasks.json` / `launch.json` are included in the repo's README — use them to run/debug the server and notebook cells.


## 3) Data Loading

The repository stores run history in `runs.json`. We can load and preview recent runs.

```python
import json
from pathlib import Path
p = Path('runs.json')
if p.exists():
    runs = json.loads(p.read_text(encoding='utf-8'))
    print('Total runs:', len(runs))
    display(runs[:3])
else:
    print('No runs.json found (run the UI to create runs).')
```

## 4) Data Preprocessing

Add any preprocessing steps you need for your dataset. For the synthetic dataset used by the trainer no preprocessing is required. Example placeholder:

```python
def preprocess_text(s: str):
    s = s.lower().strip()
    return s

# example
print(preprocess_text('  Hello WORLD  '))
```

## 5) Core Functions / Agent API Example

POST to the FastAPI `/run` endpoint to run a topic synchronously.

```python
import requests
url = 'http://127.0.0.1:8000/run'
resp = requests.post(url, json={'topic': 'Compare NVLink topologies'})
print(resp.status_code)
print(resp.json())
```

WebSocket example (async) to receive streaming messages from `/ws/run`:

```python
import asyncio, websockets, json

async def ws_run(topic):
    uri = 'ws://127.0.0.1:8000/ws/run'
    async with websockets.connect(uri) as ws:
        await ws.send(json.dumps({'topic': topic}))
        try:
            async for msg in ws:
                print('MSG:', msg)
        except Exception:
            pass

# asyncio.run(ws_run('Compare NVLink topologies'))
```


## 6) Processing Pipeline / Model Training

Run the tiny training scaffold from the repo. Example - **dry run** (quick):

```bash
python train/train.py --dry-run
```

Run a short training run and view TensorBoard:

```bash
python train/train.py --epochs 3 --batch-size 16
tensorboard --logdir runs --port 6006
```


## 7) Evaluation & Metrics

For the tiny LM we track training/validation loss. For the agent (LLM-based) evaluation, collect human ratings for relevance and completeness.

(Example evaluation code would go here if you have labeled data.)

## 8) Visualization

You can open TensorBoard to inspect training curves (see above). For agent outputs use `display()` or `print()` for quick inspection inside the notebook.

```python
# show last saved run report (if exists)
from pathlib import Path
p = Path('runs.json')
if p.exists():
    import json
    runs = json.loads(p.read_text())
    from IPython.display import Markdown
    if runs:
        display(Markdown(runs[0]['report'][:1000]))
else:
    print('No runs to show.')
```

## 9) Unit Tests & Test Runner

Add `pytest` tests in `tests/` for core functions. Example quick test to validate dataset shapes:

```python
# tests/test_dataset.py
from train.dataset import SyntheticSeqDataset

def test_dataset_len_and_shape():
    ds = SyntheticSeqDataset(num_samples=10, seq_len=16, vocab_size=100)
    x,y = ds[0]
    assert len(ds) == 10
    assert x.shape[0] == 16
    assert y.shape[0] == 16
```

Run tests:
```bash
python -m pytest -q
```

## 10) Save, Export & Reproducibility

Save artifacts with `torch.save` or `joblib` and record versions and seed.

```python
import json, torch, random, numpy as np
meta = {
    'seed': 42,
}
random.seed(meta['seed'])
np.random.seed(meta['seed'])
torch.manual_seed(meta['seed'])

# Example: load a checkpoint
from pathlib import Path
ck = Path('checkpoints/ckpt_epoch_1.pt')
if ck.exists():
    d = torch.load(ck)
    print('Loaded checkpoint, epoch=', d.get('epoch'))
else:
    print('No checkpoint found.')

# Save experiment metadata
Path('experiment_meta.json').write_text(json.dumps({'seed': meta['seed']}))
```