# Clarification: Notebook scope & requirements

Before I generate the full Colab runbook, please confirm the following so I tailor the notebook precisely for your needs.

- Is the notebook name `Colab_Run_FarmFederate.ipynb` (I created it) or did you mean the single-letter `y` as the notebook name/topic?
- Programming language: **Python 3.10+** (confirm or specify another version)
- Dataset/source: use repository files and synthetic high-contrast generator (default) or point to external datasets (e.g., PlantVillage, IP102)? If external, provide paths or links.
- Primary goals (select all that apply): EDA / preprocessing / train smoke / full training on GPU / Qdrant demo / plotting / RAG demo.
- Required libraries: confirm I should install from `requirements.txt` plus `qdrant-client`, `sentence-transformers`, and `faiss-cpu`.
- Desired outputs: plots in `plots/`, results in `results/`, checkpoints in `checkpoints/` (optionally persisted to Google Drive path).
- Target audience: beginner / intermediate / advanced (affects explanations and comments).
- Include unit tests and a short VS Code integration guide? (yes/no)
- Kernel / Python version to use on Colab (default: latest Colab Python; confirm if you need a specific version).
- Constraints: maximum runtime (e.g., smoke only vs full long-run), use Colab Pro (yes/no), prefer single-GPU.

Please reply with the answers (concise) and any extra notes you want included in the notebook (e.g., specific flags: `--use-qdrant`, `--checkpoint-dir`, custom epochs).

# FarmFederate — Colab Runbook (GPU)

This notebook prepares and runs the full FarmFederate pipeline on Google Colab (GPU runtime). It includes:

- Environment setup and dependency installation
- Optional Google Drive mounting for checkpoints and outputs
- Quick smoke test (fast verification)
- Full training run (GPU) with Qdrant integration option

Prerequisites:
- This notebook assumes the repository root contains `FarmFederate_Colab.py` (it will clone from GitHub if not present).
- Target audience: intermediate users familiar with Colab and basic ML workflows.

Primary goals: full training on Colab (training, evaluation, plots, optional Qdrant demo).


## 1) Set runtime to GPU ⚙️

Runtime → Change runtime type → **GPU** (e.g., Tesla T4/P100/V100). Use Colab Pro/Pro+ for longer unbroken sessions.

In [None]:
# 2) Clone the repository (choose one option)

# Public repo (fast):
!git clone https://github.com/Solventerritory/FarmFederate-Advisor.git || true
%cd FarmFederate-Advisor

# OR: Upload a zip to Colab and unzip, or mount Drive and copy files into the working directory.

In [None]:
# 3) Install dependencies
!pip install -r requirements.txt
!pip install qdrant-client sentence-transformers faiss-cpu

# NOTE: If CUDA/Torch seems unavailable after install, restart the runtime (Runtime -> Restart runtime).


In [None]:
# 4) Mount Google Drive (optional) to persist checkpoints & outputs
from google.colab import drive
drive.mount('/content/drive')

# Example: set env var for checkpoint dir
# %env CHECKPOINT_DIR=/content/drive/MyDrive/FarmFederate/checkpoints


In [None]:
# 5) GPU / CUDA check
import torch
print('PyTorch version:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
print('Device count:', torch.cuda.device_count())
if torch.cuda.is_available():
    print('Device name:', torch.cuda.get_device_name(0))


## 6) Optional: Qdrant Cloud setup

If you plan to use `--use-qdrant`, create a Qdrant Cloud project and store the endpoint and API key as environment variables in Colab, for example:

```python
import os
os.environ['QDRANT_URL'] = 'https://your-qdrant-xxxx.a.region.qdrant.cloud'
os.environ['QDRANT_API_KEY'] = 'your_api_key_here'
```

Alternatively pass `--qdrant-url` / `--qdrant-api-key` flags to the script if supported.

In [None]:
# 7) Run script setup (installs, checks)
!python FarmFederate_Colab.py --setup


In [None]:
# 8) Verify script version/header before a full run
print('=== FarmFederate_Colab.py header ===')
with open('FarmFederate_Colab.py','r',encoding='utf-8') as f:
    for i, line in enumerate(f):
        if i>=20:
            break
        print(line.rstrip())


In [None]:
# 9) Quick smoke test (fast verification)
!python FarmFederate_Colab.py --auto-smoke --smoke-samples 10


### LLM-only smoke (very fast)
Run a small 2-epoch LLM-only smoke to quickly verify the text-model training loop and capture logs. This is useful if you only want to validate LLM training without running full multi-model smoke.

In [None]:
# 9.1) Run LLM-only smoke (2 epochs) and save logs
python scripts/smoke_train_llm.py 2>&1 | tee smoke_llm.log
# Optional: copy logs to Google Drive if mounted
cp smoke_llm.log /content/drive/MyDrive/FarmFederate/logs/smoke_llm.log || true


In [None]:
# 10) Show results/plots produced by the smoke run (if any)
import os
from IPython.display import display, Image

print('\n== results/ contents ==')
for root, dirs, files in os.walk('results'):
    for f in files:
        print(os.path.join(root,f))
    break

print('\n== plots/ thumbnails ==')
if os.path.isdir('plots'):
    imgs = [os.path.join('plots',p) for p in os.listdir('plots') if p.lower().endswith(('.png','.jpg','.jpeg'))]
    for img in imgs[:6]:
        display(Image(img, width=420))
else:
    print('No plots/ found yet.')


In [None]:
# 11) Full training (long run) — recommended flags for v7.0
# Use Google Drive path for checkpoints if mounted. This cell runs the full pipeline (training, evaluation, plots, results) and saves artifacts to Drive if mounted.
%env CHECKPOINT_DIR=/content/drive/MyDrive/FarmFederate/checkpoints
%env QDRANT_URL=${QDRANT_URL:-}
%env QDRANT_API_KEY=${QDRANT_API_KEY:-}
# Ensure checkpoint path exists
import os
os.makedirs('/content/drive/MyDrive/FarmFederate/checkpoints', exist_ok=True)

# 11.1) Install (again) to ensure optional deps are present then run full training
!pip install -r requirements.txt
!pip install qdrant-client sentence-transformers faiss-cpu

# 11.2) Run the full training (this will take a long time on Colab-Free)
python FarmFederate_Colab.py --setup --train --epochs 12 --max-samples 600 --use-qdrant --checkpoint-dir /content/drive/MyDrive/FarmFederate/checkpoints | tee full_train.log

# 11.3) After completion: copy results and plots to Drive (if mounted)
cp -r results /content/drive/MyDrive/FarmFederate/results || true
cp -r plots /content/drive/MyDrive/FarmFederate/plots || true
cp full_train.log /content/drive/MyDrive/FarmFederate/logs/full_train.log || true

# Notes: Use Colab Pro for longer, stable runs. Monitor RAM/GPU during the run.

## 12) Troubleshooting & Tips

- If CUDA disappears after installing packages, **restart the runtime** (Runtime → Restart runtime).  
- Ensure `torch` + CUDA versions are compatible with the runtime. If needed, reinstall a specific torch wheel from https://pytorch.org.  
- For Qdrant: if `--use-qdrant` fails, set `QDRANT_URL` and `QDRANT_API_KEY` as environment variables or use Qdrant Cloud.  
- To persist work across sessions, save checkpoints to Drive and download important artifacts.


## 13) Open in Colab link

Click the link below to open this notebook directly in Colab:

https://colab.research.google.com/github/Solventerritory/FarmFederate-Advisor/blob/feature/multimodal-work/notebooks/Colab_Run_FarmFederate.ipynb

(If your repo/branch differs, replace the path accordingly.)