# Clarification: Notebook scope & requirements

Before I generate the full Colab runbook, please confirm the following so I tailor the notebook precisely for your needs.

- Is the notebook name `Colab_Run_FarmFederate.ipynb` (I created it) or did you mean the single-letter `y` as the notebook name/topic?
- Programming language: **Python 3.10+** (confirm or specify another version)
- Dataset/source: use repository files and synthetic high-contrast generator (default) or point to external datasets (e.g., PlantVillage, IP102)? If external, provide paths or links.
- Primary goals (select all that apply): EDA / preprocessing / train smoke / full training on GPU / Qdrant demo / plotting / RAG demo.
- Required libraries: confirm I should install from `requirements.txt` plus `qdrant-client`, `sentence-transformers`, and `faiss-cpu`.
- Desired outputs: plots in `plots/`, results in `results/`, checkpoints in `checkpoints/` (optionally persisted to Google Drive path).
- Target audience: beginner / intermediate / advanced (affects explanations and comments).
- Include unit tests and a short VS Code integration guide? (yes/no)
- Kernel / Python version to use on Colab (default: latest Colab Python; confirm if you need a specific version).
- Constraints: maximum runtime (e.g., smoke only vs full long-run), use Colab Pro (yes/no), prefer single-GPU.

Please reply with the answers (concise) and any extra notes you want included in the notebook (e.g., specific flags: `--use-qdrant`, `--checkpoint-dir`, custom epochs).

# FarmFederate — Colab Runbook (GPU)

This notebook prepares and runs the full FarmFederate pipeline on Google Colab (GPU runtime). It includes:

- Environment setup and dependency installation
- Optional Google Drive mounting for checkpoints and outputs
- Quick smoke test (fast verification)
- Full training run (GPU) with Qdrant integration option

Prerequisites:
- This notebook assumes the repository root contains `FarmFederate_Colab.py` (it will clone from GitHub if not present).
- Target audience: intermediate users familiar with Colab and basic ML workflows.

Primary goals: full training on Colab (training, evaluation, plots, optional Qdrant demo).


## 1) Set runtime to GPU ⚙️

Runtime → Change runtime type → **GPU** (e.g., Tesla T4/P100/V100). Use Colab Pro/Pro+ for longer unbroken sessions.

In [1]:
# 2) Clone the repository (choose one option)

# Public repo (fast):
!git clone https://github.com/Solventerritory/FarmFederate-Advisor.git || true
%cd FarmFederate-Advisor

# OR: Upload a zip to Colab and unzip, or mount Drive and copy files into the working directory.

Cloning into 'FarmFederate-Advisor'...
remote: Enumerating objects: 2666, done.[K
remote: Counting objects: 100% (141/141), done.[K
remote: Compressing objects: 100% (99/99), done.[K
remote: Total 2666 (delta 50), reused 94 (delta 41), pack-reused 2525 (from 2)[K
Receiving objects: 100% (2666/2666), 184.07 MiB | 15.63 MiB/s, done.
Resolving deltas: 100% (609/609), done.
Updating files: 100% (2179/2179), done.
Downloading backend/checkpoints/global_central.pt (847 MB)
Error downloading object: backend/checkpoints/global_central.pt (34282b1): Smudge error: Error downloading backend/checkpoints/global_central.pt (34282b12cc6d3ec62f61bb33cbade7f714501421bbb6c068abc0e0fe77b0c550): batch response: This repository exceeded its LFS budget. The account responsible for the budget should increase it to restore access.

Errors logged to /content/FarmFederate-Advisor/.git/lfs/logs/20260122T195034.004770672.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-proce

In [2]:
# 3) Install dependencies
!pip install -r requirements.txt
!pip install qdrant-client sentence-transformers faiss-cpu

# NOTE: If CUDA/Torch seems unavailable after install, restart the runtime (Runtime -> Restart runtime).


Collecting qdrant-client
  Downloading qdrant_client-1.16.2-py3-none-any.whl.metadata (11 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Collecting portalocker<4.0,>=2.7.0 (from qdrant-client)
  Downloading portalocker-3.2.0-py3-none-any.whl.metadata (8.7 kB)
Downloading qdrant_client-1.16.2-py3-none-any.whl (377 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m377.2/377.2 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m101.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading portalocker-3.2.0-py3-none-any.whl (22 kB)
Installing collected packages: portalocker, faiss-cpu, qdrant-client
Successfully installed faiss-cpu-1.13.2 portalocker-3.2.0 qdrant-client-1.16.2


In [3]:
# 4) Mount Google Drive (optional) to persist checkpoints & outputs
from google.colab import drive
drive.mount('/content/drive')

# Example: set env var for checkpoint dir
# %env CHECKPOINT_DIR=/content/drive/MyDrive/FarmFederate/checkpoints


Mounted at /content/drive


In [4]:
# 5) GPU / CUDA check
import torch
print('PyTorch version:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
print('Device count:', torch.cuda.device_count())
if torch.cuda.is_available():
    print('Device name:', torch.cuda.get_device_name(0))


PyTorch version: 2.9.0+cu126
CUDA available: True
Device count: 1
Device name: Tesla T4


## 6) Optional: Qdrant Cloud setup

If you plan to use `--use-qdrant`, create a Qdrant Cloud project and store the endpoint and API key as environment variables in Colab, for example:

```python
import os
os.environ['QDRANT_URL'] = 'https://your-qdrant-xxxx.a.region.qdrant.cloud'
os.environ['QDRANT_API_KEY'] = 'your_api_key_here'
```

Alternatively pass `--qdrant-url` / `--qdrant-api-key` flags to the script if supported.

In [5]:
# 7) Run script setup (installs, checks)
!python FarmFederate_Colab.py --setup


Traceback (most recent call last):
  File "/content/FarmFederate-Advisor/FarmFederate_Colab.py", line 331, in <module>
    def generate_synthetic_text_data(n_samples: int = 500) -> pd.DataFrame:
                                                              ^^
NameError: name 'pd' is not defined. Did you mean: 'id'?


In [6]:
# 8) Verify script version/header before a full run
print('=== FarmFederate_Colab.py header ===')
with open('FarmFederate_Colab.py','r',encoding='utf-8') as f:
    for i, line in enumerate(f):
        if i>=20:
            break
        print(line.rstrip())


=== FarmFederate_Colab.py header ===
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
FarmFederate - Comprehensive Crop Stress Detection with Federated Learning + Qdrant

A complete Colab/Kaggle script for training and comparing multimodal models with
Qdrant-powered vector search, memory, and recommendations for societal impact.

Models:
- 5 LLM variants (DistilBERT, BERT-tiny, RoBERTa-tiny, ALBERT-tiny, MobileBERT)
- 5 ViT variants (ViT-Base, DeiT-tiny, Swin-tiny, ConvNeXT-tiny, EfficientNet)
- 8 VLM fusion architectures (concat, attention, gated, CLIP, Flamingo, BLIP2, CoCa, Unified-IO)

Comparisons:
- Intra-model: Same model type with different configurations (learning rates, architectures)
- Inter-model: Cross-comparison between LLM, ViT, and VLM approaches
- Dataset comparison: PlantVillage, PlantDoc, IP102, synthetic data
- Federated vs Centralized training



In [7]:
# 9) Quick smoke test (fast verification)
!python FarmFederate_Colab.py --auto-smoke --smoke-samples 10


Traceback (most recent call last):
  File "/content/FarmFederate-Advisor/FarmFederate_Colab.py", line 331, in <module>
    def generate_synthetic_text_data(n_samples: int = 500) -> pd.DataFrame:
                                                              ^^
NameError: name 'pd' is not defined. Did you mean: 'id'?


In [8]:
# 10) Show results/plots produced by the smoke run (if any)
import os
from IPython.display import display, Image

print('\n== results/ contents ==')
for root, dirs, files in os.walk('results'):
    for f in files:
        print(os.path.join(root,f))
    break

print('\n== plots/ thumbnails ==')
if os.path.isdir('plots'):
    imgs = [os.path.join('plots',p) for p in os.listdir('plots') if p.lower().endswith(('.png','.jpg','.jpeg'))]
    for img in imgs[:6]:
        display(Image(img, width=420))
else:
    print('No plots/ found yet.')



== results/ contents ==
results/github_mirrors_proposed.json
results/github_clone_summary.json
results/github_api_suggestions.json
results/demo_farm_1_history.csv
results/github_api_query_keys.json
results/dataset_discovery_manifest.json
results/run_status.json

== plots/ thumbnails ==
No plots/ found yet.


In [9]:
# 11) Full training (long run) — recommended flags for v7.0
# Use Google Drive path for checkpoints if mounted
!python FarmFederate_Colab.py --train --epochs 12 --max-samples 600 --use-qdrant --checkpoint-dir /content/drive/MyDrive/FarmFederate/checkpoints

# Notes: Use Colab Pro for longer, stable runs. Monitor RAM/GPU during the run.

Traceback (most recent call last):
  File "/content/FarmFederate-Advisor/FarmFederate_Colab.py", line 331, in <module>
    def generate_synthetic_text_data(n_samples: int = 500) -> pd.DataFrame:
                                                              ^^
NameError: name 'pd' is not defined. Did you mean: 'id'?


## 12) Troubleshooting & Tips

- If CUDA disappears after installing packages, **restart the runtime** (Runtime → Restart runtime).  
- Ensure `torch` + CUDA versions are compatible with the runtime. If needed, reinstall a specific torch wheel from https://pytorch.org.  
- For Qdrant: if `--use-qdrant` fails, set `QDRANT_URL` and `QDRANT_API_KEY` as environment variables or use Qdrant Cloud.  
- To persist work across sessions, save checkpoints to Drive and download important artifacts.


## 13) Open in Colab link

Click the link below to open this notebook directly in Colab:

https://colab.research.google.com/github/Solventerritory/FarmFederate-Advisor/blob/feature/multimodal-work/notebooks/Colab_Run_FarmFederate.ipynb

(If your repo/branch differs, replace the path accordingly.)