# Rice Diseases: Image-to-Graph Conversion (Data Preparation)

**Purpose**: Convert rice disease images to graph .pt files + create zip archive

**Output**: `rice_diseases_graphs.zip` ready for Graphormer training

**Note**: This notebook does NOT need Graphormer/fairseq installed!

## 1. Setup

Mount Drive and clone repository.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

print("✓ Drive mounted")

In [None]:
# Clone Graphormer (only need the examples code, not full install)
import os

if not os.path.exists('/content/Graphormer'):
    !git clone https://github.com/microsoft/Graphormer.git
    print("✓ Cloned repository")
else:
    print("✓ Repository exists")

%cd /content/Graphormer

## 2. Install ONLY Required Packages

**No Graphormer/fairseq needed** - only basic packages for image processing.

In [None]:
# Install minimal dependencies
!pip install -q torch-geometric scikit-image pillow scipy matplotlib networkx tqdm

print("✓ Packages installed")

## 3. Copy Dataset from Drive

In [None]:
from examples.rice_diseases.colab_setup import copy_and_extract_dataset

# Extract dataset
data_dir = copy_and_extract_dataset(
    drive_zip_path="MyDrive/Rice_Diseases_Dataset/rice-diseases-image-dataset.zip",
    temp_dir="/tmp",
    extract_dir="/content/rice_diseases_data"
)

print(f"\n✓ Dataset at: {data_dir}")

## 4. Process Images → Graph .pt Files

This uses `process_images.py` - no Graphormer dependencies!

In [None]:
%cd /content/Graphormer/examples/rice_diseases

# Process images using standalone script
!python process_images.py \
  --image_dir /content/rice_diseases_data \
  --output_dir /content/rice_diseases_graphs \
  --n_segments 75 \
  --seed 42 \
  --create_zip

## 5. Verify Output

In [None]:
from pathlib import Path
import json

# Check files
processed_dir = Path("/content/rice_diseases_graphs/processed")
zip_file = Path("/content/rice_diseases_graphs.zip")

pt_files = list(processed_dir.glob("data_*.pt"))
metadata_file = processed_dir / "metadata.json"
split_file = processed_dir / "split_indices.pt"

print("=" * 60)
print("Output Files:")
print("=" * 60)
print(f"Graph files (.pt):  {len(pt_files)}")
print(f"metadata.json:      {'✓' if metadata_file.exists() else '✗'}")
print(f"split_indices.pt:   {'✓' if split_file.exists() else '✗'}")
print(f"Zip archive:        {'✓' if zip_file.exists() else '✗'}")

if zip_file.exists():
    zip_size_mb = zip_file.stat().st_size / (1024 * 1024)
    print(f"Zip size:           {zip_size_mb:.2f} MB")

# Load metadata
if metadata_file.exists():
    with open(metadata_file) as f:
        meta = json.load(f)
    
    print("\n" + "=" * 60)
    print("Dataset Statistics:")
    print("=" * 60)
    print(f"Total graphs:    {meta['num_graphs']}")
    print(f"Classes:         {', '.join(meta['class_names'])}")
    print(f"Train samples:   {meta['num_train']}")
    print(f"Val samples:     {meta['num_val']}")
    print(f"Test samples:    {meta['num_test']}")
    print(f"Superpixels:     {meta['n_segments']} per image")

print("\n" + "=" * 60)
print("✓ Data processing complete!")
print("=" * 60)

## 6. Generate Sample Visualizations

Create visualizations showing image → superpixels → graph conversion.

In [None]:
import sys
sys.path.append('/content/Graphormer')

from examples.rice_diseases.visualize_graphs import visualize_image_to_graph
from examples.rice_diseases.rice_image_to_graph import ImageToGraphConverter
import matplotlib.pyplot as plt
from IPython.display import Image as IPImage, display

# Load metadata to find sample images
with open('/content/rice_diseases_graphs/processed/metadata.json') as f:
    metadata = json.load(f)

CLASS_NAMES = metadata['class_names']
image_paths = metadata['image_paths']
labels = metadata['labels']

print(f"Total images: {len(image_paths)}")
print(f"Classes: {CLASS_NAMES}")

In [None]:
# Create visualizations (2 samples per class)
converter = ImageToGraphConverter(n_segments=75)
viz_dir = "/content/rice_diseases_visualizations"
os.makedirs(viz_dir, exist_ok=True)

samples_per_class = 2
viz_files = []

print("Creating visualizations...")
print("-" * 60)

for class_name in CLASS_NAMES:
    class_idx = CLASS_NAMES.index(class_name)
    
    # Find images of this class
    class_images = [(i, path) for i, (path, label) in enumerate(zip(image_paths, labels)) 
                    if label == class_idx]
    
    for j, (idx, img_path) in enumerate(class_images[:samples_per_class]):
        save_path = f"{viz_dir}/{class_name}_sample_{j+1}.png"
        try:
            fig = visualize_image_to_graph(img_path, converter, save_path=save_path)
            viz_files.append(save_path)
            plt.close(fig)
        except Exception as e:
            print(f"  Warning: Could not visualize {img_path}: {e}")

print(f"✓ Created {len(viz_files)} visualizations")

In [None]:
# Display visualizations
print("\n" + "=" * 60)
print("Sample Visualizations (Image → Superpixels → Graph)")
print("=" * 60)

for viz_file in viz_files:
    if os.path.exists(viz_file):
        class_name = os.path.basename(viz_file).split('_')[0]
        print(f"\n**{class_name}**")
        display(IPImage(filename=viz_file, width=900))

## 7. Download Zip File

Download `rice_diseases_graphs.zip` to use in Graphormer training environment.

In [None]:
from google.colab import files

# Download zip file
print("Starting download...")
files.download('/content/rice_diseases_graphs.zip')

print("✓ Download started - check your browser Downloads folder")

## Summary

### What You Have Now:

1. **rice_diseases_graphs.zip** containing:
   - `data_0.pt`, `data_1.pt`, ... (individual graph files)
   - `split_indices.pt` (train/val/test splits)
   - `metadata.json` (dataset info)

2. **Visualizations** showing the conversion process

### Dataset Statistics:

- **4 classes**: BrownSpot, Healthy, Hispa, LeafBlast
- **Superpixels**: 75 per image (nodes in graph)
- **Node features**: RGB color (3D)
- **Edge features**: Color difference (1D)
- **Splits**: 70% train / 15% val / 15% test

### Next Steps (in Graphormer environment):

1. Upload `rice_diseases_graphs.zip` to your Graphormer environment
2. Extract to `/content/rice_diseases_graphs/`
3. Run training:
   ```bash
   cd /content/Graphormer/examples/rice_diseases
   bash rice_diseases.sh
   ```

**That's it!** No need to reprocess images in the training environment.