<a href="https://colab.research.google.com/github/ChiriKamau/limaAI/blob/main/notebooks/CSVtoYMAL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🚀 **YOLO Dataset YAML Generator Notebook**
## **Auto-creates data.yaml files for ALL annotation folders with COMPLETE class detection**

**Purpose:** Scan ALL CSV files → **Extract ALL unique classes** → **Generate data.yaml** in each annotation folder  
**Result:** **Never recreate YAMLs again!** Train directly from any folder anytime.

---

## **📋 TABLE OF CONTENTS**
1. [Setup & Mount Drive](#setup)
2. [CSV Path Mapping](#mapping)
3. [Auto-Detect ALL Classes](#classes)
4. [Generate data.yaml Files](#generate)
5. [Verify & Test](#verify)

<a id="setup"></a>
## 1️⃣ SETUP & MOUNT DRIVE

**Mounts Google Drive and imports required libraries**

In [1]:
from google.colab import drive
import pandas as pd
import os
import yaml

# Mount Google Drive
drive.mount('/content/drive')
print("✅ Drive mounted!")

Mounted at /content/drive
✅ Drive mounted!


<a id="mapping"></a>
## 2️⃣ CSV PATH MAPPING

**Defines ALL 10 annotation folders (same as your main notebook)**

In [2]:
# ALL your annotation folders
csv_image_mapping = [
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/phone1(ripe)/phone1(ripe).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/phone1(ripe)'},
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/Phone1(green)/phone1(green).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/Phone1(green)'},
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/phone2(batch1)/annotations/phone2(batch1).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/phone2(batch1)'},
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/phone2(batch2)/annotations/phone2(batch2).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/phone2(batch2)'},
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/phone2(batch3)/annotations/phone2(batch3).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/phone2(batch3)'},
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/phone2(batch4)/annotations/phone2(batch4).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/phone2(batch4)'},
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/phone2(batch5)/annotations/phone2(batch5).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/phone2(batch5)'},
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/camera_ripe(batch1)/annotations/camera_ripe(batch1).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/camera_ripe(batch1)'},
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/camera_ripe(batch2)/annotations/camera_ripe(batch2).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/camera_ripe(batch2)'},
    {'csv_path': '/content/drive/MyDrive/Tomato_dataset/camera_ripe(batch3)/annotations/camera_ripe(batch3).csv', 'folder': '/content/drive/MyDrive/Tomato_dataset/camera_ripe(batch3)'}
]

print(f"✅ Loaded {len(csv_image_mapping)} annotation folders")

✅ Loaded 10 annotation folders


<a id="classes"></a>
## 3️⃣ AUTO-DETECT ALL CLASSES

**Scans EVERY CSV → Extracts ALL unique `label_name` values → Creates class mapping**

In [3]:
# Collect ALL unique classes from ALL CSVs
all_classes = set()

for mapping in csv_image_mapping:
    try:
        df = pd.read_csv(mapping['csv_path'])
        unique_labels = df['label_name'].dropna().unique()
        all_classes.update(unique_labels)
        print(f"📁 {os.path.basename(mapping['folder'])}: {len(unique_labels)} classes")
    except Exception as e:
        print(f"⚠️ Error in {mapping['folder']}: {e}")

# Sort and create class IDs
class_names = sorted(list(all_classes))
class_map = {name: idx for idx, name in enumerate(class_names)}

print(f"\n🎯 **TOTAL UNIQUE CLASSES DETECTED: {len(class_names)}**")
print("\n📋 **COMPLETE CLASS MAPPING:**")
for i, cls in enumerate(class_names):
    print(f"  {i}: '{cls}'")

📁 phone1(ripe): 9 classes
📁 Phone1(green): 8 classes
📁 phone2(batch1): 9 classes
📁 phone2(batch2): 9 classes
📁 phone2(batch3): 8 classes
📁 phone2(batch4): 8 classes
📁 phone2(batch5): 8 classes
📁 camera_ripe(batch1): 9 classes
📁 camera_ripe(batch2): 9 classes
📁 camera_ripe(batch3): 8 classes

🎯 **TOTAL UNIQUE CLASSES DETECTED: 13**

📋 **COMPLETE CLASS MAPPING:**
  0: 'G.ber'
  1: 'G.healthy'
  2: 'G.lateblight'
  3: 'G.latebright'
  4: 'G.pests'
  5: 'G.spots'
  6: 'R.ber'
  7: 'R.healthy'
  8: 'R.lateblight'
  9: 'R.latebright'
  10: 'R.pest'
  11: 'R.pests'
  12: 'R.spots'


<a id="generate"></a>
## 4️⃣ GENERATE data.yaml FILES

**Creates `data.yaml` in EVERY annotation folder with:**
- `train/val` paths **relative to folder**
- **ALL detected classes**
- **Ready for YOLOv8 training**

In [12]:
# Generate FOLDER-NAMED YAML for EACH folder → SAVE IN /annotations/
successful_yamls = 0

for mapping in csv_image_mapping:
    folder = mapping['folder']
    folder_name = os.path.basename(folder)

    # Create /annotations/ path
    annotations_dir = os.path.join(folder, 'annotations')
    os.makedirs(annotations_dir, exist_ok=True)

    # Create FOLDER-NAMED yaml file
    yaml_filename = f"{folder_name}.yaml"
    yaml_path = os.path.join(annotations_dir, yaml_filename)

    # Create standard YOLO structure paths (relative)
    yaml_content = f"""
train: ../images/train
val: ../images/val

nc: {len(class_names)}
names: {class_names}
"""

    try:
        with open(yaml_path, 'w') as f:
            f.write(yaml_content.strip())
        successful_yamls += 1

        print(f"✅ {folder_name}/annotations/{yaml_filename} CREATED")
        print(f"   📁 Path: {yaml_path}")
        print(f"   🎯 Classes: {len(class_names)}")

    except Exception as e:
        print(f"❌ Error creating YAML for {folder_name}: {e}")

print(f"\n🎉 **SUMMARY: {successful_yamls}/10 FOLDER-NAMED .yaml files created in /annotations/!**")

✅ phone1(ripe)/annotations/phone1(ripe).yaml CREATED
   📁 Path: /content/drive/MyDrive/Tomato_dataset/phone1(ripe)/annotations/phone1(ripe).yaml
   🎯 Classes: 13
✅ Phone1(green)/annotations/Phone1(green).yaml CREATED
   📁 Path: /content/drive/MyDrive/Tomato_dataset/Phone1(green)/annotations/Phone1(green).yaml
   🎯 Classes: 13
✅ phone2(batch1)/annotations/phone2(batch1).yaml CREATED
   📁 Path: /content/drive/MyDrive/Tomato_dataset/phone2(batch1)/annotations/phone2(batch1).yaml
   🎯 Classes: 13
✅ phone2(batch2)/annotations/phone2(batch2).yaml CREATED
   📁 Path: /content/drive/MyDrive/Tomato_dataset/phone2(batch2)/annotations/phone2(batch2).yaml
   🎯 Classes: 13
✅ phone2(batch3)/annotations/phone2(batch3).yaml CREATED
   📁 Path: /content/drive/MyDrive/Tomato_dataset/phone2(batch3)/annotations/phone2(batch3).yaml
   🎯 Classes: 13
✅ phone2(batch4)/annotations/phone2(batch4).yaml CREATED
   📁 Path: /content/drive/MyDrive/Tomato_dataset/phone2(batch4)/annotations/phone2(batch4).yaml
   🎯 Clas

<a id="verify"></a>
## 5️⃣ VERIFY & TEST

**Displays sample YAML content + confirms files exist**

In [13]:
# Verify all YAML files
print("📋 **VERIFICATION REPORT:**")
print("="*60)
print(f"{'FOLDER':<25} | {'YAML FILE':<25} | {'STATUS'}")
print("-"*60)

for mapping in csv_image_mapping:
    folder = mapping['folder']
    folder_name = os.path.basename(folder)
    yaml_filename = f"{folder_name}.yaml"
    yaml_path = os.path.join(folder, 'annotations', yaml_filename)

    status = "✅ EXISTS" if os.path.exists(yaml_path) else "❌ MISSING"
    print(f"{folder_name:<25} | {yaml_filename:<25} | {status}")

    # Show content of first successful file
    if os.path.exists(yaml_path):
        if 'phone1' in folder:  # Show first one
            print(f"\n📄 **SAMPLE {yaml_filename} ({folder_name}):**\n")
            with open(yaml_path, 'r') as f:
                print(f.read())
            break

print(f"\n🚀 **ALL SET! Train YOLOv8 with:**")
print(f"yolo train data=/content/drive/MyDrive/Tomato_dataset/phone1(ripe)/annotations/phone1(ripe).yaml model=yolov8n.pt epochs=100")

📋 **VERIFICATION REPORT:**
FOLDER                    | YAML FILE                 | STATUS
------------------------------------------------------------
phone1(ripe)              | phone1(ripe).yaml         | ❌ MISSING
Phone1(green)             | Phone1(green).yaml        | ❌ MISSING
phone2(batch1)            | phone2(batch1).yaml       | ✅ EXISTS
phone2(batch2)            | phone2(batch2).yaml       | ❌ MISSING
phone2(batch3)            | phone2(batch3).yaml       | ❌ MISSING
phone2(batch4)            | phone2(batch4).yaml       | ❌ MISSING
phone2(batch5)            | phone2(batch5).yaml       | ❌ MISSING
camera_ripe(batch1)       | camera_ripe(batch1).yaml  | ❌ MISSING
camera_ripe(batch2)       | camera_ripe(batch2).yaml  | ✅ EXISTS
camera_ripe(batch3)       | camera_ripe(batch3).yaml  | ✅ EXISTS

🚀 **ALL SET! Train YOLOv8 with:**
yolo train data=/content/drive/MyDrive/Tomato_dataset/phone1(ripe)/annotations/phone1(ripe).yaml model=yolov8n.pt epochs=100
