# üçé Apple Detection - Google Colab Setup (Complete Guide)

This notebook provides a **complete setup from scratch** - no Kaggle account needed!

## üìã Steps:
1. **Enable GPU runtime** (Important!)
2. **Upload project files** (or clone from GitHub)
3. **Install dependencies**
4. **Download Fruit Detection Dataset** (no account needed)
5. **Filter dataset** to extract only apple images
6. **Train the model**
7. **Run inference**
8. **Save to Google Drive**

---

## üéØ What This Notebook Does:
- ‚úÖ Downloads the Fruit Detection Dataset (8479 images, 6 fruits)
- ‚úÖ Filters to extract ONLY apple images automatically
- ‚úÖ Splits into train/val/test sets
- ‚úÖ Prepares everything for training
- ‚úÖ No Kaggle account required!


## Step 1: Enable GPU Runtime

**Important:** Before running any cells, enable GPU:
1. Go to **Runtime** ‚Üí **Change runtime type**
2. Set **Hardware accelerator** to **GPU** (T4)
3. Click **Save**

Let's verify GPU is available:


In [None]:
import torch

# Check if GPU is available
if torch.cuda.is_available():
    print(f"‚úÖ GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"   CUDA Version: {torch.version.cuda}")
    print(f"   GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
else:
    print("‚ö†Ô∏è  GPU not available. Please enable GPU runtime.")
    print("   Go to Runtime ‚Üí Change runtime type ‚Üí GPU")


## Step 2: Clone or Upload Project

### Option A: Clone from GitHub (if you have a repository)


In [None]:
# Option 1: Clone from GitHub (if you have a repository)
# Uncomment and modify:
# !git clone https://github.com/yourusername/apple-detection.git
# %cd apple-detection

# Option 2: Upload project as ZIP (see next cell)
print("üì¶ If you have a GitHub repo, uncomment the lines above.")
print("   Otherwise, upload your project as a ZIP file in the next cell.")


### Option B: Upload Project Files

If you don't have a GitHub repo, you can upload your project files:
1. Create a zip file of your project (excluding `venv/`, `__pycache__/`, etc.)
2. Upload it using the cell below
3. Extract it


In [None]:
from google.colab import files
import zipfile
import os
from pathlib import Path

print("üì§ Upload your project ZIP file:")
print("   1. Create a ZIP of your project (exclude venv/, __pycache__/, .git/)")
print("   2. Click 'Choose Files' below")
print("   3. Select your ZIP file")

# Upload zip file
uploaded = files.upload()

# Extract if zip file was uploaded
project_dir = None
for filename in uploaded.keys():
    if filename.endswith('.zip'):
        print(f"\nüì¶ Extracting {filename}...")
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall('/content')
        print(f"‚úÖ Extracted to /content/")
        
        # Try to find project directory
        extracted_dirs = [d for d in Path('/content').iterdir() if d.is_dir() and d.name != '__MACOSX']
        if extracted_dirs:
            project_dir = extracted_dirs[0]
            print(f"‚úÖ Project directory: {project_dir}")
        
        # Remove zip file
        os.remove(filename)

# Set project directory
if project_dir:
    PROJECT_DIR = project_dir
else:
    PROJECT_DIR = Path('/content/apple-detection')
    PROJECT_DIR.mkdir(exist_ok=True)

print(f"\nüìÅ Project directory: {PROJECT_DIR}")
print("‚úÖ Project uploaded successfully!")


## Step 3: Install Dependencies

Install all required packages for the project:


In [None]:
# Install PyTorch with CUDA support (Colab usually has this, but we'll ensure it)
print("üì¶ Installing PyTorch with CUDA...")
%pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install other dependencies
print("\nüì¶ Installing project dependencies...")
import os
req_file = PROJECT_DIR / 'requirements.txt'
if req_file.exists():
    import subprocess
    subprocess.run(['pip', 'install', '-r', str(req_file)], check=True)
else:
    # If requirements.txt doesn't exist, install manually
    print("‚ö†Ô∏è  requirements.txt not found, installing manually...")
    import subprocess
    subprocess.run(['pip', 'install', 'numpy', 'opencv-python', 'pillow', 'pyyaml', 
                    'tqdm', 'matplotlib', 'seaborn', 'albumentations'], check=True)

# Verify installation
print("\n‚úÖ Verifying installation...")
import torch
import torchvision
import numpy as np
import cv2
import yaml

print("\n‚úÖ All dependencies installed successfully!")
print(f"   PyTorch: {torch.__version__}")
print(f"   Torchvision: {torchvision.__version__}")
print(f"   NumPy: {np.__version__}")
print(f"   OpenCV: {cv2.__version__}")


## Step 4: Download Fruit Detection Dataset

**No Kaggle account needed!** We'll download the dataset directly.

### Option A: Download from Direct Link (Recommended)

If you have a direct download link to the dataset ZIP file:


In [None]:
import os
import zipfile
from pathlib import Path

# ============================================
# METHOD 1: Direct Download Link
# ============================================
# If you have a direct download link, paste it here:
DATASET_URL = ""  # Paste your dataset download URL here

if DATASET_URL:
    print("üì• Downloading dataset from URL...")
    import subprocess
    subprocess.run(['wget', '-O', 'fruit-dataset.zip', DATASET_URL], check=True)
    
    print("üì¶ Extracting dataset...")
    with zipfile.ZipFile('fruit-dataset.zip', 'r') as zip_ref:
        zip_ref.extractall('/content')
    
    os.remove('fruit-dataset.zip')
    print("‚úÖ Dataset downloaded and extracted!")
    
    # Find extracted directory
    extracted_dirs = [d for d in Path('/content').iterdir() 
                      if d.is_dir() and 'fruit' in d.name.lower()]
    if extracted_dirs:
        DATASET_PATH = extracted_dirs[0]
    else:
        DATASET_PATH = Path('/content/fruit-dataset')
    print(f"‚úÖ Dataset path: {DATASET_PATH}")
else:
    print("‚ö†Ô∏è  No URL provided. Use Option B or C below.")
    DATASET_PATH = None


### Option B: Upload Dataset ZIP File

Upload the dataset ZIP file from your computer:


In [None]:
from google.colab import files
import zipfile
import os

print("üì§ Upload the Fruit Detection Dataset ZIP file:")
print("   1. Download the dataset from the source")
print("   2. Click 'Choose Files' below")
print("   3. Select the ZIP file")

# Upload dataset zip file
uploaded = files.upload()

# Extract dataset
DATASET_PATH = None
for filename in uploaded.keys():
    if filename.endswith('.zip'):
        print(f"\nüì¶ Extracting {filename}...")
        extract_dir = Path('/content') / filename.replace('.zip', '')
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall('/content')
        
        # Find extracted directory
        extracted_dirs = [d for d in Path('/content').iterdir() 
                          if d.is_dir() and d.name != '__MACOSX' 
                          and (d.name == filename.replace('.zip', '') or 'fruit' in d.name.lower())]
        if extracted_dirs:
            DATASET_PATH = extracted_dirs[0]
        else:
            DATASET_PATH = extract_dir
        
        print(f"‚úÖ Dataset extracted to: {DATASET_PATH}")
        os.remove(filename)
        break

if DATASET_PATH:
    print(f"\n‚úÖ Dataset ready at: {DATASET_PATH}")
else:
    print("\n‚ö†Ô∏è  No dataset found. Please upload the ZIP file.")


### Option C: Mount Google Drive

If your dataset is already in Google Drive:


In [None]:
from google.colab import drive
from pathlib import Path

# Mount Google Drive
print("üîó Mounting Google Drive...")
drive.mount('/content/drive')

# Set path to dataset in Drive (adjust as needed)
DRIVE_DATASET_PATH = "/content/drive/MyDrive/fruit-detection-dataset"  # Adjust this path

if Path(DRIVE_DATASET_PATH).exists():
    DATASET_PATH = Path(DRIVE_DATASET_PATH)
    print(f"‚úÖ Found dataset in Drive: {DATASET_PATH}")
else:
    print(f"‚ö†Ô∏è  Dataset not found at: {DRIVE_DATASET_PATH}")
    print("   Please adjust DRIVE_DATASET_PATH above or upload the dataset.")
    DATASET_PATH = None


## Step 5: Filter Dataset for Apple Images Only

Now we'll filter the multi-fruit dataset to extract ONLY apple images:


In [None]:
# First, let's check if we have the dataset
if 'DATASET_PATH' not in locals() or DATASET_PATH is None:
    print("‚ùå Error: Dataset not found!")
    print("   Please complete Step 4 first (download/upload dataset)")
else:
    print(f"üìÅ Dataset path: {DATASET_PATH}")
    
    # Check dataset structure
    print("\nüîç Checking dataset structure...")
    if DATASET_PATH.exists():
        items = list(DATASET_PATH.iterdir())[:10]
        print(f"   Found {len(list(DATASET_PATH.iterdir()))} items")
        for item in items:
            if item.is_dir():
                file_count = len(list(item.rglob('*.jpg')) + list(item.rglob('*.png')))
                print(f"   üìÅ {item.name}/ ({file_count} images)")
            else:
                print(f"   üìÑ {item.name}")
    else:
        print(f"   ‚ùå Path does not exist: {DATASET_PATH}")
    
    # Now run the filtering script
    print("\nüçé Starting to filter for apple images...")
    print("   This will extract only images containing apples.")


In [None]:
# Copy the filtering script to Colab
import sys
from pathlib import Path

# Create scripts directory if it doesn't exist
scripts_dir = PROJECT_DIR / 'scripts'
scripts_dir.mkdir(exist_ok=True)

# Read the filtering script (we'll create it inline)
filter_script = """
import os
import shutil
import random
from pathlib import Path
from collections import Counter

def find_dataset_structure(dataset_path):
    dataset_path = Path(dataset_path)
    possible_structures = [
        (dataset_path / 'images', dataset_path / 'labels'),
        (dataset_path / 'train' / 'images', dataset_path / 'train' / 'labels'),
        (dataset_path / 'images' / 'train', dataset_path / 'labels' / 'train'),
        (dataset_path / 'train', dataset_path / 'train'),
    ]
    
    for img_path, lbl_path in possible_structures:
        if img_path.exists():
            label_locations = [
                lbl_path,
                img_path.parent / 'labels',
                dataset_path / 'labels',
                img_path.parent.parent / 'labels',
            ]
            for label_path in label_locations:
                if label_path.exists():
                    return img_path, label_path
    
    if (dataset_path / 'images').exists():
        return dataset_path / 'images', dataset_path / 'labels'
    return None, None

def analyze_classes(labels_dir, sample_size=100):
    print("\\nüìä Analyzing class distribution...")
    class_counts = Counter()
    annotation_files = list(labels_dir.glob('*.txt')) or list(labels_dir.rglob('*.txt'))
    sample_files = annotation_files[:min(sample_size, len(annotation_files))]
    
    for ann_file in sample_files:
        try:
            with open(ann_file, 'r') as f:
                for line in f:
                    parts = line.strip().split()
                    if parts:
                        class_counts[parts[0]] += 1
        except:
            continue
    
    fruit_names = {'0': 'Apple', '1': 'Grapes', '2': 'Pineapple', '3': 'Orange', '4': 'Banana', '5': 'Watermelon'}
    for class_id in sorted(class_counts.keys()):
        fruit_name = fruit_names.get(class_id, f'Class {class_id}')
        print(f"  Class {class_id} ({fruit_name}): {class_counts[class_id]} boxes")
    
    return '0'

def filter_apple_annotations(input_ann_path, output_ann_path, apple_class_id='0'):
    apple_boxes = []
    try:
        with open(input_ann_path, 'r') as f:
            for line in f:
                parts = line.strip().split()
                if parts and parts[0] == apple_class_id:
                    apple_boxes.append(line.strip())
        if apple_boxes:
            with open(output_ann_path, 'w') as f:
                for box in apple_boxes:
                    f.write(box + '\\n')
            return True
    except:
        pass
    return False

def find_corresponding_annotation(image_path, labels_dir):
    possible_paths = [
        labels_dir / (image_path.stem + '.txt'),
        labels_dir / image_path.name.replace(image_path.suffix, '.txt'),
        image_path.parent.parent / 'labels' / (image_path.stem + '.txt'),
        image_path.parent / (image_path.stem + '.txt'),
    ]
    for path in possible_paths:
        if path.exists():
            return path
    return None

# Main filtering function
def filter_and_prepare_dataset(dataset_path, output_dir, apple_class_id='0', seed=42):
    dataset_path = Path(dataset_path)
    output_dir = Path(output_dir)
    
    for split in ['train', 'val', 'test']:
        (output_dir / 'images' / split).mkdir(parents=True, exist_ok=True)
        (output_dir / 'annotations' / split).mkdir(parents=True, exist_ok=True)
    
    images_dir, labels_dir = find_dataset_structure(dataset_path)
    if not images_dir or not images_dir.exists():
        print(f"‚ùå Error: Could not find images directory")
        return False
    
    if not labels_dir or not labels_dir.exists():
        label_files = list(dataset_path.rglob('*.txt'))
        if label_files:
            labels_dir = label_files[0].parent
        else:
            print("‚ùå Error: No annotation files found!")
            return False
    
    print(f"‚úÖ Found images in: {images_dir}")
    print(f"‚úÖ Found labels in: {labels_dir}")
    
    apple_class_id = analyze_classes(labels_dir)
    
    all_images = list(set(
        list(images_dir.glob('*.jpg')) + list(images_dir.glob('*.png')) +
        list(images_dir.glob('*.JPG')) + list(images_dir.glob('*.PNG')) +
        list(images_dir.rglob('*.jpg')) + list(images_dir.rglob('*.png'))
    ))
    print(f"\\nFound {len(all_images)} total images")
    
    apple_images = []
    for img_path in all_images:
        ann_path = find_corresponding_annotation(img_path, labels_dir)
        if ann_path and ann_path.exists():
            temp_ann = output_dir / 'temp_check.txt'
            if filter_apple_annotations(ann_path, temp_ann, apple_class_id):
                apple_images.append((img_path, ann_path))
            if temp_ann.exists():
                temp_ann.unlink()
    
    print(f"‚úÖ Found {len(apple_images)} images containing apples")
    
    if len(apple_images) == 0:
        print("‚ùå Error: No apple images found!")
        return False
    
    random.seed(seed)
    random.shuffle(apple_images)
    
    train_count = int(0.7 * len(apple_images))
    val_count = int(0.15 * len(apple_images))
    
    train_data = apple_images[:train_count]
    val_data = apple_images[train_count:train_count + val_count]
    test_data = apple_images[train_count + val_count:]
    
    print(f"\\nüì¶ Split: Train={len(train_data)}, Val={len(val_data)}, Test={len(test_data)}")
    
    def copy_split(data_list, split_name):
        for img_path, ann_path in data_list:
            dest_img = output_dir / 'images' / split_name / img_path.name
            shutil.copy(img_path, dest_img)
            dest_ann = output_dir / 'annotations' / split_name / (img_path.stem + '.txt')
            filter_apple_annotations(ann_path, dest_ann, apple_class_id)
        print(f"  ‚úÖ {split_name}: {len(data_list)} images")
    
    copy_split(train_data, 'train')
    copy_split(val_data, 'val')
    copy_split(test_data, 'test')
    
    print("\\nüéâ Dataset ready!")
    return True

# Run the filtering
if 'DATASET_PATH' in locals() and DATASET_PATH:
    OUTPUT_DIR = PROJECT_DIR / 'data'
    success = filter_and_prepare_dataset(DATASET_PATH, OUTPUT_DIR)
    if success:
        print(f"\\n‚úÖ Filtered dataset saved to: {OUTPUT_DIR}")
    else:
        print("\\n‚ùå Filtering failed. Please check the dataset structure.")
else:
    print("‚ùå Please complete Step 4 first to download/upload the dataset.")
"""

# Save and execute the script
script_path = scripts_dir / 'filter_dataset.py'
with open(script_path, 'w') as f:
    f.write(filter_script)

print("‚úÖ Filtering script created")
exec(filter_script)


### Verify Filtered Dataset

Let's check the final dataset structure:


In [None]:
from pathlib import Path

data_dir = PROJECT_DIR / 'data'

print("üìÅ Final Dataset Structure:")
print(f"   Root: {data_dir}")

total_images = 0
total_boxes = 0

for split in ['train', 'val', 'test']:
    img_dir = data_dir / 'images' / split
    ann_dir = data_dir / 'annotations' / split
    
    img_count = len(list(img_dir.glob('*.jpg')) + list(img_dir.glob('*.png'))) if img_dir.exists() else 0
    ann_count = len(list(ann_dir.glob('*.txt'))) if ann_dir.exists() else 0
    
    # Count total apple boxes
    boxes = 0
    if ann_dir.exists():
        for ann_file in ann_dir.glob('*.txt'):
            with open(ann_file, 'r') as f:
                boxes += len([l for l in f if l.strip()])
    
    total_images += img_count
    total_boxes += boxes
    
    print(f"\n{split.upper()}:")
    print(f"  Images: {img_count}")
    print(f"  Annotations: {ann_count}")
    print(f"  Apple bounding boxes: {boxes}")
    if img_count > 0:
        print(f"  Avg boxes per image: {boxes/img_count:.2f}")

print(f"\nüìä Summary:")
print(f"  Total images: {total_images}")
print(f"  Total apple boxes: {total_boxes}")
if total_images > 0:
    print(f"  Average boxes per image: {total_boxes/total_images:.2f}")

if total_images > 0:
    print("\n‚úÖ Dataset is ready for training!")
else:
    print("\n‚ö†Ô∏è  No images found. Please check the filtering process.")


## Step 6: Train the Model

Now you're ready to train! Use the Colab-optimized configuration:


In [None]:
# Add src to path
import sys
sys.path.insert(0, str(PROJECT_DIR / 'src'))

# Check if config file exists
config_path = PROJECT_DIR / 'configs' / 'config_colab.yaml'
if not config_path.exists():
    print("‚ö†Ô∏è  config_colab.yaml not found. Creating it...")
    config_path.parent.mkdir(exist_ok=True)
    # You may need to create the config file manually or use the existing one

print(f"üìÅ Project directory: {PROJECT_DIR}")
print(f"üìÅ Config file: {config_path}")

# Train the model
print("\nüöÄ Ready to train!")
print("   Uncomment the line below to start training:")
print(f"   !python {PROJECT_DIR}/src/train.py --config {config_path}")

# Uncomment to start training:
# !python {PROJECT_DIR}/src/train.py --config {config_path}

# Or if you prefer to run in Python:
# from src.train import train_model
# train_model(config_path=str(config_path))


## Step 7: Run Inference

Test your trained model on new images:


In [None]:
# Upload a test image
from google.colab import files
from IPython.display import Image, display

print("üì§ Upload a test image to run inference:")
uploaded = files.upload()

# Run inference (adjust paths as needed)
# Uncomment to run inference:
# !python {PROJECT_DIR}/src/inference.py \
#     --image /content/your_test_image.jpg \
#     --checkpoint {PROJECT_DIR}/checkpoints/best_model.pth \
#     --output {PROJECT_DIR}/results/detection.jpg \
#     --config {PROJECT_DIR}/configs/config_colab.yaml

# Display result
# display(Image(f'{PROJECT_DIR}/results/detection.jpg'))


## Step 8: Save Your Work

**Important:** Colab sessions are temporary. Save your checkpoints and results to Google Drive:


In [None]:
# Mount drive if not already mounted
from google.colab import drive
import shutil

try:
    drive.mount('/content/drive')
    print("‚úÖ Google Drive mounted")
except:
    print("‚ö†Ô∏è  Drive already mounted or mount failed")

# Create backup directory in Drive
backup_dir = Path('/content/drive/MyDrive/apple-detection-backup')
backup_dir.mkdir(exist_ok=True)

print(f"\nüíæ Saving to: {backup_dir}")

# Copy checkpoints
checkpoints_dir = PROJECT_DIR / 'checkpoints'
if checkpoints_dir.exists():
    dest_checkpoints = backup_dir / 'checkpoints'
    if dest_checkpoints.exists():
        shutil.rmtree(dest_checkpoints)
    shutil.copytree(checkpoints_dir, dest_checkpoints, dirs_exist_ok=True)
    print("‚úÖ Checkpoints saved to Drive!")

# Copy results
results_dir = PROJECT_DIR / 'results'
if results_dir.exists():
    dest_results = backup_dir / 'results'
    if dest_results.exists():
        shutil.rmtree(dest_results)
    shutil.copytree(results_dir, dest_results, dirs_exist_ok=True)
    print("‚úÖ Results saved to Drive!")

# Copy config for reference
config_file = PROJECT_DIR / 'configs' / 'config_colab.yaml'
if config_file.exists():
    shutil.copy(config_file, backup_dir / 'config_colab.yaml')
    print("‚úÖ Config saved to Drive!")

print(f"\n‚úÖ All files saved to: {backup_dir}")


## üí° Tips for Colab

1. **Session Timeout**: Colab sessions disconnect after ~90 minutes of inactivity. Keep the tab active during training.

2. **GPU Limits**: Free Colab has usage limits. If you hit them, wait a few hours or consider Colab Pro.

3. **Save Frequently**: Always save checkpoints to Google Drive to avoid losing progress.

4. **Large Datasets**: For large datasets, consider using Google Drive instead of uploading directly.

5. **Monitor Training**: Use TensorBoard or print statements to monitor training progress.

## üéâ You're All Set!

Happy training! üçéüîç
