# ü©∫ Skin Disease Classification: Exploration & Training

This notebook explores the `skin-ga5ww` dataset from Roboflow and trains a **YOLO11** classification model using the Ultralytics framework.

**Objectives:**
1.  **Explore:** Visualize images, count samples, identify classes, check splits, and analyze class imbalance.
2.  **Train:** Train a YOLO11-cls model on the dataset.

## 1. Setup and Installation
Install the necessary libraries: `roboflow` for dataset management and `ultralytics` for the model.

In [None]:
!pip install roboflow ultralytics matplotlib seaborn

## 2. Download Data from Roboflow
We will download the dataset using the Roboflow API. 

**‚ö†Ô∏è IMPORTANT:** You must replace `"INSERT_ROBOFLOW_API_KEY_HERE"` with your actual Private API Key from your Roboflow settings.

In [None]:
from roboflow import Roboflow
import os

# --- CONFIGURATION ---
API_KEY = "INSERT_ROBOFLOW_API_KEY_HERE" # ‚ö†Ô∏è PASTE YOUR KEY HERE
WORKSPACE = "cxrdataset"
PROJECT = "skin-ga5ww"
VERSION = 1 # We assume version 1, change if you are using a specific version
# ---------------------

try:
    rf = Roboflow(api_key=API_KEY)
    project = rf.workspace(WORKSPACE).project(PROJECT)
    dataset = project.version(VERSION).download("folder")
    dataset_path = dataset.location
    print(f"\n‚úÖ Dataset downloaded to: {dataset_path}")
except Exception as e:
    print("\n‚ùå Error downloading dataset. Please check your API Key and Project permissions.")
    print(e)

## 3. Dataset Exploration & Analysis
Here we write a custom script to walk through the folders, count the images, and analyze the distribution.

In [None]:
import glob
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from PIL import Image
import random

def explore_dataset(root_path):
    splits = ['train', 'valid', 'test']
    data_stats = []

    # 1. Parse Directory Structure
    print(f"Analyzing dataset at: {root_path}...")
    
    for split in splits:
        split_path = os.path.join(root_path, split)
        if not os.path.exists(split_path):
            print(f"‚ö†Ô∏è Split '{split}' not found (Roboflow sometimes names valid as 'valid' or 'val')")
            # Try alternative name for validation
            if split == 'valid':
                split_path = os.path.join(root_path, 'val')
                if os.path.exists(split_path): 
                    split = 'val'
                else:
                    continue
            else:
                continue

        # Get classes (subfolders)
        classes = [d for d in os.listdir(split_path) if os.path.isdir(os.path.join(split_path, d))]
        
        for cls in classes:
            class_path = os.path.join(split_path, cls)
            num_images = len(glob.glob(os.path.join(class_path, '*.*')))
            data_stats.append({'Split': split, 'Class': cls, 'Count': num_images})

    df = pd.DataFrame(data_stats)
    
    # 2. Display Statistics
    if df.empty:
        print("‚ùå No data found. Check the dataset path.")
        return
        
    print("\nüìä Dataset Statistics:")
    total_images = df['Count'].sum()
    print(f"Total Images: {total_images}")
    print(f"Classes Detected: {df['Class'].unique()}")
    print("\nBreakdown by Split and Class:")
    print(df.pivot(index='Class', columns='Split', values='Count').fillna(0))

    # 3. Check for Imbalance
    plt.figure(figsize=(12, 6))
    sns.barplot(data=df, x='Class', y='Count', hue='Split')
    plt.title('Class Distribution across Splits')
    plt.xticks(rotation=45)
    plt.show()

    # 4. Visualize Samples
    print("\nüñºÔ∏è Sample Images (One per class from Train split):")
    train_split = 'train'
    classes = df[df['Split'] == train_split]['Class'].unique()
    
    fig, axes = plt.subplots(1, len(classes), figsize=(15, 5))
    if len(classes) == 1: axes = [axes]
    
    for i, cls in enumerate(classes):
        cls_path = os.path.join(root_path, train_split, cls)
        images = glob.glob(os.path.join(cls_path, '*.*'))
        if images:
            img_path = random.choice(images)
            img = Image.open(img_path)
            axes[i].imshow(img)
            axes[i].set_title(cls)
            axes[i].axis('off')
    plt.show()

# Run exploration
if 'dataset_path' in locals():
    explore_dataset(dataset_path)
else:
    print("Please download the dataset first.")

## 4. Train YOLO11 Model
We will now train a **YOLO11** (Nano) model for image classification. YOLO11 is the latest state-of-the-art model from Ultralytics.

**Settings:**
- **Model:** `yolo11n-cls.pt` (Nano version, pretrained on ImageNet, fastest training)
- **Epochs:** `20` (Adjust based on your needs)
- **Image Size:** `224` (Standard for classification)
- **Device:** `0` (Uses the Colab GPU)

In [None]:
from ultralytics import YOLO

# Initialize YOLO11 classifier
# 'n' stands for nano (smallest/fastest). You can also use yolo11s-cls.pt (small), yolo11m-cls.pt (medium)
model = YOLO('yolo11n-cls.pt') 

# Train the model
# We point 'data' to the dataset folder downloaded by Roboflow
results = model.train(
    data=dataset_path, 
    epochs=20, 
    imgsz=224, 
    project="skin_disease_project",
    name="yolo11n_skin_cls"
)

## 5. Validate and Inference
Check the accuracy on the validation/test set and run a prediction on a random test image.

In [None]:
# Validate on the test/val set automatically included in the dataset folder
metrics = model.val()
print(f"Top-1 Accuracy: {metrics.top1}")
print(f"Top-5 Accuracy: {metrics.top5}")

# Run Inference on a random image from the test set
import glob
import random

test_images = glob.glob(f"{dataset_path}/test/*/*.*")
if test_images:
    test_image = random.choice(test_images)
    
    # Predict
    results = model.predict(test_image)
    
    # Show results
    for result in results:
        result.show()  # Display the image with class label
        print(f"Prediction: {result.names[result.probs.top1]} ({result.probs.top1conf:.2f})")
else:
    print("No test images found.")