# Dataset Split – SPCBViTNet Preprocessed Metadata-Based Split

📌 This notebook performs final **dataset splitting** based on class labels extracted from metadata, after preprocessing.  
It prepares the training, validation, and test sets for experiments on the **HAM10000** and **PAD-UFES-20** datasets.

🧪 Part of the submission, titled:  
**"SPCBViTNet: Enhancing Skin Cancer Diagnosis with Multi-Scale Vision Transformers and Adaptive Spatial-Channel Attention"**
🔒 Not peer-reviewed. Please do not cite or reuse until publication.

📁 Repository: [https://github.com/diyagoyal31/SPCBViT](https://github.com/diyagoyal31/SPCBViT)


In [2]:
import pandas as pd
import os
import shutil
from PIL import Image

# Load dataset CSV
csv_path = "/Users/diyagoyal/skin_cancer/hello/metadata_PAD.csv"  # Update path
df = pd.read_csv(csv_path)

# Define source image folder & output dataset folder
image_folder = "/Users/diyagoyal/skin_cancer/hello/Preprocessed_PAD_224x224_Normalized"  # Update path
output_dataset = "/Users/diyagoyal/skin_cancer/hello/dataset_split_2/"  # Single dataset folder

# Ensure output directory exists
os.makedirs(output_dataset, exist_ok=True)

# Pre-create class-wise subdirectories
for class_name in df['diagnostic'].unique():
    os.makedirs(os.path.join(output_dataset, class_name), exist_ok=True)

# Function to move images to class folders
def move_images(df, destination_folder):
    missing_count = 0  # Track missing files
    wrong_size_count = 0  # Track incorrectly sized images
    expected_size = (224, 224)  # Expected image size

    for _, row in df.iterrows():
        src_path = os.path.join(image_folder, row['img_id'] )  # Assuming images are .jpg
        dest_path = os.path.join(destination_folder, row['diagnostic'], row['img_id'])

        if os.path.exists(src_path):
            # Open the image and check its size
            with Image.open(src_path) as img:
                if img.size != expected_size:
                    print(f"Warning: {src_path} is {img.size}, expected {expected_size}. Skipping...")
                    wrong_size_count += 1
                    continue  # Skip moving this image
            
            shutil.move(src_path, dest_path)  # Move image if size is correct
        else:
            missing_count += 1
            print(f"Warning: Image {src_path} not found!")

    print(f"Finished moving images. Missing: {missing_count}, Wrong size: {wrong_size_count}.")

# Move all images class-wise and check size
move_images(df, output_dataset)

print("Dataset with class-wise subfolders created successfully!")


Finished moving images. Missing: 0, Wrong size: 0.
Dataset with class-wise subfolders created successfully!
