Authour: Md Redwan Hossain
Date: 02/08/2025

Objective: Copies image data from a user-specified dataset folder to a "ToAnnotate" folder located in the same directory as the script. Organsie dataset for auto annotationa and Yolo model traning. 

Features:
    - Recursively searches all speceis subfolders for images in a user specified datset folder.
    - Creates or looks for a ToAnnotate folder next to your notebook/script.
    - Creates ot look for a species folders with a nested <Species>_Images folder.
    - Renames files with timestamp for uniqueness.
    - Normalizes folder matching (ignores case, spaces, underscores, and dashes).
    - Prints clear progress and summary report.


In [3]:
# Copies images from the selected dataset to the annotated folder, avoiding photos in the "not good images" folder. 
# Run the code, it will ask to enter the dataset folder name. 

from pathlib import Path

# ===============================
# User Input
# ===============================
dataset_input = input("Enter the path to your dataset folder: ").strip()
dataset_folder = Path(dataset_input)

if not dataset_folder.exists() or not dataset_folder.is_dir():
    print(f"[!] Dataset folder '{dataset_input}' does not exist or is not a directory.")
else:
    from backend_AutoAnnotateAndStructre.image_copier import ImageCopier  # <- import your class (save it as image_copier.py)
    
    copier = ImageCopier(dataset_folder)  # Initialize the class
    copier.copy_images()                  # Run the copy process


Enter the path to your dataset folder:  Not annotated


[•] Processing species: Camponotus_sp_9
[•] Processing species: Iridomyrmex_anceps
[•] Processing species: Iridomyrmex_sanguineus
[•] Processing species: Linepithema_humile
[•] Processing species: Mixed_species
[•] Processing species: Monomorium_sp_laeve_gp
[•] Processing species: Monomorium_sp_nigrius_gp
[•] Processing species: Podomyrma_maculiventris
[•] Processing species: Polyrhachis_ammon
[•] Processing species: Polyrhachis_cyrtomyrma
[•] Processing species: Polyrhachis_cyrus
[•] Processing species: Polyrhachis_diversa
[•] Processing species: Polyrhachis_lownei
[•] Processing species: Polyrhachis_penelope
[•] Processing species: Polyrhachis_robsoni
[•] Processing species: Polyrhachis_turneri
[•] Processing species: Pristomyrmex_foveolatus
[•] Processing species: Prolasius_spp
[•] Processing species: Pseudoneoponera_porcata
[•] Processing species: Rhytidoponera_aurata
[•] Processing species: Rhytidoponera_convexa
[•] Processing species: Rhytidoponera_metallica
[•] Processing specie

# Detaied code and descripton

## Detailed descriobtion of code: 

jupyter notebook Python script that:

Objective:
Copies image data from a user-specified dataset folder to a "ToAnnotate" folder located in the same directory as the script.

Detailed Requirements:

User Input:

The user to enter the path to the dataset folder.

Directory Structure:

Dataset Folder: Contains subfolders named after ant species.

Each ant species folder may contain images nested within multiple subfolders.

Copy Operation:

Recursively search each species folder to locate image files.

Copy images from the dataset folder to the "ToAnnotate" folder.

Destination Folder Structure:

Create or look for a folder named ToAnnotate in the same directory as the Python script if it does not exist.

Within ToAnnotate, look for or create  individual species folders named exactly as they appear in the dataset, considering that:

Folder name matching is case-insensitive.

Ignore spaces, dashes (-), and underscores (_) when checking for existing folders.

Within each species folder, look for or  create another folder named <SpeciesName>_Images.

File Naming Convention:

Each copied image file should be renamed uniquely using the format:

<SpeciesFolderName>-<OriginalImageFolderName>-<OriginalImageFileName>-<DatetimeStamp>

<DatetimeStamp> format: YYYYMMDD_HHMMSS.

Display Progress:

Clearly indicate which species folder is currently being processed.

Post-operation Report:

After completion, output a summary report displaying:

Total number of species processed.

Total number of images copied.

List of created/updated species folders.

Example Directory Structure:

Before:

DatasetFolder/
├── AntSpecies1/
│   ├── img1.jpg
│   └── SubFolderA/
│       └── img2.jpg
├── ant_species1/
│   └── img3.png
└── AntSpecies2/
    └── img4.jpeg

After:

ToAnnotate/
├── AntSpecies1/
│   └── AntSpecies1_Images/
│       ├── AntSpecies1-SubFolderA-img2.jpg-20250802_144530.jpg
│       ├── AntSpecies1-AntSpecies1-img1.jpg-20250802_144531.jpg
│       └── AntSpecies1-ant_species1-img3.png-20250802_144532.png
└── AntSpecies2/
    └── AntSpecies2_Images/
        └── AntSpecies2-AntSpecies2-img4.jpeg-20250802_144533.jpeg

Ensure robustness, readability, and clear logging for ease of monitoring and troubleshooting.


In [None]:


import os
import shutil
from pathlib import Path
import hashlib

# ===============================
# Configuration
# ===============================
VALID_EXTENSIONS = [".jpg", ".jpeg", ".png", ".bmp", ".tif", ".tiff"]

def normalize_name(name: str) -> str:
    """Normalize folder names for comparison by ignoring spaces, underscores, and dashes, and lowercasing."""
    return name.replace(" ", "").replace("-", "").replace("_", "").lower()

def is_image_file(file_path: Path) -> bool:
    """Check if the file is an image based on extension."""
    return file_path.suffix.lower() in VALID_EXTENSIONS

def short_hash(path: Path, length=6):
    """Generate a short hash for the file based on full source path."""
    return hashlib.md5(str(path).encode()).hexdigest()[:length]

def get_hashed_filename(file_path: Path):
    """Return filename with short hash appended before extension."""
    hash_suffix = short_hash(file_path)
    return f"{file_path.stem}_{hash_suffix}{file_path.suffix.lower()}"

def copy_images_to_toannotate(dataset_folder: Path):
    """Copy images to ToAnnotate folder maintaining species-based folder structure."""
    # Use current working directory for Jupyter Notebook
    to_annotate_dir = Path.cwd() / "ToAnnotate"
    to_annotate_dir.mkdir(exist_ok=True)

    # Track summary
    species_processed = 0
    total_images_copied = 0
    created_species_folders = []

    # Map normalized names to existing folders in ToAnnotate
    existing_folders = {normalize_name(f.name): f for f in to_annotate_dir.iterdir() if f.is_dir()}

    for species_folder in dataset_folder.iterdir():
        if not species_folder.is_dir():
            continue
        
        species_name = species_folder.name
        species_norm = normalize_name(species_name)

        # Find or create species folder in ToAnnotate
        if species_norm in existing_folders:
            target_species_folder = existing_folders[species_norm]
        else:
            target_species_folder = to_annotate_dir / species_name
            target_species_folder.mkdir(exist_ok=True)
            created_species_folders.append(species_name)
            existing_folders[species_norm] = target_species_folder

        # Create Images folder
        images_folder = target_species_folder / f"{species_folder.name}_Images"
        images_folder.mkdir(exist_ok=True)

        print(f"[•] Processing species: {species_folder.name}")
        species_processed += 1

        # Recursively copy images with hash in filename
        for img_path in species_folder.rglob("*"):
            if img_path.is_file() and is_image_file(img_path):
                new_filename = get_hashed_filename(img_path)
                dest_path = images_folder / new_filename

                shutil.copy2(img_path, dest_path)
                total_images_copied += 1

    # Summary
    print("\n========== Summary Report ==========")
    print(f"Total species processed: {species_processed}")
    print(f"Total images copied: {total_images_copied}")
    print(f"New added species folders: {created_species_folders}")
    print("===================================")

# ===============================
# User Input
# ===============================
dataset_input = input("Enter the path to your dataset folder: ").strip()
dataset_folder = Path(dataset_input)

if not dataset_folder.exists() or not dataset_folder.is_dir():
    print(f"[!] Dataset folder '{dataset_input}' does not exist or is not a directory.")
else:
    copy_images_to_toannotate(dataset_folder)
