# Class ID Remapping Automation Script

This notebook automates the process of remapping class IDs in YOLO annotation files.
It processes all annotation files in the `data/labels/` directory and creates updated files in `data/new_labels/`.

## Features:
- Processes all .txt annotation files in the labels directory
- Remaps class IDs according to the provided mapping dictionary
- Creates a backup of original files
- Provides detailed logging and progress tracking
- Handles errors gracefully
- Generates a summary report of changes made

In [1]:
import os
import shutil
from pathlib import Path
import logging
from tqdm import tqdm
from collections import defaultdict
import pandas as pd

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

print("Class ID Remapping Script Initialized")
print("=====================================")

Class ID Remapping Script Initialized


In [2]:
# Define the class mapping dictionary
CLASS_MAPPING = {
    0: (0, 'Aluminium foil'),
    1: (1, 'Battery'),
    2: (2, 'Blister pack'),
    3: (2, 'Blister pack'),
    4: (3, 'Bottle'),
    5: (3, 'Bottle'),
    6: (3, 'Bottle'),
    7: (4, 'Bottle cap'),
    8: (4, 'Bottle cap'),
    9: (5, 'Broken glass'),
    10: (6, 'Can'),
    11: (6, 'Can'),
    12: (6, 'Can'),
    13: (7, 'Carton'),
    14: (7, 'Carton'),
    15: (7, 'Carton'),
    16: (7, 'Carton'),
    17: (7, 'Carton'),
    18: (7, 'Carton'),
    19: (7, 'Carton'),
    20: (8, 'Cup'),
    21: (8, 'Cup'),
    22: (8, 'Cup'),
    23: (8, 'Cup'),
    24: (8, 'Cup'),
    25: (9, 'Food waste'),
    26: (10, 'Glass jar'),
    27: (11, 'Lid'),
    28: (11, 'Lid'),
    29: (12, 'Other plastic'),
    30: (13, 'Paper'),
    31: (13, 'Paper'),
    32: (13, 'Paper'),
    33: (13, 'Paper'),
    34: (14, 'Paper bag'),
    35: (14, 'Paper bag'),
    36: (15, 'Plastic bag & wrapper'),
    37: (15, 'Plastic bag & wrapper'),
    38: (15, 'Plastic bag & wrapper'),
    39: (15, 'Plastic bag & wrapper'),
    40: (15, 'Plastic bag & wrapper'),
    41: (15, 'Plastic bag & wrapper'),
    42: (15, 'Plastic bag & wrapper'),
    43: (16, 'Plastic container'),
    44: (16, 'Plastic container'),
    45: (16, 'Plastic container'),
    46: (16, 'Plastic container'),
    47: (16, 'Plastic container'),
    48: (17, 'Plastic glooves'),
    49: (18, 'Plastic utensils'),
    50: (19, 'Pop tab'),
    51: (20, 'Rope & strings'),
    52: (21, 'Scrap metal'),
    53: (22, 'Shoe'),
    54: (23, 'Squeezable tube'),
    55: (24, 'Straw'),
    56: (24, 'Straw'),
    57: (25, 'Styrofoam piece'),
    58: (26, 'Unlabeled litter'),
    59: (27, 'Cigarette')
}

print(f"Loaded class mapping with {len(CLASS_MAPPING)} entries")
print(f"Original classes: {min(CLASS_MAPPING.keys())} to {max(CLASS_MAPPING.keys())}")
print(f"New classes: {min([v[0] for v in CLASS_MAPPING.values()])} to {max([v[0] for v in CLASS_MAPPING.values()])}")

Loaded class mapping with 60 entries
Original classes: 0 to 59
New classes: 0 to 27


In [3]:
# Define paths
LABELS_DIR = Path('data/labels')
NEW_LABELS_DIR = Path('data/new_labels')
BACKUP_DIR = Path('data/labels_backup')

# Create output directory if it doesn't exist
NEW_LABELS_DIR.mkdir(exist_ok=True)
print(f"Output directory: {NEW_LABELS_DIR}")

# Verify input directory exists
if not LABELS_DIR.exists():
    raise FileNotFoundError(f"Labels directory not found: {LABELS_DIR}")

print(f"Input directory: {LABELS_DIR}")
print(f"Directory exists: {LABELS_DIR.exists()}")

Output directory: data\new_labels
Input directory: data\labels
Directory exists: True


In [4]:
def process_annotation_file(input_path, output_path, class_mapping):
    """
    Process a single YOLO annotation file and remap class IDs.
    
    Args:
        input_path (Path): Path to input annotation file
        output_path (Path): Path to output annotation file
        class_mapping (dict): Dictionary mapping old class IDs to (new_id, class_name)
    
    Returns:
        dict: Statistics about the processing (lines processed, changes made, etc.)
    """
    stats = {
        'lines_processed': 0,
        'lines_changed': 0,
        'original_classes': set(),
        'new_classes': set(),
        'errors': []
    }
    
    try:
        with open(input_path, 'r') as infile, open(output_path, 'w') as outfile:
            for line_num, line in enumerate(infile, 1):
                line = line.strip()
                
                # Skip empty lines
                if not line:
                    outfile.write('\n')
                    continue
                
                stats['lines_processed'] += 1
                
                try:
                    # Parse YOLO format: class_id x_center y_center width height
                    parts = line.split()
                    if len(parts) != 5:
                        stats['errors'].append(f"Line {line_num}: Invalid format - expected 5 values, got {len(parts)}")
                        outfile.write(line + '\n')
                        continue
                    
                    original_class_id = int(parts[0])
                    stats['original_classes'].add(original_class_id)
                    
                    # Check if class ID exists in mapping
                    if original_class_id not in class_mapping:
                        stats['errors'].append(f"Line {line_num}: Class ID {original_class_id} not found in mapping")
                        outfile.write(line + '\n')
                        continue
                    
                    # Get new class ID
                    new_class_id, class_name = class_mapping[original_class_id]
                    stats['new_classes'].add(new_class_id)
                    
                    # Update the line with new class ID
                    parts[0] = str(new_class_id)
                    new_line = ' '.join(parts)
                    outfile.write(new_line + '\n')
                    
                    if original_class_id != new_class_id:
                        stats['lines_changed'] += 1
                        
                except ValueError as e:
                    stats['errors'].append(f"Line {line_num}: Error parsing class ID - {str(e)}")
                    outfile.write(line + '\n')
                    continue
                    
    except Exception as e:
        stats['errors'].append(f"File processing error: {str(e)}")
        
    return stats

In [5]:
# Get list of all annotation files
annotation_files = list(LABELS_DIR.glob('*.txt'))
print(f"Found {len(annotation_files)} annotation files to process")

if len(annotation_files) == 0:
    print("No .txt files found in the labels directory!")
else:
    print(f"First few files: {[f.name for f in annotation_files[:5]]}")
    if len(annotation_files) > 5:
        print(f"... and {len(annotation_files) - 5} more files")

Found 1500 annotation files to process
First few files: ['batch_10_000000.txt', 'batch_10_000001.txt', 'batch_10_000002.txt', 'batch_10_000003.txt', 'batch_10_000004.txt']
... and 1495 more files


In [6]:
# Initialize tracking variables
total_stats = {
    'files_processed': 0,
    'files_with_errors': 0,
    'total_lines_processed': 0,
    'total_lines_changed': 0,
    'all_original_classes': set(),
    'all_new_classes': set(),
    'all_errors': []
}

file_stats = []

print("Starting file processing...")
print("=" * 50)

# Process each annotation file
for input_file in tqdm(annotation_files, desc="Processing files"):
    output_file = NEW_LABELS_DIR / input_file.name
    
    try:
        # Process the file
        stats = process_annotation_file(input_file, output_file, CLASS_MAPPING)
        
        # Update total statistics
        total_stats['files_processed'] += 1
        total_stats['total_lines_processed'] += stats['lines_processed']
        total_stats['total_lines_changed'] += stats['lines_changed']
        total_stats['all_original_classes'].update(stats['original_classes'])
        total_stats['all_new_classes'].update(stats['new_classes'])
        
        if stats['errors']:
            total_stats['files_with_errors'] += 1
            total_stats['all_errors'].extend([f"{input_file.name}: {error}" for error in stats['errors']])
        
        # Store individual file stats
        file_stats.append({
            'filename': input_file.name,
            'lines_processed': stats['lines_processed'],
            'lines_changed': stats['lines_changed'],
            'original_classes': len(stats['original_classes']),
            'new_classes': len(stats['new_classes']),
            'errors': len(stats['errors'])
        })
        
    except Exception as e:
        logger.error(f"Failed to process {input_file.name}: {str(e)}")
        total_stats['files_with_errors'] += 1
        total_stats['all_errors'].append(f"{input_file.name}: {str(e)}")

print("\nFile processing completed!")

Starting file processing...


Processing files: 100%|██████████| 1500/1500 [00:16<00:00, 91.63it/s]


File processing completed!





In [7]:
# Display summary statistics
print("\n" + "=" * 60)
print("PROCESSING SUMMARY")
print("=" * 60)

print(f"Files processed: {total_stats['files_processed']}")
print(f"Files with errors: {total_stats['files_with_errors']}")
print(f"Total annotation lines processed: {total_stats['total_lines_processed']}")
print(f"Total annotation lines changed: {total_stats['total_lines_changed']}")
print(f"Change percentage: {(total_stats['total_lines_changed'] / max(total_stats['total_lines_processed'], 1)) * 100:.2f}%")

print(f"\nOriginal classes found: {sorted(total_stats['all_original_classes'])}")
print(f"New classes created: {sorted(total_stats['all_new_classes'])}")

if total_stats['all_errors']:
    print(f"\nErrors encountered ({len(total_stats['all_errors'])}):") 
    for error in total_stats['all_errors'][:10]:  # Show first 10 errors
        print(f"  - {error}")
    if len(total_stats['all_errors']) > 10:
        print(f"  ... and {len(total_stats['all_errors']) - 10} more errors")
else:
    print("\n✅ No errors encountered!")


PROCESSING SUMMARY
Files processed: 1500
Files with errors: 0
Total annotation lines processed: 4784
Total annotation lines changed: 4714
Change percentage: 98.54%

Original classes found: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
New classes created: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]

✅ No errors encountered!


In [8]:
# Create detailed statistics DataFrame
if file_stats:
    df_stats = pd.DataFrame(file_stats)
    
    print("\n" + "=" * 60)
    print("DETAILED FILE STATISTICS")
    print("=" * 60)
    
    print("\nSummary statistics:")
    print(df_stats.describe())
    
    print("\nFiles with most changes:")
    top_changed = df_stats.nlargest(10, 'lines_changed')[['filename', 'lines_processed', 'lines_changed']]
    print(top_changed.to_string(index=False))
    
    if df_stats['errors'].sum() > 0:
        print("\nFiles with errors:")
        error_files = df_stats[df_stats['errors'] > 0][['filename', 'errors']]
        print(error_files.to_string(index=False))
else:
    print("No file statistics available")


DETAILED FILE STATISTICS

Summary statistics:
       lines_processed  lines_changed  original_classes  new_classes  errors
count      1500.000000    1500.000000       1500.000000  1500.000000  1500.0
mean          3.189333       3.142667          2.082000     1.984667     0.0
std           4.691712       4.684258          1.491563     1.320705     0.0
min           1.000000       0.000000          1.000000     1.000000     0.0
25%           1.000000       1.000000          1.000000     1.000000     0.0
50%           2.000000       2.000000          2.000000     2.000000     0.0
75%           3.000000       3.000000          2.000000     2.000000     0.0
max          90.000000      90.000000         13.000000    11.000000     0.0

Files with most changes:
           filename  lines_processed  lines_changed
 batch_6_000072.txt               90             90
 batch_8_000021.txt               54             54
batch_12_000061.txt               38             38
batch_12_000062.txt       

In [9]:
# Create class mapping summary
print("\n" + "=" * 60)
print("CLASS MAPPING SUMMARY")
print("=" * 60)

# Count how many original classes map to each new class
new_class_counts = defaultdict(list)
for old_id, (new_id, class_name) in CLASS_MAPPING.items():
    new_class_counts[new_id].append((old_id, class_name))

print("\nClass consolidation summary:")
for new_id in sorted(new_class_counts.keys()):
    old_ids, class_name = zip(*new_class_counts[new_id])
    class_name = class_name[0]  # All should be the same
    print(f"Class {new_id:2d} ({class_name:25s}): {len(old_ids):2d} original classes {sorted(old_ids)}")

print(f"\nTotal reduction: {len(CLASS_MAPPING)} → {len(new_class_counts)} classes")
print(f"Reduction percentage: {((len(CLASS_MAPPING) - len(new_class_counts)) / len(CLASS_MAPPING)) * 100:.1f}%")


CLASS MAPPING SUMMARY

Class consolidation summary:
Class  0 (Aluminium foil           ):  1 original classes [0]
Class  1 (Battery                  ):  1 original classes [1]
Class  2 (Blister pack             ):  2 original classes [2, 3]
Class  3 (Bottle                   ):  3 original classes [4, 5, 6]
Class  4 (Bottle cap               ):  2 original classes [7, 8]
Class  5 (Broken glass             ):  1 original classes [9]
Class  6 (Can                      ):  3 original classes [10, 11, 12]
Class  7 (Carton                   ):  7 original classes [13, 14, 15, 16, 17, 18, 19]
Class  8 (Cup                      ):  5 original classes [20, 21, 22, 23, 24]
Class  9 (Food waste               ):  1 original classes [25]
Class 10 (Glass jar                ):  1 original classes [26]
Class 11 (Lid                      ):  2 original classes [27, 28]
Class 12 (Other plastic            ):  1 original classes [29]
Class 13 (Paper                    ):  4 original classes [30, 31, 32,

In [10]:
# Verify a few processed files
print("\n" + "=" * 60)
print("VERIFICATION SAMPLES")
print("=" * 60)

# Show before/after comparison for first few files
sample_files = annotation_files[:3]

for sample_file in sample_files:
    print(f"\n📁 File: {sample_file.name}")
    print("-" * 40)
    
    # Read original file
    try:
        with open(sample_file, 'r') as f:
            original_lines = [line.strip() for line in f.readlines() if line.strip()]
        
        # Read processed file
        processed_file = NEW_LABELS_DIR / sample_file.name
        with open(processed_file, 'r') as f:
            processed_lines = [line.strip() for line in f.readlines() if line.strip()]
        
        print(f"Original annotations: {len(original_lines)}")
        print(f"Processed annotations: {len(processed_lines)}")
        
        # Show first few lines comparison
        for i, (orig, proc) in enumerate(zip(original_lines[:3], processed_lines[:3])):
            orig_class = orig.split()[0]
            proc_class = proc.split()[0]
            change_indicator = "→" if orig_class != proc_class else "="
            print(f"  Line {i+1}: {orig_class} {change_indicator} {proc_class}")
        
        if len(original_lines) > 3:
            print(f"  ... and {len(original_lines) - 3} more lines")
            
    except Exception as e:
        print(f"  Error reading files: {str(e)}")


VERIFICATION SAMPLES

📁 File: batch_10_000000.txt
----------------------------------------
Original annotations: 1
Processed annotations: 1
  Line 1: 5 → 3

📁 File: batch_10_000001.txt
----------------------------------------
Original annotations: 13
Processed annotations: 13
  Line 1: 4 → 3
  Line 2: 7 → 4
  Line 3: 58 → 26
  ... and 10 more lines

📁 File: batch_10_000002.txt
----------------------------------------
Original annotations: 6
Processed annotations: 6
  Line 1: 58 → 26
  Line 2: 29 → 12
  Line 3: 5 → 3
  ... and 3 more lines


In [12]:
# Save processing report
report_path = Path('class_remapping_report.txt')

with open(report_path, 'w') as report_file:
    report_file.write("Class ID Remapping Report\n")
    report_file.write("=" * 50 + "\n\n")
    
    report_file.write(f"Processing Date: {pd.Timestamp.now()}\n")
    report_file.write(f"Input Directory: {LABELS_DIR}\n")
    report_file.write(f"Output Directory: {NEW_LABELS_DIR}\n\n")
    
    report_file.write("Summary Statistics:\n")
    report_file.write(f"  Files processed: {total_stats['files_processed']}\n")
    report_file.write(f"  Files with errors: {total_stats['files_with_errors']}\n")
    report_file.write(f"  Total lines processed: {total_stats['total_lines_processed']}\n")
    report_file.write(f"  Total lines changed: {total_stats['total_lines_changed']}\n")
    report_file.write(f"  Change percentage: {(total_stats['total_lines_changed'] / max(total_stats['total_lines_processed'], 1)) * 100:.2f}%\n\n")
    
    report_file.write("Class Mapping Applied:\n")
    for old_id, (new_id, class_name) in sorted(CLASS_MAPPING.items()):
        report_file.write(f"  {old_id:2d} -> {new_id:2d} ({class_name})\n")
    
    if total_stats['all_errors']:
        report_file.write("\nErrors Encountered:\n")
        for error in total_stats['all_errors']:
            report_file.write(f"  {error}\n")

print(f"\n📄 Detailed report saved to: {report_path}")
print(f"\n✅ Processing complete! Check the '{NEW_LABELS_DIR}' directory for updated annotation files.")


📄 Detailed report saved to: class_remapping_report.txt

✅ Processing complete! Check the 'data\new_labels' directory for updated annotation files.


In [None]:
# Optional: Create a classes.txt file with the new class names
classes_file = NEW_LABELS_DIR / 'classes.txt'

# Get unique new classes and their names
unique_new_classes = {}
for old_id, (new_id, class_name) in CLASS_MAPPING.items():
    unique_new_classes[new_id] = class_name

with open(classes_file, 'w') as f:
    for class_id in sorted(unique_new_classes.keys()):
        f.write(f"{unique_new_classes[class_id]}\n")

print(f"\n📝 Class names file created: {classes_file}")
print("\nNew class list:")
for class_id in sorted(unique_new_classes.keys()):
    print(f"  {class_id:2d}: {unique_new_classes[class_id]}")


📝 Class names file created: data\new_labels\classes.txt

New class list:
   0: Aluminium foil
   1: Battery
   2: Blister pack
   3: Bottle
   4: Bottle cap
   5: Broken glass
   6: Can
   7: Carton
   8: Cup
   9: Food waste
  10: Glass jar
  11: Lid
  12: Other plastic
  13: Paper
  14: Paper bag
  15: Plastic bag & wrapper
  16: Plastic container
  17: Plastic glooves
  18: Plastic utensils
  19: Pop tab
  20: Rope & strings
  21: Scrap metal
  22: Shoe
  23: Squeezable tube
  24: Straw
  25: Styrofoam piece
  26: Unlabeled litter
  27: Cigarette


## Usage Instructions

1. **Run all cells sequentially** - Each cell builds on the previous ones
2. **Check the output** - The script provides detailed progress and statistics
3. **Review the report** - A detailed report is saved as `class_remapping_report.txt`
4. **Verify results** - Sample comparisons are shown to verify the remapping worked correctly
5. **Use new files** - Updated annotation files are in `data/new_labels/`

## Output Files

- **`data/new_labels/*.txt`** - Updated annotation files with remapped class IDs
- **`data/new_labels/classes.txt`** - List of new class names in order
- **`class_remapping_report.txt`** - Detailed processing report

## class_remapping_report.txt

Class ID Remapping Report
==================================================

text 
Class ID Remapping Report
==================================================

Processing Date: 2025-08-06 19:06:40.942912
Input Directory: data\labels
Output Directory: data\new_labels

Summary Statistics:
  Files processed: 1500
  Files with errors: 0
  Total lines processed: 4784
  Total lines changed: 4714
  Change percentage: 98.54%

Class Mapping Applied:
   0 ->  0 (Aluminium foil)
   1 ->  1 (Battery)
   2 ->  2 (Blister pack)
   3 ->  2 (Blister pack)
   4 ->  3 (Bottle)
   5 ->  3 (Bottle)
   6 ->  3 (Bottle)
   7 ->  4 (Bottle cap)
   8 ->  4 (Bottle cap)
   9 ->  5 (Broken glass)
  10 ->  6 (Can)
  11 ->  6 (Can)
  12 ->  6 (Can)
  13 ->  7 (Carton)
  14 ->  7 (Carton)
  15 ->  7 (Carton)
  16 ->  7 (Carton)
  17 ->  7 (Carton)
  18 ->  7 (Carton)
  19 ->  7 (Carton)
  20 ->  8 (Cup)
  21 ->  8 (Cup)
  22 ->  8 (Cup)
  23 ->  8 (Cup)
  24 ->  8 (Cup)
  25 ->  9 (Food waste)
  26 -> 10 (Glass jar)
  27 -> 11 (Lid)
  28 -> 11 (Lid)
  29 -> 12 (Other plastic)
  30 -> 13 (Paper)
  31 -> 13 (Paper)
  32 -> 13 (Paper)
  33 -> 13 (Paper)
  34 -> 14 (Paper bag)
  35 -> 14 (Paper bag)
  36 -> 15 (Plastic bag & wrapper)
  37 -> 15 (Plastic bag & wrapper)
  38 -> 15 (Plastic bag & wrapper)
  39 -> 15 (Plastic bag & wrapper)
  40 -> 15 (Plastic bag & wrapper)
  41 -> 15 (Plastic bag & wrapper)
  42 -> 15 (Plastic bag & wrapper)
  43 -> 16 (Plastic container)
  44 -> 16 (Plastic container)
  45 -> 16 (Plastic container)
  46 -> 16 (Plastic container)
  47 -> 16 (Plastic container)
  48 -> 17 (Plastic glooves)
  49 -> 18 (Plastic utensils)
  50 -> 19 (Pop tab)
  51 -> 20 (Rope & strings)
  52 -> 21 (Scrap metal)
  53 -> 22 (Shoe)
  54 -> 23 (Squeezable tube)
  55 -> 24 (Straw)
  56 -> 24 (Straw)
  57 -> 25 (Styrofoam piece)
  58 -> 26 (Unlabeled litter)
  59 -> 27 (Cigarette)