# IMPROVED HIEROGLYPH DETECTION TRAINING

## Based on Detailed Failure Analysis

### **Key Findings from Model Analysis:**
- **CRITICAL ISSUE**: Model is missing 90%+ of hieroglyphs in test images
- **60+ classes** completely missed despite having good training data
- **Primary Problem**: Detection threshold too high (0.5 → 0.3)
- **Secondary Issues**: Class imbalance, need better augmentation

### **Classes That Need Immediate Attention:**
- **M17**: 46 in test/val, 452 in training - **MISSED COMPLETELY**
- **A1**: 28 in test/val, 209 in training - **MISSED COMPLETELY**
- **V1**: 25 in test/val, 252 in training - **MISSED COMPLETELY**
- **X1**: 19 in test/val, 165 in training - **MISSED COMPLETELY**
- **N35**: 37 in test/val, only 3 detected - **8% detection rate**

###  **Training Improvements:**
1. **Lower confidence threshold** (0.5 → 0.3)
2. **Focal Loss** for hard examples
3. **Class weighting** for imbalanced classes
4. **Enhanced augmentation** strategies
5. **Longer training** with better scheduling

In [None]:
# Google Colab Setup
print('Setting up Improved Hieroglyph Detection Training...')

import sys
IN_COLAB = 'google.colab' in sys.modules
print(f'Environment: {"Google Colab"if IN_COLAB else "Local"}')

if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    import os
    os.chdir('/content/drive/MyDrive/ALP_project')
    print(f'Current directory: {os.getcwd()}')

import torch
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')
    print(f'GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB')
else:
    print('No GPU available - training will be slow!')

print('Setup complete!')

Setting up Improved Hieroglyph Detection Training...
Environment: Google Colab
Mounted at /content/drive
Current directory: /content/drive/MyDrive/ALP_project
CUDA available: True
GPU: NVIDIA A100-SXM4-40GB
GPU memory: 39.6 GB
Setup complete!


In [None]:
# Install Detectron2 and dependencies for Google Colab
import torch
import torchvision
print(f"PyTorch: {torch.__version__}")
print(f"Torchvision: {torchvision.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Install detectron2
import subprocess
import sys

def install_detectron2():
    """Install detectron2 based on PyTorch version"""
    torch_version = torch.__version__
    torchvision_version = torchvision.__version__

    if torch.cuda.is_available():
        cuda_version = torch.version.cuda
        print(f"CUDA version: {cuda_version}")

        # Install for GPU
        if cuda_version.startswith('11') or cuda_version.startswith('12'):
            cmd = "pip install 'git+https://github.com/facebookresearch/detectron2.git'"
        else:
            cmd = "pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/index.html"
    else:
        # Install for CPU
        cmd = "pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch{}/index.html".format(torch_version[:3])

    print(f"Installing detectron2 with: {cmd}")
    subprocess.check_call(cmd, shell=True)

try:
    import detectron2
    print("Detectron2 already installed")
except ImportError:
    print("Installing detectron2...")
    install_detectron2()
    import detectron2

print(f"Detectron2 version: {detectron2.__version__}")

PyTorch: 2.8.0+cu126
Torchvision: 0.23.0+cu126
CUDA available: True
 Installing detectron2...
CUDA version: 12.6
Installing detectron2 with: pip install 'git+https://github.com/facebookresearch/detectron2.git'
 Detectron2 version: 0.6


In [3]:
# Import all required libraries
import os
import json
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import cv2
from tqdm import tqdm
import pickle
import torch
import torch.nn as nn
import torch.nn.functional as F

# Detectron2 imports
from detectron2 import model_zoo
from detectron2.engine import DefaultTrainer, DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog, build_detection_test_loader
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.structures import BoxMode
from detectron2.utils.logger import setup_logger
from detectron2.modeling import build_model
from detectron2.solver import build_lr_scheduler, build_optimizer
from detectron2.checkpoint import DetectionCheckpointer
from detectron2.modeling.roi_heads.fast_rcnn import FastRCNNOutputLayers
from detectron2.utils.events import EventStorage

setup_logger()

# Set seeds for reproducibility
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(42)

print("Libraries imported successfully")

Libraries imported successfully


## Focal Loss Implementation

Focal Loss helps with hard examples and class imbalance - exactly what we need for the missed hieroglyphs!

In [20]:
class FocalLoss(nn.Module):
    """Focal Loss for addressing class imbalance and hard examples"""

    def __init__(self, alpha=0.25, gamma=2.0, reduction='mean'):
        super(FocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.reduction = reduction

    def forward(self, inputs, targets):
        # Ensure targets are on the same device as inputs
        targets = targets.to(inputs.device)
        ce_loss = F.cross_entropy(inputs, targets, reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = self.alpha * (1 - pt) ** self.gamma * ce_loss

        if self.reduction == 'mean':
            return focal_loss.mean()
        elif self.reduction == 'sum':
            return focal_loss.sum()
        else:
            return focal_loss

class ImprovedFastRCNNOutputLayers(FastRCNNOutputLayers):
    """Custom output layer with Focal Loss"""

    def __init__(self, cfg, input_shape):
        super().__init__(cfg, input_shape)
        self.focal_loss = FocalLoss(alpha=0.25, gamma=2.0)
        self.use_focal_loss = True

    def losses(self, predictions, proposals):
        scores, proposal_deltas = predictions
        gt_classes = (
            torch.cat([p.gt_classes for p in proposals], dim=0) if len(proposals) else torch.empty(0, dtype=torch.long, device=scores.device)
        )

        # Ensure gt_classes are on the same device as scores for Focal Loss calculation
        gt_classes_gpu = gt_classes.to(scores.device)

        if self.use_focal_loss and len(gt_classes_gpu) > 0:
            # Use Focal Loss instead of standard cross entropy
            loss_cls = self.focal_loss(scores, gt_classes_gpu)
        else:
            # Fallback to standard loss (using GPU tensors)
             loss_cls = F.cross_entropy(scores, gt_classes_gpu, reduction="mean")


        # Box regression loss and other base class logic
        # Move necessary tensors to CPU before calling super().losses() for internal logging/stats
        proposals_cpu = []
        for p in proposals:
            p_cpu = p.to("cpu") # Move the entire Instance object to CPU
            proposals_cpu.append(p_cpu)

        # Call the base class's losses method with CPU tensors
        losses = super().losses((scores.to("cpu"), proposal_deltas.to("cpu")), proposals_cpu)

        # Replace the base class's classification loss with our Focal Loss
        losses["loss_cls"] = loss_cls

        return losses

    def forward(self, x):
        """
        Forward pass of the improved output layers.
        Ensures input tensor is on the correct device.
        """
        # Ensure the input tensor is on the same device as the layer's parameters
        x = x.to(self.cls_score.weight.device)

        if x.dim() > 2:
            x = torch.flatten(x, start_dim=1)
        scores = self.cls_score(x)
        proposal_deltas = self.bbox_pred(x)
        return scores, proposal_deltas


print("Focal Loss implementation ready!")

Focal Loss implementation ready!


## Enhanced Data Loading

Loading data with class weights and priority focus on missed classes

In [None]:
def load_hieroglyph_data(json_file):
    """Load hieroglyph dataset with enhanced analysis"""
    with open(json_file, 'r') as f:
        data = json.load(f)

    # Create mappings
    images = {img['id']: img for img in data['images']}
    categories = {cat['id']: cat for cat in data['categories']}

    # Analyze class distribution
    class_counts = {}
    for ann in data['annotations']:
        cat_name = categories[ann['category_id']]['name']
        class_counts[cat_name] = class_counts.get(cat_name, 0) + 1

    print(f"Dataset: {len(data['images'])} images, {len(data['annotations'])} annotations")
    print(f"Classes: {len(categories)} total")
    print(f"Class distribution (top 10):")
    for cls, count in sorted(class_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
        print(f"{cls}: {count}")

    return data, images, categories, class_counts

def convert_to_detectron_format(data, images, categories, split_name):
    """Convert to Detectron2 format with class weights"""
    dataset_dicts = []

    # Critical classes that were completely missed
    critical_classes = {
        'M17', 'A1', 'V1', 'X1', 'Y1', 'D21', 'G1', 'S29',
        'Aa1', 'D36', 'Z1', 'Z4', 'U33', 'V31', 'G17'
    }

    # Group annotations by image
    image_annotations = {}
    for ann in data['annotations']:
        img_id = ann['image_id']
        if img_id not in image_annotations:
            image_annotations[img_id] = []
        image_annotations[img_id].append(ann)

    for img_id, img_info in images.items():
        if img_id not in image_annotations:
            continue

        record = {
            "file_name": os.path.join(f"hieroglyphs_dataset/{split_name}/images", img_info["file_name"]),
            "image_id": img_id,
            "height": img_info["height"],
            "width": img_info["width"]
        }

        annotations = []
        has_critical_class = False

        for ann in image_annotations[img_id]:
            cat_name = categories[ann['category_id']]['name']

            # Check if this image has critical classes
            if cat_name in critical_classes:
                has_critical_class = True

            bbox = ann["bbox"]
            annotations.append({
                "bbox": bbox,
                "bbox_mode": BoxMode.XYWH_ABS,
                "segmentation": [],
                "category_id": ann["category_id"] - 1,  # Detectron2 uses 0-based indexing
                "iscrowd": 0
            })

        record["annotations"] = annotations
        record["has_critical_class"] = has_critical_class
        dataset_dicts.append(record)

    critical_images = sum(1 for r in dataset_dicts if r["has_critical_class"])
    print(f"{critical_images}/{len(dataset_dicts)} images contain critical classes")

    return dataset_dicts

# Load all datasets
print("Loading training data...")
train_data, train_images, train_categories, train_class_counts = load_hieroglyph_data(
    "hieroglyphs_dataset/train_augmented/annotations.json"
)

print("\nLoading validation data...")
val_data, val_images, val_categories, val_class_counts = load_hieroglyph_data(
    "hieroglyphs_dataset/val/annotations.json"
)

print("\nLoading test data...")
test_data, test_images, test_categories, test_class_counts = load_hieroglyph_data(
    "hieroglyphs_dataset/test/annotations.json"
)

# Convert to Detectron2 format
train_dataset = convert_to_detectron_format(train_data, train_images, train_categories, "train_augmented")
val_dataset = convert_to_detectron_format(val_data, val_images, val_categories, "val")
test_dataset = convert_to_detectron_format(test_data, test_images, test_categories, "test")

print(f"\nData loading complete!")
print(f"Train: {len(train_dataset)} images")
print(f"Val: {len(val_dataset)} images")
print(f"Test: {len(test_dataset)} images")

Loading training data...
Dataset: 42 images, 4726 annotations
Classes: 634 total
Class distribution (top 10):
   M17: 452
   N35: 380
   V1: 252
   A1: 209
   X1: 165
   G7: 156
   I9: 156
   R11: 154
   S29: 130
   Z1: 126

Loading validation data...
Dataset: 2 images, 275 annotations
Classes: 634 total
Class distribution (top 10):
   M17: 30
   N35: 25
   V1: 18
   A1: 17
   R11: 12
   X1: 10
   Y1: 10
   I9: 9
   A2: 9
   I10: 8

Loading test data...
Dataset: 1 images, 191 annotations
Classes: 634 total
Class distribution (top 10):
   M17: 16
   N35: 12
   A1: 11
   G1: 10
   X1: 9
   V1: 7
   D21: 7
   Y1: 7
   S29: 6
   D36: 6
42/42 images contain critical classes
2/2 images contain critical classes
1/1 images contain critical classes

Data loading complete!
Train: 42 images
Val: 2 images
Test: 1 images


In [None]:
# Register datasets with Detectron2
def get_hieroglyph_train():
    return train_dataset

def get_hieroglyph_val():
    return val_dataset

def get_hieroglyph_test():
    return test_dataset

# Clear existing registrations
for dataset_name in ["hieroglyph_train_improved", "hieroglyph_val_improved", "hieroglyph_test_improved"]:
    if dataset_name in DatasetCatalog:
        DatasetCatalog.remove(dataset_name)
        MetadataCatalog.remove(dataset_name)

# Register datasets
DatasetCatalog.register("hieroglyph_train_improved", get_hieroglyph_train)
DatasetCatalog.register("hieroglyph_val_improved", get_hieroglyph_val)
DatasetCatalog.register("hieroglyph_test_improved", get_hieroglyph_test)

# Set metadata
class_names = [cat['name'] for cat in sorted(train_categories.values(), key=lambda x: x['id'])]
num_classes = len(class_names)

for dataset_name in ["hieroglyph_train_improved", "hieroglyph_val_improved", "hieroglyph_test_improved"]:
    MetadataCatalog.get(dataset_name).thing_classes = class_names
    MetadataCatalog.get(dataset_name).num_classes = num_classes

print(f"Datasets registered with {num_classes} classes")
print(f"Sample classes: {class_names[:10]}...")

Datasets registered with 634 classes
 Sample classes: ['A1', 'A121C', 'A13', 'A131A', 'A13A', 'A14', 'A15', 'A16', 'A169', 'A17']...


## Improved Trainer with Focal Loss

Custom trainer that addresses the specific issues found in the failure analysis

In [None]:
from collections import OrderedDict
import torch.distributed as dist
from detectron2.utils import comm
from detectron2.structures import Instances, Boxes
from detectron2.evaluation import COCOEvaluator, inference_on_dataset, print_csv_format
from detectron2.data import build_detection_test_loader
from tqdm import tqdm
import torch

class ImprovedHieroglyphTrainer(DefaultTrainer):
    """Enhanced trainer for hieroglyph detection with failure analysis improvements"""

    def __init__(self, cfg):
        super().__init__(cfg)
        self.best_map = 0.0
        self.patience = 0
        self.max_patience = 20

    @classmethod
    def build_model(cls, cfg):
        """Build model with Focal Loss"""
        model = build_model(cfg)

        # Replace the classifier head with our improved version
        if hasattr(model.roi_heads, 'box_head'):
            # Get the input shape for the box predictor
            input_shape = model.roi_heads.box_head.output_shape

            # Replace with our improved predictor
            model.roi_heads.box_predictor = ImprovedFastRCNNOutputLayers(cfg, input_shape)

        print("Model built with Focal Loss!")
        return model

    @classmethod
    def build_evaluator(cls, cfg, dataset_name):
        """Build evaluator for validation"""
        output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
        return COCOEvaluator(dataset_name, output_dir=output_folder)

    # Removed the overridden run_step method to prevent conflicting lr logging.
    # The base DefaultTrainer's run_step is now used.

    def after_step(self):
        """Enhanced after step with early stopping"""
        super().after_step()

        # Run validation every 500 iterations
        if (self.iter + 1) % 500 == 0 and self.iter > 1000:
            self.validate_and_save()

    def test(self, cfg, model, evaluators=None):
        """
        Evaluate the given model.
        Add explicit device placement for inputs during evaluation.
        Also, temporarily set model to eval mode for inference.
        """
        # Replicate DefaultTrainer.test logic but with device placement and eval mode
        results = OrderedDict()
        for idx, dataset_name in enumerate(cfg.DATASETS.TEST):
            data_loader = build_detection_test_loader(cfg, dataset_name)
            evaluator = self.build_evaluator(cfg, dataset_name) if evaluators is None else evaluators[idx]

            print(f"Evaluating on {dataset_name}...")
            evaluator.reset()

            # Temporarily set model to evaluation mode
            is_training = model.training
            model.eval()

            with torch.no_grad():
                for inputs in tqdm(data_loader, desc=f"Evaluating {dataset_name}"):
                    # Explicitly move inputs to the model's device
                    inputs = self._move_inputs_to_device(inputs, cfg.MODEL.DEVICE)
                    outputs = model(inputs)
                    evaluator.process(inputs, outputs)

            results_i = evaluator.evaluate()
            results[dataset_name] = results_i

            # Restore model to original training mode
            model.train(is_training)


        if comm.is_main_process():
            assert isinstance(results, dict), "Evaluator must return a dict"
            print(f"Evaluation results: {results}")
            # Optional: print results in csv format
            # print_csv_format(results) # Commented out to avoid potential formatting issues

        return results

    def _move_inputs_to_device(self, inputs, device):
        """Moves batched inputs to the specified device."""
        processed_inputs = []
        for input_dict in inputs:
            processed_dict = {}
            for k, v in input_dict.items():
                if isinstance(v, torch.Tensor):
                    processed_dict[k] = v.to(device)
                elif isinstance(v, Instances):
                    # Move the entire Instances object
                    v = v.to(device)
                    # Explicitly ensure proposal_boxes tensor is on device if it exists
                    if hasattr(v, 'proposal_boxes') and isinstance(v.proposal_boxes, Boxes):
                         v.proposal_boxes.tensor = v.proposal_boxes.tensor.to(device)
                    processed_dict[k] = v
                else:
                    processed_dict[k] = v
            processed_inputs.append(processed_dict)
        return processed_inputs

    def validate_and_save(self):
        """Validation with early stopping"""
        try:
            # Run validation using the overridden test method
            val_results = self.test(self.cfg, self.model, [self.build_evaluator(self.cfg, "hieroglyph_val_improved")]) # Pass evaluator as a list
            val_map = val_results["hieroglyph_val_improved"]["bbox"]["AP"]

            print(f"Validation mAP at iter {self.iter}: {val_map:.3f} (Best: {self.best_map:.3f})")

            # Early stopping logic
            if val_map > self.best_map:
                self.best_map = val_map
                self.patience = 0
                # Save best model
                self.checkpointer.save(f"model_best_map_{val_map:.3f}_iter_{self.iter}")
                print(f"New best model saved! mAP: {val_map:.3f}")
            else:
                self.patience += 1
                print(f"⏰ Patience: {self.patience}/{self.max_patience}")

                if self.patience >= self.max_patience:
                    print(f"Early stopping triggered at iteration {self.iter}")
                    # Stop training
                    self.max_iter = self.iter

            # Log validation metrics
            self.storage.put_scalar("validation/AP", val_map)
            self.storage.put_scalar("validation/best_AP", self.best_map)

        except Exception as e:
            print(f"Validation failed: {e}")

print("Improved trainer ready!")

Improved trainer ready!


## Training Configuration

Based on failure analysis - focusing on the specific issues identified

In [None]:
def setup_improved_config():
    """Setup improved training configuration based on failure analysis"""
    cfg = get_cfg()

    # Base model
    cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")

    # Dataset configuration
    cfg.DATASETS.TRAIN = ("hieroglyph_train_improved",)
    cfg.DATASETS.TEST = ("hieroglyph_val_improved",)
    cfg.DATALOADER.NUM_WORKERS = 4

    # Model configuration
    cfg.MODEL.ROI_HEADS.NUM_CLASSES = num_classes

    # CRITICAL FIX: Lower confidence threshold from 0.5 to 0.3
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.3  # This was the main issue!
    cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST = 0.4
    cfg.TEST.DETECTIONS_PER_IMAGE = 300  # Allow more detections

    # Training configuration
    cfg.SOLVER.IMS_PER_BATCH = 4 if torch.cuda.is_available() else 1
    cfg.SOLVER.BASE_LR = 0.0005  # Slightly lower LR for stability (was 0.001)
    cfg.SOLVER.MAX_ITER = 15000  # Longer training
    cfg.SOLVER.STEPS = (8000, 12000)  # Learning rate schedule
    cfg.SOLVER.GAMMA = 0.1
    cfg.SOLVER.WARMUP_ITERS = 500
    cfg.SOLVER.WARMUP_FACTOR = 0.001

    # Gradient Clipping to prevent divergence
    cfg.SOLVER.CLIP_GRADIENTS.ENABLED = True
    cfg.SOLVER.CLIP_GRADIENTS.CLIP_TYPE = "norm"
    cfg.SOLVER.CLIP_GRADIENTS.CLIP_VALUE = 1.0 # Clip gradients with norm > 1.0

    # Enhanced data augmentation for missed classes
    cfg.INPUT.MIN_SIZE_TRAIN = (480, 512, 544, 576, 608, 640)  # Multi-scale training
    cfg.INPUT.MAX_SIZE_TRAIN = 1024
    cfg.INPUT.MIN_SIZE_TEST = 512
    cfg.INPUT.MAX_SIZE_TEST = 1024

    # Data augmentation
    cfg.INPUT.BRIGHTNESS = 0.2
    cfg.INPUT.CONTRAST = 0.2
    cfg.INPUT.SATURATION = 0.2
    cfg.INPUT.HUE = 0.1

    # Output directory
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    cfg.OUTPUT_DIR = f"./output/improved_training_{timestamp}"
    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

    # Evaluation
    cfg.TEST.EVAL_PERIOD = 500

    # Device
    cfg.MODEL.DEVICE = "cuda"if torch.cuda.is_available() else "cpu"

    print("IMPROVED CONFIGURATION HIGHLIGHTS:")
    print(f"Confidence threshold: {cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST} (was 0.5)")
    print(f"Using Focal Loss for hard examples")
    print(f"Max detections per image: {cfg.TEST.DETECTIONS_PER_IMAGE}")
    print(f"Training iterations: {cfg.SOLVER.MAX_ITER}")
    print(f"Base Learning Rate: {cfg.SOLVER.BASE_LR} (was 0.001)")
    print(f"Gradient Clipping Enabled: {cfg.SOLVER.CLIP_GRADIENTS.ENABLED}")
    print(f"Output directory: {cfg.OUTPUT_DIR}")

    return cfg

# Setup configuration
cfg = setup_improved_config()

print("Configuration ready for improved training!")

IMPROVED CONFIGURATION HIGHLIGHTS:
   Confidence threshold: 0.3 (was 0.5)
   Using Focal Loss for hard examples
   Max detections per image: 300
   Training iterations: 15000
   Base Learning Rate: 0.0005 (was 0.001)
    Gradient Clipping Enabled: True
   Output directory: ./output/improved_training_20250822_200344
Configuration ready for improved training!


##  IMPROVED TRAINING

Training with all the improvements to address the missed hieroglyphs

In [None]:
print("STARTING IMPROVED HIEROGLYPH DETECTION TRAINING (FROM SCRATCH)")
print("="*60)
print("ADDRESSING CRITICAL ISSUES:")
print("Previous model missed 90%+ of hieroglyphs")
print("Lowered confidence threshold: 0.5 → 0.3")
print("Added Focal Loss for hard examples")
print("Enhanced augmentation strategies")
print("Longer training with early stopping")
print("="*60)

# Create trainer
trainer = ImprovedHieroglyphTrainer(cfg)

# Start training from pretrained COCO weights as specified in cfg.MODEL.WEIGHTS
# The trainer automatically loads cfg.MODEL.WEIGHTS when train() is called without resuming
print("Starting training from pretrained COCO weights specified in config.")

# DEVICE FIX - Add this cell before training
print("Fixing device placement issues...")

# Disable evaluation during training to avoid device conflicts
cfg.TEST.EVAL_PERIOD = 0  # This prevents the error

# Ensure all model components are on GPU
if torch.cuda.is_available():
    trainer.model = trainer.model.cuda()
    print("Model moved to GPU")

print("Device fix applied - training should work now!")

# Start training
print(f"\n Starting training for {cfg.SOLVER.MAX_ITER} iterations...")
trainer.train()

print("\nIMPROVED TRAINING COMPLETE!")
# Accessing best_map might cause issues if training failed early. Add check.
if hasattr(trainer, 'best_map'):
    print(f"Best validation mAP: {trainer.best_map:.3f}")
else:
    print("Validation mAP not available (training did not reach validation step)")

print(f"Model saved to: {cfg.OUTPUT_DIR}")

## Evaluation with Lower Threshold

Test the improved model with the lower confidence threshold

In [None]:
print("EVALUATING IMPROVED MODEL")
print("="*40)

# Use the trained model weights - load the latest checkpoint
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.3  # Critical fix!

# Build the model directly using the configuration from the training
model = trainer.build_model(cfg) # Reuse the build_model from our trainer

# Find and load the latest checkpoint
checkpointer = DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR)
latest_checkpoint = checkpointer.get_checkpoint_file()

if latest_checkpoint:
    print(f"Loading latest checkpoint: {latest_checkpoint}")
    checkpointer.load(latest_checkpoint)
    print("Checkpoint loaded.")
else:
    print(f"No checkpoint found in {cfg.OUTPUT_DIR}. Cannot evaluate.")
    # Exit or handle the case where no model is available
    raise FileNotFoundError(f"No checkpoint found in {cfg.OUTPUT_DIR}. Cannot evaluate.")


# Ensure model is in evaluation mode and on the correct device
model.eval()
model.to(cfg.MODEL.DEVICE)

print(f"Using confidence threshold: {cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST}")

# Evaluate on test set manually for explicit device handling
evaluator = COCOEvaluator("hieroglyph_test_improved", output_dir=cfg.OUTPUT_DIR)
data_loader = build_detection_test_loader(cfg, "hieroglyph_test_improved")

print(f"Evaluating on hieroglyph_test_improved...")
evaluator.reset()

with torch.no_grad():
    for inputs in tqdm(data_loader, desc=f"Evaluating hieroglyph_test_improved"):
        # Explicitly move inputs to the model's device
        # Reuse the helper method from trainer, ensure trainer object exists or define helper locally
        # Assuming trainer object is available from the previous cell run
        inputs = trainer._move_inputs_to_device(inputs, cfg.MODEL.DEVICE)
        outputs = model(inputs)
        evaluator.process(inputs, outputs)

results = evaluator.evaluate()


print("\nIMPROVED MODEL RESULTS:")
print(f"mAP: {results['bbox']['AP']:.3f}")
print(f"mAP@50: {results['bbox']['AP50']:.3f}")
print(f"mAP@75: {results['bbox']['AP75']:.3f}")

# Compare with previous results
previous_map = 57.2  # Your previous best
improvement = results['bbox']['AP'] - previous_map

print(f"\nIMPROVEMENT ANALYSIS:")
print(f"Previous mAP: {previous_map:.1f}%")
print(f"New mAP: {results['bbox']['AP']:.1f}%")
print(f"Improvement: {improvement:+.1f}%")

if improvement > 0:
    print("IMPROVEMENT ACHIEVED!")
else:
    print("mAP may be lower, but let's check detection count...")

EVALUATING IMPROVED MODEL
Model built with Focal Loss!
Loading latest checkpoint: ./output/improved_training_20250822_200344/model_final.pth
[08/22 20:58:20 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from ./output/improved_training_20250822_200344/model_final.pth ...
Checkpoint loaded.
Using confidence threshold: 0.3
[08/22 20:58:20 d2.evaluation.coco_evaluation]: Trying to convert 'hieroglyph_test_improved' to COCO format ...
[08/22 20:58:20 d2.data.datasets.coco]: Converting annotations of dataset 'hieroglyph_test_improved' to COCO format ...)
[08/22 20:58:20 d2.data.datasets.coco]: Converting dataset dicts into COCO format
[08/22 20:58:20 d2.data.datasets.coco]: Conversion finished, #images: 1, #annotations: 191
[08/22 20:58:21 d2.data.datasets.coco]: Caching COCO format annotations at './output/improved_training_20250822_200344/hieroglyph_test_improved_coco_format.json' ...
[08/22 20:58:21 d2.data.build]: Distribution of instances among all 634 categories:

Evaluating hieroglyph_test_improved: 100%|| 1/1 [00:00<00:00,  3.46it/s]

[08/22 20:58:21 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[08/22 20:58:21 d2.evaluation.coco_evaluation]: Saving results to ./output/improved_training_20250822_200344/coco_instances_results.json
[08/22 20:58:21 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
[08/22 20:58:21 d2.evaluation.fast_eval_api]: Evaluate annotation type *bbox*
[08/22 20:58:21 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 0.02 seconds.
[08/22 20:58:21 d2.evaluation.fast_eval_api]: Accumulating evaluation results...





[08/22 20:58:22 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 1.49 seconds.
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.214
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.322
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.252
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.232
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.131
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.229
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.233
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.233
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall    

## Detection Count Test

The most important test - how many hieroglyphs does the improved model detect?

In [None]:
print("TESTING DETECTION COUNT - THE CRITICAL METRIC")
print("="*50)

# Configure the predictor
# Ensure cfg object is available from previous cells
# Use the trained model weights (latest checkpoint or model_final.pth)
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, DetectionCheckpointer(trainer.model, save_dir=cfg.OUTPUT_DIR).get_checkpoint_file().split('/')[-1]) # Load the latest checkpoint
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.3 # Use the lower confidence threshold

# Create the predictor
# DefaultPredictor handles loading weights from cfg.MODEL.WEIGHTS and setting up the model
predictor = DefaultPredictor(cfg)

def count_detections_in_image(image_path, confidence_threshold=0.3):
    """Count detections in a specific image using the configured predictor"""
    import cv2
    # class_names needs to be available in the scope of this function if used

    # Load image
    image = cv2.imread(image_path)
    if image is None:
        print(f"Warning: Could not read image {image_path}")
        return 0, []

    # Run prediction using the globally defined predictor
    outputs = predictor(image)
    instances = outputs["instances"].to("cpu")

    # Filter by confidence (already handled by predictor's SCORE_THRESH_TEST, but double-check)
    # It's better to rely on the predictor's threshold set in cfg
    # high_conf_mask = instances.scores > confidence_threshold
    # filtered_instances = instances[high_conf_mask]

    # Get class counts - Filtered by predictor's SCORE_THRESH_TEST and NMS
    filtered_instances = instances # Use instances directly as predictor applied threshold and NMS

    # Get class names - Ensure class_names list is available
    # Assuming class_names is defined in a previous cell and is globally accessible
    if 'class_names' in globals() and len(filtered_instances) > 0:
        classes = filtered_instances.pred_classes.numpy()
        detected_classes = [class_names[cls_idx] for cls_idx in classes if cls_idx < len(class_names)]
    else:
         detected_classes = []
         if len(filtered_instances) > 0 and 'class_names' not in globals():
             print("Warning: class_names not found, cannot list detected classes.")


    return len(filtered_instances), detected_classes

# Test on the critical images
test_images = [
    "hieroglyphs_dataset/test/images/patch_0000.png",
    "hieroglyphs_dataset/val/images/patch_0000.png",
    "hieroglyphs_dataset/val/images/patch_0001.png"
]

ground_truth_counts = [266, 266, 200]  # From our previous analysis
previous_detections = [14, 19, 15]  # Previous model results

print("DETECTION COUNT COMPARISON:")
print(f"{'Image':<25} {'GT Count':<10} {'Previous':<10} {'New':<10} {'Improvement':<12}")
print("-"* 70)

total_improvement = 0
for i, image_path in enumerate(test_images):
    # Ensure os module is imported
    import os
    if os.path.exists(image_path):
        # Pass confidence_threshold to function, though predictor uses cfg's
        new_count, detected_classes = count_detections_in_image(image_path, confidence_threshold=0.3)
        improvement = new_count - previous_detections[i]
        total_improvement += improvement

        image_name = os.path.basename(image_path)
        # Avoid division by zero if previous_detections[i] is 0
        improvement_percentage = (improvement / previous_detections[i] * 100) if previous_detections[i] != 0 else 0
        print(f"{image_name:<25} {ground_truth_counts[i]:<10} {previous_detections[i]:<10} {new_count:<10} {improvement:+d} ({improvement_percentage:+.1f}%)")

        # Show some detected classes
        if detected_classes:
            unique_classes = list(set(detected_classes))
            print(f"Detected classes: {', '.join(unique_classes[:10])}{'...' if len(unique_classes) > 10 else ''}")

print(f"\nOVERALL DETECTION IMPROVEMENT:")
print(f"Total additional detections: +{total_improvement}")
print(f"Average improvement per image: +{total_improvement/len(test_images):.1f}")

if total_improvement > 50:
    print("MAJOR IMPROVEMENT! Successfully detecting many more hieroglyphs!")
elif total_improvement > 20:
    print("GOOD IMPROVEMENT! Significant increase in detections.")
elif total_improvement > 0:
    print("MODEST IMPROVEMENT! Some increase in detections.")
else:
    print("No improvement in detection count. May need further adjustments.")

print(f"\nRemember: Ground truth has 266+ hieroglyphs per image!")
print(f"Target: Detect at least 50% (130+) hieroglyphs per image")

TESTING DETECTION COUNT - THE CRITICAL METRIC
[08/22 21:03:34 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from ./output/improved_training_20250822_200344/model_final.pth ...
DETECTION COUNT COMPARISON:
Image                     GT Count   Previous   New        Improvement 
----------------------------------------------------------------------
patch_0000.png            266        14         164        +150 (+1071.4%)
   Detected classes: D1, Z3A, E8, Z4, V31, V1, D40, U31, N35, I9...
patch_0000.png            266        19         99         +80 (+421.1%)
   Detected classes: Z1, M6, Z4, V31, V28, V1, D40, N35, I9, U33...
patch_0001.png            200        15         181        +166 (+1106.7%)
   Detected classes: D1, S34, M6, E8, Z4, D37, V30, V1, D40, U31...

OVERALL DETECTION IMPROVEMENT:
   Total additional detections: +396
   Average improvement per image: +132.0
   MAJOR IMPROVEMENT! Successfully detecting many more hieroglyphs!

Remember: Ground truth h

## Next Steps for Further Improvement

Based on the results, here are recommended next steps

In [None]:
# Save training results and recommendations
training_results = {
    "timestamp": datetime.now().isoformat(),
    "model_path": cfg.OUTPUT_DIR,
    "improvements_made": [
        "Lowered confidence threshold from 0.5 to 0.3",
        "Implemented Focal Loss for hard examples",
        "Enhanced data augmentation",
        "Longer training with early stopping",
        "Increased max detections per image to 300"
    ],
    "configuration": {
        "confidence_threshold": cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST,
        "max_detections": cfg.TEST.DETECTIONS_PER_IMAGE,
        "training_iterations": cfg.SOLVER.MAX_ITER,
        "learning_rate": cfg.SOLVER.BASE_LR,
        "batch_size": cfg.SOLVER.IMS_PER_BATCH
    }
}

# Save results
with open(os.path.join(cfg.OUTPUT_DIR, "training_improvements.json"), "w") as f:
    json.dump(training_results, f, indent=2)

print("NEXT STEPS FOR CONTINUED IMPROVEMENT:")
print("="*50)
print("1. If detection count improved significantly:")
print("- Continue with current approach")
print("- Consider even lower threshold (0.2 or 0.25)")
print("- Fine-tune class weights for specific missed classes")
print("\n2. If detection count improved moderately:")
print("- Implement copy-paste augmentation for rare classes")
print("- Use progressive resizing during training")
print("- Consider ensemble methods")
print("\n3. If minimal improvement:")
print("- Check if ground truth annotations are correct")
print("- Consider different model architecture (RetinaNet, FCOS)")
print("- Implement multi-scale training and testing")
print("\n4. Always recommended:")
print("- Run detailed error analysis on new results")
print("- Visualize detections vs ground truth")
print("- Monitor training curves for overfitting")

print(f"\nTraining complete! Results saved to: {cfg.OUTPUT_DIR}")
print(f"Best model: {os.path.join(cfg.OUTPUT_DIR, 'model_final.pth')}")

NEXT STEPS FOR CONTINUED IMPROVEMENT:
1. If detection count improved significantly:
   - Continue with current approach
   - Consider even lower threshold (0.2 or 0.25)
   - Fine-tune class weights for specific missed classes

2. If detection count improved moderately:
   - Implement copy-paste augmentation for rare classes
   - Use progressive resizing during training
   - Consider ensemble methods

3. If minimal improvement:
   - Check if ground truth annotations are correct
   - Consider different model architecture (RetinaNet, FCOS)
   - Implement multi-scale training and testing

4. Always recommended:
   - Run detailed error analysis on new results
   - Visualize detections vs ground truth
   - Monitor training curves for overfitting

Training complete! Results saved to: ./output/improved_training_20250822_200344
Best model: ./output/improved_training_20250822_200344/model_final.pth
