### We will use the YOLOv11 Pretrained model and transfer training using a dataset available on Roboflow to create a more generalised model.

I will implement a multi-stage transfer learning approach:
1. Start with the YOLOv11 pretrained model as the foundation
2. First transfer learning phase: Train on research paper layouts to build general document understanding
3. Second transfer learning phase: Fine-tune on early modern document layouts
4. Final specialization phase: Fine-tune specifically for the provided early modern Spanish documents

#### This progressive specialization strategy will create a model capable of extracting layouts from both general documents and our specific target domain of early modern Spanish manuscripts.

#### Step 1: Install Dependencies

In [27]:
import os
from roboflow import Roboflow
from ultralytics import YOLO
import numpy as np
import torch
import matplotlib.pyplot as plt
import cv2

In [16]:
if torch.cuda.is_available():
    device = 'cuda'
else:
    device = 'cpu'

print(f'Using device: {device}')

Using device: cuda


 ### Downloading Custom Dataset.
Thanks to the [TFT-ID](https://universe.roboflow.com/huyifei/tft-id) dataset provided by [huyifei](https://universe.roboflow.com/huyifei) which contains *8606 images* training of our model was possible.

In [34]:
from roboflow import Roboflow
rf = Roboflow(api_key="WQoYV1hNmIvbs7cVK9KZ")
project = rf.workspace("huyifei").project("tft-id")
version = project.version(1)
dataset = version.download("yolov11")


loading Roboflow workspace...
loading Roboflow project...


In [35]:
train_images_path = os.path.join(dataset.location, "train/images")
test_images_path = os.path.join(dataset.location, "valid/images")

<Figure size 1500x1000 with 5 Axes>

### 🔍 Inspecting Dataset
Let's explore the downloaded dataset and display some sample images.

In [37]:
sample_images = os.listdir(train_images_path)[:5]
plt.figure(figsize=(15, 10))
for i, img_name in enumerate(sample_images):
    img_path = os.path.join(train_images_path, img_name)
    img = cv2.imread(img_path)
    if img is None:
        continue
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    plt.subplot(2, 3, i+1)
    plt.imshow(img)
    plt.title(img_name)
    plt.axis('off')
plt.show()

<Figure size 1500x1000 with 5 Axes>

## 📊 Displaying Dataset Structur

In [38]:
import pandas as pd

def count_files(directory):
    return len([name for name in os.listdir(directory)])

train_count = count_files(train_images_path)
test_count = count_files(test_images_path)

dataset_summary = pd.DataFrame({
    'Dataset Split': ['Train', 'Test'],
    'Number of Images': [train_count, test_count]
})

dataset_summary

Unnamed: 0,Dataset Split,Number of Images
0,Train,7745
1,Test,431



## 🚀 Training the YOLOv11 Model
Now, let's train the model on our dataset.

In [40]:
from ultralytics import YOLO
model = YOLO('yolo11n.pt')
model.train(
    data=f'{dataset.location}/data.yaml',
    epochs=25,
    imgsz=640,
    project='yolov11-custom-training',
    name='yolov11-trained-model'
)

Ultralytics 8.3.92  Python-3.12.4 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3080 Ti Laptop GPU, 16384MiB)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=yolo11n.pt, data=C:\Users\Asus\PycharmProjects\TextExtraction\TFT-ID-1/data.yaml, epochs=25, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=yolov11-custom-training, name=yolov11-trained-model, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, s

[34m[1mtrain: [0mScanning C:\Users\Asus\PycharmProjects\TextExtraction\TFT-ID-1\train\labels.cache... 7745 images, 26 backgrounds, 0 corrupt: 100%|██████████| 7745/7745 [00:00<?, ?it/s]
[34m[1mval: [0mScanning C:\Users\Asus\PycharmProjects\TextExtraction\TFT-ID-1\valid\labels.cache... 431 images, 2 backgrounds, 0 corrupt: 100%|██████████| 431/431 [00:00<?, ?it/s]


Plotting labels to yolov11-custom-training\yolov11-trained-model\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001429, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added 
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to [1myolov11-custom-training\yolov11-trained-model[0m
Starting training for 25 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       1/25      2.27G     0.6353       1.47      1.044          8        640: 100%|██████████| 485/485 [05:15<00:00,  1.54it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:06<00:00,  2.28it/s]

                   all        431       2084      0.879       0.77      0.878       0.77






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       2/25      2.62G     0.4961     0.8991     0.9623         12        640: 100%|██████████| 485/485 [05:06<00:00,  1.58it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:07<00:00,  1.83it/s]

                   all        431       2084      0.842       0.86      0.905      0.783






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       3/25      2.62G     0.4577     0.8046     0.9459          3        640: 100%|██████████| 485/485 [02:21<00:00,  3.44it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.64it/s]

                   all        431       2084      0.873      0.798      0.895      0.784






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       4/25      2.62G     0.4286     0.7408     0.9336         13        640: 100%|██████████| 485/485 [01:45<00:00,  4.58it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.59it/s]


                   all        431       2084      0.893      0.883      0.931      0.854

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       5/25      2.62G     0.3944     0.6961     0.9222         14        640: 100%|██████████| 485/485 [01:45<00:00,  4.62it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.50it/s]

                   all        431       2084      0.863      0.809      0.908      0.815






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       6/25      2.62G     0.3781     0.6725     0.9185          6        640: 100%|██████████| 485/485 [01:43<00:00,  4.66it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.49it/s]

                   all        431       2084      0.876      0.899      0.936      0.881






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       7/25      2.62G     0.3548     0.6441     0.9087         27        640: 100%|██████████| 485/485 [01:44<00:00,  4.66it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.54it/s]

                   all        431       2084      0.897      0.821      0.919      0.843






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       8/25      2.62G     0.3449      0.624     0.9077         12        640: 100%|██████████| 485/485 [01:44<00:00,  4.62it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.54it/s]

                   all        431       2084      0.855      0.849      0.905      0.766






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       9/25      2.62G       0.33     0.6003      0.901          5        640: 100%|██████████| 485/485 [01:42<00:00,  4.72it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.61it/s]

                   all        431       2084      0.927      0.909      0.961      0.918






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      10/25      2.62G     0.3208     0.5872     0.8993          5        640: 100%|██████████| 485/485 [01:42<00:00,  4.73it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.43it/s]

                   all        431       2084      0.901      0.809      0.913      0.778






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      11/25      2.62G     0.3111     0.5777     0.8941         14        640: 100%|██████████| 485/485 [01:48<00:00,  4.48it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.53it/s]

                   all        431       2084       0.93      0.921      0.961      0.913






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      12/25      2.62G      0.299      0.566     0.8892          6        640: 100%|██████████| 485/485 [01:50<00:00,  4.40it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.21it/s]

                   all        431       2084      0.542      0.325      0.396      0.279






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      13/25      2.62G     0.2949     0.5576     0.8882          5        640: 100%|██████████| 485/485 [01:56<00:00,  4.18it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.52it/s]

                   all        431       2084      0.907       0.89      0.948      0.893






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      14/25      2.62G     0.2855     0.5446     0.8865          5        640: 100%|██████████| 485/485 [01:56<00:00,  4.17it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.46it/s]

                   all        431       2084      0.929      0.916      0.966      0.925






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      15/25      2.62G     0.2794     0.5375     0.8813          3        640: 100%|██████████| 485/485 [02:00<00:00,  4.04it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.60it/s]

                   all        431       2084      0.936      0.925      0.966      0.929





Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      16/25      2.62G     0.2523     0.5074      0.846          5        640: 100%|██████████| 485/485 [01:44<00:00,  4.65it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.60it/s]

                   all        431       2084      0.929      0.939      0.968       0.93






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      17/25      2.62G     0.2372     0.4852      0.837          6        640: 100%|██████████| 485/485 [01:55<00:00,  4.18it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.26it/s]

                   all        431       2084       0.93      0.923      0.964      0.924






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      18/25      2.62G      0.232     0.4674     0.8354          4        640: 100%|██████████| 485/485 [02:02<00:00,  3.96it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.60it/s]

                   all        431       2084      0.909        0.9       0.96      0.923






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      19/25      2.62G     0.2193     0.4514     0.8306          6        640: 100%|██████████| 485/485 [02:01<00:00,  4.00it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.46it/s]

                   all        431       2084      0.947      0.939      0.971      0.932






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      20/25      2.62G     0.2128     0.4399     0.8282          3        640: 100%|██████████| 485/485 [02:04<00:00,  3.88it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.83it/s]

                   all        431       2084      0.938      0.923      0.971      0.937






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      21/25      2.62G     0.2078     0.4304     0.8253          3        640: 100%|██████████| 485/485 [02:13<00:00,  3.64it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.62it/s]

                   all        431       2084       0.93      0.938      0.968      0.936






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      22/25      2.62G     0.2006     0.4183     0.8225          4        640: 100%|██████████| 485/485 [02:16<00:00,  3.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:03<00:00,  3.56it/s]

                   all        431       2084      0.949      0.919      0.971      0.938






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      23/25      2.62G      0.193     0.3989     0.8204          5        640: 100%|██████████| 485/485 [02:12<00:00,  3.67it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.42it/s]

                   all        431       2084      0.923      0.939      0.974      0.942






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      24/25      2.62G     0.1873     0.3929      0.819          4        640: 100%|██████████| 485/485 [02:09<00:00,  3.75it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.27it/s]

                   all        431       2084      0.958      0.942      0.977       0.95






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      25/25      2.62G     0.1816     0.3744     0.8179          3        640: 100%|██████████| 485/485 [02:06<00:00,  3.83it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:04<00:00,  3.38it/s]

                   all        431       2084      0.956      0.939      0.977      0.948






25 epochs completed in 0.973 hours.
Optimizer stripped from yolov11-custom-training\yolov11-trained-model\weights\last.pt, 5.5MB
Optimizer stripped from yolov11-custom-training\yolov11-trained-model\weights\best.pt, 5.5MB

Validating yolov11-custom-training\yolov11-trained-model\weights\best.pt...
Ultralytics 8.3.92  Python-3.12.4 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3080 Ti Laptop GPU, 16384MiB)
YOLO11n summary (fused): 100 layers, 2,582,737 parameters, 0 gradients, 6.3 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14 [00:05<00:00,  2.55it/s]


                   all        431       2084      0.959      0.941      0.977       0.95
                figure        196        250      0.992      0.984      0.991      0.974
                 table        106        149      0.935       0.94      0.968      0.958
                  text        422       1685      0.949        0.9      0.973      0.918
Speed: 0.2ms preprocess, 1.9ms inference, 0.0ms loss, 1.8ms postprocess per image
Results saved to [1myolov11-custom-training\yolov11-trained-model[0m


ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0, 1, 2])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x000001C91F8AFD40>
curves: ['Precision-Recall(B)', 'F1-Confidence(B)', 'Precision-Confidence(B)', 'Recall-Confidence(B)']
curves_results: [[array([          0,    0.001001,    0.002002,    0.003003,    0.004004,    0.005005,    0.006006,    0.007007,    0.008008,    0.009009,     0.01001,    0.011011,    0.012012,    0.013013,    0.014014,    0.015015,    0.016016,    0.017017,    0.018018,    0.019019,     0.02002,    0.021021,    0.022022,    0.023023,
          0.024024,    0.025025,    0.026026,    0.027027,    0.028028,    0.029029,     0.03003,    0.031031,    0.032032,    0.033033,    0.034034,    0.035035,    0.036036,    0.037037,    0.038038,    0.039039,     0.04004,    0.041041,    0.042042,    0.043043,    0.044044,    0.045045,    0.046046,    0.047047,
          

In [42]:
metrics = model.val()
print(metrics)

Ultralytics 8.3.92  Python-3.12.4 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3080 Ti Laptop GPU, 16384MiB)
YOLO11n summary (fused): 100 layers, 2,582,737 parameters, 0 gradients, 6.3 GFLOPs


[34m[1mval: [0mScanning C:\Users\Asus\PycharmProjects\TextExtraction\TFT-ID-1\valid\labels.cache... 431 images, 2 backgrounds, 0 corrupt: 100%|██████████| 431/431 [00:00<?, ?it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 27/27 [00:08<00:00,  3.35it/s]


                   all        431       2084      0.957      0.942      0.977       0.95
                figure        196        250      0.991      0.984      0.991      0.975
                 table        106        149      0.934       0.94      0.968      0.957
                  text        422       1685      0.946      0.902      0.973      0.918
Speed: 0.5ms preprocess, 4.3ms inference, 0.0ms loss, 1.6ms postprocess per image
Results saved to [1myolov11-custom-training\yolov11-trained-model2[0m
ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0, 1, 2])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x000001C95FE46B10>
curves: ['Precision-Recall(B)', 'F1-Confidence(B)', 'Precision-Confidence(B)', 'Recall-Confidence(B)']
curves_results: [[array([          0,    0.001001,    0.002002,    0.003003,    0.004004,    0.005005,    0.006006,    0.007007,    0.008008,    0.009009,  

## Some of the labeled and the predicted images
<img src="yolov11-custom-training/yolov11-trained-model/val_batch1_labels.jpg" width="350" height="200" alt="Ancient text document">

<img src="yolov11-custom-training/yolov11-trained-model/val_batch1_pred.jpg" width="350" height="200" alt="Ancient text document">

Labeled Image__________________________________Predicted Image
Note: The text present in grey box

### 📌 Evaluation Results
Below are the evaluation results of our model training, visualized through various metrics.

- **F1 Score, Precision, and Recall:**
<img src="yolov11-custom-training/yolov11-trained-model/F1_curve.png" width="350" height="200" alt="Ancient text document">
<img src="yolov11-custom-training/yolov11-trained-model/P_curve.png" width="350" height="200" alt="Ancient text document">
<img src="yolov11-custom-training/yolov11-trained-model/R_curve.png" width="350" height="200" alt="Ancient text document">



- **Confusion Matrix:**
 <img src="yolov11-custom-training/yolov11-trained-model/confusion_matrix.png" width="350" height="200" alt="Ancient text document">



### Importing another dataset from Roboflow.

In [43]:

from roboflow import Roboflow
rf = Roboflow(api_key="WQoYV1hNmIvbs7cVK9KZ")
project = rf.workspace("layoutorganisation").project("macro-segmentation-2pwqv")
version = project.version(1)
dataset = version.download("yolov11")


loading Roboflow workspace...
loading Roboflow project...




### Further we will transfer the training using another custoom dataset.
The link for the Dataset is [macro-segmentation-2pwqv](https://universe.roboflow.com/layoutorganisation/macro-segmentation-2pwqv/dataset/1). This is a filtered version of [macro-segmentation](https://universe.roboflow.com/rf-100-vl/macro-segmentation-kaer8-yajkb-blok/browse?queryText=&pageSize=50&startingIndex=0&browseQuery=true) dataset created by [RF 100 VL](https://universe.roboflow.com/rf-100-vl) User.
This would further fine-tune the model to recognise the format in the ancient text
Some of the images from the dataset are:


<img src="Macro-segmentation-1/test/images/rollin_12148-bpt6k97791900_f20_jpg.rf.bb1cb641b3f1f51fcd101482d34adc27.jpg" width="150" height="100" alt="Ancient text document">
<img src="Macro-segmentation-1/test/images/1881_11_RDA_N070-6_png.rf.72246703af5b18dd26f25abf5f9dce84.jpg" width="150" height="100" alt="Ancient text document">
<img src="Macro-segmentation-1/test/images/1899_02_LAD_N293_gt_0005_JPG.rf.ffd01c5bdf326d8fe996aaeefff6870c.jpg" width="150" height="100" alt="Ancient text document">


In [51]:
import  yaml
yaml_path = os.path.join(dataset.location, "data.yaml")
with open(yaml_path, 'r') as f:
    data_yaml = yaml.safe_load(f)

print(f"Dataset information:")
print(f"Number of classes: {data_yaml.get('nc', 0)}")
print(f"Classes: {data_yaml.get('names', [])}")


Dataset information:
Number of classes: 2
Classes: ['Image', 'Text']


In [52]:
PRETRAINED_MODEL_PATH = "yolov11-custom-training/yolov11-trained-model/weights/best.pt"

In [53]:
model = YOLO(PRETRAINED_MODEL_PATH)

In [55]:
EPOCHS = 10 #Less number of epochs is to avoid overfitting
BATCH_SIZE = 16
IMAGE_SIZE = 640
WORKERS = 4
PATIENCE = 10  # Early stopping patience
SAVE_PERIOD = 10

In [56]:
INITIAL_LR = 0.001
FINAL_LR = 0.0001

In [57]:
RESULTS_DIR = "yolov11n_transfer_learning_results"
os.makedirs(RESULTS_DIR, exist_ok=True)

In [58]:
results = model.train(
    data=yaml_path,
    epochs=EPOCHS,
    imgsz=IMAGE_SIZE,
    batch=BATCH_SIZE,
    workers=WORKERS,
    patience=PATIENCE,
    save_period=SAVE_PERIOD,
    lr0=INITIAL_LR,
    lrf=FINAL_LR/INITIAL_LR,  # Final learning rate factor
    pretrained=True,  # Use pretrained weights (our custom model)
    optimizer="Adam",  # Optimizer (SGD, Adam, AdamW, etc.)
    project=RESULTS_DIR,
    name="transfer_learning",
    exist_ok=True,
    cache=False  # Cache images for faster training
)

Ultralytics 8.3.92  Python-3.12.4 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3080 Ti Laptop GPU, 16384MiB)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=yolov11-custom-training/yolov11-trained-model/weights/best.pt, data=C:\Users\Asus\PycharmProjects\TextExtraction\Macro-segmentation-1\data.yaml, epochs=10, time=None, patience=10, batch=16, imgsz=640, save=True, save_period=10, cache=False, device=None, workers=4, project=yolov11n_transfer_learning_results, name=transfer_learning, exist_ok=True, pretrained=True, optimizer=Adam, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nm

[34m[1mtrain: [0mScanning C:\Users\Asus\PycharmProjects\TextExtraction\Macro-segmentation-1\train\labels.cache... 725 images, 0 backgrounds, 0 corrupt: 100%|██████████| 725/725 [00:00<?, ?it/s]




[34m[1mval: [0mScanning C:\Users\Asus\PycharmProjects\TextExtraction\Macro-segmentation-1\valid\labels.cache... 195 images, 0 backgrounds, 0 corrupt: 100%|██████████| 195/195 [00:00<?, ?it/s]


Plotting labels to yolov11n_transfer_learning_results\transfer_learning\labels.jpg... 
[34m[1moptimizer:[0m Adam(lr=0.001, momentum=0.937) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added 
Image sizes 640 train, 640 val
Using 4 dataloader workers
Logging results to [1myolov11n_transfer_learning_results\transfer_learning[0m
Starting training for 10 epochs...
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       1/10      2.58G      1.614       2.11      1.364        105        640: 100%|██████████| 46/46 [00:29<00:00,  1.55it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:03<00:00,  2.08it/s]

                   all        195       3175      0.211      0.233      0.187     0.0928






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       2/10         3G      1.366      1.373      1.147        130        640: 100%|██████████| 46/46 [00:24<00:00,  1.86it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:02<00:00,  2.79it/s]

                   all        195       3175      0.325       0.32      0.349      0.212






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       3/10         3G       1.24       1.23      1.082         38        640: 100%|██████████| 46/46 [00:27<00:00,  1.69it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:02<00:00,  3.11it/s]

                   all        195       3175      0.411      0.416      0.413      0.277






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       4/10         3G      1.187      1.131      1.063         74        640: 100%|██████████| 46/46 [00:25<00:00,  1.79it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:02<00:00,  3.07it/s]

                   all        195       3175      0.514      0.446      0.428      0.278






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       5/10         3G       1.15      1.081      1.037        123        640: 100%|██████████| 46/46 [00:28<00:00,  1.62it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:02<00:00,  3.31it/s]

                   all        195       3175      0.517      0.463      0.458       0.31






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       6/10         3G       1.13      1.022      1.013         87        640: 100%|██████████| 46/46 [00:26<00:00,  1.72it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:01<00:00,  3.67it/s]

                   all        195       3175      0.481      0.461      0.465      0.318






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       7/10         3G      1.113     0.9838      1.008         77        640: 100%|██████████| 46/46 [00:26<00:00,  1.77it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:02<00:00,  3.21it/s]

                   all        195       3175      0.501      0.488      0.464      0.315






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       8/10         3G      1.079      0.948      1.008        114        640: 100%|██████████| 46/46 [00:25<00:00,  1.79it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:02<00:00,  3.17it/s]

                   all        195       3175      0.478      0.525      0.497      0.351






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       9/10         3G      1.069     0.9291      0.995         59        640: 100%|██████████| 46/46 [00:27<00:00,  1.68it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:01<00:00,  3.67it/s]

                   all        195       3175      0.568      0.477      0.512      0.363






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      10/10         3G      1.041     0.8931     0.9869         71        640: 100%|██████████| 46/46 [00:25<00:00,  1.79it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:02<00:00,  3.24it/s]

                   all        195       3175      0.558      0.491      0.514      0.365






10 epochs completed in 0.090 hours.
Optimizer stripped from yolov11n_transfer_learning_results\transfer_learning\weights\last.pt, 5.5MB
Optimizer stripped from yolov11n_transfer_learning_results\transfer_learning\weights\best.pt, 5.5MB

Validating yolov11n_transfer_learning_results\transfer_learning\weights\best.pt...
Ultralytics 8.3.92  Python-3.12.4 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3080 Ti Laptop GPU, 16384MiB)
YOLO11n summary (fused): 100 layers, 2,582,542 parameters, 0 gradients, 6.3 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:04<00:00,  1.45it/s]


                   all        195       3175      0.561      0.489      0.514      0.365
                 Image         17         67      0.284      0.209      0.153      0.107
                  Text        195       3108      0.838      0.768      0.875      0.624
Speed: 0.2ms preprocess, 3.3ms inference, 0.0ms loss, 2.2ms postprocess per image
Results saved to [1myolov11n_transfer_learning_results\transfer_learning[0m


## Some of the labeled and the predicted images
<img src="yolov11n_transfer_learning_results/transfer_learning/val_batch0_labels.jpg" width="350" height="200" alt="Ancient text document">

<img src="yolov11n_transfer_learning_results/transfer_learning/val_batch0_pred.jpg" width="350" height="200" alt="Ancient text document">

Labeled Image__________________________________Predicted Image


### 📌 Evaluation Results
Below are the evaluation results of our model training, visualized through various metrics.

- **F1 Score, Precision, and Recall:**
<img src="yolov11n_transfer_learning_results/transfer_learning/F1_curve.png" width="350" height="200" alt="Ancient text document">
<img src="yolov11n_transfer_learning_results/transfer_learning/P_curve.png" width="350" height="200" alt="Ancient text document">
<img src="yolov11n_transfer_learning_results/transfer_learning/R_curve.png" width="350" height="200" alt="Ancient text document">



- **Confusion Matrix:**
 <img src="yolov11n_transfer_learning_results/transfer_learning/confusion_matrix.png" width="350" height="200" alt="Ancient text document">


### Further we will fine-tune the training using the given dataset.
I have taken liberty to add few additional images to the dataset inspired from The GERMANA Database and annotate the given dataset. This data is the augmentated for more generalization. Feel free to use this dataset

The link for the Dataset is [Layout_organistion](https://universe.roboflow.com/layoutorganisation/layout_organistion/dataset/4). This is dataset was created by myself.
This would further fine-tune the model to recognise the given data.
Some of the images from the dataset are:


<img src="Macro-segmentation-1/test/images/rollin_12148-bpt6k97791900_f20_jpg.rf.bb1cb641b3f1f51fcd101482d34adc27.jpg" width="150" height="100" alt="Ancient text document">
<img src="Macro-segmentation-1/test/images/1881_11_RDA_N070-6_png.rf.72246703af5b18dd26f25abf5f9dce84.jpg" width="150" height="100" alt="Ancient text document">
<img src="Macro-segmentation-1/test/images/1899_02_LAD_N293_gt_0005_JPG.rf.ffd01c5bdf326d8fe996aaeefff6870c.jpg" width="150" height="100" alt="Ancient text document">

In [59]:
from roboflow import Roboflow
rf = Roboflow(api_key="WQoYV1hNmIvbs7cVK9KZ")
project = rf.workspace("layoutorganisation").project("layout_organistion")
version = project.version(4)
dataset = version.download("yolov11")


loading Roboflow workspace...
loading Roboflow project...


Some of the images from the dataset are:


<img src="Layout_organistion-4/test/images/Buendia-Instruccion_pdf_page_4_png.rf.06c1845d87a3012e1498a23e78ebd43e.jpg" width="150" height="100" alt="Ancient text document">
<img src="Layout_organistion-4/train/images/PORCONES_228_35-1636_pdf_page_8_png.rf.c26676035d3c5fe53d61ad3038cfdfa0.jpg" width="150" height="100" alt="Ancient text document">
<img src="Layout_organistion-4/train/images/Screenshot-2025-03-18-195342_png.rf.aba8dd804a296b1ee2aec1b1131f3875.jpg" width="150" height="100" alt="Ancient text document">

In [60]:
import yaml

yaml_path = os.path.join(dataset.location, "data.yaml")
with open(yaml_path, 'r') as f:
    data_yaml = yaml.safe_load(f)

print(f"Dataset information:")
print(f"Number of classes: {data_yaml.get('nc', 0)}")
print(f"Classes: {data_yaml.get('names', [])}")



Dataset information:
Number of classes: 2
Classes: ['Image', 'text']


In [61]:
PRETRAINED_MODEL_PATH = "yolov11-custom-training/yolov11-trained-model/weights/best.pt"

In [63]:
model = YOLO(PRETRAINED_MODEL_PATH)

In [64]:
EPOCHS = 20
BATCH_SIZE = 16
IMAGE_SIZE = 640
WORKERS = 4
PATIENCE = 10  # Early stopping patience
SAVE_PERIOD = 10

In [66]:
INITIAL_LR = 0.001
FINAL_LR = 0.0001

In [67]:
RESULTS_DIR = "yolov11n_transfer_learning_resultsFinal"
os.makedirs(RESULTS_DIR, exist_ok=True)

In [68]:
results = model.train(
    data=yaml_path,
    epochs=EPOCHS,
    imgsz=IMAGE_SIZE,
    batch=BATCH_SIZE,
    workers=WORKERS,
    patience=PATIENCE,
    save_period=SAVE_PERIOD,
    lr0=INITIAL_LR,
    lrf=FINAL_LR/INITIAL_LR,  # Final learning rate factor
    pretrained=True,  # Use pretrained weights (our custom model)
    optimizer="Adam",  # Optimizer (SGD, Adam, AdamW, etc.)
    project=RESULTS_DIR,
    name="transfer_learning",
    exist_ok=True,
    cache=False  # Cache images for faster training
)

Ultralytics 8.3.92  Python-3.12.4 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3080 Ti Laptop GPU, 16384MiB)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=yolov11-custom-training/yolov11-trained-model/weights/best.pt, data=C:\Users\Asus\PycharmProjects\TextExtraction\Layout_organistion-4\data.yaml, epochs=20, time=None, patience=10, batch=16, imgsz=640, save=True, save_period=10, cache=False, device=None, workers=4, project=yolov11n_transfer_learning_resultsFinal, name=transfer_learning, exist_ok=True, pretrained=True, optimizer=Adam, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnost

[34m[1mtrain: [0mScanning C:\Users\Asus\PycharmProjects\TextExtraction\Layout_organistion-4\train\labels.cache... 150 images, 0 backgrounds, 0 corrupt: 100%|██████████| 150/150 [00:00<?, ?it/s]
[34m[1mval: [0mScanning C:\Users\Asus\PycharmProjects\TextExtraction\Layout_organistion-4\valid\labels.cache... 16 images, 0 backgrounds, 0 corrupt: 100%|██████████| 16/16 [00:00<?, ?it/s]


Plotting labels to yolov11n_transfer_learning_resultsFinal\transfer_learning\labels.jpg... 
[34m[1moptimizer:[0m Adam(lr=0.001, momentum=0.937) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added 
Image sizes 640 train, 640 val
Using 4 dataloader workers
Logging results to [1myolov11n_transfer_learning_resultsFinal\transfer_learning[0m
Starting training for 20 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       1/20      2.34G       1.35      2.603      1.636         34        640: 100%|██████████| 10/10 [00:02<00:00,  4.39it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.03it/s]

                   all         16         62      0.662     0.0833      0.069     0.0174






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       2/20      2.35G      1.107      1.677      1.323         86        640: 100%|██████████| 10/10 [00:01<00:00,  5.02it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  5.50it/s]

                   all         16         62      0.111     0.0583     0.0349    0.00991






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       3/20      2.35G      1.043      1.404      1.259         60        640: 100%|██████████| 10/10 [00:02<00:00,  4.98it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.02it/s]

                   all         16         62      0.855      0.167      0.204     0.0983






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       4/20      2.35G     0.9968      1.256       1.22         32        640: 100%|██████████| 10/10 [00:02<00:00,  4.89it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.62it/s]

                   all         16         62      0.846      0.233      0.256      0.135






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       5/20      2.35G      1.014      1.145      1.211         58        640: 100%|██████████| 10/10 [00:02<00:00,  4.95it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  5.76it/s]

                   all         16         62      0.838      0.217      0.285      0.126






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       6/20      2.35G     0.9796      1.072      1.198         61        640: 100%|██████████| 10/10 [00:01<00:00,  5.17it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.63it/s]

                   all         16         62      0.753      0.192      0.527      0.237






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       7/20      2.35G     0.9308      1.034      1.183         47        640: 100%|██████████| 10/10 [00:01<00:00,  5.40it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.20it/s]

                   all         16         62      0.702       0.15      0.563       0.33






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       8/20      2.35G     0.9465       1.06      1.159         51        640: 100%|██████████| 10/10 [00:01<00:00,  5.43it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.22it/s]


                   all         16         62       0.66      0.267      0.541      0.314

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       9/20      2.35G     0.9388     0.9787      1.163         63        640: 100%|██████████| 10/10 [00:01<00:00,  5.49it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  5.86it/s]

                   all         16         62      0.606      0.558      0.707      0.426






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      10/20      2.35G     0.9031     0.9609      1.157         52        640: 100%|██████████| 10/10 [00:01<00:00,  5.20it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.35it/s]

                   all         16         62      0.747      0.517      0.674      0.406





Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      11/20      2.44G     0.9342      1.176      1.217         20        640: 100%|██████████| 10/10 [00:02<00:00,  4.58it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  5.93it/s]

                   all         16         62       0.73      0.489      0.675      0.403






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      12/20      2.44G     0.8674       1.09      1.147         33        640: 100%|██████████| 10/10 [00:01<00:00,  5.18it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.59it/s]

                   all         16         62      0.501      0.758      0.638      0.385






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      13/20      2.44G     0.9122       1.02      1.162         33        640: 100%|██████████| 10/10 [00:01<00:00,  5.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  7.03it/s]

                   all         16         62       0.55      0.725       0.62      0.377






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      14/20      2.44G     0.8915     0.9667      1.146         12        640: 100%|██████████| 10/10 [00:01<00:00,  5.49it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  5.77it/s]


                   all         16         62      0.629      0.722      0.694      0.429

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      15/20      2.44G     0.8478     0.9453      1.135         24        640: 100%|██████████| 10/10 [00:01<00:00,  5.69it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.63it/s]


                   all         16         62      0.582      0.735      0.707      0.442

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      16/20      2.44G     0.8626     0.9985      1.169         11        640: 100%|██████████| 10/10 [00:01<00:00,  5.72it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.64it/s]

                   all         16         62      0.624      0.746      0.717      0.448






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      17/20      2.44G     0.8443     0.8801      1.154         18        640: 100%|██████████| 10/10 [00:01<00:00,  5.79it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  7.13it/s]

                   all         16         62      0.627      0.767      0.813      0.521






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      18/20      2.44G     0.8531      0.907      1.153         35        640: 100%|██████████| 10/10 [00:01<00:00,  5.46it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  5.75it/s]

                   all         16         62        0.6      0.758      0.798      0.475






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      19/20      2.44G     0.7772     0.8322      1.122         12        640: 100%|██████████| 10/10 [00:01<00:00,  5.79it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  7.16it/s]


                   all         16         62      0.667      0.678      0.793      0.512

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      20/20      2.44G     0.8427     0.8444      1.115         28        640: 100%|██████████| 10/10 [00:01<00:00,  5.54it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  6.80it/s]

                   all         16         62      0.666      0.711      0.793      0.515






20 epochs completed in 0.025 hours.
Optimizer stripped from yolov11n_transfer_learning_resultsFinal\transfer_learning\weights\last.pt, 5.5MB
Optimizer stripped from yolov11n_transfer_learning_resultsFinal\transfer_learning\weights\best.pt, 5.5MB

Validating yolov11n_transfer_learning_resultsFinal\transfer_learning\weights\best.pt...
Ultralytics 8.3.92  Python-3.12.4 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3080 Ti Laptop GPU, 16384MiB)
YOLO11n summary (fused): 100 layers, 2,582,542 parameters, 0 gradients, 6.3 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  5.92it/s]


                   all         16         62      0.638      0.767      0.814      0.521
                 Image          2          2      0.389          1      0.995      0.746
                  text         16         60      0.888      0.533      0.632      0.297
Speed: 0.2ms preprocess, 1.7ms inference, 0.0ms loss, 1.9ms postprocess per image
Results saved to [1myolov11n_transfer_learning_resultsFinal\transfer_learning[0m


## Some of the labeled and the predicted images
<img src="yolov11n_transfer_learning_resultsFinal/transfer_learning/val_batch0_labels.jpg" width="450" height="200" alt="Ancient text document">

<img src="yolov11n_transfer_learning_resultsFinal/transfer_learning/val_batch0_pred.jpg" width="450" height="200" alt="Ancient text document">

Labeled Image__________________________________Predicted Image

### 📌 Evaluation Results
Below are the evaluation results of our model training, visualized through various metrics.

- **F1 Score, Precision, and Recall:**
<img src="yolov11n_transfer_learning_resultsFinal/transfer_learning/F1_curve.png" width="350" height="200" alt="Ancient text document">
<img src="yolov11n_transfer_learning_resultsFinal/transfer_learning/P_curve.png" width="350" height="200" alt="Ancient text document">
<img src="yolov11n_transfer_learning_resultsFinal/transfer_learning/R_curve.png" width="350" height="200" alt="Ancient text document">

- **labels, labels correlogram**
<img src="yolov11n_transfer_learning_resultsFinal/transfer_learning/labels.jpg" width="350" height="200" alt="Ancient text document">
<img src="yolov11n_transfer_learning_resultsFinal/transfer_learning/labels_correlogram.jpg" width="350" height="200" alt="Ancient text document">

- **Confusion Matrix:**
 <img src="yolov11n_transfer_learning_resultsFinal/transfer_learning/confusion_matrix.png" width="350" height="200" alt="Ancient text document">


### From the Data Distribution and Confusion Matrix :
* Note: The image refers to the first highlighted character at the start of the page. Since this character cannot be recognised as text it is easire to extract it as an image.
#### Class Distribution:

There's a significant imbalance between classes - the "text" class has many more instances than "image" class
This imbalance could be affecting model performance


#### Spatial Distribution:

Several scatter plots show the spatial distribution of elements
Text elements appear to have more variability in position and size


#### Confusion Matrix:

The model performs best at identifying text regions (high true positives)
There's some confusion between classes, particularly misclassifying background as text
Text detection has stronger performance compared to image detection

In [71]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import seaborn as sns

In [78]:
def calculate_model_accuracy(model_path, dataset_path, conf_threshold=0.25, iou_threshold=0.5):
    """
    Calculate model accuracy on a Roboflow dataset

    Args:
        model_path (str): Path to the trained YOLOv8 model (.pt file)
        dataset_path (str): Path to the dataset directory (containing test folder)
        conf_threshold (float): Confidence threshold for predictions
        iou_threshold (float): IoU threshold for predictions

    Returns:
        float: Accuracy of the model
        dict: Detailed metrics
    """
    # Load the model
    model = YOLO(model_path)

    # Get test images path
    test_images_path = os.path.join(dataset_path, 'test', 'images')

    # Get list of test images
    test_images = [os.path.join(test_images_path, img) for img in os.listdir(test_images_path)
                  if img.endswith(('.jpg', '.jpeg', '.png'))]

    # Prepare lists to store ground truth and predictions
    y_true_classes = []
    y_pred_classes = []
    y_true_segments = []
    y_pred_segments = []

    # Get class names - properly parse YAML
    yaml_path = os.path.join(dataset_path, 'data.yaml')
    try:
        with open(yaml_path, 'r') as f:
            yaml_data = yaml.safe_load(f)
            class_names = yaml_data.get('names', {})
            if isinstance(class_names, dict):
                # Convert dict format to list format if needed
                max_key = max(int(k) for k in class_names.keys()) if class_names else -1
                class_names_list = ["" for _ in range(max_key + 1)]
                for k, v in class_names.items():
                    class_names_list[int(k)] = v
                class_names = class_names_list
    except Exception as e:
        print(f"Error loading class names from YAML: {e}")
        # Fallback to basic class names
        class_names = ["class_0", "class_1", "class_2"]  # Default names

    print(f"Found class names: {class_names}")

    # Process each test image
    for img_path in test_images:
        # Get corresponding label path
        label_path = img_path.replace('images', 'labels').replace(os.path.splitext(img_path)[1], '.txt')

        if not os.path.exists(label_path):
            print(f"Warning: No label file found for {img_path}")
            continue

        # Load image
        img = cv2.imread(img_path)
        if img is None:
            print(f"Warning: Could not load image {img_path}")
            continue

        img_height, img_width = img.shape[:2]

        # Get ground truth
        with open(label_path, 'r') as f:
            gt_lines = f.readlines()

        gt_classes = []
        gt_segments = []

        for line in gt_lines:
            data = line.strip().split()
            if not data:  # Skip empty lines
                continue

            try:
                class_id = int(data[0])
                gt_classes.append(class_id)

                # Extract normalized bounding box (for segmentation tasks, adapt as needed)
                if len(data) > 5:  # For segmentation, there will be more than 5 values
                    # Handle segmentation data - this is simplified
                    segment_coords = []
                    for i in range(1, len(data), 2):
                        if i+1 < len(data):
                            # Convert normalized coordinates to pixel coordinates
                            x = float(data[i]) * img_width
                            y = float(data[i+1]) * img_height
                            segment_coords.append((x, y))
                    gt_segments.append((class_id, segment_coords))
            except (ValueError, IndexError) as e:
                print(f"Warning: Error parsing line in {label_path}: {line}, Error: {e}")
                continue

        # Get predictions
        try:
            results = model(img_path, conf=conf_threshold, iou=iou_threshold, verbose=False)[0]

            pred_classes = []
            pred_segments = []

            if results.boxes is not None:
                for box in results.boxes:
                    pred_class = int(box.cls.item())
                    pred_classes.append(pred_class)

                    # For segmentation, get the mask if available
                    if hasattr(results, 'masks') and results.masks is not None:
                        # Extract mask data - simplified
                        try:
                            mask = results.masks[pred_classes.index(pred_class)].xy
                            pred_segments.append((pred_class, mask))
                        except (IndexError, AttributeError) as e:
                            print(f"Warning: Error extracting mask: {e}")

            # Add classes to global lists
            y_true_classes.extend(gt_classes)
            y_pred_classes.extend(pred_classes)

            # Add segments to global lists
            y_true_segments.extend(gt_segments)
            y_pred_segments.extend(pred_segments)

        except Exception as e:
            print(f"Error processing image {img_path}: {e}")
            continue

    # If we don't have any predictions or ground truth, return zero accuracy
    if not y_true_classes or not y_pred_classes:
        print("Warning: No valid predictions or ground truth data found")
        return 0.0, {"accuracy": 0.0, "confusion_matrix": None, "classification_report": None}

    # Calculate accuracy (class-level)
    # Approach 1: Match the number of predictions to ground truth
    # (If there's a different number of predictions vs ground truth)
    min_len = min(len(y_true_classes), len(y_pred_classes))
    if min_len == 0:
        print("Warning: No overlapping predictions and ground truth")
        return 0.0, {"accuracy": 0.0, "confusion_matrix": None, "classification_report": None}

    y_true_sample = y_true_classes[:min_len]
    y_pred_sample = y_pred_classes[:min_len]

    try:
        accuracy = accuracy_score(y_true_sample, y_pred_sample)
        conf_mat = confusion_matrix(y_true_sample, y_pred_sample,
                                   labels=list(range(len(class_names))))
        # Try to use class_names, but fallback to indices if there's an issue
        try:
            report = classification_report(y_true_sample, y_pred_sample,
                                         target_names=class_names)
        except:
            print("Warning: Could not use class_names for report, using indices instead")
            report = classification_report(y_true_sample, y_pred_sample)
    except Exception as e:
        print(f"Error calculating metrics: {e}")
        return 0.0, {"accuracy": 0.0, "confusion_matrix": None, "classification_report": "Error"}

    # Plot confusion matrix
    try:
        plt.figure(figsize=(10, 8))
        sns.heatmap(conf_mat, annot=True, fmt='d', cmap='Blues',
                   xticklabels=class_names[:len(conf_mat)],
                   yticklabels=class_names[:len(conf_mat)])
        plt.xlabel('Predicted')
        plt.ylabel('True')
        plt.title('Confusion Matrix')
        plt.savefig('confusion_matrix.png')
        plt.close()
    except Exception as e:
        print(f"Warning: Could not plot confusion matrix: {e}")

    detailed_metrics = {
        'accuracy': accuracy,
        'confusion_matrix': conf_mat,
        'classification_report': report
    }

    # Return the calculated accuracy and detailed metrics
    return accuracy, detailed_metrics

In [79]:
def validate_with_yolo(model_path, data_yaml_path):
    """
    Use YOLOv8's built-in validation metrics

    Args:
        model_path (str): Path to the trained YOLOv8 model (.pt file)
        data_yaml_path (str): Path to the data.yaml file

    Returns:
        dict: Validation metrics
    """
    try:
        model = YOLO(model_path)
        metrics = model.val(data=data_yaml_path)
        return metrics
    except Exception as e:
        print(f"Error during YOLO validation: {e}")
        return None

In [81]:
if __name__ == "__main__":
    model_path = "yolov11n_transfer_learning_resultsFinal/transfer_learning/weights/best.pt"
    dataset_path = dataset.location  # Update this with the actual path
    data_yaml_path = f"{dataset_path}/data.yaml"

    # Calculate accuracy using our function
    accuracy, metrics = calculate_model_accuracy(model_path, dataset_path)
    print(f"Model accuracy: {accuracy:.4f}")
    print("Classification Report:")
    print(metrics['classification_report'])

    # Alternatively, use YOLOv8's built-in validation
    print("\nRunning YOLOv8 validation...")
    yolo_metrics = validate_with_yolo(model_path, data_yaml_path)

    if yolo_metrics is not None:
        print(f"mAP50: {yolo_metrics.box.map50:.4f}")
        print(f"mAP50-95: {yolo_metrics.box.map:.4f}")

        # For segmentation task, use the mask metrics
        if hasattr(yolo_metrics, 'seg'):
            print(f"Segmentation mAP50: {yolo_metrics.seg.map50:.4f}")
            print(f"Segmentation mAP50-95: {yolo_metrics.seg.map:.4f}")
    else:
        print("YOLO validation failed")

Found class names: ['Image', 'text']
Model accuracy: 0.8784
Classification Report:
              precision    recall  f1-score   support

       Image       0.45      0.62      0.53         8
        text       0.95      0.91      0.93        66

    accuracy                           0.88        74
   macro avg       0.70      0.77      0.73        74
weighted avg       0.90      0.88      0.89        74


Running YOLOv8 validation...
Ultralytics 8.3.92  Python-3.12.4 torch-2.5.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3080 Ti Laptop GPU, 16384MiB)
YOLO11n summary (fused): 100 layers, 2,582,542 parameters, 0 gradients, 6.3 GFLOPs


[34m[1mval: [0mScanning C:\Users\Asus\PycharmProjects\TextExtraction\Layout_organistion-4\valid\labels.cache... 16 images, 0 backgrounds, 0 corrupt: 100%|██████████| 16/16 [00:00<?, ?it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:08<00:00,  8.10s/it]


                   all         16         62      0.626      0.764      0.813      0.521
                 Image          2          2       0.39          1      0.995      0.746
                  text         16         60      0.863      0.527      0.631      0.295
Speed: 2.3ms preprocess, 21.7ms inference, 0.0ms loss, 1.4ms postprocess per image
Results saved to [1mruns\detect\val9[0m
mAP50: 0.8128
mAP50-95: 0.5205


## Conclusion
 #### Even though the accuracy is quite good, there could be overfitting since the dataset provided was small. While the model has undergone transfer training on two custom datasets to make it more generalized, the risk of overfitting remains and could easily be addressed by increasing the dataset size. For practical applications in ancient text document processing, the current model provides excellent text segmentation capabilities but would benefit from both targeted improvements in image detection and an expanded training dataset to enhance generalization across diverse document layouts. Despite these limitations, the model's strong text detection capabilities make it immediately valuable for text extraction and analysis tasks in historical document processing workflows.