In [None]:
# Mount Google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [17]:
# Install Tesseract OCR, the Python wrapper Pytesseract, the Levenshtein, Pillow, OpenCV and Torch
!sudo apt install tesseract-ocr
!pip install pytesseract
!pip install python-Levenshtein
!pip install pillow
!pip install opencv-python
!pip install torch torchvision ultralytics

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
tesseract-ocr is already the newest version (4.1.1-2.1build1).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.
Collecting ultralytics
  Downloading ultralytics-8.2.98-py3-none-any.whl.metadata (39 kB)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.6-py3-none-any.whl.metadata (9.1 kB)
Downloading ultralytics-8.2.98-py3-none-any.whl (873 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m873.6/873.6 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading ultralytics_thop-2.0.6-py3-none-any.whl (26 kB)
Installing collected packages: ultralytics-thop, ultralytics
Successfully installed ultralytics-8.2.98 ultralytics-thop-2.0.6


In [195]:
# Import necessary libraries and packages
import os
import re
import shutil
import random
import pandas as pd
import numpy as np

import albumentations as A
import torch
import cv2
import matplotlib.pyplot as plt
import pytesseract
import Levenshtein

from google.colab.patches import cv2_imshow
from PIL import Image
from ultralytics import YOLO
from albumentations.pytorch import ToTensorV2

%matplotlib inline

# Import the invoice scanner script from Google Drive
!cp '/content/drive/My Drive/YOLO_OCR_InvoiceScanner/invoice_scanner.py' .

from invoice_scanner import process_invoice

# Check if GPU is available for YOLO
print('GPU available: ', torch.cuda.is_available())

GPU available:  True


In [25]:
# Verify the CUDA installation
if torch.cuda.is_available():
  print('CUDA is available!')

  # Verify CUDA version
  !nvcc --version
else:
  print('CUDA is not available.')

CUDA is available!
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


## YOLOv5 and Pretrained Weights

**YOLOv5** (You Only Look Once version 5) is a state-of-the-art object detection model widely used for its efficiency and high performance. It offers real-time object detection with remarkable speed and accuracy, making it suitable for a wide range of computer vision tasks, including tasks like invoice field detection and OCR-related processes.

YOLOv5 is known for its ability to balance detection speed and accuracy, offering multiple model sizes (small, medium, large, and extra-large) that can be chosen based on resource availability and task requirements. The **`yolov5s.pt`** file represents the small version of YOLOv5, pre-trained on the COCO dataset, which contains 80 different object categories. This pre-trained model serves as a strong starting point and can be fine-tuned on custom datasets, allowing for the detection of specific objects in various domains.

Using pretrained weights like `yolov5s.pt` offers a significant advantage by leveraging the knowledge learned from the extensive COCO dataset. This saves computational resources and time during training while improving the overall model performance on custom datasets. Additionally, fine-tuning a pre-trained model typically requires fewer epochs and smaller datasets compared to training a model from scratch.

In object detection tasks where precise detection is required within real-world scenarios, YOLOv5 is an ideal solution due to its real-time performance and high accuracy.


In [26]:
# Download the YOLO v5 pre-trained weights
!wget https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt -O /content/drive/My\ Drive/YOLO_OCR_InvoiceScanner/yolov5s.pt

--2024-09-22 13:49:32--  https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/eab38592-7168-4731-bdff-ad5ede2002be?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20240922%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240922T134932Z&X-Amz-Expires=300&X-Amz-Signature=3e53380a82a667209026bcef61d434df3aa5ac8314c557eb0db222e97ff0c220&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dyolov5s.pt&response-content-type=application%2Foctet-stream [following]
--2024-09-22 13:49:32--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/eab38592-7168-4731-bdff-ad5ede2002be?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=rele

In [191]:
# Fine-tune YOLOv5 model on my custom dataset
!yolo task=detect mode=train model=yolov5s.pt data='/content/drive/My Drive/YOLO_OCR_InvoiceScanner/data.yaml' epochs=600 imgsz=640 patience=50

PRO TIP 💡 Replace 'model=yolov5s.pt' with new 'model=yolov5su.pt'.
YOLOv5 'u' models are trained with https://github.com/ultralytics/ultralytics and feature improved performance vs standard YOLOv5 models trained with https://github.com/ultralytics/yolov5.

Ultralytics YOLOv8.2.98 🚀 Python-3.10.12 torch-2.4.1+cu121 CUDA:0 (Tesla T4, 15102MiB)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolov5s.pt, data=/content/drive/My Drive/YOLO_OCR_InvoiceScanner/data.yaml, epochs=600, time=None, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train10, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300

# YOLOv5 Invoice OCR - Results Analysis

## Overview

The YOLOv5 model was trained to detect and extract three key fields from invoice images: `invoice_number`, `total_amount`, and `billing_date`. Training was completed in 532 epochs, and the model achieved excellent results in both precision and recall metrics. Below is a detailed breakdown of the training and evaluation results.

## Training Details
- **Model**: YOLOv5s
- **Total Epochs**: 532 (stopped early due to convergence)
- **Batch Size**: 16
- **Image Size**: 640x640
- **Classes Detected**: 3 (invoice_number, total_amount, billing_date)

## Evaluation Metrics

| Metric            | All Classes  | Invoice Number | Total Amount  | Billing Date  |
|-------------------|--------------|----------------|---------------|---------------|
| **Precision (P)**  | 0.991        | 0.982          | 1.0           | 0.99          |
| **Recall (R)**     | 0.996        | 1.0            | 0.988         | 1.0           |
| **mAP@50**         | 0.995        | 0.995          | 0.995         | 0.995         |
| **mAP@50-95**      | 0.95         | 0.946          | 0.953         | 0.95          |

### Key Metrics Definitions:
- **Precision (P)**: Measures how many of the model’s predicted objects are correctly identified.
- **Recall (R)**: Measures how many of the actual objects were correctly predicted by the model.
- **mAP@50**: Mean Average Precision at an IoU (Intersection over Union) threshold of 50%. This is a commonly used metric in object detection tasks to measure the accuracy of the model.
- **mAP@50-95**: Mean Average Precision averaged over different IoU thresholds from 50% to 95%, providing a more holistic view of the model’s performance.

## Key Observations
1. **High Precision & Recall**:
   - The model has extremely high precision (0.991) and recall (0.996) across all classes, indicating that it is not only correctly detecting the objects of interest but also capturing nearly all instances of the target fields in the invoices.
   
2. **Class-Specific Performance**:
   - **Invoice Number**: The model performs exceptionally well with a precision of 0.982 and a recall of 1.0. This means it detects all instances of invoice numbers while maintaining a high accuracy rate.
   - **Total Amount**: The model shows perfect precision (1.0) and near-perfect recall (0.988), demonstrating that it can accurately identify the total amount on invoices without any false positives.
   - **Billing Date**: Similarly, the model achieves a precision of 0.99 and a recall of 1.0, indicating excellent performance in detecting billing dates.

3. **mAP Metrics**:
   - The model achieves a **mAP@50 of 0.995**, which is an outstanding score. This suggests that the bounding boxes drawn by the model on the detected fields are highly accurate.
   - The **mAP@50-95** is 0.95, showing that even at stricter IoU thresholds, the model retains its accuracy.

The YOLOv5 model trained for invoice OCR has achieved near-perfect precision and recall scores, demonstrating its capability to accurately detect and extract invoice fields such as `invoice_number`, `total_amount`, and `billing_date`. With an mAP@50 of 0.995 and mAP@50-95 of 0.95, this model is highly reliable for invoice field extraction in practical scenarios.


