# MMOCR Training

This notebook contains all source code to train text detection and recognition models. You don't need to change anything except the path to datasets and config file modification. Please use GPU runtime.

## Setup for Training

In [None]:
!pip install torch==1.13.1+cu117 \
  torchvision==0.14.1+cu117 \
  --extra-index-url https://download.pytorch.org/whl/cu117
!pip install -U openmim
!mim install "mmengine>=0.7.1,<1.1.0"
!mim install "mmcv>=2.0.0rc4,<2.1.0"
!mim install "mmdet>=3.0.0rc5,<3.2.0"
!git clone https://github.com/open-mmlab/mmocr.git
!cd mmocr && pip install -v -e .
!apt install tree

## Dataset Preparation

First, we need to change the format of the Label Studio annotation to MMOCR annotation.

Load the dataset to local directory. **Change the script to load dataset that you get from from label-studio export based on your own need, but for convenience please put the dataset root at `/content/handwriting` directory**. The directory ideally structured like this (the filename may be different).

```text
./handwriting
├── label-studio-anno.json
├── test
│   └── IMG_2347.jpg
└── training
    ├── IMG_2346.jpg
    ├── IMG_2348.jpg
    ├── IMG_2349.jpg
    └── IMG_2350.jpg
```

In [None]:
!cp -r "/content/drive/MyDrive/Public/Dibimbing/25 - OCR/Assignment/handwriting" "./handwriting"
!tree "./handwriting"

Run the remaining cells of this section without modification.

In [None]:
import cv2
import json
import numpy as np
import os
import shutil
from pathlib import Path
from typing import Dict, List, Tuple

Functions for text detection dataset preparation.
The main function is `create_mmocr_det_anno`

In [None]:
def xywh2xyxy(xywh: List[float], img_width: int, img_height: int) -> List[int]:
    """
    Change bounding box format xywh normalized to xyxy
    """
    x, y, w, h = xywh
    x = x * img_width / 100
    y = y * img_height / 100
    w = w * img_width / 100
    h = h * img_height / 100
    return [
        int(x),
        int(y),
        int(x + w),
        int(y + h),
    ]

def xyxy2poly(xyxy: List[int]) -> List[int]:
    """
    Change bounding box format from xyxy to polygon
    format xyxyxy...
    """
    x1, y1, x2, y2 = xyxy
    return [
        x1, y1, x1, y2, x2, y2, x2, y1
    ]


def create_instance_mmocr_anno(
    label_ls: Dict,
    text: str,
    img_width: int,
    img_height: int,
) -> Dict:
    """
    Conver annotation of a text instance from label studio format
    to MMOCR format
    """
    bbox = xywh2xyxy(
        [
            label_ls["x"],
            label_ls["y"],
            label_ls["width"],
            label_ls["height"],
        ],
        img_width,
        img_height,
    )
    instance_anno = {}
    instance_anno["bbox"] = bbox
    instance_anno["bbox_label"] = 0
    instance_anno["polygon"] = xyxy2poly(bbox)
    instance_anno["text"] = text
    instance_anno["ignore"] = False
    return instance_anno

def create_image_mmocr_anno(image_name: str, image_ls: Dict) -> Dict:
    """
    Conver annotation of an image from label studio format
    to MMOCR format
    """
    img_width = image_ls["label"][0]["original_width"]
    img_height = image_ls["label"][0]["original_height"]
    image_anno = {}
    image_anno["img_path"] = image_name
    image_anno["height"] = img_height
    image_anno["width"] = img_width
    image_anno["instances"] = [
        create_instance_mmocr_anno(lbl, txt, img_width, img_height)
        for lbl, txt in zip(image_ls["label"], image_ls["transcription"])
    ]
    return image_anno

def create_metainfo_det() -> Dict:
    """
    Metainfo for MMOCR text detection dataset
    """
    return {
        "dataset_type": "TextDetDataset",
        "task_name": "textdet",
        "category": [{"id": 0, "name": "text"}],
    }

def create_output_json(
    annotations: List[Dict],
    metainfo: Dict,
    output_path: Path
) -> None:
    """
    Dump MMOCR annotation JSON
    """
    output = {
        "metainfo": metainfo,
        "data_list": annotations
    }
    with open(output_path, "w") as f:
        json.dump(output, f)

def get_image_name(ls_image_path: str) -> str:
    """
    Label studio will write the image file name in format of
    '{random_id}-{original_image_name}'. So we only want to
    get the original image name, since that is the name that
    we have.
    """
    name = os.path.basename(ls_image_path)
    name = name[(name.find("-") + 1):]
    return name

def create_mmocr_det_anno(
    ls_anno_path: Path,
    train_images_dir: Path,
    test_images_dir: Path,
    output_dir: Path,
):
    """
    Create text detection dataset in MMOCR format
    """
    train_images = [p for p in train_images_dir.glob("*")]
    test_images = [p for p in test_images_dir.glob("*")]
    with open(ls_anno_path, "r") as f:
        ls_anno = json.load(f)
    image_annos = {}
    for ann in ls_anno:
        img_name = get_image_name(ann["ocr"])
        image_annos[img_name] = create_image_mmocr_anno(img_name, ann)

    output_dir.mkdir(parents=True, exist_ok=True)
    for p in [*train_images, *test_images]:
      shutil.copy(p, output_dir / p.name)
    create_output_json(
        annotations=[image_annos[p.name] for p in train_images],
        metainfo=create_metainfo_det(),
        output_path=output_dir / "textdet_train.json"
    )
    create_output_json(
        annotations=[image_annos[p.name] for p in test_images],
        metainfo=create_metainfo_det(),
        output_path=output_dir / "textdet_test.json"
    )

Functions for text recognition dataset preparation. The main function is `create_mmocr_rec_anno`

In [None]:
def create_metainfo_rec() -> Dict:
    """
    Metainfo for MMOCR text recognition dataset
    """
    return {
        "dataset_type": "TextRecogDataset",
        "task_name": "textrecog",
    }

def crop_images(
    src_annos: Dict,
    image_src_dir: Path,
    image_dst_dir: Path,
) -> List[Dict]:
    """
    Crop text images and extract the text annotations
    """
    image_path = image_src_dir / src_annos["img_path"]
    image = cv2.imread(str(image_path))
    image_name = image_path.stem

    anns = []
    for i, src_txt_anno in enumerate(src_annos["instances"]):
        dst_image_file = f"{image_name}_{i:05}.jpg"
        x1, y1, x2, y2 = src_txt_anno["bbox"]
        crop = image[y1:y2, x1:x2]
        cv2.imwrite(str(image_dst_dir / dst_image_file), crop)

        instance = [{"text": src_txt_anno["text"]}]
        crop_ann = {
            "img_path": dst_image_file,
            "height": crop.shape[0],
            "width": crop.shape[1],
            "instances": instance
        }
        anns.append(crop_ann)
    return anns


def create_split_anno(
    det_anno_path: Path,
    det_images_dir: Path,
    output_dir: Path,
    json_name: str,
):
    """
    Create formatted text recognition dataset for
    a dataset split.
    """
    with open(det_anno_path, "r") as f:
        det_anno = json.load(f)
    new_data_list = []
    for src_anno in det_anno["data_list"]:
        new_data_list += crop_images(
            src_anno,
            det_images_dir,
            output_dir,
        )
    new_anno = {
        "metainfo": create_metainfo_rec(),
        "data_list": new_data_list,
    }
    with open(output_dir / json_name, "w") as f:
      json.dump(new_anno, f)

def create_mmocr_rec_anno(
    det_root_dir: Path,
    output_dir: Path,
):
    """
    Create text recognition dataset in MMOCR format
    """
    output_dir.mkdir(parents=True, exist_ok=True)
    create_split_anno(
        det_root_dir / "textdet_train.json",
        det_root_dir,
        output_dir,
        "textrecog_train.json"
    )
    create_split_anno(
        det_root_dir / "textdet_test.json",
        det_root_dir,
        output_dir,
        "textrecog_test.json"
    )

Do the actual format conversions.

In [None]:
# change to path to your label-studio annotation JSON
LABEL_STUDIO_ANN = Path("handwriting/label-studio-anno.json")
# change to path to your training images folder
TRAIN_IMGS = Path("handwriting/training")
# change to path to your test images folder
TEST_IMGS = Path("handwriting/test")
# formatted dataset for text detection will be saved in the directory below
OUTPUT_DET_DIR = Path("dataset-det")
# formatted dataset for text recognition will be saved in the directory below
OUTPUT_REC_DIR = Path("dataset-rec")

create_mmocr_det_anno(
    LABEL_STUDIO_ANN,
    TRAIN_IMGS,
    TEST_IMGS,
    OUTPUT_DET_DIR,
)
create_mmocr_rec_anno(
    OUTPUT_DET_DIR,
    OUTPUT_REC_DIR,
)

You should see two new folders `dataset-det` and `dataset-rec` now.

## Text Detection Training

I already provide the config and scripts that you can use. You only need to runn the cell one-by-one to start the training. Please upload `handwriting-dbnet-config.py` to this notebook directory.

### Config Details

However, in the next cell I put some expanations about the config file. The changes already implemented in the provided `.py` file that you have uploaded, so this is only for your additional reading.

We will be using the config file `/content/mmocr/configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py` as the main model config. Note that each parameters can be defined in another `.py` file, since MMOCR uses distributed configuration files. Check the `_base_` of the main config.

Change in the configuration:

- Root data (Use ICDAR2015 config) to `dataset-det`
- Num of iterations, try at least 50, be careful to not overfit
- Validation cycle, try around 10 iters
- TensorBoard visualizer

  ```
  vis_backends = [dict(type='LocalVisBackend'),
                  dict(type='TensorboardVisBackend')]
  ```

- Only save last checkpoint

  ```
      checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=1)
  ```

### Training

Run the following cell, check `/content/vis` and make sure that the visualization is correct. If not, there might be something wrong with your config.

In [None]:
!python /content/mmocr/tools/visualizations/browse_dataset.py \
  "/content/handwriting-dbnet-config.py" \
  -o "/content/vis" \
  -m original

In [None]:
%reload_ext tensorboard
%tensorboard --logdir "/content/work_dir_det"

In [None]:
!python "/content/mmocr/tools/train.py" \
  "/content/handwriting-dbnet-config.py" \
  --work-dir "/content/work_dir_det"

Optionally, save the results to your GDrive

In [None]:
!cp -r "/content/work_dir_det" "/content/drive/MyDrive/Public/Dibimbing/25 - OCR/Kunci Jawaban/dbnet_training"

## Text Recognition Training

I already provide the config and scripts that you can use. You only need to runn the cell one-by-one to start the training. Please upload `handwriting-svtr-config.py` to this notebook directory.

### Config Details

However, in the next cell I put some expanations about the config file. The changes already implemented in the provided `.py` file that you have uploaded, so this is only for your additional reading.

We will be using the config file `/content/mmocr/configs/textrecog/svtr/svtr-base_20e_st_mj.py` as the main model config. Note that each parameters can be defined in another `.py` file, since MMOCR uses distributed configuration files. Check the `_base_` of the main config.

Change in the configuration:

- Root data (Use ICDAR2015 config) to `dataset-rec`
- Num of iterations, try the default fist.
- TensorBoard visualizer

  ```
  vis_backends = [dict(type='LocalVisBackend'),
                  dict(type='TensorboardVisBackend')]
  ```

- Only save last checkpoint

  ```
      checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=1)
  ```

- Validation evaluator

  ```
  val_evaluator = dict(
      _delete_=True,
      type='Evaluator',
      metrics=[
          dict(
              type='WordMetric',
              mode=['exact', 'ignore_case', 'ignore_case_symbol']),
          dict(type='CharMetric')
      ])
  test_evaluator = val_evaluator
  ```

- Train/test dataset list

  ```
  train_list = [_base_.icdar2015_textrecog_train]
  test_list = [_base_.icdar2015_textrecog_test]
  ```

- Update pre-trained model

  ```
  load_from = "https://download.openmmlab.com/mmocr/textrecog/svtr/svtr-base_20e_st_mj/svtr-base_20e_st_mj-ea500101.pth"
  ```

- Change batch size to smaller value if you get CUDA OOM, e.g. 128

### Training

Run the following cell, check `/content/vis` and make sure that the visualization is correct. If not, there might be something wrong with your config.

In [None]:
!python /content/mmocr/tools/visualizations/browse_dataset.py \
  "/content/handwriting-svtr-config.py" \
  -o "/content/vis" \
  -m original

In [None]:
%reload_ext tensorboard
%tensorboard --logdir "/content/work_dir_rec"

In [None]:
!python "/content/mmocr/tools/train.py" \
  "/content/handwriting-svtr-config.py" \
  --work-dir "/content/work_dir_rec"

In [None]:
!cp -r "/content/work_dir_rec" "/content/drive/MyDrive/Public/Dibimbing/25 - OCR/Kunci Jawaban/svtr_training"