


# PaddlePaddle OCR Training Done By Mohsin Ali Mirza k200353

In this Notebook we'll fine-tune a text detection model - using the PaddleOCR framework from the PaddlePaddle ecosystem, developed by Baidu.

To use PaddleOCR you need to install `paddlepaddle` as well. Since we have a GPU we'll install the GPU version: `paddlepaddle-gpu`.



## 1. Installing Libraries and Dependencies

In [None]:
%%shell
git clone https://github.com/PaddlePaddle/PaddleOCR
pip install -qqq paddlepaddle-gpu pyclipper attrdict
cd PaddleOCR
pip install -r requirements.txt

fatal: destination path 'PaddleOCR' already exists and is not an empty directory.




## 2. Install `wandb` and log in

If you don't yet have a Weights and Biases account you can create one here: https://wandb.ai/login?signup=true
wandb is a Logging/Evaluation Metric Visualization Tool

In [None]:
!pip install -qqq wandb

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m31.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.6/190.6 kB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m252.8/252.8 kB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import wandb
wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


True

## 3. Downloading and Extracting the Dataset

In [None]:
%cd PaddleOCR/

[Errno 2] No such file or directory: 'PaddleOCR/'
/content/PaddleOCR


Now, PaddleOCR expects the data to be in the following directories, per the [documentation](https://github-com.translate.goog/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/detection.md?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp):
```
/PaddleOCR/train_data/icdar2015/text_localization/
  └─ icdar_c4_train_imgs/         icdar数据集的训练数据
  └─ ch4_test_images/             icdar数据集的测试数据
  └─ train_icdar2015_label.txt    icdar数据集的训练标注
  └─ test_icdar2015_label.txt     icdar数据集的测试标注

In [None]:
import os
import zipfile

# Create necessary directories
label_directory = "./train_data/icdar2015/text_localization/"
os.makedirs(label_directory, exist_ok=True)

# Specify the URLs for the label files
train_label_url = "https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt"
test_label_url = "https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt"

# Specify where to save the downloaded label files
train_label_path = label_directory + "train_icdar2015_label.txt"
test_label_path = label_directory + "test_icdar2015_label.txt"

# Download train_icdar2015_label.txt
!wget $train_label_url -O $train_label_path

# Download test_icdar2015_label.txt
!wget $test_label_url -O $test_label_path

print("ICDAR 2015 label files downloaded successfully.")

--2023-12-09 16:08:34--  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 103.235.46.61, 2409:8c04:1001:1002:0:ff:b001:368a
Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1063118 (1.0M) [text/plain]
Saving to: ‘./train_data/icdar2015/text_localization/train_icdar2015_label.txt’


2023-12-09 16:08:35 (1.69 MB/s) - ‘./train_data/icdar2015/text_localization/train_icdar2015_label.txt’ saved [1063118/1063118]

--2023-12-09 16:08:35--  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 103.235.46.61, 2409:8c04:1001:1002:0:ff:b001:368a
Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 468453 (457K) [text/plain]
Saving to: ‘./train_data

In [None]:
import os
import zipfile

# Create necessary directories
label_directory = "./train_data/icdar2015/text_localization/"
os.makedirs(label_directory, exist_ok=True)

# Specify where the zip files are located
training_images_zip_path = "./ch4_training_images.zip"
test_images_zip_path = "./ch4_test_images.zip"

# Specify where to unzip the image files
training_images_path = os.path.join(label_directory, "icdar_c4_train_imgs")
test_images_path = os.path.join(label_directory, "icdar_c4_test_imgs")

# Unzip ch4_training_images.zip
with zipfile.ZipFile(training_images_zip_path, 'r') as zip_ref:
    zip_ref.extractall(training_images_path)

# Unzip ch4_test_images.zip
with zipfile.ZipFile(test_images_zip_path, 'r') as zip_ref:
    zip_ref.extractall(test_images_path)

In [None]:
import os
from google.colab import drive
import zipfile

# Mount Google Drive
drive.mount('/content/drive')

# Set the directory in Google Drive where your files are located
drive_directory = '/content/drive/MyDrive/AML_Project/'

# Create necessary directories
label_directory = "./train_data/icdar2015/text_localization/"
os.makedirs(label_directory, exist_ok=True)

# Specify where to unzip the image files
training_images_path = os.path.join(label_directory, "icdar_c4_train_imgs")
test_images_path = os.path.join(label_directory, "ch4_test_images")

# Specify the file paths in Google Drive
training_images_zip_path = os.path.join(drive_directory, "ch4_training_images.zip")
test_images_zip_path = os.path.join(drive_directory, "ch4_test_images.zip")

# Unzip ch4_training_images.zip
with zipfile.ZipFile(training_images_zip_path, 'r') as zip_ref:
    zip_ref.extractall(training_images_path)

# Unzip ch4_test_images.zip
with zipfile.ZipFile(test_images_zip_path, 'r') as zip_ref:
    zip_ref.extractall(test_images_path)

print("Image files extracted successfully.")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Image files extracted successfully.


## 4. Download pretrained models

Note that since ResNet50 is so performant (and because inferencing speed isn't a huge constraint of ours) we'll choose to download the pretrained weights for the ReNet50 model: https://github.com/PaddlePaddle/Paddleclas/tree/dygraph-dev#resnet-and-vd-series

If you're looking for a more lightweight model feel free to go with the MobileNet model series: https://github.com/PaddlePaddle/Paddleclas/tree/dygraph-dev#mobile-series


In [None]:
# Download the corresponding pre-training model according to different backbone options

# Download the pre-trained model of MobileNetV3
!wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
# # Download the pre-trained model of ResNet18_vd
# !wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams
# # Download the pre-trained model of ResNet50_vd
# !wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams


--2023-12-09 16:19:46--  https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
Resolving paddle-imagenet-models-name.bj.bcebos.com (paddle-imagenet-models-name.bj.bcebos.com)... 103.235.46.61, 2409:8c04:1001:1002:0:ff:b001:368a
Connecting to paddle-imagenet-models-name.bj.bcebos.com (paddle-imagenet-models-name.bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16255295 (16M) [application/octet-stream]
Saving to: ‘./pretrain_models/MobileNetV3_large_x0_5_pretrained.pdparams’


2023-12-09 16:19:53 (2.74 MB/s) - ‘./pretrain_models/MobileNetV3_large_x0_5_pretrained.pdparams’ saved [16255295/16255295]



## 5. Verify your hyperparameters: edit your `config` file

For the MobileNetv3 model the configuration file with its hyperparameters are located in the `configs/det/det_mv3_db.yml` file. Note that each model has a config file that 'matches' with its name, so if you are using a model other than MobileNetv3 make sure to use the correct YAML file with your model.

In [None]:
!head /content/PaddleOCR/configs/det/det_mv3_db.yml

Global:
  use_gpu: true
  use_xpu: false
  epoch_num: 1200
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/db_mv3/
  save_epoch_step: 1200
  # evaluation is run every 2000 iterations
  eval_batch_step: [0, 2000]


In [None]:
import yaml

with open("configs/det/det_mv3_db.yml", "r") as f:
    config = yaml.safe_load(f)
config.update({
    'wandb': {
        'project': 'OCR_with_Paddle'
    }
})
config['Global'].update({
    'epoch_num': 5,
    'cal_metric_during_train': True
    })

with open("configs/det/det_mv3_db.yml", "w") as f:
    yaml.safe_dump(config, f)

## 6. Beginning the fine-tuning process

In [None]:
# Fine-tune the Mobile Net V3 model
!python3 /content/PaddleOCR/tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained

[34m[1mwandb[0m: Currently logged in as: [33mk200353-fast[0m ([33mfast-k203053[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m./output/db_mv3/wandb/run-20231209_162439-gav9ely8[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mMyOCRModel[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/fast-k203053/CoolOCR[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/fast-k203053/CoolOCR/runs/gav9ely8[0m
[2023/12/09 16:24:40] ppocr INFO: Architecture : 
[2023/12/09 16:24:40] ppocr INFO:     Backbone : 
[2023/12/09 16:24:40] ppocr INFO:         model_name : large
[2023/12/09 16:24:40] ppocr INFO:         name : MobileNetV3
[2023/12/09 16:24:40] ppocr INFO:         scale : 0.5
[2023/12/09 16:24:40] ppocr INFO:     Head : 
[2023/12/09 16:24:40] ppocr INFO:         k : 50
[20

## 7. Download The Best Model From `Wandb`

In [None]:
import wandb
artifact = wandb.Api().artifact('fast-k203053/CoolOCR/model-gav9ely8:best', type='model')
artifact_dir = artifact.download()

[34m[1mwandb[0m:   1 of 1 files downloaded.  


In [None]:
artifact_dir

'/content/PaddleOCR/artifacts/model-gav9ely8:v2'

In [None]:
!python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="./artifacts/model-gav9ely8:v2/model_ckpt" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5

[34m[1mwandb[0m: Currently logged in as: [33mk200353-fast[0m ([33mfast-k203053[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m./output/db_mv3/wandb/run-20231209_164706-y7yb8lr9[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mMyOCRModel[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/fast-k203053/CoolOCR[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/fast-k203053/CoolOCR/runs/y7yb8lr9[0m
[2023/12/09 16:47:07] ppocr INFO: Architecture : 
[2023/12/09 16:47:07] ppocr INFO:     Backbone : 
[2023/12/09 16:47:07] ppocr INFO:         model_name : large
[2023/12/09 16:47:07] ppocr INFO:         name : MobileNetV3
[2023/12/09 16:47:07] ppocr INFO:         scale : 0.5
[2023/12/09 16:47:07] ppocr INFO:     Head : 
[2023/12/09 16:47:07] ppocr INFO:         k : 50
[20

## 8. Inference Testing

In [None]:
!python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./artifacts/model-gav9ely8:v2/model_ckpt"

[34m[1mwandb[0m: Currently logged in as: [33mk200353-fast[0m ([33mfast-k203053[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m./output/db_mv3/wandb/run-20231209_164846-t59m1by8[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mMyOCRModel[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/fast-k203053/CoolOCR[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/fast-k203053/CoolOCR/runs/t59m1by8[0m
[2023/12/09 16:48:47] ppocr INFO: Architecture : 
[2023/12/09 16:48:47] ppocr INFO:     Backbone : 
[2023/12/09 16:48:47] ppocr INFO:         model_name : large
[2023/12/09 16:48:47] ppocr INFO:         name : MobileNetV3
[2023/12/09 16:48:47] ppocr INFO:         scale : 0.5
[2023/12/09 16:48:47] ppocr INFO:     Head : 
[2023/12/09 16:48:47] ppocr INFO:         k : 50
[20

## 9. Save The Inference Results On `Wandb`

In [None]:
wandb.init(project="CoolOCR")
wandb.use_artifact('manan-goel/CoolOCR/model-gav9ely8:best')

[34m[1mwandb[0m: Currently logged in as: [33mk200353-fast[0m ([33mfast-k203053[0m). Use [1m`wandb login --relogin`[0m to force relogin


CommError: ignored

In [None]:
table = wandb.Table(columns=["Input Image", "Annotated Image"])

In [None]:
import glob
inp_imgs = sorted(glob.glob("./doc/imgs_en/*.jpg"), key=lambda x: x.split("/")[-1])
out_imgs = sorted(glob.glob("./output/det_db/det_results/*.jpg"), key=lambda x: x.split("/")[-1])

In [None]:
for inp in inp_imgs:
    for out in out_imgs:
        if out.split("/")[-1] != inp.split("/")[-1]:
            continue
        table.add_data(
            wandb.Image(inp),
            wandb.Image(out)
        )

In [None]:
wandb.log({
    "Predictions": table
})

In [None]:
wandb.finish()

VBox(children=(Label(value='2.698 MB of 2.698 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))