## Introduction
Optical Character Recognition (OCR) is a critical component in Computer Vision and Artificial Intelligence. Its applications span a variety of sectors, including document digitization, automated data entry, and license plate recognition. Unlike typical image processing tasks, OCR identifies, extracts, and digitizes written or printed characters from images or documents. This technology bridges the gap between the physical text world and the digital realm, allowing computers to understand and utilize written text within images.

In this guide, we are going to show you how to run training of a powerful OCR model Paddle-OCR. You can read more about PaddleOCR in their official repo [here](https://github.com/PaddlePaddle/PaddleOCR).

## Prerequisite

If you are running this notebook in Google Colab, before starting to run the code, remember to choose GPU in Runtime:

`(Runtime --> Change Runtime Type --> Hardware accelerator --> GPU)`

We can validate this by running the `nvidia-smi` command, which will show helpful info like GPU driver/maximum CUDA version, GPUs' power and memory usage and current processes, etc.

To get your current CUDA version, you can use `nvcc --version`

In [None]:
!nvidia-smi

In [None]:
!nvcc --version

## Dependencies

First, we must install the PaddlePaddle framework to run the authors' training procedure. You can find the official installation instructions [here](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html).

There are pre-built pip packages for the following CUDA versions:

- CUDA toolkit 10.2
- CUDA toolkit 11.2
- CUDA toolkit 11.6
- CUDA toolkit 11.7

If any matches your setup, you can install it using pip. In our case, we have CUDA 11.7 installed.

In [None]:
# Uncomment line that matches your setup

# CUDA 10.2
# !python3 -m pip install paddlepaddle-gpu==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple

# CUDA 11.2
# !python3 -m pip install paddlepaddle-gpu==2.4.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

# CUDA 11.6
# !python3 -m pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

# CUDA 11.7
!python3 -m pip install paddlepaddle-gpu==2.5.2.post117 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html


In another case, you need to build in from the source. Please, refer to [the installation guide](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html). Also, you can try to use the latter versions of the Paddle framework, but we didn't test them.

Once `paddle` package is installed, we can clone the PaddleOCR repository and install all necessary dependencies.

In [None]:
!git clone https://github.com/PaddlePaddle/PaddleOCR.git
!cd PaddleOCR && pip install -r requirements.txt
!pip install protobuf~=3.20 gdown

## Training

The PaddleOCR pipeline consists of several stages. There are several versions of models available. In our case, we used detection version 3 and recognition version 1.

### Data Preparation

#### Detection

The detection model was trained using the ICDAR2015 dataset; we've prepared it and stored it in a necessary format; it can be found [here](https://drive.google.com/file/d/1YvNp1HAfUGI5ao17zzuPR5ai2R14NrNf).

#### Recognition
The model training config stored in the PaddleOCR repository suggests fine-tuning the recognition model using ICDAR2015 too; a subset prepared for recognition training can be found [here](https://drive.google.com/file/d/1YvNp1HAfUGI5ao17zzuPR5ai2R14NrNf).


In [None]:
%cd PaddleOCR
!gdown 1YvNp1HAfUGI5ao17zzuPR5ai2R14NrNf && unzip -q detection_recognition_train_data.zip

### Detection Training

Detection version 3 requires a multi-step teaching procedure. First of all, we need to train a teacher model. Let's begin with downloading the pre-trained weights.

In [None]:
!mkdir ./pretrain_models
# Download the pretrained model of ResNet50_vd and
!wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams

# Download the pre-trained model of MobileNetV3
!wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams

Now we can train the teacher model.

In [None]:
# Single GPU training
# You can pass Global.eval_batch_step to control the evaluation frequency
!python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
    -o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
       Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
       Global.save_model_dir=./output/detection_teacher \
       Global.print_batch_step=20 \
       Train.loader.batch_size_per_card=2 # You can adjust the batch size based on your GPU VRAM

After the training is finished, we need to extract the student parameters, and we can train the lightweight student model.

In [None]:
import paddle
# load pretrained model
all_params = paddle.load("output/detection_teacher/best_accuracy.pdparams")
# View the keys of the weight parameter
# print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
# print(s_params.keys())
# save
paddle.save(s_params, "./pretrain_models/dml_teacher.pdparams")

In [None]:
# Single card training
# You can pass Global.eval_batch_step to control the evaluation frequency
!python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
    -o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
       Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
       Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
       Global.save_model_dir=./output/detection_lightweight \
       Train.loader.batch_size_per_card=2

The model saved during training is in the output directory. You can evaluate the model.

In [None]:
!python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.checkpoints=./output/detection_lightweight/best_accuracy

best_accuracy checkpoint contains three model parameters, corresponding to Student, Student2, and Teacher, in the configuration file. The method to extract the Student parameter is as follows:

In [None]:
import paddle
# load pretrained model
all_params = paddle.load("output/detection_lightweight/best_accuracy.pdparams")
# View the keys of the weight parameter
# print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
# print(s_params.keys())
# save
paddle.save(s_params, "./pretrain_models/detection_v3_cml.pdparams")

### Recognition Training

Recognition version 1 can be trained in a single-step procedure.

Note that the difference between Chinese and English models is in training data only, so we correct some paths in config.

In [None]:
!python3 tools/train.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml \
                        -o Global.save_model_dir=./output/rec_en_v1 \
                           Global.infer_image=./doc/imgs_words/en/word_1.jpg \
                           Global.character_dict_path=./ppocr/utils/en_dict.txt \
                           Global.save_res_path=./output/rec_en_v1/predicts_en_common_v2.0.txt \
                           Train.dataset.data_dir=./train_data/ic15_data/ \
                           Train.dataset.label_file_list=./train_data/ic15_data/rec_gt_train.txt \
                           Eval.dataset.data_dir=./train_data/ic15_data/ \
                           Eval.dataset.label_file_list=./train_data/ic15_data/rec_gt_test.txt

## Conclusion
Once your models are trained, you can follow the instructions from our [inference](https://colab.research.google.com/drive/1Rru8y0CKrBo1_Wj6RV0hqyrur-eX91vy?authuser=0#scrollTo=YHrilj060QYr) notebook to convert them into ONNX format and run testing. Make sure to follow [the instructions](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/inference_en.md#1-convert-training-model-to-inference-model) to save inference state checkpoint, which is required for testing.