## Dataset prep, Fine-tune and model export

This notebook demonstrates how to **prepare a dataset**, **fine-tune** a license plate OCR model using `fast-plate-ocr` and **export** the trained model for deployment using the `fast-plate-ocr` ecosystem (**ONNX**) or to other formats like **TFLite** and **CoreML**.

### Setup

Let's install `fast-plate-ocr` with `train` and `onnx` extras, as well as `tensorflow` backend for training (**JAX** and **PyTorch** can also be used too for **training**).

In [None]:
!pip install fast-plate-ocr[train,onnx]
!pip install tensorflow[and-cuda]

In [None]:
# Disable TF disable all debugging logs
%env TF_CPP_MIN_LOG_LEVEL=3
# Use TensorFlow as Keras backned (JAX and PyTorch are also supported!)
%env KERAS_BACKEND=tensorflow

Let's also **download the dataset** we will use for **fine-tuning**. This dataset corresponds to **Colombian** vehicle plates. All the credits of this dataset goes to https://github.com/jdbravo, taken from https://gitlab.com/jdbravo/plates-ocr-train.

*Note: This dataset wasn't used to trained the pre-trained models, nor did we use data from Colombia during pre-training the official `fast-plate-ocr` models*

The downloaded dataset **already** contains the **format expected** by `fast-plate-ocr`, for details see the [docs](https://ankandrew.github.io/fast-plate-ocr/1.0/training/dataset/) for **creating your own dataset**.

In [None]:
!wget -q https://github.com/ankandrew/fast-plate-ocr/releases/download/arg-plates/colombia_dataset_example.zip && unzip -q colombia_dataset_example.zip -d colombia_dataset

Next, **download** the **keras model** that we will **fine-tune**. Keep in mind, we provide all the `.keras` counterpart to the `.onnx` models used for inference. This way, we can easily **fine-tune** any of the models supported by the lib. For this example, we will **fine-tune** `cct_xs_v1_global` model.

TIP: All the models can be found in this [**release**](https://github.com/ankandrew/fast-plate-ocr/releases/tag/arg-plates).

In [None]:
# Download the .keras model used for fine-tuning
!wget -q https://github.com/ankandrew/fast-plate-ocr/releases/download/arg-plates/cct_xs_v1_global.keras
# Download the plate config. This defines how license plate images and text should be preprocessed for OCR
# Although you can modify this to suit your needs, since we will be fine-tuning, we will use the exact same
# config that was used originally for training cct_xs_v1_global model
!wget -q https://github.com/ankandrew/fast-plate-ocr/releases/download/arg-plates/cct_xs_v1_global_plate_config.yaml
# Download also the model config, since we will fine-tuning above model, we will download the same config that
# was used to build and train originally the cct_xs_v1_global model
!wget -q https://github.com/ankandrew/fast-plate-ocr/releases/download/arg-plates/cct_xs_v1_global_model_config.yaml

### Inspecting the Dataset

Below we will use **scripts** that ship with `fast-plate-ocr` lib, which are immediately **available** after **installation**.

Let's first validate the dataset to see if anything is wrong with it. See more in [docs](https://ankandrew.github.io/fast-plate-ocr/1.0/training/cli/validate_dataset).

In [None]:
# Check the train annotations
!fast-plate-ocr validate-dataset \
  --annotations-file ./colombia_dataset/train_annotations.csv \
  --plate-config-file cct_xs_v1_global_plate_config.yaml

In [None]:
# Check the validation annotations
!fast-plate-ocr validate-dataset \
  --annotations-file ./colombia_dataset/valid_annotations.csv \
  --plate-config-file cct_xs_v1_global_plate_config.yaml

If you see errors when **validating** the dataset, you can use the `--export-fixed annotations_fixed.csv`. This creates annotation `.csv` with only valid entries, skipping corrupted rows and malformed labels.

Next let's do a **sanity check** and show statistics about the data that will be used for **training**/**validation**.

In [None]:
!fast-plate-ocr dataset-stats \
  --annotations ./colombia_dataset/train_annotations.csv \
  --plate-config-file cct_xs_v1_global_plate_config.yaml

### Prepare for Training

Now that we have **validated** and **visualized stats** about the dataset, we can start **preparing for training**.

Now, we will visualize what the model will **actually see** when training. This is a very important step in the workflow, and it will be crucial to the model ability to generalize and work well.

In [None]:
# We invoke it this way so we can visualize it properly in this notebook
%matplotlib inline
%run -m fast_plate_ocr.cli.visualize_augmentation -- \
      --img-dir ./colombia_dataset/train \
      --columns 2 \
      --rows 4 \
      --show-original \
      --num-images 50 \
      --plate-config-file cct_xs_v1_global_plate_config.yaml

A **default** data augmentation is applied, but that doesn't mean you can't customize and use your own **augmentation pipeline**. We use [albumentations](https://albumentations.ai), so you can any augmentation available in that lib (there are a lot!).

Let's **write** a new **augmentation pipeline**, which will be later used when **training** the model.

In [None]:
import albumentations as A
import cv2

A.save(
    A.Compose(
        [
            A.Affine(
                translate_percent=(-0.02, 0.02),
                scale=(0.75, 1.10),
                rotate=(-15, 15),
                border_mode=cv2.BORDER_CONSTANT,
                fill=(0, 0, 0),
                shear=(0.0, 0.0),
                p=0.75,
            ),
            A.RandomBrightnessContrast(brightness_limit=0.10, contrast_limit=0.10, p=0.5),
            A.OneOf(
                [
                    A.HueSaturationValue(
                        hue_shift_limit=5, sat_shift_limit=10, val_shift_limit=10, p=0.7
                    ),
                    A.RGBShift(r_shift_limit=10, g_shift_limit=10, b_shift_limit=10, p=0.3),
                ],
                p=0.3,
            ),
            A.RandomGamma(gamma_limit=(95, 105), p=0.20),
            A.ToGray(p=0.05),
            A.OneOf(
                [
                    A.GaussianBlur(sigma_limit=(0.2, 0.5), p=0.5),
                    A.MotionBlur(blur_limit=(3, 3), p=0.5),
                ],
                p=0.2,
            ),
            A.OneOf(
                [
                    A.GaussNoise(std_range=(0.01, 0.03), p=0.2),
                    A.MultiplicativeNoise(multiplier=(0.98, 1.02), p=0.1),
                    A.ISONoise(intensity=(0.005, 0.02), p=0.1),
                    A.ImageCompression(quality_range=(55, 90), p=0.1),
                ],
                p=0.3,
            ),
            A.OneOf(
                [
                    A.CoarseDropout(
                        num_holes_range=(1, 14),
                        hole_height_range=(1, 5),
                        hole_width_range=(1, 5),
                        p=0.2,
                    ),
                    A.PixelDropout(dropout_prob=0.02, p=0.3),
                    A.GridDropout(ratio=0.3, fill="random", p=0.3),
                ],
                p=0.5,
            ),
        ]
    ),
    filepath_or_buffer="custom_augmentation.yaml",
    data_format="yaml",
)


Feel free to **explore** and play with different **augmentations** even for better results!

The best way to validate the augmentation pipeline is to actually **visualize** the **results** applied to our training images. For that you can try the newly created pipeline in the `visualize_augmentation` script (used above) with the `--augmentation-path` pointing to the newly created **augmentation pipeline**.

Great, now we have our `custom_augmentation.yaml` that we can later use with the **training** script 🚀.

### Training the Model

Great, now we are **ready** to **fine-tune** the **model** on the Colombian plates dataset!

Before running the train script, let's see how the **pre-trained** model performs on the Colombian dataset, so we have a baseline to compare with, after we fine-tune it. Note that **not a single** Colombian **plate** was used **to train** originally the **pre-trained** model!

In [None]:
!fast-plate-ocr valid \
  --model ./cct_xs_v1_global.keras \
  --plate-config-file ./cct_xs_v1_global_plate_config.yaml \
  --annotations ./colombia_dataset/valid_annotations.csv

Not bad! We can see a `plate_acc: 0.8881`, which means that roughly an **88.8%** of plates were **correctly** classified.

Keep in mind `plate_acc` is a **strict** metric, it computes the number of license plates that were fully classified. For a single plate, if the ground truth is ABC123 and the prediction is also ABC123, it would **score 1**. However, if the prediction was ABD123, it would **score 0**, as **not all characters** were correctly classified.

See more in [Metrics](https://ankandrew.github.io/fast-plate-ocr/1.0/training/metrics/) for full details.

Now, let's **improve** that number by **fine-tuning** the **model**!

In [None]:
!fast-plate-ocr train \
  --model-config-file ./cct_xs_v1_global_model_config.yaml \
  --plate-config-file ./cct_xs_v1_global_plate_config.yaml \
  --annotations ./colombia_dataset/train_annotations.csv \
  --val-annotations ./colombia_dataset/valid_annotations.csv \
  --augmentation-path custom_augmentation.yaml \
  --epochs 30 \
  --batch-size 32 \
  --output-dir trained_models/ \
  --weights-path cct_xs_v1_global.keras \
  --label-smoothing 0.0 \
  --weight-decay 0.0005 \
  --lr 0.0005

We now see a val `plate_acc` of `0.97917`, meaning almost **98%** of the plates from the validation set were **correctly classified**! That's roughly **10+%** **comparing** with the **baseline** 🎉.

*Note: you might sligthly different results depending on your run, but it should match more or less these numbers.*

### Export the model

Now that we've trained the model, we can **export** it to **ONNX** to use it within `fast-plate-ocr` ecosystem or export it to **other formats** (i.e. **TFLite**, **CoreML**, etc.).

In [None]:
best_keras_model = "/content/trained_models/2025-07-06_15-29-13/ckpt-epoch_25-acc_0.981.keras"   # <--- Make sure to change this, yours will be different
exported_onnx = best_keras_model.replace(".keras", ".onnx")
!fast-plate-ocr export \
  --format onnx \
  --plate-config-file ./cct_xs_v1_global_plate_config.yaml \
  --simplify \
  --model {best_keras_model}

For exporting the newly trained model to other formats, checkout the [docs](https://ankandrew.github.io/fast-plate-ocr/1.0/training/cli/export/).

### Running Inference

Now we have the **ONNX** model **ready** to use it with [LicensePlateRecognizer](https://ankandrew.github.io/fast-plate-ocr/1.0/reference/inference/inference_class/) class! Doing **inference** with it is as simple as writing very few lines of code.

In [None]:
from fast_plate_ocr import LicensePlateRecognizer

plate_recognizer = LicensePlateRecognizer(
    onnx_model_path=exported_onnx,
    plate_config_path="cct_xs_v1_global_plate_config.yaml",
)

To run inference we can simply call the `.run(...)` method, but **remember** `fast-plate-ocr` **expects** the **cropped plate**.

To **use** the **trained model** with an actual **plate detector** (which **localizes** and **crops** the plate) into the expected format expect by `fast-plate-ocr`, we can easily use out newly trained and exported model with [**FastALPR**](https://github.com/ankandrew/fast-alpr).


Next, install fast-alpr:

In [None]:
!pip install fast-alpr[onnx]  # or fast-alpr[onnx-gpu] for GPU support!

We can easily integrate our custom ONNX model (trained on the Colombian dataset), with the following:

In [None]:
import cv2
from fast_alpr import ALPR

# Initialize the ALPR
alpr = ALPR(
    detector_model="yolo-v9-t-384-license-plate-end2end",
    ocr_model_path=exported_onnx,
    ocr_config_path="cct_xs_v1_global_plate_config.yaml",
)

You can find more details and options of the ALPR class in the FastALPR [docs](https://ankandrew.github.io/fast-alpr/latest/)

Let's try it with a random image grabbed from the web:

In [None]:
!wget https://upload.wikimedia.org/wikipedia/commons/7/71/2020_Renault_Logan_Intens_%28Colombia%29_front_view_02.png \
  -O test_plate.png

In [None]:
from google.colab.patches import cv2_imshow

# Load the image
image_path = "test_plate.png"
frame = cv2.imread(image_path)

# Draw predictions on the image
annotated_frame = alpr.draw_predictions(frame)

# Display the result
cv2_imshow(annotated_frame)

That's it! You have your own plate recognition, that you can use with a couple lines of code 🚀.