#Test dotsOCR for RBNR dataset

This is a ready-to-use notebook for the use of dotsOCR on the RBNR dataset. You just need to run the cells, the download of the dataset and the model is also managed, and see how it goes. You can also edit some parameter and the path for dataset if you want to use another one to test dotsOCR on it.

## SECTION 1 - parameters and dependencies

*First* of all we need to install the dependencies:


**File and Dataset**

* **os** â€“ filesystem operations
* **gdown** â€“ download dataset from Google Drive
* **zipfile / tarfile** â€“ extract compressed files
* **pathlib** â€“ manage file paths

**Model and Inference**

* **dots_ocr** â€“ our model, we will install it via github
* **dots_ocr.utils** â€“ helper functions for prompts
* **torch** â€“ deep learning framework
* **transformers** â€“ load model, processor, tokenizer
* **qwen_vl_utils** â€“ prepare image + text input

**Text and Regex**

* **re** â€“ extract digits from output
* **json** â€“ read/write config files

**Evaluation**

* **scikit-learn (metrics)** â€“ compute precision, recall, F1
* **matplotlib** â€“ plot results, confusion matrices





In [None]:
!git clone https://github.com/rednote-hilab/dots.ocr.git

In [None]:
!python3 dots.ocr/tools/download_model.py #download dots_ocs model

In [None]:
!pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
%cd /content/dots.ocr
!pip install -e .
%cd ../

In [None]:
import os
import gdown
import tarfile
import zipfile

import torch
from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer
from qwen_vl_utils import process_vision_info

%cd /content/dots.ocr
from dots_ocr.utils import dict_promptmode_to_prompt
%cd ../

import json
import re
import pathlib

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, precision_score, recall_score, f1_score
import matplotlib.pyplot as plt

Now lets lets define some **variables** that are useful for the notebook. Path of the **drive dataset**, **prompt**, **num_token** can be modified here.

In [None]:

#DATASET PARAMETERS
dataset_link = 'https://drive.google.com/uc?id=12W-bY7SuctltDqHhl-OkI1DzwrcGnvJM'
dataset_extract_path = './'
dataset_images_subfolder = '/content/cropped_RBNR_bib_dataset/images'
labels_path = '/content/cropped_RBNR_bib_dataset/labels.txt'

#MODEL PARAMETER
model_path = "dots.ocr/weights/DotsOCR" #should not be modified ( is a path from the dotsOCR library )
attn_impl = "flash_attention_2"
device = 'cuda'
dtype = torch.bfloat16

#INFERENCE PARAMETERS
prompt_text = """What number is visible on the racing bib in this image?""" #prompt that dotsOCR will recive to make inference
digit_regex = r'\b\d{2,6}\b'
max_token = 16 #max number of the token that will be generated
temperature = 1
repetition_penalty = 1
max_digit_length = 6
problematic_img = ['set3_06_0.JPG'] #some img make the script crash

#EVALUATION PARAMETERS
predictions_output_path = '../predictions.txt'


## SECTION 2 ( optional ) - download dataset

I am kinldy hosting the dataset for you on my google drive , I don't know until when... To download it from there I use **gdown** to get the zip, then the **zipfile** library to extract it

In [None]:
os.makedirs(dataset_extract_path, exist_ok=True)

# Function to download a file if it is not already present
def download_if_needed(filename, url):
    file_path = os.path.join(dataset_extract_path, filename)
    if not os.path.exists(file_path):
        print(f"ðŸ“¥ Downloading {filename} from Google Drive...")
        gdown.download(url, file_path, quiet=False)
    return file_path

# Download the files
x_dev_path_compressed = download_if_needed("./dataset.zip", dataset_link)


In [None]:
# Path to the downloaded file
compressed_file = x_dev_path_compressed

os.makedirs(dataset_extract_path, exist_ok=True)

# Extract everything
# Check if the file is a zip file before attempting to open it as tar.gz
with zipfile.ZipFile(compressed_file, "r") as zip_ref:
    zip_ref.extractall(path=dataset_extract_path)

print(f"âœ… Files extracted to: {dataset_extract_path}")


# SECTION 3 - Load model and inference

In [None]:
model_path = model_path
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    attn_implementation=attn_impl,
    torch_dtype=dtype,
    device_map="auto",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

In [None]:
def dots_ocr_inference(image_path):

  prompt = prompt_text
  messages = [
          {
              "role": "user",
              "content": [
                  {
                      "type": "image",
                      "image": image_path
                  },
                  {"type": "text", "text": prompt}
              ]
          }
      ]

  # Prepare the image + text imput
  text = processor.apply_chat_template(
      messages,
      tokenize=False,
      add_generation_prompt=True
  )

  image_inputs, video_inputs = process_vision_info(messages)
  inputs = processor(
      text=[text],
      images=image_inputs,
      videos=video_inputs,
      padding=True,
      return_tensors="pt",
  )

  inputs = inputs.to("cuda")

  # Inference: Generation of the output
  generated_ids = model.generate(**inputs,
                                 max_new_tokens=max_token,
                                 temperature = temperature,
                                 repetition_penalty = repetition_penalty)

  generated_ids_trimmed = [
      out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
  ]

  output_text = processor.batch_decode(
      generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
  )

  #check if the prediction satisfy the regex condition (2 to 6 digit in this notebook)
  digits = re.findall(digit_regex, output_text[0])
  pred = "".join(digits)

  return pred

In [None]:
images_path = pathlib.Path(dataset_images_subfolder)
predictions = []

i = 0
for bib in sorted(os.listdir(images_path)):
    i += 1
    if bib in problematic_img:
      predictions.append('nan')
      continue
    print(f'computing {bib}')

    img_path = os.path.join(images_path, bib)
    result = dots_ocr_inference(img_path)

    #if the number has more than 6 digit it sign it as nan
    if len(result) > max_digit_length:
      result = 'nan'

    print(f'[{i}/{len(os.listdir(images_path))}] SUCCESS: {result}')
    predictions.append(result)

#save predictions
with open(predictions_output_path, 'w') as f:
    for line in predictions:
        f.write("".join(line) + "\n")



# SECTION 4 - evaluation

Now that we have our prediction we evaluate the result in 2 way:


*   **complete number**: basically we count as True Positive only if the number predicted and the label perfectly match
*   **by digit**: instead of evaluating the full number we evaluate the single digits of each number



In [None]:

def evaluate(labels_path: str, predictions_path: str) -> tuple[float, float, float]:

    labels = []
    with open(labels_path, 'r') as f:
        for line in f:
            labels.append(line.strip())

    predictions = []
    with open(predictions_path, 'r') as f:
        for line in f:
            predictions.append(line.strip())

    TP = 0
    FP = 0
    FN = 0

    for label, prediction in zip(labels, predictions):
        if prediction == 'nan':
            FN +=1
            continue

        if prediction == label:
            TP+=1
            continue

        FP+=1

    P = TP / (TP + FP) if (TP + FP) > 0 else 0.0
    R = TP / (TP + FN) if (TP + FN) > 0 else 0.0
    F1 = 2 * (R * P) / (R + P) if (R + P) > 0 else 0.0
    return P, R, F1


In [None]:

def evaluate_digit(labels_path: str, predictions_path: str) -> tuple[float, float, float]:

    labels = []
    with open(labels_path, 'r') as f:
        for line in f:
            labels.append(line.strip())

    predictions = []
    with open(predictions_path, 'r') as f:
        for line in f:
            predictions.append(line.strip())

    TP = 0
    FP = 0
    FN = 0

    for label, prediction in zip(labels, predictions):
        if prediction == 'nan':
            FN +=1
            continue

        max_len = max(len(label), len(prediction))

        for i in range(max_len):
            true_digit = label[i] if i < len(label) else None
            pred_digit = prediction[i] if i < len(prediction) else None

            if true_digit is not None and pred_digit is not None: #if i can compare them
                if true_digit == pred_digit:
                    TP += 1 # right predition -> TP
                else:
                    FP += 1  # wrong prediction -> FP
            elif true_digit is not None and pred_digit is None:# if i dont predict a digit -> FN
                FN += 1
            elif pred_digit is not None and true_digit is None:
                FP += 1  # if i predict some digit that do not exist -> FP

    P = TP / (TP + FP) if (TP + FP) > 0 else 0.0
    R = TP / (TP + FN) if (TP + FN) > 0 else 0.0
    F1 = 2 * (R * P) / (R + P) if (R + P) > 0 else 0.0
    return P, R, F1
    print(f"PRECISION: {precision_score(y_true=y_true, y_pred=y_pred)}")
    print(f"RECALL: {recall_score(y_true=y_true, y_pred=y_pred)}")
    print(f"F1: {f1_score(y_true=y_true, y_pred=y_pred)}")


In [27]:
P_digit, R_digit, F1_digit = evaluate_digit(labels_path, predictions_output_path)
P_full, R_full, F1_full = evaluate(labels_path, predictions_output_path)

print("===== ðŸ“Š RISULTATI DOTS OCR =====")
print("\nFull number evaluation:")
print(f"Precisione: {P_full*100:.2f}%")
print(f"Recall:     {R_full*100:.2f}%")
print(f"F1-score:   {F1_full*100:.2f}%")

print("\nDigit evaluation:")
print(f"Precisione: {P_digit*100:.2f}%")
print(f"Recall:     {R_digit*100:.2f}%")
print(f"F1-score:   {F1_digit*100:.2f}%")

print("\n=================================")


===== ðŸ“Š RISULTATI DOTS OCR =====

Full number evaluation:
Precisione: 84.19%
Recall:     92.71%
F1-score:   88.25%

Digit evaluation:
Precisione: 94.58%
Recall:     90.80%
F1-score:   92.65%

