#Test monkeyOCR for RBNR dataset

This is a ready-to-use notebook for the use of monkeyOCR on the RBNR dataset. You just need to run the cells, the download of the dataset and the model is also managed, and see how it goes. You can also edit some parameter and the path for dataset if you want to use another one to test monkeyOCR on it.

## SECTION 1 - parameters and dependencies

*First* of all we need to install the dependencies:


**File and Archive Handling**

* **os** â€“ basic filesystem operations
* **gdown** â€“ download files from Google Drive
* **zipfile** â€“ extract `.zip` archives
* **pathlib** â€“ manage file paths

**Regex**

* **re** â€“ pattern matching and digit extraction

**Evaluation (scikit-learn)**

* **precision_score** â€“ compute precision
* **recall_score** â€“ compute recall
* **f1_score** â€“ compute F1



First, let's install the Library from github, the model and the requirements

In [None]:
!git clone https://github.com/Yuliang-Liu/MonkeyOCR.git
!python MonkeyOCR/tools/download_model.py -n MonkeyOCR-pro-3B
!pip install -r MonkeyOCR/requirements.txt

We need some other libraries to make it work

In [None]:
import os
import gdown


import tarfile
import os
import zipfile

import pathlib
import os
import re

from sklearn.metrics import precision_score, recall_score, f1_score

Now lets lets define some **variables** that are useful for the notebook. Path of the **drive dataset**, **prompt**, **num_token** can be modified here.

In [None]:
#dataset
data_link = 'https://drive.google.com/uc?id=12W-bY7SuctltDqHhl-OkI1DzwrcGnvJM'
dataset_folder = "./"
extract_path = "./"
images_folder = "../cropped_RBNR_bib_dataset/images/"

#evaluation
labels_path = "./labels.txt"
predictions_output_path = "./predictions.txt"

In [None]:
os.makedirs(dataset_folder, exist_ok=True)

# Function to download a file if it is not already present
def download_if_needed(filename, url):
    file_path = os.path.join(dataset_folder, filename)
    if not os.path.exists(file_path):
        print(f"ðŸ“¥ Downloading {filename} from Google Drive...")
        gdown.download(url, file_path, quiet=False)
    return file_path

# Download the files
x_dev_path_compressed = download_if_needed("./dataset.zip", data_link)


In [None]:
# Path to the downloaded file
compressed_file = x_dev_path_compressed
extract_path = "./"  # folder where files will be extracted

os.makedirs(extract_path, exist_ok=True)

# Extract everything
# Check if the file is a zip file before attempting to open it as tar.gz
with zipfile.ZipFile(compressed_file, "r") as zip_ref:
    zip_ref.extractall(path=extract_path)

print(f"âœ… Files extracted to: {extract_path}")


Here we make inference via CLI, to change parameters like **prompt** you need to edit the MonkeyOCR/parse.py file

In [None]:
%cd MonkeyOCR/
!python ./parse.py $images_folder -t text
%cd ../

Since the output are stored in the MonkeyOCR/output directory, we need to take the value from there

In [None]:
predictions = []
output_path = pathlib.Path('MonkeyOCR/output')
for res in sorted(os.listdir(output_path)): # for each result in the output folder
  markdown = res + "_text_result.md" # extract the name of the markdown which contain the text
  markdown_pat = os.path.join(output_path/res,markdown)

  pred = 'nan'
  with open (markdown_pat, 'r') as f:
    texts = f.readlines()
    for text in texts:
      if re.match(r'^\d+$', text):#if the text contains only digits
        pred = text.strip()#keep the first number predicted
        break
    predictions.append(pred)

In [None]:
#save the prediciton in a file

with open(predictions_output_path, 'w') as f:
    for line in predictions:
        f.write("".join(line) + "\n")

# SECTION 4 - evaluation

Now that we have our prediction we evaluate the result in 2 way:


*   **complete number**: basically we count as True Positive only if the number predicted and the label perfectly match
*   **by digit**: instead of evaluating the full number we evaluate the single digits of each number



In [None]:

def evaluate_digit(labels_path: str, predictions_path: str) -> tuple[float, float, float]:

    labels = []
    with open(labels_path, 'r') as f:
        for line in f:
            labels.append(line.strip())

    predictions = []
    with open(predictions_path, 'r') as f:
        for line in f:
            predictions.append(line.strip())

    TP = 0
    FP = 0
    FN = 0

    for label, prediction in zip(labels, predictions):
        if prediction == 'nan':
            FN +=1
            continue

        max_len = max(len(label), len(prediction))

        for i in range(max_len):
            true_digit = label[i] if i < len(label) else None
            pred_digit = prediction[i] if i < len(prediction) else None

            if true_digit is not None and pred_digit is not None: #if i can compare them
                if true_digit == pred_digit:
                    TP += 1 # right predition -> TP
                else:
                    FP += 1  # wrong prediction -> FP
            elif true_digit is not None and pred_digit is None:# if i dont predict a digit -> FN
                FN += 1
            elif pred_digit is not None and true_digit is None:
                FP += 1  # if i predict some digit that do not exist -> FP

    P = TP / (TP + FP) if (TP + FP) > 0 else 0.0
    R = TP / (TP + FN) if (TP + FN) > 0 else 0.0
    F1 = 2 * (R * P) / (R + P) if (R + P) > 0 else 0.0
    return P, R, F1
    print(f"PRECISION: {precision_score(y_true=y_true, y_pred=y_pred)}")
    print(f"RECALL: {recall_score(y_true=y_true, y_pred=y_pred)}")
    print(f"F1: {f1_score(y_true=y_true, y_pred=y_pred)}")


In [None]:

def evaluate(labels_path: str, predictions_path: str) -> tuple[float, float, float]:

    labels = []
    with open(labels_path, 'r') as f:
        for line in f:
            labels.append(line.strip())

    predictions = []
    with open(predictions_path, 'r') as f:
        for line in f:
            predictions.append(line.strip())

    TP = 0
    FP = 0
    FN = 0

    for label, prediction in zip(labels, predictions):
        if prediction == 'nan':
            FN +=1
            continue

        if prediction == label:
            TP+=1
            continue

        FP+=1

    P = TP / (TP + FP) if (TP + FP) > 0 else 0.0
    R = TP / (TP + FN) if (TP + FN) > 0 else 0.0
    F1 = 2 * (R * P) / (R + P) if (R + P) > 0 else 0.0
    return P, R, F1


In [None]:
P_digit, R_digit, F1_digit = evaluate_digit(labels_path, predictions_output_path)
P_full, R_full, F1_full = evaluate(labels_path, predictions_output_path)

print("===== ðŸ“Š RISULTATI MONKEY OCR =====")
print("\nFull number evaluation:")
print(f"Precisione: {P_full*100:.2f}%")
print(f"Recall:     {R_full*100:.2f}%")
print(f"F1-score:   {F1_full*100:.2f}%")

print("\nDigit evaluation:")
print(f"Precisione: {P_digit*100:.2f}%")
print(f"Recall:     {R_digit*100:.2f}%")
print(f"F1-score:   {F1_digit*100:.2f}%")

print("\n=================================")
