<a href="https://colab.research.google.com/github/Bakir-Bhuyain/prescription_analyzer_model/blob/main/Untitled10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers torch opencv-python-headless pillow



In [2]:
import torch
import cv2
import numpy as np
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from google.colab import files
import io

# --- 1. Load Model (Do this once) ---
# We load the model and processor once to save time
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-handwritten')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-base-handwritten')
print("✅ Model and Processor Loaded.")

def extract_multi_line_text(image_bytes):
    """
    Combines OpenCV for box detection and TrOCR for text recognition
    to extract multi-line, ordered text from an image.

    Args:
        image_bytes (bytes): The image file as raw bytes.

    Returns:
        str: The extracted and organized text.
    """

    final_text = ""

    # --- 2. OpenCV Part: Find Text Boxes ---

    # Load image with OpenCV
    # Convert bytes to a numpy array
    nparr = np.frombuffer(image_bytes, np.uint8)
    # Decode image
    orig_image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)

    # We'll work on a grayscale version
    gray = cv2.cvtColor(orig_image, cv2.COLOR_BGR2GRAY)

    # Apply adaptive thresholding to get a clean, binary image
    # This is better than a simple threshold for varied lighting (like shadows on a prescription)
    binary = cv2.adaptiveThreshold(
        gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 10
    )

    # Find contours
    contours, hierarchy = cv2.findContours(
        binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )

    # Store bounding boxes
    boxes = []
    for cnt in contours:
        x, y, w, h = cv2.boundingRect(cnt)

        # --- Filter out noise ---
        # You must tune these values for your images
        # Here, we ignore boxes that are too small
        if w > 20 and h > 20: # Example filter: must be at least 20px wide and high
            boxes.append((x, y, w, h))

    if not boxes:
        return "No text boxes found. Try adjusting OpenCV filters."

    # --- 3. Sorting Part: The "Scattered Text" Fix ---

    # Sort boxes:
    # 1. Primary sort: top-to-bottom (by 'y' coordinate)
    # 2. Secondary sort: left-to-right (by 'x' coordinate)
    # We use a threshold to group boxes into "lines"

    boxes.sort(key=lambda b: b[1]) # Sort by 'y' coordinate (top)

    sorted_lines = []
    current_line = [boxes[0]]

    for box in boxes[1:]:
        # Get the average 'y' of the current line
        avg_y = sum([b[1] for b in current_line]) / len(current_line)

        # Get the height of the current box
        box_h = box[3]

        # Check if the box is on the same line (y-coordinate is close)
        # We use the box height as a threshold (e.g., within 70% of its height)
        if abs(box[1] - avg_y) < (box_h * 0.7):
            current_line.append(box)
        else:
            # New line detected
            # Sort the completed line by 'x' (left-to-right)
            current_line.sort(key=lambda b: b[0])
            sorted_lines.append(current_line)
            current_line = [box] # Start a new line

    # Add the last line
    current_line.sort(key=lambda b: b[0])
    sorted_lines.append(current_line)


    # --- 4. TrOCR Part: Read Text in Order ---

    # Convert the original *color* OpenCV image to PIL Image for TrOCR
    pil_image = Image.fromarray(cv2.cvtColor(orig_image, cv2.COLOR_BGR2RGB))

    # Define a small padding
    padding = 5

    full_text = []

    for line in sorted_lines:
        line_text = []
        for (x, y, w, h) in line:
            # Crop the *original* image
            # Apply padding to give TrOCR context
            cropped_img = pil_image.crop((
                max(x - padding, 0),
                max(y - padding, 0),
                min(x + w + padding, pil_image.width),
                min(y + h + padding, pil_image.height)
            ))

            # Run TrOCR
            pixel_values = processor(images=cropped_img, return_tensors="pt").pixel_values
            generated_ids = model.generate(pixel_values)
            generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

            line_text.append(generated_text)

        # Join words in a line with a space, and add a newline
        full_text.append(" ".join(line_text))

    return "\n".join(full_text)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/224 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-base-handwritten and are newly initialized: ['encoder.pooler.dense.bias', 'encoder.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Model and Processor Loaded.


In [3]:
# --- 5. Run the Extraction ---

# Upload a file
uploaded = files.upload()

if not uploaded:
    print("No file uploaded.")
else:
    # Get the first uploaded file
    file_name = next(iter(uploaded))
    image_bytes = uploaded[file_name]

    print("Processing image... this may take a moment.")

    # Run our function
    organized_text = extract_multi_line_text(image_bytes)

    print("\n--- Extracted Text ---")
    print(organized_text)

Saving pres.jpg to pres.jpg
Processing image... this may take a moment.

--- Extracted Text ---
fr spizzed
thrift . prom . mu.
sun
al n. ayr. 50 oomy
comm ebd 0 1 ply sp 4 phantily
ozd q
tab propying 1. la
onon
of cf ) fy of 9
0
te b. pis expro 20 ny
evil
rt. 0 of in. finct .
ta b. 2 w. spa
a sant
yt of of
0 tc. ls le evil 42 50. ory .
qt at max-
tab . 2. ji s. 8
race- neo
quartermasterly 330
