Installing the necessary packages also make sure that you are connected to T4 GPU or any other GPU runtime in collab

In [1]:
! pip install gradio transformers torch tiktoken verovio easyocr

Collecting gradio
  Downloading gradio-4.44.0-py3-none-any.whl.metadata (15 kB)
Collecting tiktoken
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting verovio
  Downloading verovio-4.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.0 kB)
Collecting easyocr
  Downloading easyocr-1.7.2-py3-none-any.whl.metadata (10 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0 (from gradio)
  Downloading fastapi-0.115.0-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.3.0 (from gradio)
  Downloading gradio_client-1.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.10.7-cp31

The application utilizes GOT OCR along with EasyOCR and also highlights the given keyword. User just needs to upload any image of their choice and the keyword and will get responses from both EasyOCR and GOT ( 580M end-to-end OCR 2.0 model). Right now EasyOCR gives best result for Hindi text and GOT for english texts so utilized both the services.

In [2]:
import gradio as gr
from transformers import AutoTokenizer, AutoModel
import torch
import easyocr
from PIL import Image

def load_got_model():
    tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True)
    model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id)
    return tokenizer, model.eval().cuda()

def load_easyocr():
    reader = easyocr.Reader(['en', 'hi'])
    return reader

tokenizer, got_model = load_got_model()
easyocr_reader = load_easyocr()

def perform_ocr(image, keyword):
    image.save("temp_image.png")

    got_res = got_model.chat(tokenizer, "temp_image.png", ocr_type='ocr')

    easyocr_res = easyocr_reader.readtext("temp_image.png", detail=0)
    easyocr_res = ' '.join(easyocr_res)

    highlight_color = "#87CEEB"

    if keyword:
        got_highlighted = got_res.replace(keyword, f"<mark style='background-color:{highlight_color};'>{keyword}</mark>")
        easyocr_highlighted = easyocr_res.replace(keyword, f"<mark style='background-color:{highlight_color};'>{keyword}</mark>")
        return got_highlighted, easyocr_highlighted

    return got_res, easyocr_res

interface = gr.Interface(
    fn=perform_ocr,
    inputs=[
        gr.Image(type="pil", label="Upload Image"),
        gr.Textbox(label="Keyword to Search", placeholder="Enter keyword to highlight")
    ],
    outputs=[
        gr.HTML(label="GOT OCR Result"),
        gr.HTML(label="EasyOCR Result")
    ],
    title="OCR Application using GOT 2.0 and EasyOCR",
    description="Upload an image to extract text using the GOT OCR 2.0 model and EasyOCR (English and Hindi), and search for keywords in the extracted text."
)

if __name__ == "__main__":
    interface.launch()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/300 [00:00<?, ?B/s]

tokenization_qwen.py:   0%|          | 0.00/9.47k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ucaslcl/GOT-OCR2_0:
- tokenization_qwen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


qwen.tiktoken:   0%|          | 0.00/2.56M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/149 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/986 [00:00<?, ?B/s]

modeling_GOT.py:   0%|          | 0.00/33.8k [00:00<?, ?B/s]

got_vision_b.py:   0%|          | 0.00/16.1k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ucaslcl/GOT-OCR2_0:
- got_vision_b.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


render_tools.py:   0%|          | 0.00/1.99k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ucaslcl/GOT-OCR2_0:
- render_tools.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/ucaslcl/GOT-OCR2_0:
- modeling_GOT.py
- got_vision_b.py
- render_tools.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]



Progress: |██████████████████████████████████████████████████| 100.0% Complete



Progress: |██████████████████████████████████████████████████| 100.0% CompleteSetting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://309056624da9aa8c04.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
