Reference: https://huggingface.co/nanonets/Nanonets-OCR-s

In [1]:
from PIL import Image
from transformers import AutoTokenizer, AutoProcessor, AutoModelForImageTextToText

In [2]:
import os 
print(os.getenv("CONDA_DEFAULT_ENV"))

stable_env


### Load the Model

In [3]:
model_path = "nanonets/Nanonets-OCR-s"

In [4]:
model = AutoModelForImageTextToText.from_pretrained(
    model_path, 
    dtype="auto", 
    device_map="auto", 
    attn_implementation="flash_attention_2"
).eval()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = AutoProcessor.from_pretrained(model_path)

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.


Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

### Load the Image

In [6]:
image_path = "/home/aritrad/test_images/electric.PNG"
image = Image.open(image_path)

In [7]:
prompt = """Extract the text from the above document as if you were reading it naturally. Return the tables in html format. Return the equations in LaTeX representation. If there is an image in the document and image caption is not present, add a small description of the image inside the <img></img> tag; otherwise, add the image caption inside <img></img>. Watermarks should be wrapped in brackets. Ex: <watermark>OFFICIAL COPY</watermark>. Page numbers should be wrapped in brackets. Ex: <page_number>14</page_number> or <page_number>9/22</page_number>. Prefer using ☐ and ☑ for check boxes."""

### Create Chat Message

In [8]:
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image", "image": f"file://{image_path}"},
        {"type": "text", "text": prompt},
    ]},
]

In [9]:
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], padding=True, return_tensors="pt")
inputs = inputs.to(model.device)

### Generate

In [10]:
%%time
output_ids = model.generate(
    **inputs, 
    max_new_tokens=1024, 
    do_sample=True,  # <--- Change this to True
    temperature=0.7  # (Optional) Set your desired temp
)

generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]

CPU times: user 9.18 s, sys: 251 ms, total: 9.43 s
Wall time: 6.66 s


In [11]:
output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
print(output_text[0])

# WHY STUDY ELECTRICAL ENGINEERING?

The engineering discipline of electrical engineering deals with the study and application of systems which use electricity, electronics and electromagnetism.

*Electrical engineering overlaps with a wide array of disciplines as shown here.*

- *Any student / professional interested in working in these domains / related areas must require the understanding of*
  - Electrical Engineering
  - Computer Science (Application and Solution)
  - Electronics Engineering
  - Mechanical Engineering
  - Material Science
  - Communication Technology
  - Mathematics (ODEs, PDEs, Matrices, Linear Algebra)

<img>A Venn diagram showing overlap between Electrical Engineering and other fields.</img>
