<a href="https://colab.research.google.com/github/VRX-Work/doc-info-extractor/blob/main/qwen_parser.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Main Implementation

In [1]:
!apt-get install poppler-utils
!pip install pdf2image --no-cache-dir
!pip install transformers
!pip install qwen-vl-utils
!pip install -q streamlit

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
poppler-utils is already the newest version (22.02.0-2ubuntu0.5).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.


In [2]:
!npm install localtunnel

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K
up to date, audited 23 packages in 2s
[1G[0K⠇[1G[0K
[1G[0K⠇[1G[0K3 packages are looking for funding
[1G[0K⠇[1G[0K  run `npm fund` for details
[1G[0K⠇[1G[0K
2 [33m[1mmoderate[22m[39m severity vulnerabilities

To address all issues (including breaking changes), run:
  npm audit fix --force

Run `npm audit` for details.
[1G[0K⠇[1G[0K

In [5]:
%%writefile app.py
import streamlit as st
from pdf2image import convert_from_bytes

# Load model directly
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor, TextIteratorStreamer
import torch
from qwen_vl_utils import process_vision_info

MODEL_ID = "Qwen/Qwen2-VL-2B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
    torch_dtype=torch.float16
).to("cuda").eval()

min_pixels = 256*28*28
max_pixels = 720*28*28
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)

st.title("Document Data Extractor")
pdf_file = st.file_uploader("Upload Document", type="pdf")

# Inference Loop

responses = {
    "full": [],
    "sliced": []
}

if pdf_file is not None:
  images = convert_from_bytes(pdf_file.read()) # Input PDF
  bar = st.progress(0)

  for idx, image in enumerate(images):
    image.save("sample.png", "PNG") # Await this in async

    # Pre-process Logic
    messages = [
      {
          "role": "user",
          "content": [
              {
                  "type": "image",
                  "image": "/content/sample.png"
              },
              {"type": "text", "text": "From this image extract every single field and value and provide the output in this csv form file,value"},
          ],
      },
    ]

    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    image_inputs, video_inputs = process_vision_info(messages)
    inputs = processor(
            text=[text],
            images=image_inputs,
            videos=video_inputs,
            padding=True,
            return_tensors="pt",
        ).to("cuda")

    # Inference here
    generate_ids = model.generate(**inputs, max_new_tokens=1024)
    response_full = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    response_sliced = processor.decode(generate_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True, clean_up_tokenization_spaces=False)
    bar.progress(100 // (len(images) - idx))
    responses["full"].append(response_full)
    responses["sliced"].append(response_sliced)

    st.markdown(f"**Page {idx}**")
    st.markdown(response_sliced)

    del inputs # Free cuda cache

Overwriting app.py


In [6]:
!streamlit run app.py &>/content/logs.txt &
!npx localtunnel --port 8501

[1G[0K⠙[1G[0Kyour url is: https://wild-queens-try.loca.lt
^C


In [11]:
print(responses)

{'full': ["system\nYou are a helpful assistant.\nuser\nFrom this image extract every single field and value and provide the output in this csv form file,value\nassistant\n|Field|Value|\n|---|---|\n|Form No.| PAS-3|\n|Language| English|\n|Reference| Hindi|\n|Instructions| Pre-fill|\n|Corporate Identity Number (CIN)| U74999HR2015FTC056386|\n|Global Location Number (GLN)| CARS24 SERVICES PRIVATE LIMITED|\n|Address of the Registered office of the company| 10th Floor, Tower – B, Unitech Cyber Park, Sector - 39, Gurugram, Gurgaon, Haryana 122001|\n|Email Id of the company| roc@cars24.com|\n|Number of allotments| 1|\n|Date of allotment| 24/06/2022|\n|Date of passing shareholders' resolution| 24/06/2022|\n|SRN of Form No. MGT-14| MGT-14|\n|Preference shares| 1|\n|Equity shares without Differential rights| 10|\n|Equity Shares with Differential rights| 10|\n|Debentures| 0|\n|Brief particulars of terms and conditions| Pari-Passu to Existing Equity Shares|\n|Number of securities allotted| 879435|\

|Field|Value|
|---|---|
|Form No.| PAS-3|
|Language| English|
|Reference| Hindi|
|Instructions| Pre-fill|
|Corporate Identity Number (CIN)| U74999HR2015FTC056386|
|Global Location Number (GLN)| CARS24 SERVICES PRIVATE LIMITED|
|Address of the Registered office of the company| 10th Floor, Tower – B, Unitech Cyber Park, Sector - 39, Gurugram, Gurgaon, Haryana 122001|
|Email Id of the company| roc@cars24.com|
|Number of allotments| 1|
|Date of allotment| 24/06/2022|
|Date of passing shareholders' resolution| 24/06/2022|
|SRN of Form No. MGT-14| MGT-14|
|Preference shares| 1|
|Equity shares without Differential rights| 10|
|Equity Shares with Differential rights| 10|
|Debentures| 0|
|Brief particulars of terms and conditions| Pari-Passu to Existing Equity Shares|
|Number of securities allotted| 879435|
|Nominal amount per security| 10|
|Total nominal amount| 8794350|
|Amount paid per security on application (excluding premium)| 10|
|Total amount paid on application (excluding premium)| 8794350|
|Amount due and payable on allotment per security (excluding premium)| 0|
|Total Amount payable on allotment (excluding premium)| 0|
|Premium amount per security due and payable (if any)| 6661.87|
|Total premium amount due and payable (if any)| 5858680650|
|Premium amount paid per security (if any)| 6661.87|
|Total premium amount paid (if any)| 5858680650|
|Amount of discount per security (if any)| 0|
|Total discount amount (if any)| 0|
|Amount to be paid on calls per security (if any) (excluding premium)| 0|
|Total amount to be paid on calls (if any) (excluding premium)| 0|

In [None]:


# st.write('Hello, *World!* :sunglasses:')

In [None]:
!streamlit run app.py &>/content/logs.txt &
!npx localtunnel --port 8501