#### Initial Setup:  
- conda env remove -n qwen-local-ocr  
  
- conda create -n qwen-local-ocr -c conda-forge -y  
- conda activate qwen-local-ocr   
- conda install python requests huggingface_hub "accelerate>=0.26.0" pillow ipykernel jupyter nb_conda_kernels ipywidgets -c conda-forge -y  

- pip install git+https://github.com/huggingface/transformers  
- pip install qwen-vl-utils  
  
#### Install pytorch locally:  
- https://pytorch.org/get-started/locally/  
- pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  
  
python -m ipykernel install [--user] --prefix=C:\Users\tech_expert\.conda\envs\qwen-local-ocr --name qwen-local-ocr  
  
Run VSCode in Admin mode to enable symbolic links created by model cache  
  
#### Model cache:  
- C:\Users\ [username] \.cache\huggingface\hub\models--Qwen--Qwen2*  

In [None]:
from GetModelList import get_qwen_models 
import huggingface_hub
from my_timer import MyTimer, my_timer
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

In [None]:
get_qwen_models("Qwen")

device_map selects the processor to use, cpu or cuda (gpu)  
usually select auto if you are going to run this code on different machines and you are not sure if they will have a GPU available.

**This command creates an instance of the Qwen2_5_VLForConditionalGeneration class from the transformers library to load a pre-trained Qwen-2.5-VL model  from the Hugging Face model hub for image analysis.

In [None]:
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    # "Qwen/Qwen2.5-VL-3B-Instruct", torch_dtype="auto", device_map="auto"
    # "Qwen/Qwen2.5-VL-3B-Instruct", torch_dtype="auto", device_map="cpu"
    "Qwen/Qwen2.5-VL-3B-Instruct", torch_dtype="auto", device_map="cuda"
)

In [None]:
# pip install git+https://github.com/huggingface/transformers.git@1931a351408dbd1d0e2c4d6d7ee0eb5e8807d7bf AutoProcessor

**this command loads the appropriate processor for a pre-trained model from the Hugging Face transformers library

In [None]:
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")


In [None]:
# image = r"..\images\dl1.jpg"
# prompt = "Extract all text found on the image, including handwritten signatures"

image = r"..\images\WalmartReceipt.png"
prompt = "Extract all text found on the image, including handwritten signatures"

# image = r"..\images\WalmartReceipt.png"
# prompt = "What is the account number shown on this image?"

In [None]:
messages = [
{
    "role": "user",
    "content": [
        {
            "type": "image",
            "image": image,
        },
        {"type": "text", "text": prompt},
    ],
}
]

**This statement uses the apply_chat_template method of the processor to format a list of messages into the specific input format expected by the  model.

In [None]:
# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

**prepare the image(s) or video for the model, transforming them into a format the model can understand. This may include normalization, resizing and other transformations.

In [None]:
image_inputs, video_inputs = process_vision_info(messages)

**Use the processor to prepare the input data (text, images, and potentially videos) for the model

In [None]:
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)

**Move the data or tensors contained within the inputs dictionary to the GPU (Graphics Processing Unit) memory. Cuda is used for NVidia GPU's
*** test the model.to method!!

In [None]:
inputs = inputs.to("cuda")

# new!!
model = model.to("cuda")

**For Inference, generate output from the model using our message and prompt - slowest part of the code
max_new_tokens controls the model output, too large of a value can cause loops of output, where output is repeated several times.

In [None]:

# generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids = model.generate(**inputs, max_new_tokens=1024)

**create an iterator of tuples between input and generated ids

In [None]:
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

**Decode model output into human readable format
generated_ids_trimmed is the model output, which is a list of pytorch tensors containing tokens 

In [None]:
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)

In [None]:
print(output_text)