# Fixed NumPy Compatibility Issues

**IMPORTANT:** Before running this notebook, restart your kernel to ensure the NumPy downgrade takes effect:
- In Jupyter: Kernel → Restart Kernel
- In VS Code: Restart the Python interpreter

## Changes Made:
1. **Fixed NumPy compatibility**: Downgraded NumPy from 2.2.6 to 1.26.4 to be compatible with PyTorch 2.1.0
2. **Added error handling**: Better debugging information throughout the notebook
3. **Improved model loading**: Added progress messages and trust_remote_code parameter
4. **Enhanced inference function**: Better error handling and device management

## Next Steps:
Run all cells in order after restarting the kernel.


In [None]:
# Verify NumPy compatibility
import numpy as np
import torch

print("🔍 Checking system compatibility...")
print(f"✅ NumPy version: {np.__version__}")
print(f"✅ PyTorch version: {torch.__version__}")
print(f"✅ CUDA available: {torch.cuda.is_available()}")

# Test NumPy-PyTorch compatibility
try:
    # This was the operation that was failing before
    test_array = np.array([1, 2, 3])
    test_tensor = torch.from_numpy(test_array)
    print("✅ NumPy-PyTorch compatibility: WORKING")
except Exception as e:
    print(f"❌ NumPy-PyTorch compatibility: FAILED - {e}")

print("\n🚀 Ready to proceed with model loading!")


In [2]:
!pip install --upgrade huggingface_hub pillow transformers qwen-vl-utils datasets peft 

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


In [3]:
# Load token from environment variable (safer than hardcoding)
import os
kahua_token = os.getenv('HUGGINGFACE_TOKEN', 'your-token-here')

# If no environment variable is set, you'll need to set it
if kahua_token == 'your-token-here':
    print("⚠️  Please set your HUGGINGFACE_TOKEN environment variable")
    print("   export HUGGINGFACE_TOKEN='your-actual-token'")
    # For now, you can uncomment and set your token here temporarily:
    # kahua_token = 'your-actual-token-here'

In [4]:
import datasets
import huggingface_hub

In [5]:
huggingface_hub.login(kahua_token)

In [6]:
import huggingface_hub
import datasets
from PIL import Image
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.1.0+cu118
CUDA available: True
CUDA device: NVIDIA A100 80GB PCIe


In [7]:
print("Loading Qwen2.5-VL-3B-Instruct model...")
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-3B-Instruct",
    torch_dtype=torch.bfloat16,
    # attn_implementation="flash_attention_2",
    device_map="auto",
    trust_remote_code=True
)
print("Model loaded successfully!")

# default processer
print("Loading processor...")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct", trust_remote_code=True)
print("Processor loaded successfully!")

Loading Qwen2.5-VL-3B-Instruct model...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


Model loaded successfully!
Loading processor...


You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.


Processor loaded successfully!


In [8]:
from peft import PeftModel, PeftConfig

# Load the config
peft_model_id = "kahua-ml/invoice1"
print(f"Loading PEFT config from {peft_model_id}...")

peft_config = PeftConfig.from_pretrained(peft_model_id)
print("PEFT config loaded successfully!")

print("Enabling input gradients...")
model.enable_input_require_grads()

# Attach the PEFT model
print("Loading PEFT model...")
peft_model = PeftModel.from_pretrained(model, peft_model_id)
print("PEFT model loaded successfully!")

print(f"Model device: {next(peft_model.parameters()).device}")
print(f"Model dtype: {next(peft_model.parameters()).dtype}")

Loading PEFT config from kahua-ml/invoice1...
PEFT config loaded successfully!
Enabling input gradients...
Loading PEFT model...
PEFT model loaded successfully!
Model device: cuda:0
Model dtype: torch.bfloat16


In [9]:
def infer(image, model, processor):
    """Run inference on the image."""
    print("🚀 Starting inference...")
    
    # Use the EXACT training prompt for better results
    query = """You are an expert at extracting structured data from receipts and invoices. 
Analyze the image and return in JSON format all metadata seen including company details, items, prices, totals, and dates.

Expected JSON format:
{
  "company": "Company Name",
  "address": "Full Address", 
  "date": "YYYY-MM-DD",
  "total": "XX.XX",
  "tax": "XX.XX",
  "items": [
    {
      "description": "Item description",
      "quantity": "X",
      "price": "XX.XX",
      "total": "XX.XX"
    }
  ]
}

JSON Output:"""
    
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": query}
            ],
        }
    ]

    print("📝 Applying chat template...")
    text = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    
    print("🖼️ Processing vision info...")
    vision_info = process_vision_info(messages)
    image_inputs = vision_info[0] if len(vision_info) > 0 else None
    video_inputs = vision_info[1] if len(vision_info) > 1 else None
    
    print(f"📊 Image inputs type: {type(image_inputs)}")
    print(f"📊 Number of images: {len(image_inputs) if image_inputs else 0}")
    
    print("⚙️ Processing inputs with alternative approach...")
    # Try different approaches for processing
    try:
        # First approach: Standard processing
        inputs = processor(
            text=[text],
            images=image_inputs,
            videos=video_inputs,
            padding=True,
            return_tensors="pt",
        )
        print("✅ Standard processing successful!")
    except Exception as e1:
        print(f"⚠️ Standard processing failed: {e1}")
        try:
            # Second approach: Process without explicit padding
            inputs = processor(
                text=[text],
                images=image_inputs,
                videos=video_inputs,
                return_tensors="pt",
            )
            print("✅ Alternative processing successful!")
        except Exception as e2:
            print(f"❌ Alternative processing also failed: {e2}")
            # Third approach: Manual processing
            print("🔧 Trying manual processing...")
            inputs = processor.tokenizer(
                text, return_tensors="pt", padding=True, truncation=True
            )
            # Process images separately
            if image_inputs:
                image_features = processor.image_processor(
                    images=image_inputs, return_tensors="pt"
                )
                inputs.update(image_features)
            print("✅ Manual processing successful!")

    print("🔄 Moving inputs to device...")
    device = next(model.parameters()).device
    inputs = inputs.to(device)

    print("🧠 Generating response...")
    with torch.no_grad():
        generated_ids = model.generate(
            **inputs, 
            max_new_tokens=512, 
            do_sample=False,
            temperature=1.0,
            top_p=1.0
        )
    
    print("📖 Decoding response...")
    generated_ids_trimmed = [
        out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
    ]
    output_text = processor.batch_decode(
        generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )

    print("✅ Inference completed!")
    return output_text[0]

In [11]:
from PIL import Image
import os

# Check if image file exists
image_path = r"/root/video/rryalsty.png"
if not os.path.exists(image_path):
    print(f"Error: Image file '{image_path}' not found!")
    print("Available files in current directory:")
    print([f for f in os.listdir(".") if f.lower().endswith(('.png', '.jpg', '.jpeg'))])
else:
    print(f"Loading image: {image_path}")
    image = Image.open(image_path).convert("RGB")
    print(f"Image size: {image.size}")
    
    try:
        result = infer(image, peft_model, processor)
        print("\n" + "="*50)
        print("INFERENCE RESULT:")
        print("="*50)
        print(result)
    except Exception as e:
        print(f"Error during inference: {e}")
        import traceback
        traceback.print_exc()

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Loading image: /root/video/rryalsty.png
Image size: (938, 548)
Starting inference...
Applying chat template...
Processing vision info...
Image inputs type: <class 'list'>
Number of images: 1
Processing inputs...
Inputs processed successfully!
Moving inputs to device...
Generating response...
Decoding response...
Inference completed!

INFERENCE RESULT:
{
  "company": "IRONHORSE",
  "product": "INDUSTRIAL DUTY FRACTIONAL MOTOR",
  "model": "MTRJ-P33-3BD36J",
  "frame": "56J",
  "frequency": {
    "60": "",
    "50": ""
  },
  "horsepower": {
    "1/3": "",
    "1/4": ""
  },
  "phase": "3",
  "rpm": {
    "3450": "",
    "2850": ""
  },
  "duty": "CONT",
  "voltage": {
    "230/460": "",
    "190/380": ""
  },
  "amps": {
    "1.3/0.65": "",
    "1.2/0.6": ""
  },
  "insulation": "F",
  "ip rating": "IP43",
  "s.f": "1.15",
  "sfa": {
    "1.5/0.75": "",
    "1.2/0.6": ""
  },
  "lb/wt": "18",
  "max. ambient temperature": "40°C",
  "date code": "01/2022",
  "serial number": "2022010008"