# Qwen2-VL-2B-Instruct Model Testing

Testing the **Qwen/Qwen2-VL-2B-Instruct** model with **4-bit quantization** for chess piece recognition.

**Key Benefits:**
- 2B parameters (vs 7B in UI-TARS)
- 4-bit quantization for fast CPU inference
- ~1.5GB memory usage (vs ~14GB full precision)
- Much faster loading and inference

In [None]:
# Load Qwen2-VL-2B-Instruct model directly with 4-bit quantization
from transformers import AutoModelForVision2Seq, AutoProcessor, BitsAndBytesConfig
import torch

print("Loading Qwen2-VL-2B-Instruct with 4-bit quantization...")

# 4-bit quantization config for fast CPU inference
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

# Load processor
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", trust_remote_code=True)
print("✓ Processor loaded")

# Load model with quantization
model = AutoModelForVision2Seq.from_pretrained(
    "Qwen/Qwen2-VL-2B-Instruct",
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)
print("✓ Model loaded with 4-bit quantization (~1.5GB memory)")


  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Fetching 2 files: 100%|██████████| 2/2 [02:53<00:00, 86.77s/it] 
Loading checkpoint shards: 100%|██████████| 2/2 [00:10<00:00,  5.01s/it]
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.


[{'input_text': [{'role': 'user',
    'content': [{'type': 'image',
      'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG'},
     {'type': 'text', 'text': 'What animal is on the candy?'}]}],
  'generated_text': [{'role': 'user',
    'content': [{'type': 'image',
      'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG'},
     {'type': 'text', 'text': 'What animal is on the candy?'}]},
   {'role': 'assistant',
    'content': "The candy in the image features a black bird with outstretched wings, which is a distinctive design element on the colorful M&M's."}]}]

In [None]:
from PIL import Image

# Load your local chess board image
image = Image.open(r"C:\Users\muhammadahmad5\Desktop\chess autoamtion\chessboard_capture.png")

# Create prompt for chess piece recognition
text_prompt = """Analyze this chess board image and identify all pieces.

For each piece, provide its square location (a1-h8), piece type (pawn, knight, bishop, rook, queen, king), and color (white, black).

Return ONLY a JSON object like this:
{"a1": {"piece": "rook", "color": "white"}, "e4": {"piece": "pawn", "color": "black"}}

Use lowercase, algebraic notation, and only include occupied squares."""

# Prepare messages in Qwen2VL format
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": text_prompt}
        ]
    },
]

# Process with Qwen2-VL
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generate response
print("Analyzing chessboard with Qwen2-VL (quantized)...")
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=1024)

# Trim and decode
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs["input_ids"], generated_ids)
]
response = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

print("Response:")
print(response)


Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.


[{'input_text': [{'role': 'user', 'content': [{'type': 'image', 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=800x800 at 0x2A0B2437380>}, {'type': 'text', 'text': 'What is on the chessboard?'}]}], 'generated_text': [{'role': 'user', 'content': [{'type': 'image', 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=800x800 at 0x2A0B2437380>}, {'type': 'text', 'text': 'What is on the chessboard?'}]}, {'role': 'assistant', 'content': 'The chessboard features a classic layout with rows and columns, each containing a variety of chess pieces. The pieces are arranged in a grid, with some pieces in the top row and others in the bottom row. The pieces are colored in shades of gray and white, with some pieces having a slight shadow effect.'}]}]
