# Lets start

### Important links
1. https://www.cbe.org.eg/ar/economic-research/economic-reports
2. https://www.pif.gov.sa/en/investors/annual-reports/
3. https://www.pif.gov.sa/en/investors/
4. https://www.pif.gov.sa/en/investors/credit-rating/ -> a perfect use case  

##### https://www.cbe.org.eg/ar/economic-research/economic-reports/annual-report -> cbe image pdf

In [1]:
!nvcc --version
!echo $CUDA_HOME

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0



In [2]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Looking in indexes: https://download.pytorch.org/whl/cu124
Collecting torch
  Downloading https://download.pytorch.org/whl/cu124/torch-2.6.0%2Bcu124-cp310-cp310-linux_x86_64.whl (768.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m768.4/768.4 MB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting torchvision
  Downloading https://download.pytorch.org/whl/cu124/torchvision-0.21.0%2Bcu124-cp310-cp310-linux_x86_64.whl (7.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.3/7.3 MB[0m [31m39.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting torchaudio
  Downloading https://download.pytorch.org/whl/cu124/torchaudio-2.6.0%2Bcu124-cp310-cp310-linux_x86_64.whl (3.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting jinja2
  Downloading https://download.pytorch.org/whl/Jinja2-3.1.4-py3-none-any.whl (133 kB)
[2K     [90m━

# Installing what we need for qwen2.5-vl

In [4]:
!pip install git+https://github.com/huggingface/transformers accelerate 
!pip install qwen-vl-utils[decord]==0.0.8 
!pip install bitsandbytes flash-attn 

Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-p0k63ul4
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-p0k63ul4
  Resolved https://github.com/huggingface/transformers to commit 94ae1ba5b55e79ba766582de8a199d8ccf24a021
  Installing build dependencies ... [?2done
donetting requirements to build wheel ... [?25l
doneeparing metadata (pyproject.toml) ... [?25l


In [5]:
import torch
import sys

def check_environment():
    print(f"Python version: {sys.version}")
    print(f"PyTorch version: {torch.__version__}")
    
    # Check CUDA availability
    cuda_available = torch.cuda.is_available()
    print(f"CUDA available: {cuda_available}")
    
    if cuda_available:
        print(f"CUDA version: {torch.version.cuda}")
        print(f"Current CUDA device: {torch.cuda.current_device()}")
        print(f"Device name: {torch.cuda.get_device_name(0)}")
        print(f"Device count: {torch.cuda.device_count()}")
    
        # Alternative check for flash attention
        try:
            import flash_attn
            print(f"flash_attn package is installed: version {flash_attn.__version__}")
        except ImportError:
            print("flash_attn package is not installed")
    
    # Memory info if CUDA is available
    if cuda_available:
        print("\nGPU Memory Information:")
        print(f"Total memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
        print(f"Allocated memory: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")
        print(f"Cached memory: {torch.cuda.memory_reserved(0) / 1e9:.2f} GB")

if __name__ == "__main__":
    check_environment()

Python version: 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
PyTorch version: 2.6.0+cu124
CUDA available: True
CUDA version: 12.4
Current CUDA device: 0
Device name: NVIDIA GeForce RTX 4090
Device count: 1
flash_attn package is installed: version 2.7.4.post1

GPU Memory Information:
Total memory: 25.39 GB
Allocated memory: 0.00 GB
Cached memory: 0.00 GB


In [1]:
# huggingface-cli login --token <token> -> Terminal

### Model card on HF
https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct

In [2]:
import os
import base64
from typing import List, Union, Dict
import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
from PIL import Image
import os

class QwenVLProcessor:
    def __init__(
        self,
        model_name: str = "Qwen/Qwen2.5-VL-7B-Instruct",
        device: str = "cuda",
        min_pixels: int = 128*16*16,
        max_pixels: int = 1024*16*16,
        cache_dir: str = None  # Add cache_dir parameter
    ):
        """
        Initialize the QwenVL processor with custom configuration.

        Args:
            model_name: Name or path of the model to load
            device: Device to run the model on ('cuda' or 'cpu')
            use_flash_attention: Whether to use flash attention
            min_pixels: Minimum number of pixels for image processing
            max_pixels: Maximum number of pixels for image processing
        """
        # Configure CUDA memory allocation
        os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

        # Clear CUDA cache
        if device == "cuda":
            torch.cuda.empty_cache()

        # Load model and assign to self
        self.model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            model_name,
            torch_dtype=torch.bfloat16,
            device_map=device,
            attn_implementation="flash_attention_2", 
            use_cache=True,
            cache_dir=cache_dir,
        )

        # Load processor and assign to self
        self.processor = AutoProcessor.from_pretrained(
            model_name,
            min_pixels=min_pixels,
            max_pixels=max_pixels,
            use_fast=True
        )

        self.device = device

    def _encode_image(self, image_path: str) -> str:
        """
        Encode a local image file to base64.

        Args:
            image_path: Path to the local image file

        Returns:
            Base64 encoded string of the image
        """
        with open(image_path, "rb") as image_file:
            encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
        return f"data:image/jpeg;base64,{encoded_string}"

    def prepare_messages(
        self,
        image_paths: Union[str, List[str]],
        prompt: str
    ) -> List[Dict]:
        """
        Prepare messages for the model using local image paths.

        Args:
            image_paths: Single path or list of paths to local images
            prompt: Text prompt to process with the images

        Returns:
            List of formatted messages for the model
        """
        if isinstance(image_paths, str):
            image_paths = [image_paths]

        messages = []
        for path in image_paths:
            encoded_image = self._encode_image(path)
            messages.append({
                "role": "user",
                "content": [
                    {"type": "image", "image": encoded_image},
                    {"type": "text", "text": prompt}
                ]
            })
        return messages

    def process_images(
        self,
        image_paths: Union[str, List[str]],
        prompt: str,
        max_new_tokens: int = 2000,
        temperature: float = 0.01,
        top_p: float = 0.9 # creates a smaller pool of probably avaliable words
    ) -> List[str]:
        """
        Process local images with the given prompt.

        Args:
            image_paths: Single path or list of paths to local images
            prompt: Text prompt to process with the images
            max_new_tokens: Maximum number of tokens to generate
            temperature: Sampling temperature
            top_p: Top-p sampling parameter

        Returns:
            List of generated responses for each image
        """
        messages = self.prepare_messages(image_paths, prompt)

        with torch.inference_mode(): # check pytorch autograd mechanics page
            text = self.processor.apply_chat_template(
                messages,
                tokenize=False,
                add_generation_prompt=True
            )

            image_inputs, video_inputs = process_vision_info(messages)
            inputs = self.processor(
                text=[text],
                images=image_inputs,
                videos=video_inputs,
                padding=True,
                return_tensors="pt"
            )

            # put the inputs on device
            inputs = inputs.to(self.device) 

            
            generated_ids = self.model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                do_sample=True,
                temperature=temperature,
                top_p=top_p,
                pad_token_id=self.processor.tokenizer.pad_token_id,
                eos_token_id=self.processor.tokenizer.eos_token_id
            )

            generated_ids_trimmed = [
                out_ids[len(in_ids):]
                for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
            ]

            output_text = self.processor.batch_decode(
                generated_ids_trimmed,
                skip_special_tokens=True,
                clean_up_tokenization_spaces=False
            )

        return output_text

if __name__ == "__main__":

    processor = QwenVLProcessor()

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

In [3]:
import time
# Process single image
image_path = "./image.jpeg"

start_time = time.time()

result = processor.process_images(
    image_path,
    prompt="""You are an expert OCR model who can read and interpret hard images in details
    and in great precision. Given these images extract every detail of it in an organized format."""
)
print(f"Single image result: {result[0]}")

end_time = time.time() - start_time 
print(f"time is : {end_time:.2f} seconds")

Single image result: The image contains a handwritten quote on lined notebook paper. The text reads:

"‘Don’t ever let someone tell you, you can’t do something. Not even me. You got a dream, you got to protect it. People can’t do something themselves, they want to tell you, you can’t do it. You want something, go get it, period.
All right?’
- From Pursuit of Happiness"

The quote is attributed to the movie "Pursuit of Happiness." The handwriting appears neat and legible, with the lines of the notebook paper providing a structured background for the text.
time is : 3.92 seconds


# Lets see some English images examples

## Visualizations interpretations

In [4]:
image_path = "./visualization_eng.png"
result = processor.process_images(
    image_path,
    prompt="""You are an expert OCR model who can read and interpret hard images in details
    and in great precision. Given these images extract every detail of it in an organized format."""
)
print(f"Single image result: {result[0]}")

Single image result: The image is a bar chart titled "Fintech Market Growth Projections." The chart shows the projected market valuation of the fintech industry based on Compound Annual Growth Rate (CAGR) from 2021 to 2029.

### Key Details:
- **Title**: Fintech Market Growth Projections
- **Y-Axis**: Market Valuation Calculated Based on CAGR
- **X-Axis**: Year (from 2021 to 2029)
- **Data Points**:
  - **2021**: Approximately $100B
  - **2022**: Approximately $120B
  - **2023**: Approximately $140B
  - **2024**: Approximately $180B
  - **2025**: Approximately $260B
  - **2026**: Approximately $320B
  - **2027**: Approximately $400B
  - **2028**: Approximately $500B
  - **2029**: Approximately $620B

The bars increase steadily each year, indicating a consistent growth trend in the fintech market valuation over the period shown.


## prompting can help .. the more specific you are the more accurate results you get

In [5]:
image_path = "./pif.png"
result = processor.process_images(
    image_path,
    prompt="""You are an expert OCR model who can read and interpret hard images in details
    and in great precision. Given these images extract every detail of it in an organized format."""
)
print(f"Single image result: {result[0]}")

Single image result: Certainly! Below is the detailed information extracted from the provided image, organized into sections:

---

### **PIF Vision Realization Program**

#### **Introduction**
The Public Investment Fund (PIF) has committed to Vision 2030, with a focus on driving sustainable and transformative economic change through diversification of the Saudi Arabian economy and building its international asset portfolio. The program aims at generating sustainable returns and fostering economic diversification.

#### **Strategic Review | PIF Vision Realization Program**

#### **In 2021, PIF Launched Its Vision Realization Program 2021-2025:**
- **Vision:** The program is designed to guide the Fund's evolution and align the Fund's strategy with the Kingdom's vision and ambitions.
- **Objectives:** The VIP emphasizes PIF's role in achieving Vision 2030 by building a diversified and sustainable future, positioning the Fund as a practical player in Saudi Arabia's broader economic narrat

In [6]:
image_path = "./pif.png"
result = processor.process_images(
    image_path,
    prompt="""You are an expert OCR model who can read and interpret hard images in details
    and in great precision. just extract the text / numbers you see ."""
)
print(f"Single image result: {result[0]}")

Single image result: Here is the extracted text and numbers from the image:

**PIF VISION REALIZATION PROGRAM**

- **Strategic Review | PIF Vision Realization Program**
- **The Public Investment Fund's cornerstone in Vision 2030, tasked with driving sustainable and transformative economic change through investments in the Saudi economy and building its international asset portfolio, to achieve long-term sustainable returns and fostering economic diversification.**
- **In 2021, PIF launched its Vision Realization Program 2021-2025, redefining a critical step in the Fund’s evolution and the alignment of its strategy with the Kingdom’s objectives and ambitions. The VIP emphasizes PIF’s role in achieving Vision 2030 by building a diversified and sustainable future, positioning the Fund as a practical player in Saudi Arabia’s broader economic narrative.**
- **EXPECTED IMPACT BY 2025**
  - **Cumulative Non-of GDP Contribution:** SAR 1.2 TN
  - **Job Creation:** 1.8 MN (Direct, Indirect and I

In [7]:
image_path = "./pif.png"
result = processor.process_images(
    image_path,
    prompt="""Extract the page numbers."""
)
print(f"Single image result: {result[0]}")

Single image result: The page numbers in the image are 26 and 27.


In [8]:
from time import time

start_time = time()

image_path = "./pif.png"
result = processor.process_images(
    image_path,
    prompt="""You are an expert OCR model who can read and interpret hard images in details
    and in great precision. Given these images extract every detail of it in an organized format,
    include any numbers you see .. page numbers also"""
)

end_time = time()
execution_time = end_time - start_time

print(f"Single image result: {result[0]}")
print(f"Execution time: {execution_time:.2f} seconds")

Single image result: Certainly! Here is the extracted information from the image:

---

**PIF VISION REALIZATION PROGRAM**

**Strategic Review | PIF Vision Realization Program**

**The Public Investment Fund's commitment in Saudi Vision 2030: Working towards driving sustainable and transformative economic change through the development of the Saudi economy and building its international asset portfolio, with the aim of achieving long-term sustainable returns and fostering economic diversification.**

**In 2021, PIF launched its Vision Realization Program 2021-2025, redefining a critical step in the Fund’s evolution and the alignment of its strategic objectives with its resources and ambitions. The VIP emphasizes PIF’s role in shaping Saudi Arabia’s vision for a diversified and sustainable future, positioning the Fund as a practical player in Saudi Arabia’s broader economic narrative.**

**EXPECTED IMPACT BY 2025**

- **Cumulative Non-of GDP Contribution:** SAR 1.2 TN (cumulative)
- **J

# Arabic images

# well lets try it out what do you think ?
(N/G)

In [9]:
from time import time

start_time = time()

image_path = "./arabic_cbe.png"
result = processor.process_images(
    image_path,
    prompt="""You are an expert OCR model who can read and interpret hard images in details
    and in great precision. Given these images extract every detail of it in an organized format,
    include any numbers you see .. page numbers also"""
)

end_time = time()
execution_time = end_time - start_time

print(f"Single image result: {result[0]}")
print(f"Execution time: {execution_time:.2f} seconds")

Single image result: The image appears to be a pie chart or a circular diagram with various segments, each labeled with text and numerical values. Here is the detailed breakdown:

1. **Title**: 
   - The title at the top reads: "ةيفرصملا تاروطتلا مهأ" which translates to "The Impact of Financial Policies on Economic Growth."

2. **Segments**:
   - **Segment 1**: 
     - Label: "هينج رايلم" (Economic Growth)
     - Value: 4798,9%
   - **Segment 2**: 
     - Label: "هينج رايلم" (Economic Growth)
     - Value: 9450,8%
   - **Segment 3**: 
     - Label: "ضورفلا لامجإ" (Public Sector Performance)
     - Value: 50,8%
   - **Segment 4**: 
     - Label: "عئادولا لامجإ" (Private Sector Performance)
     - Value: 50,8%
   - **Segment 5**: 
     - Label: "ضورفلا ةظفحم لامجإ" (Public Sector Performance)
     - Value: 17,7%
   - **Segment 6**: 
     - Label: "ضورفلا ةظفحم لامجإ" (Public Sector Performance)
     - Value: 1,2%
   - **Segment 7**: 
     - Label: "ضورفلا ةظفحم لامجإ" (Public Sector Per

In [11]:
from time import time

start_time = time()

image_path = "./arabic_cbe.png"
result = processor.process_images(
    image_path,
    prompt="""You are an expert Arabic OCR model who can read and interpret hard images in details
    and in great precision. Given these images extract every detail of it in an organized format,
    include any numbers you see .. page numbers also .. Generate in arabic text only"""
)

end_time = time()
execution_time = end_time - start_time

print(f"Single image result: {result[0]}")
print(f"Execution time: {execution_time:.2f} seconds")

Single image result: فيما يلي تفاصيل الصورة المقدمة:

- **العنوان الرئيسي**: ةيفرصملا تاروطتلا مهأ

- **البيانات الرئيسية**:
  - **4798,9**: هينج رايلم
  - **9450,8**: هينج رايلم
  - **٪50,8**: عئادولا لىإ ضورقلا ةبسن
  - **٪17,7**: ىلع دئاعلا لدعم ةيكمللا قوقح طسوتم فيرصملا زاهجلل
  - **٪1,2**: ىلع دئاعلا لدعم لوصلأا طسوتم
  - **٪2,3**: ضورقلا ةظفحم لامج ةيامحلإا تلايهستلاو تلايهستلاو ضورقلا لىإ ةقطنملا ريغ ضورقلا لامجو تلايهستلاو
  - **٪91,6**: تامصصخم ةبسن لىإ تلايهستلاو ضورقلا تلايهستلاو ضورقلا ةقطنملا ريغ

- **البيانات الإجمالية**: عئادولا لامجإ و ضورقلا لامجإ.
Execution time: 7.39 seconds


# lets expand the res of the image

In [1]:
import os
import base64
from typing import List, Union, Dict
import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
from PIL import Image
import os

class QwenVLProcessor:
    def __init__(
        self,
        model_name: str = "Qwen/Qwen2.5-VL-7B-Instruct",
        device: str = "cuda",
        min_pixels: int = 128*25*25,
        max_pixels: int = 1500*35*35,
        cache_dir: str = None  

    ):
        """
        Initialize the QwenVL processor with custom configuration.

        Args:
            model_name: Name or path of the model to load
            device: Device to run the model on ('cuda' or 'cpu')
            use_flash_attention: Whether to use flash attention
            min_pixels: Minimum number of pixels for image processing
            max_pixels: Maximum number of pixels for image processing
        """
        # Configure CUDA memory allocation
        os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

        # Clear CUDA cache
        if device == "cuda":
            torch.cuda.empty_cache()

        # Load model and assign to self
        self.model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            model_name,
            torch_dtype=torch.bfloat16,
            device_map=device,
            attn_implementation="flash_attention_2",  # Changed from use_flash_attention_2
            use_cache=True,
            cache_dir=cache_dir,
        )

        # Load processor and assign to self
        self.processor = AutoProcessor.from_pretrained(
            model_name,
            min_pixels=min_pixels,
            max_pixels=max_pixels,
            use_fast=True
        )

        self.device = device

    def _encode_image(self, image_path: str) -> str:
        """
        Encode a local image file to base64.

        Args:
            image_path: Path to the local image file

        Returns:
            Base64 encoded string of the image
        """
        with open(image_path, "rb") as image_file:
            encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
        return f"data:image/jpeg;base64,{encoded_string}"

    def prepare_messages(
        self,
        image_paths: Union[str, List[str]],
        prompt: str
    ) -> List[Dict]:
        """
        Prepare messages for the model using local image paths.

        Args:
            image_paths: Single path or list of paths to local images
            prompt: Text prompt to process with the images

        Returns:
            List of formatted messages for the model
        """
        if isinstance(image_paths, str):
            image_paths = [image_paths]

        messages = []
        for path in image_paths:
            encoded_image = self._encode_image(path)
            messages.append({
                "role": "user",
                "content": [
                    {"type": "image", "image": encoded_image},
                    {"type": "text", "text": prompt}
                ]
            })
        return messages

    def process_images(
        self,
        image_paths: Union[str, List[str]],
        prompt: str,
        max_new_tokens: int = 2000,
        temperature: float = 0.1,
        top_p: float = 0.9
    ) -> List[str]:
        """
        Process local images with the given prompt.

        Args:
            image_paths: Single path or list of paths to local images
            prompt: Text prompt to process with the images
            max_new_tokens: Maximum number of tokens to generate
            temperature: Sampling temperature
            top_p: Top-p sampling parameter

        Returns:
            List of generated responses for each image
        """
        messages = self.prepare_messages(image_paths, prompt)

        with torch.inference_mode():
            text = self.processor.apply_chat_template(
                messages,
                tokenize=False,
                add_generation_prompt=True
            )

            image_inputs, video_inputs = process_vision_info(messages)
            inputs = self.processor(
                text=[text],
                images=image_inputs,
                videos=video_inputs,
                padding=True,
                return_tensors="pt"
            )

            inputs = inputs.to(self.device)

            generated_ids = self.model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                do_sample=True,
                temperature=temperature,
                top_p=top_p,
                pad_token_id=self.processor.tokenizer.pad_token_id,
                eos_token_id=self.processor.tokenizer.eos_token_id
            )

            generated_ids_trimmed = [
                out_ids[len(in_ids):]
                for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
            ]

            output_text = self.processor.batch_decode(
                generated_ids_trimmed,
                skip_special_tokens=True,
                clean_up_tokenization_spaces=False
            )

        return output_text

if __name__ == "__main__":

    processor = QwenVLProcessor()

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

In [2]:
from time import time

start_time = time()

image_path = "./arabic_cbe.png"
result = processor.process_images(
    image_path,
    prompt="""You are an expert Arabic OCR model who can read and interpret hard images in details
    and in great precision. Given these images extract every detail of it in an organized format,
    include any numbers you see .. page numbers also .. Generate in arabic text"""
)

end_time = time()
execution_time = end_time - start_time

print(f"Single image result: {result[0]}")
print(f"Execution time: {execution_time:.2f} seconds")

Single image result: الصورة تقدم معلومات عن أهم التطورات المصرفية في مصر، وتشمل البيانات التالية:

1. إجمالي القروض: 4798,9 مليار جنيه.
2. إجمالي الودائع: 9450,8 مليار جنيه.
3. نسبة القروض إلى الودائع: 50,8%.

بالإضافة إلى ذلك، هناك بعض الأرقام الأخرى التي تم تقديمها في الصورة:

- معدل العائد على متوسط حقوق الملكية للجهاز المصرفي: 17,7%.
- معدل العائد على متوسط الأصول: 1,2%.
- إجمالي محفظة القروض والتسهيلات غير المنتظمة إلى إجمالي القروض والتسهيلات: 20,3%.
- نسبة مخصصات القروض والتسهيلات إلى القروض والتسهيلات غير المنتظمة: 91,6%.
Execution time: 6.23 seconds


In [3]:
from time import time

start_time = time()

image_path = "./pif.png"
result = processor.process_images(
    image_path,
    prompt="""You are an expert OCR model who can read and interpret hard images in details
    and in great precision. Given these images extract every detail of it in an organized format,
    include any numbers you see .. page numbers also"""
)

end_time = time()
execution_time = end_time - start_time

print(f"Single image result: {result[0]}")
print(f"Execution time: {execution_time:.2f} seconds")

Single image result: Certainly! Here is the detailed information extracted from the image:

---

### PIF Vision Realization Program

#### Strategic Review | PIF Vision Realization Program

**PIF Vision Realization Program:**
- **The Public Investment Fund (PIF) is a cornerstone in Saudi Vision 2030, tasked with driving sustainable and transformative economic change.**
- **Focused on strengthening the local economy and building its international asset portfolio, PIF is dedicated to maximizing sustainable returns and fostering economic diversification.**

#### Source of Funding:
- Capital injections from the government
- Government assets transferred to PIF
- Loans and debt instruments
- Retained earnings from investments

#### Direct Objectives:
- Grow the assets of PIF
- Unlock new sectors through PIF
- Build strategic economic partnerships through PIF
- Localize cutting-edge technology and knowledge through PIF

#### Strategic Pillars:
- Launch and grow domestic sectors
- Develop dome

# always good ? no :/

In [4]:
from time import time

start_time = time()

image_path = "./friend.png"
result = processor.process_images(
    image_path,
    prompt="""You are an expert Arabic OCR model who can read and interpret hard images in details
    and in great precision. Given these images extract every detail of it in an organized format,
    include any numbers you see .. Generate in arabic text"""
)

end_time = time()
execution_time = end_time - start_time

print(f"Single image result: {result[0]}")
print(f"Execution time: {execution_time:.2f} seconds")

Single image result: الصداقة مواقف وليس عشرون عاماً
Execution time: 0.45 seconds


## Section 2
## can i embedd an image directly ?
## can the model see ?

### 1. VisRag

### lets try the pdf and convert it to images

# now use VisRag-Ret to retrieve relevant information from the images

### https://huggingface.co/openbmb/VisRAG-Ret

#### first prepare the pdf and convert to images 

In [6]:
!pip install fitz pymupdf pdf2image 
# sudo apt-get install poppler-utils

Collecting fitz
  Downloading fitz-0.0.1.dev2-py2.py3-none-any.whl (20 kB)
Collecting pymupdf
  Downloading pymupdf-1.25.3-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (20.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m62.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting pdf2image
  Downloading pdf2image-1.17.0-py3-none-any.whl (11 kB)
Collecting httplib2
  Downloading httplib2-0.22.0-py3-none-any.whl (96 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.9/96.9 KB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Collecting scipy
  Downloading scipy-1.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m37.6/37.6 MB[0m [31m48.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting nibabel
  Downloading nibabel-5.3.2-py3-none-any.whl (3.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB

In [7]:
import requests
import os

def download_pdf(url, output_filename):
    """
    Download a PDF file from a URL and save it with the specified filename
    
    Args:
        url (str): URL of the PDF file
        output_filename (str): Name to save the file as
    
    Returns:
        bool: True if successful, False otherwise
    """
    try:
        # Send GET request to the URL
        response = requests.get(url, stream=True)
        
        # Raise an exception if the request was unsuccessful
        response.raise_for_status()
        
        # Check if the content type is PDF
        content_type = response.headers.get('Content-Type', '')
        if 'application/pdf' not in content_type and '.pdf' not in url:
            print(f"Warning: The content might not be a PDF. Content-Type: {content_type}")
        
        # Save the file
        with open(output_filename, 'wb') as file:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    file.write(chunk)
        
        print(f"Successfully downloaded: {output_filename}")
        print(f"File size: {os.path.getsize(output_filename) / 1024:.2f} KB")
        return True
    
    except requests.exceptions.RequestException as e:
        print(f"Error downloading the file: {e}")
        return False

# URL of the PDF to download
url = "https://www.pif.gov.sa/-/media/project/pif-corporate/pif-corporate-site/investors/credit-rating/pdf/moodys-rating-report.pdf"

# Output filename
output_filename = "moodys-rating-report.pdf"

# Download the PDF
download_pdf(url, output_filename)

Successfully downloaded: moodys-rating-report.pdf
File size: 132.03 KB


True

In [8]:
import os
from pdf2image import convert_from_path

def convert_pdf_to_jpg(pdf_path: str, output_folder: str, dpi: int = 300) -> list:
    """
    Convert PDF pages to JPG images using pdf2image.
    
    Parameters:
    -----------
    pdf_path : str
        Path to the input PDF file
    output_folder : str
        Path to the folder where JPG images will be saved
    dpi : int, optional
        DPI for rendering (higher means better quality but larger files)
        
    Returns:
    --------
    list
        List of paths to the generated JPG files
    """
    
    # Validate input PDF file
    if not os.path.exists(pdf_path):
        raise FileNotFoundError(f"PDF file not found: {pdf_path}")
    
    # Create output folder if it doesn't exist
    os.makedirs(output_folder, exist_ok=True)
    
    try:
        # Convert PDF to list of images
        images = convert_from_path(pdf_path, dpi=dpi)
        output_files = []
        
        # Save each image
        for i, image in enumerate(images):
            output_path = os.path.join(output_folder, f"page_{i+1}.jpg")
            image.save(output_path, "JPEG")
            output_files.append(output_path)
            print(f"Converted page {i+1} to {output_path}")
            
        return output_files
    
    except Exception as e:
        raise Exception(f"Error converting PDF: {str(e)}")

# Example usage
if __name__ == "__main__":
    try:
        # Convert a sample PDF
        pdf_file = "moodys-rating-report.pdf"
        output_dir = "moodys-rating-report"
        
        # Convert PDF to images (higher DPI for better quality)
        image_files = convert_pdf_to_jpg(pdf_file, output_dir, dpi=400)
        
        print(f"\nSuccessfully converted {len(image_files)} pages")
        print("Output files:", image_files)
        
    except Exception as e:
        print(f"Error: {str(e)}")

Converted page 1 to moodys-rating-report/page_1.jpg
Converted page 2 to moodys-rating-report/page_2.jpg
Converted page 3 to moodys-rating-report/page_3.jpg
Converted page 4 to moodys-rating-report/page_4.jpg
Converted page 5 to moodys-rating-report/page_5.jpg
Converted page 6 to moodys-rating-report/page_6.jpg
Converted page 7 to moodys-rating-report/page_7.jpg
Converted page 8 to moodys-rating-report/page_8.jpg
Converted page 9 to moodys-rating-report/page_9.jpg
Converted page 10 to moodys-rating-report/page_10.jpg
Converted page 11 to moodys-rating-report/page_11.jpg

Successfully converted 11 pages
Output files: ['moodys-rating-report/page_1.jpg', 'moodys-rating-report/page_2.jpg', 'moodys-rating-report/page_3.jpg', 'moodys-rating-report/page_4.jpg', 'moodys-rating-report/page_5.jpg', 'moodys-rating-report/page_6.jpg', 'moodys-rating-report/page_7.jpg', 'moodys-rating-report/page_8.jpg', 'moodys-rating-report/page_9.jpg', 'moodys-rating-report/page_10.jpg', 'moodys-rating-report/pag

In [10]:
!pip install SentencePiece timm
# restart required



In [1]:
from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F
from PIL import Image
import os
from tqdm import tqdm
import numpy as np
import json
import pickle
import time
class ImageRetriever:
    def __init__(self):
        """Initialize basic attributes without loading the model."""
        self.images = []
        self.image_paths = []
        self.embeddings = None
        self.model = None
        self.tokenizer = None
        
    def _init_model(self, model_name="openbmb/VisRAG-Ret", use_cuda=True):
        """Initialize the model only when needed."""
        if self.model is None:
            self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
            device = 'cuda' if use_cuda and torch.cuda.is_available() else 'cpu'
            self.model = AutoModel.from_pretrained(
                model_name, 
                torch_dtype=torch.bfloat16 if device == 'cuda' else torch.float32,
                trust_remote_code=True
            ).to(device)
            self.model.eval()

    def weighted_mean_pooling(self, hidden, attention_mask):
        """Apply weighted mean pooling to the hidden states."""
        attention_mask_ = attention_mask * attention_mask.cumsum(dim=1)
        s = torch.sum(hidden * attention_mask_.unsqueeze(-1).float(), dim=1)
        d = attention_mask_.sum(dim=1, keepdim=True).float()
        return s / d

    @torch.no_grad()
    def encode(self, text_or_image_list):
        """Encode text queries or images into embeddings."""
        self._init_model()  # Initialize model only if needed
            
        if isinstance(text_or_image_list[0], str):
            inputs = {
                "text": text_or_image_list,
                'image': [None] * len(text_or_image_list),
                'tokenizer': self.tokenizer
            }
        else:
            inputs = {
                "text": [''] * len(text_or_image_list),
                'image': text_or_image_list,
                'tokenizer': self.tokenizer
            }
        
        outputs = self.model(**inputs)
        attention_mask = outputs.attention_mask
        hidden = outputs.last_hidden_state
        
        reps = self.weighted_mean_pooling(hidden, attention_mask)
        embeddings = F.normalize(reps, p=2, dim=1).detach().cpu().numpy()
        return embeddings
    def load_images(self, image_dir, save_dir=None):
        """Load images and embeddings, computing only if necessary."""
        print(f"\nAttempting to load images from directory: {image_dir}")
        print(f"Embeddings directory: {save_dir}")

        if not save_dir:
            print("No save_dir provided, will compute embeddings without saving")
            should_compute = True
        else:
            # Check for existing embeddings
            embeddings_path = os.path.join(save_dir, 'embeddings.pkl')
            paths_file = os.path.join(save_dir, 'image_paths.json')
            
            print(f"Checking for existing embeddings at: {embeddings_path}")
            print(f"Checking for paths file at: {paths_file}")

            if os.path.exists(embeddings_path) and os.path.exists(paths_file):
                try:
                    # Load embeddings and paths
                    print("Found existing embedding files, attempting to load...")
                    with open(embeddings_path, 'rb') as f:
                        self.embeddings = pickle.load(f)
                    with open(paths_file, 'r') as f:
                        self.image_paths = json.load(f)['image_paths']
                    
                    # Verify image paths still exist
                    missing_images = [p for p in self.image_paths if not os.path.exists(p)]
                    if missing_images:
                        print(f"Found {len(missing_images)} missing images, will recompute")
                        should_compute = True
                    else:
                        # Load images
                        print("Loading images from saved paths...")
                        self.images = []
                        for path in self.image_paths:
                            image = Image.open(path).convert('RGB')
                            self.images.append(image)
                        
                        print(f"Successfully loaded {len(self.images)} images and their embeddings")
                        return
                        
                except Exception as e:
                    print(f"Error loading saved embeddings: {e}")
                    print("Will recompute embeddings")
                    should_compute = True
            else:
                print("No existing embedding files found")
                should_compute = True

        # If we get here, we need to compute embeddings
        print("\nComputing new embeddings...")
        supported_formats = {'.jpg', '.jpeg', '.png', '.bmp', '.gif', '.tiff'}
        self.images = []
        self.image_paths = []

        # Load images
        for filename in os.listdir(image_dir):
            if os.path.splitext(filename)[1].lower() in supported_formats:
                image_path = os.path.join(image_dir, filename)
                try:
                    image = Image.open(image_path).convert('RGB')
                    self.images.append(image)
                    self.image_paths.append(image_path)
                except Exception as e:
                    print(f"Error loading {filename}: {str(e)}")

        if not self.images:
            raise ValueError(f"No valid images found in {image_dir}")

        # Compute embeddings
        print(f"Computing embeddings for {len(self.images)} images...")
        self.embeddings = self.encode(self.images)
        
        # Save if requested
        if save_dir:
            os.makedirs(save_dir, exist_ok=True)
            with open(os.path.join(save_dir, 'embeddings.pkl'), 'wb') as f:
                pickle.dump(self.embeddings, f)
            with open(os.path.join(save_dir, 'image_paths.json'), 'w') as f:
                json.dump({'image_paths': self.image_paths}, f)
            print(f"Saved new embeddings to {save_dir}")

    def query(self, question, k=3):
        """Query the images with a question and return top-k most relevant images."""
        if self.embeddings is None:
            raise ValueError("No images loaded. Please load images first using load_images()")
            
        # Prepare and encode query
        query = ["Represent this query for retrieving relevant documents: " + question]
        query_embedding = self.encode(query)
        
        # Get top-k results
        scores = (query_embedding @ self.embeddings.T)[0]
        top_k_indices = np.argsort(scores)[-k:][::-1]
        
        return [
            {
                'image_path': self.image_paths[idx],
                'score': float(scores[idx]),
                'image': self.images[idx]
            }
            for idx in top_k_indices
        ]

def main():
    # Initialize retriever
    start_time = time.time()
    retriever = ImageRetriever()
    
    # Define directories
    image_dir = "moodys-rating-report"  # Replace with your image directory
    embeddings_dir = "embeddings"  # Directory to save/load embeddings
    
    # Load images and compute/load embeddings
    retriever.load_images(image_dir, save_dir=embeddings_dir)
    
    # Example queries
    questions = [
        "How does PIF's investment strategy align with Saudi Arabia's Vision 2030 goals?",
    ]
    
    # Process each query
    for question in questions:
        print(f"\nQuery: {question}")
        results = retriever.query(question, k=10)
        
        # Print results
        for i, result in enumerate(results, 1):
            print(f"\nResult {i}:")
            print(f"Image: {os.path.basename(result['image_path'])}")
            print(f"Score: {result['score']:.4f}")
    
    total_execution_time = time.time() - start_time
    print(" ")
    print(f"time: {total_execution_time:.4f} second")
if __name__ == "__main__":
    main()


Attempting to load images from directory: moodys-rating-report
Embeddings directory: embeddings
Checking for existing embeddings at: embeddings/embeddings.pkl
Checking for paths file at: embeddings/image_paths.json
No existing embedding files found

Computing new embeddings...
Computing embeddings for 11 images...


tokenizer_config.json:   0%|          | 0.00/3.38k [00:00<?, ?B/s]

tokenizer.py:   0%|          | 0.00/983 [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/openbmb/VisRAG-Ret:
- tokenizer.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


tokenizer.model:   0%|          | 0.00/1.99M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/765 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/6.20M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

configuration_minicpm.py:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/openbmb/VisRAG-Ret:
- configuration_minicpm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_visrag_ret.py:   0%|          | 0.00/4.61k [00:00<?, ?B/s]

resampler.py:   0%|          | 0.00/5.61k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/openbmb/VisRAG-Ret:
- resampler.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_minicpm.py:   0%|          | 0.00/71.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/openbmb/VisRAG-Ret:
- modeling_minicpm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_minicpmv.py:   0%|          | 0.00/21.0k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/openbmb/VisRAG-Ret:
- modeling_minicpmv.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/openbmb/VisRAG-Ret:
- modeling_visrag_ret.py
- resampler.py
- modeling_minicpm.py
- modeling_minicpmv.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/54.6k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.88G [00:00<?, ?B/s]

MiniCPMForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Saved new embeddings to embeddings

Query: How does PIF's investment strategy align with Saudi Arabia's Vision 2030 goals?

Result 1:
Image: page_3.jpg
Score: 0.3717

Result 2:
Image: page_4.jpg
Score: 0.3493

Result 3:
Image: page_2.jpg
Score: 0.3273

Result 4:
Image: page_1.jpg
Score: 0.3074

Result 5:
Image: page_5.jpg
Score: 0.2841

Result 6:
Image: page_6.jpg
Score: 0.2780

Result 7:
Image: page_8.jpg
Score: 0.2461

Result 8:
Image: page_7.jpg
Score: 0.1820

Result 9:
Image: page_9.jpg
Score: 0.1339

Result 10:
Image: page_10.jpg
Score: 0.0859
 
time: 88.5887 second


## check if it matches claude 3.7
https://claude.ai/share/cbc8e34d-11d8-40ec-9a09-9b39f634cc0b 

# now lets try colqwen2.5 Ret

https://github.com/illuin-tech/colpali?tab=readme-ov-file

In [2]:
!pip install colpali-engine 

Collecting colpali-engine
  Downloading colpali_engine-0.3.8-py3-none-any.whl (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.5/48.5 KB[0m [31m945.9 kB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
Collecting gputil
  Downloading GPUtil-1.4.0.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting peft<0.15.0,>=0.14.0
  Downloading peft-0.14.0-py3-none-any.whl (374 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m374.8/374.8 KB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting transformers<4.48.0,>=4.47.0
  Downloading transformers-4.47.1-py3-none-any.whl (10.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m62.0 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
Building wheels for collected packages: gputil
doneng wheel for gputil (setup.py) ... [?25l
[?25h  Created wheel for gputil: filename=GPUtil-1.4.0-py3-none-any.whl size=7410 sha256=bb9eeac36c803cf641f6cbd

In [3]:
!pip install git+https://github.com/illuin-tech/colpali 

Collecting git+https://github.com/illuin-tech/colpali
  Cloning https://github.com/illuin-tech/colpali to /tmp/pip-req-build-0ilt_7z0
  Running command git clone --filter=blob:none --quiet https://github.com/illuin-tech/colpali /tmp/pip-req-build-0ilt_7z0
  Resolved https://github.com/illuin-tech/colpali to commit 71af1932ab57c79dc6ad4b4e2f8e8339754d4bc9
  Installing build dependenciesdone
done5h  Getting requirements to build wheel ... [?25l
done5h  Preparing metadata (pyproject.toml) ... [?25l
Collecting transformers<4.50.0,>=4.49.0
  Downloading transformers-4.49.0-py3-none-any.whl (10.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m39.2 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
Building wheels for collected packages: colpali_engine
doneilding wheel for colpali_engine (pyproject.toml) ... [?25l
[?25h  Created wheel for colpali_engine: filename=colpali_engine-0.3.9.dev35+g71af193-py3-none-any.whl size=54073 sha256=87c23204d42a5564d

In [4]:
import requests
import os

def download_pdf(url, output_filename):
    """
    Download a PDF file from a URL and save it with the specified filename
    
    Args:
        url (str): URL of the PDF file
        output_filename (str): Name to save the file as
    
    Returns:
        bool: True if successful, False otherwise
    """
    try:
        # Send GET request to the URL
        response = requests.get(url, stream=True)
        
        # Raise an exception if the request was unsuccessful
        response.raise_for_status()
        
        # Check if the content type is PDF
        content_type = response.headers.get('Content-Type', '')
        if 'application/pdf' not in content_type and '.pdf' not in url:
            print(f"Warning: The content might not be a PDF. Content-Type: {content_type}")
        
        # Save the file
        with open(output_filename, 'wb') as file:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    file.write(chunk)
        
        print(f"Successfully downloaded: {output_filename}")
        print(f"File size: {os.path.getsize(output_filename) / 1024:.2f} KB")
        return True
    
    except requests.exceptions.RequestException as e:
        print(f"Error downloading the file: {e}")
        return False

# URL of the PDF to download
url = "https://www.pif.gov.sa/-/media/project/pif-corporate/pif-corporate-site/investors/credit-rating/pdf/moodys-rating-report.pdf"

# Output filename
output_filename = "moodys-rating-report.pdf"

# Download the PDF
download_pdf(url, output_filename)

Successfully downloaded: moodys-rating-report.pdf
File size: 132.03 KB


True

In [5]:
import os
from pdf2image import convert_from_path

def convert_pdf_to_jpg(pdf_path: str, output_folder: str, dpi: int = 300) -> list:
    """
    Convert PDF pages to JPG images using pdf2image.
    
    Parameters:
    -----------
    pdf_path : str
        Path to the input PDF file
    output_folder : str
        Path to the folder where JPG images will be saved
    dpi : int, optional
        DPI for rendering (higher means better quality but larger files)
        
    Returns:
    --------
    list
        List of paths to the generated JPG files
    """
    
    # Validate input PDF file
    if not os.path.exists(pdf_path):
        raise FileNotFoundError(f"PDF file not found: {pdf_path}")
    
    # Create output folder if it doesn't exist
    os.makedirs(output_folder, exist_ok=True)
    
    try:
        # Convert PDF to list of images
        images = convert_from_path(pdf_path, dpi=dpi)
        output_files = []
        
        # Save each image
        for i, image in enumerate(images):
            output_path = os.path.join(output_folder, f"page_{i+1}.jpg")
            image.save(output_path, "JPEG")
            output_files.append(output_path)
            print(f"Converted page {i+1} to {output_path}")
            
        return output_files
    
    except Exception as e:
        raise Exception(f"Error converting PDF: {str(e)}")

# Example usage
if __name__ == "__main__":
    try:
        # Convert a sample PDF
        pdf_file = "moodys-rating-report.pdf"
        output_dir = "moodys-rating-report"
        
        # Convert PDF to images (higher DPI for better quality)
        image_files = convert_pdf_to_jpg(pdf_file, output_dir, dpi=400)
        
        print(f"\nSuccessfully converted {len(image_files)} pages")
        print("Output files:", image_files)
        
    except Exception as e:
        print(f"Error: {str(e)}")

Converted page 1 to moodys-rating-report/page_1.jpg
Converted page 2 to moodys-rating-report/page_2.jpg
Converted page 3 to moodys-rating-report/page_3.jpg
Converted page 4 to moodys-rating-report/page_4.jpg
Converted page 5 to moodys-rating-report/page_5.jpg
Converted page 6 to moodys-rating-report/page_6.jpg
Converted page 7 to moodys-rating-report/page_7.jpg
Converted page 8 to moodys-rating-report/page_8.jpg
Converted page 9 to moodys-rating-report/page_9.jpg
Converted page 10 to moodys-rating-report/page_10.jpg
Converted page 11 to moodys-rating-report/page_11.jpg

Successfully converted 11 pages
Output files: ['moodys-rating-report/page_1.jpg', 'moodys-rating-report/page_2.jpg', 'moodys-rating-report/page_3.jpg', 'moodys-rating-report/page_4.jpg', 'moodys-rating-report/page_5.jpg', 'moodys-rating-report/page_6.jpg', 'moodys-rating-report/page_7.jpg', 'moodys-rating-report/page_8.jpg', 'moodys-rating-report/page_9.jpg', 'moodys-rating-report/page_10.jpg', 'moodys-rating-report/pag

## this needs a base model cuz its an Adaptor 

An adapter refers to a lightweight and efficient fine-tuning technique that adds small trainable components to a pre-trained model without modifying the original model weights.

### lets talk about Adaptors for a couple of minutes

In [15]:
!pip install transformers==4.49.0 peft==0.14.0  #-> restart kernel
# !pip install transformers==4.34.0 
# !pip uninstall transformers -y

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting transformers==4.49.0
  Using cached transformers-4.49.0-py3-none-any.whl (10.0 MB)
Collecting tokenizers<0.22,>=0.21
  Using cached tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
Collecting huggingface-hub<1.0,>=0.26.0
  Downloading huggingface_hub-0.29.2-py3-none-any.whl (468 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.1/468.1 KB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: huggingface-hub, tokenizers, transformers
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.17.3
    Uninstalling huggingface-hub-0.17.3:
      Successfully uninstalled huggingface-hub-0.17.3
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.14.1
    Uninstalling tokenizers-0.14.1:
      Successfully uninstalled tokenizers-0.14.1
  Attempting uninstall: transformers
    Found existing installation: transformers 4.34.0
    Un

https://www.pif.gov.sa/-/media/project/pif-corporate/pif-corporate-site/our-financials/annual-reports/pdf/pif-2023-annual-report-ar.pdf saudi pif ar

In [8]:
import requests
import os
from tqdm import tqdm

def download_pdf(url, output_filename):
    """
    Download a PDF file from a URL and save it with the specified filename
    
    Args:
        url (str): URL of the PDF file
        output_filename (str): Name to save the file as
    
    Returns:
        bool: True if successful, False otherwise
    """
    try:
        # Send GET request to the URL
        response = requests.get(url, stream=True)
        
        # Raise an exception if the request was unsuccessful
        response.raise_for_status()
        
        # Check if the content type is PDF
        content_type = response.headers.get('Content-Type', '')
        if 'application/pdf' not in content_type and '.pdf' not in url:
            print(f"Warning: The content might not be a PDF. Content-Type: {content_type}")
        
        # Get the total file size if available
        total_size = int(response.headers.get('content-length', 0))
        
        # Initialize the progress bar
        progress_bar = tqdm(total=total_size, unit='B', unit_scale=True, desc=output_filename)
        
        # Save the file
        with open(output_filename, 'wb') as file:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    file.write(chunk)
                    progress_bar.update(len(chunk))
        
        # Close the progress bar
        progress_bar.close()
        
        print(f"Successfully downloaded: {output_filename}")
        print(f"File size: {os.path.getsize(output_filename) / 1024:.2f} KB")
        return True
    
    except requests.exceptions.RequestException as e:
        print(f"Error downloading the file: {e}")
        return False

# URL of the PDF to download
url = "https://www.pif.gov.sa/-/media/project/pif-corporate/pif-corporate-site/our-financials/annual-reports/pdf/pif-2023-annual-report-ar.pdf"
# Output filename
output_filename = "pif_ar.pdf"
# Download the PDF
download_pdf(url, output_filename)

pif_ar.pdf: 100%|██████████| 19.4M/19.4M [00:22<00:00, 862kB/s]

Successfully downloaded: pif_ar.pdf
File size: 18908.05 KB





True

In [9]:
import os
import concurrent.futures
from pdf2image import convert_from_path
from tqdm import tqdm

def convert_pdf_to_jpg(pdf_path: str, output_folder: str, dpi: int = 400, 
                       threads: int = 4, batch_size: int = 10) -> list:
    """
    Convert PDF pages to JPG images using pdf2image with parallel processing.
    
    Parameters:
    -----------
    pdf_path : str
        Path to the input PDF file
    output_folder : str
        Path to the folder where JPG images will be saved
    dpi : int, optional
        DPI for rendering (higher means better quality but larger files)
    threads : int, optional
        Number of worker threads to use for parallel processing
    batch_size : int, optional
        Number of pages to process in each batch
        
    Returns:
    --------
    list
        List of paths to the generated JPG files
    """
    
    # Validate input PDF file
    if not os.path.exists(pdf_path):
        raise FileNotFoundError(f"PDF file not found: {pdf_path}")
    
    # Create output folder if it doesn't exist
    os.makedirs(output_folder, exist_ok=True)
    
    try:
        # Get total number of pages first
        info = convert_from_path(pdf_path, dpi=dpi, first_page=1, last_page=1)
        total_pages = convert_from_path(pdf_path, dpi=72, first_page=1, last_page=None, thread_count=1)
        num_pages = len(total_pages)
        print(f"PDF has {num_pages} pages. Starting conversion...")
        
        output_files = []
        
        # Define a function to convert a batch of pages
        def convert_batch(batch):
            start_page, end_page = batch
            batch_images = convert_from_path(
                pdf_path,
                dpi=dpi,
                first_page=start_page,
                last_page=end_page,
                thread_count=1  # Use 1 thread per worker as we're already parallelizing
            )
            
            batch_output_files = []
            for i, image in enumerate(batch_images):
                page_num = start_page + i
                output_path = os.path.join(output_folder, f"page_{page_num}.jpg")
                image.save(output_path, "JPEG")
                batch_output_files.append(output_path)
            
            return batch_output_files
        
        # Create batches
        batches = []
        for i in range(1, num_pages + 1, batch_size):
            batches.append((i, min(i + batch_size - 1, num_pages)))
        
        # Process batches in parallel
        with concurrent.futures.ThreadPoolExecutor(max_workers=threads) as executor:
            # Submit all batches to the executor
            future_to_batch = {executor.submit(convert_batch, batch): batch for batch in batches}
            
            # Process results as they complete with progress bar
            with tqdm(total=len(batches), desc="Converting PDF pages") as pbar:
                for future in concurrent.futures.as_completed(future_to_batch):
                    batch_files = future.result()
                    output_files.extend(batch_files)
                    pbar.update(1)
        
        # Sort output files by page number
        output_files.sort(key=lambda x: int(os.path.basename(x).split('_')[1].split('.')[0]))
        
        return output_files
    
    except Exception as e:
        raise Exception(f"Error converting PDF: {str(e)}")

# Example usage
if __name__ == "__main__":
    try:
        # Convert a sample PDF
        pdf_file = "pif_ar.pdf"
        output_dir = "pif_ar"
        
        # Convert PDF to images with parallel processing
        image_files = convert_pdf_to_jpg(
            pdf_file, 
            output_dir, 
            dpi=400,
            threads=os.cpu_count(),  # Use all available CPU cores
            batch_size=5            # Process 5 pages at a time
        )
        
        print(f"\nSuccessfully converted {len(image_files)} pages")
        
    except Exception as e:
        print(f"Error: {str(e)}")

PDF has 80 pages. Starting conversion...


Converting PDF pages: 100%|██████████| 16/16 [00:51<00:00,  3.23s/it]



Successfully converted 80 pages


### VisRag-Ret with arabic will it perform well ?

In [10]:
from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F
from PIL import Image
import os
from tqdm import tqdm
import numpy as np
import json
import pickle
import time
class ImageRetriever:
    def __init__(self):
        """Initialize basic attributes without loading the model."""
        self.images = []
        self.image_paths = []
        self.embeddings = None
        self.model = None
        self.tokenizer = None
        
    def _init_model(self, model_name="openbmb/VisRAG-Ret", use_cuda=True):
        """Initialize the model only when needed."""
        if self.model is None:
            self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
            device = 'cuda' if use_cuda and torch.cuda.is_available() else 'cpu'
            self.model = AutoModel.from_pretrained(
                model_name, 
                torch_dtype=torch.bfloat16 if device == 'cuda' else torch.float32,
                trust_remote_code=True
            ).to(device)
            self.model.eval()

    def weighted_mean_pooling(self, hidden, attention_mask):
        """Apply weighted mean pooling to the hidden states."""
        attention_mask_ = attention_mask * attention_mask.cumsum(dim=1)
        s = torch.sum(hidden * attention_mask_.unsqueeze(-1).float(), dim=1)
        d = attention_mask_.sum(dim=1, keepdim=True).float()
        return s / d

    @torch.no_grad()
    def encode(self, text_or_image_list):
        """Encode text queries or images into embeddings."""
        self._init_model()  # Initialize model only if needed
            
        if isinstance(text_or_image_list[0], str):
            inputs = {
                "text": text_or_image_list,
                'image': [None] * len(text_or_image_list),
                'tokenizer': self.tokenizer
            }
        else:
            inputs = {
                "text": [''] * len(text_or_image_list),
                'image': text_or_image_list,
                'tokenizer': self.tokenizer
            }
        
        outputs = self.model(**inputs)
        attention_mask = outputs.attention_mask
        hidden = outputs.last_hidden_state
        
        reps = self.weighted_mean_pooling(hidden, attention_mask)
        embeddings = F.normalize(reps, p=2, dim=1).detach().cpu().numpy()
        return embeddings
    def load_images(self, image_dir, save_dir=None):
        """Load images and embeddings, computing only if necessary."""
        print(f"\nAttempting to load images from directory: {image_dir}")
        print(f"Embeddings directory: {save_dir}")

        if not save_dir:
            print("No save_dir provided, will compute embeddings without saving")
            should_compute = True
        else:
            # Check for existing embeddings
            embeddings_path = os.path.join(save_dir, 'embeddings.pkl')
            paths_file = os.path.join(save_dir, 'image_paths.json')
            
            print(f"Checking for existing embeddings at: {embeddings_path}")
            print(f"Checking for paths file at: {paths_file}")

            if os.path.exists(embeddings_path) and os.path.exists(paths_file):
                try:
                    # Load embeddings and paths
                    print("Found existing embedding files, attempting to load...")
                    with open(embeddings_path, 'rb') as f:
                        self.embeddings = pickle.load(f)
                    with open(paths_file, 'r') as f:
                        self.image_paths = json.load(f)['image_paths']
                    
                    # Verify image paths still exist
                    missing_images = [p for p in self.image_paths if not os.path.exists(p)]
                    if missing_images:
                        print(f"Found {len(missing_images)} missing images, will recompute")
                        should_compute = True
                    else:
                        # Load images
                        print("Loading images from saved paths...")
                        self.images = []
                        for path in self.image_paths:
                            image = Image.open(path).convert('RGB')
                            self.images.append(image)
                        
                        print(f"Successfully loaded {len(self.images)} images and their embeddings")
                        return
                        
                except Exception as e:
                    print(f"Error loading saved embeddings: {e}")
                    print("Will recompute embeddings")
                    should_compute = True
            else:
                print("No existing embedding files found")
                should_compute = True

        # If we get here, we need to compute embeddings
        print("\nComputing new embeddings...")
        supported_formats = {'.jpg', '.jpeg', '.png', '.bmp', '.gif', '.tiff'}
        self.images = []
        self.image_paths = []

        # Load images
        for filename in os.listdir(image_dir):
            if os.path.splitext(filename)[1].lower() in supported_formats:
                image_path = os.path.join(image_dir, filename)
                try:
                    image = Image.open(image_path).convert('RGB')
                    self.images.append(image)
                    self.image_paths.append(image_path)
                except Exception as e:
                    print(f"Error loading {filename}: {str(e)}")

        if not self.images:
            raise ValueError(f"No valid images found in {image_dir}")

        # Compute embeddings
        print(f"Computing embeddings for {len(self.images)} images...")
        self.embeddings = self.encode(self.images)
        
        # Save if requested
        if save_dir:
            os.makedirs(save_dir, exist_ok=True)
            with open(os.path.join(save_dir, 'embeddings.pkl'), 'wb') as f:
                pickle.dump(self.embeddings, f)
            with open(os.path.join(save_dir, 'image_paths.json'), 'w') as f:
                json.dump({'image_paths': self.image_paths}, f)
            print(f"Saved new embeddings to {save_dir}")

    def query(self, question, k=3):
        """Query the images with a question and return top-k most relevant images."""
        if self.embeddings is None:
            raise ValueError("No images loaded. Please load images first using load_images()")
            
        # Prepare and encode query
        query = ["Represent this query for retrieving relevant documents: " + question]
        query_embedding = self.encode(query)
        
        # Get top-k results
        scores = (query_embedding @ self.embeddings.T)[0]
        top_k_indices = np.argsort(scores)[-k:][::-1]
        
        return [
            {
                'image_path': self.image_paths[idx],
                'score': float(scores[idx]),
                'image': self.images[idx]
            }
            for idx in top_k_indices
        ]

def main():
    # Initialize retriever
    start_time = time.time()
    retriever = ImageRetriever()
    
    # Define directories
    image_dir = "pif_ar"  # Replace with your image directory
    embeddings_dir = "embeddings"  # Directory to save/load embeddings
    
    # Load images and compute/load embeddings
    retriever.load_images(image_dir, save_dir=embeddings_dir)
    
    # Example queries
    questions = [
    "ما هو إجمالي الأصول المدارة لصندوق الاستثمارات العامة حتى عام 2023؟",
    "ما هو إجمالي عائد المساهمين لصندوق الاستثمارات العامة منذ بداية برنامج تحقيق الرؤية؟",
    ]
    
    # Process each query
    for question in questions:
        print(f"\nQuery: {question}")
        results = retriever.query(question, k=10)
        
        # Print results
        for i, result in enumerate(results, 1):
            print(f"\nResult {i}:")
            print(f"Image: {os.path.basename(result['image_path'])}")
            print(f"Score: {result['score']:.4f}")
    
    total_execution_time = time.time() - start_time
    print(" ")
    print(f"time: {total_execution_time:.4f} second")
if __name__ == "__main__":
    main()


Attempting to load images from directory: pif_ar
Embeddings directory: embeddings
Checking for existing embeddings at: embeddings/embeddings.pkl
Checking for paths file at: embeddings/image_paths.json
Found existing embedding files, attempting to load...
Loading images from saved paths...
Successfully loaded 11 images and their embeddings

Query: ما هو إجمالي الأصول المدارة لصندوق الاستثمارات العامة حتى عام 2023؟


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]


Result 1:
Image: page_1.jpg
Score: 0.2109

Result 2:
Image: page_4.jpg
Score: 0.2034

Result 3:
Image: page_3.jpg
Score: 0.1851

Result 4:
Image: page_8.jpg
Score: 0.1705

Result 5:
Image: page_2.jpg
Score: 0.1602

Result 6:
Image: page_6.jpg
Score: 0.1592

Result 7:
Image: page_5.jpg
Score: 0.1428

Result 8:
Image: page_10.jpg
Score: 0.1279

Result 9:
Image: page_7.jpg
Score: 0.1016

Result 10:
Image: page_9.jpg
Score: 0.1011

Query: ما هو إجمالي عائد المساهمين لصندوق الاستثمارات العامة منذ بداية برنامج تحقيق الرؤية؟

Result 1:
Image: page_1.jpg
Score: 0.2240

Result 2:
Image: page_4.jpg
Score: 0.2166

Result 3:
Image: page_3.jpg
Score: 0.2145

Result 4:
Image: page_2.jpg
Score: 0.1876

Result 5:
Image: page_5.jpg
Score: 0.1836

Result 6:
Image: page_8.jpg
Score: 0.1806

Result 7:
Image: page_6.jpg
Score: 0.1601

Result 8:
Image: page_10.jpg
Score: 0.1134

Result 9:
Image: page_9.jpg
Score: 0.1087

Result 10:
Image: page_7.jpg
Score: 0.1013
 
time: 13.8202 second


# now lets try colqwen 0.1

In [None]:
from transformers.utils.import_utils import is_flash_attn_2_available
from colpali_engine.models import ColQwen2, ColQwen2Processor
import torch
import os
from PIL import Image
import time

def process_image_directory(image_dir, queries, base_model="vidore/colqwen2-base", adapter_model="vidore/colqwen2-v0.1"):
    """
    Process all images in a directory with the ColQwen2 model.
    
    Args:
        image_dir (str): Path to directory containing images
        queries (list): List of text queries to score against the images
        base_model (str): HuggingFace model identifier for the base model
        adapter_model (str): HuggingFace model identifier for the adapter
        
    Returns:
        dict: Results with scores for each query-image pair
    """
    # Load base model and adapter
    print(f"Loading base model from {base_model}...")
    model = ColQwen2.from_pretrained(
        base_model,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None,
    ).eval()
    
    # Load adapter
    print(f"Loading adapter from {adapter_model}...")
    try:
        model.load_adapter(adapter_model)
        print("Adapter loaded successfully")
    except Exception as e:
        print(f"Warning: Could not load adapter: {e}")
        print("Continuing with base model only...")
    
    processor = ColQwen2Processor.from_pretrained(base_model)
    print("Model and processor loaded successfully")
    
    # Collect all valid images from directory
    images = []
    image_paths = []
    valid_extensions = ['.jpg', '.jpeg', '.png', '.bmp', '.webp']
    
    print(f"Loading images from {image_dir}...")
    for filename in os.listdir(image_dir):
        file_path = os.path.join(image_dir, filename)
        file_ext = os.path.splitext(filename)[1].lower()
        
        if os.path.isfile(file_path) and file_ext in valid_extensions:
            try:
                img = Image.open(file_path).convert('RGB')
                images.append(img)
                image_paths.append(filename)
                print(f"Loaded image: {filename}")
            except Exception as e:
                print(f"Error loading {filename}: {e}")
    
    if not images:
        print("No valid images found in directory")
        return None
    
    print(f"Processing {len(images)} images with {len(queries)} queries...")
    
    print("Processing queries...")
    query_embeddings = []
    
    for i, query in enumerate(queries):
        try:
            batch_queries = processor.process_queries([query])
            batch_queries = {k: v.to(model.device) for k, v in batch_queries.items()}
            
            with torch.no_grad():
                embedding = model(**batch_queries)
                query_embeddings.append(embedding)
            
            if torch.cuda.is_available():
                torch.cuda.empty_cache()
                
        except Exception as e:
            print(f"  Error processing query: {e}")
            return None
    
    # Process images one at a time
    print("Processing images...")
    image_embeddings = []
    successful_images = []
    
    for i, (img, img_path) in enumerate(zip(images, image_paths)):
        # print(f"  Processing image {i+1}/{len(images)}: {img_path}")
        try:
            # Process a single image
            batch_images = processor.process_images([img])
            batch_images = {k: v.to(model.device) for k, v in batch_images.items()}
            
            with torch.no_grad():
                embedding = model(**batch_images)
                image_embeddings.append(embedding)
                successful_images.append(img_path)
            
            if torch.cuda.is_available():
                torch.cuda.empty_cache()
                
        except Exception as e:
            print(f"  Error processing image {img_path}: {e}")
    
    if not image_embeddings:
        print("No image embeddings were processed successfully.")
        return None
    
    # Calculate similarity scores
    print("Calculating similarity scores...")
    results = {}
    
    for i, query in enumerate(queries):
        query_scores = {}
        query_emb = query_embeddings[i]
        
        for j, img_path in enumerate(successful_images):
            img_emb = image_embeddings[j]
            
            # Calculate cosine similarity (using score_multi_vector from processor)
            scores = processor.score_multi_vector(query_emb, img_emb)
            
            # Get the score (should be a single value)
            score = scores[0, 0].item()
            query_scores[img_path] = score
        
        results[query] = query_scores
    
    return results

if __name__ == "__main__":
    # Set your image directory and queries
    image_directory = "pif_ar"
    queries = [
            "ما هو إجمالي الأصول المدارة لصندوق الاستثمارات العامة حتى عام 2023؟",
    "ما هو إجمالي عائد المساهمين لصندوق الاستثمارات العامة منذ بداية برنامج تحقيق الرؤية؟",
        
    ]
    
    start_time = time.time()
    # Process the directory
    results = process_image_directory(image_directory, queries)
    
    # Display results
    if results:
        for query, scores in results.items():
            print(f"\nQuery: {query}")
            # Sort images by score (highest first)
            sorted_scores = sorted(scores.items(), key=lambda x: x[1], reverse=True)
            for img_name, score in sorted_scores:
                print(f"  {img_name}: {score:.4f}")
    else:
        print("No results were obtained.")

    total_execution_time = time.time() - start_time
    print(f"\nScript completed in {total_execution_time:.2f} seconds")

Loading base model from vidore/colqwen2-base...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading adapter from vidore/colqwen2-v0.1...
Adapter loaded successfully
Model and processor loaded successfully
Loading images from pif_ar...
Loaded image: page_76.jpg
Loaded image: page_77.jpg
Loaded image: page_78.jpg
Loaded image: page_79.jpg
Loaded image: page_80.jpg
Loaded image: page_56.jpg
Loaded image: page_57.jpg
Loaded image: page_58.jpg
Loaded image: page_59.jpg
Loaded image: page_60.jpg
Loaded image: page_1.jpg
Loaded image: page_2.jpg
Loaded image: page_41.jpg
Loaded image: page_3.jpg
Loaded image: page_42.jpg
Loaded image: page_4.jpg
Loaded image: page_43.jpg
Loaded image: page_5.jpg
Loaded image: page_44.jpg
Loaded image: page_71.jpg
Loaded image: page_45.jpg
Loaded image: page_72.jpg
Loaded image: page_73.jpg
Loaded image: page_74.jpg
Loaded image: page_75.jpg
Loaded image: page_21.jpg
Loaded image: page_66.jpg
Loaded image: page_22.jpg
Loaded image: page_67.jpg
Loaded image: page_23.jpg
Loaded image: page_68.jpg
Loaded image: page_24.jpg
Loaded image: page_69.jpg
Load

# Lets combine qwenvl with retrieval 

In [7]:
import os
import torch
from PIL import Image
import time
from typing import List
import base64
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info

# This function is no longer needed as we're directly specifying image paths

class QwenVLProcessor:
    def __init__(
        self,
        model_name: str = "Qwen/Qwen2.5-VL-7B-Instruct",
        device: str = "cuda",
        min_pixels: int = 128*16*16,
        max_pixels: int = 1024*16*16,
        cache_dir: str = None
    ):
        """
        Initialize the QwenVL processor with custom configuration.
        """
        # Configure CUDA memory allocation
        os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

        # Clear CUDA cache
        if device == "cuda":
            torch.cuda.empty_cache()

        print(f"Loading QwenVL model from {model_name}...")
        # Load model and assign to self
        self.model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            model_name,
            torch_dtype=torch.bfloat16,
            device_map=device,
            attn_implementation="flash_attention_2",
            use_cache=True,
            cache_dir=cache_dir,
        )

        # Load processor and assign to self
        self.processor = AutoProcessor.from_pretrained(
            model_name,
            min_pixels=min_pixels,
            max_pixels=max_pixels,
            use_fast=True
        )

        self.device = device
        print("QwenVL model and processor loaded successfully")

    def _encode_image(self, image_path: str) -> str:
        """
        Encode a local image file to base64.
        """
        with open(image_path, "rb") as image_file:
            encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
        return f"data:image/jpeg;base64,{encoded_string}"

    def prepare_messages(
        self,
        image_paths: List[str],
        prompt: str
    ) -> List[dict]:
        """
        Prepare messages for the model using local image paths.
        """
        if isinstance(image_paths, str):
            image_paths = [image_paths]

        messages = []
        for path in image_paths:
            encoded_image = self._encode_image(path)
            messages.append({
                "role": "user",
                "content": [
                    {"type": "image", "image": encoded_image},
                    {"type": "text", "text": prompt}
                ]
            })
        return messages

    def process_images(
        self,
        image_paths: List[str],
        prompt: str,
        max_new_tokens: int = 2000,
        temperature: float = 0.1,
        top_p: float = 0.9
    ) -> List[str]:
        """
        Process local images with the given prompt.
        """
        if isinstance(image_paths, str):
            image_paths = [image_paths]
            
        results = []
        
        # Process one image at a time to avoid memory issues
        for image_path in image_paths:
            print(f"Processing image: {os.path.basename(image_path)}")
            messages = self.prepare_messages(image_path, prompt)

            with torch.inference_mode():
                text = self.processor.apply_chat_template(
                    messages,
                    tokenize=False,
                    add_generation_prompt=True
                )

                image_inputs, video_inputs = process_vision_info(messages)
                inputs = self.processor(
                    text=[text],
                    images=image_inputs,
                    videos=video_inputs,
                    padding=True,
                    return_tensors="pt"
                )

                inputs = inputs.to(self.device)

                generated_ids = self.model.generate(
                    **inputs,
                    max_new_tokens=max_new_tokens,
                    do_sample=True,
                    temperature=temperature,
                    top_p=top_p,
                    pad_token_id=self.processor.tokenizer.pad_token_id,
                    eos_token_id=self.processor.tokenizer.eos_token_id
                )

                generated_ids_trimmed = [
                    out_ids[len(in_ids):]
                    for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
                ]

                output_text = self.processor.batch_decode(
                    generated_ids_trimmed,
                    skip_special_tokens=True,
                    clean_up_tokenization_spaces=False
                )
                
                results.append(output_text[0])
                
                # Clear cache after each image
                if self.device == "cuda":
                    torch.cuda.empty_cache()

        return results

def main():
    start_time = time.time()
    
    # Set your image directory
    image_directory = "pif_ar"  # Base directory containing images
    
    
    image_filenames = [
    "page_23.jpg",
    "page_22.jpg",
    ]
    
    # Generate full paths
    image_paths = [os.path.join(image_directory, filename) for filename in image_filenames]
    
    # Verify images exist
    valid_image_paths = []
    for path in image_paths:
        if os.path.isfile(path):
            valid_image_paths.append(path)
        else:
            print(f"Warning: Image not found: {path}")
    
    if not valid_image_paths:
        print("No valid images found.")
        return
    
    print(f"Found {len(valid_image_paths)} images to process with QwenVL")
    
    # Initialize QwenVL processor
    processor = QwenVLProcessor()
    
    # OCR prompt
    ocr_prompt = """You are an expert OCR model who can read and interpret hard images in details
                   and in great precision. Given these images extract every detail of text in an organized format.
                   Include all text visible in the image, preserving the structure where possible."""
    
    # Process images with QwenVL for OCR
    results = processor.process_images(valid_image_paths, prompt=ocr_prompt)
    
    # Print results
    print("\n===== OCR RESULTS =====")
    for i, (image_path, ocr_text) in enumerate(zip(valid_image_paths, results)):
        print(f"\nImage {i+1}: {os.path.basename(image_path)}")
        print("-" * 40)
        print(ocr_text)
        print("-" * 40)
    
    # Save results to file
    with open("ocr_results.txt", "w", encoding="utf-8") as f:
        for i, (image_path, ocr_text) in enumerate(zip(valid_image_paths, results)):
            f.write(f"\nImage {i+1}: {os.path.basename(image_path)}\n")
            f.write("-" * 40 + "\n")
            f.write(ocr_text + "\n")
            f.write("-" * 40 + "\n")
    
    print(f"\nResults saved to ocr_results.txt")
    
    total_execution_time = time.time() - start_time
    print(f"\nScript completed in {total_execution_time:.2f} seconds")

if __name__ == "__main__":
    main()

Found 2 images to process with QwenVL
Loading QwenVL model from Qwen/Qwen2.5-VL-7B-Instruct...


Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

preprocessor_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.23k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

QwenVL model and processor loaded successfully
Processing image: page_23.jpg
Processing image: page_22.jpg

===== OCR RESULTS =====

Image 1: page_23.jpg
----------------------------------------
Here is the extracted text from the image:

---

**هادألا ىلع ةماع ةرخآ | ةينيطسلف**

م2025 ماعل ةماعلا تارامثتسلاا قودنص هادأ

**ذنم نيمهاسملا دئاع يلامإجإ عباتلا ةيؤرلا قيقحت مهأرب هدب ةماعلا تارامثتسلاا قودنصل (ًانوتس) %8.7**

**ماعلا قوسلا قودنصلا ميقن | ملاك23**

**ماعملا اًضيوعو تاغايضلا نم ةءاسو ةعومجم ىلع تارامثتسلاا عيزوت مت .يلبأك تاعاطقلاب ةيبلطلا**

**%9.4 تامولعملا ةيبلط**
**%17.0 راطملا**
**%23.1 ةيلحلما**
**%5.5 ةيملعلا قفوتملا**
**%6.9 تامولعملا**
**%7.3 ةيملعلا**
**%2.5 ةيلاحلا ةيجاهنملا عمتلمجا**
**%3.1 ةيكلملا**
**%4.6 ةيملعلا دوجولا**
**%18.9 ةيملعلا رفوت**
**%0.4 تامولعملا**
**%1.2 ةيملعلا ةيجرد ةيجاهنملا عمتلمجا**

---

This text provides detailed statistics about various categories related to the year 2025, including percentages for different groups and activities.
------

In [8]:
import os
import torch
from PIL import Image
import time
from typing import List
import base64
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info

class QwenVLProcessor:
    def __init__(
        self,
        model_name: str = "Qwen/Qwen2.5-VL-7B-Instruct",
        device: str = "cuda",
        min_pixels: int = 128*16*16,
        max_pixels: int = 1600*40*40,
        cache_dir: str = None
    ):
        """
        Initialize the QwenVL processor with custom configuration.
        """
        # Configure CUDA memory allocation
        os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

        # Clear CUDA cache
        if device == "cuda":
            torch.cuda.empty_cache()

        print(f"Loading QwenVL model from {model_name}...")
        # Load model and assign to self
        self.model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            model_name,
            torch_dtype=torch.bfloat16,
            device_map=device,
            attn_implementation="flash_attention_2",
            use_cache=True,
            cache_dir=cache_dir,
        )

        # Load processor and assign to self
        self.processor = AutoProcessor.from_pretrained(
            model_name,
            min_pixels=min_pixels,
            max_pixels=max_pixels,
            use_fast=True
        )

        self.device = device
        print("QwenVL model and processor loaded successfully")

    def _encode_image(self, image_path: str) -> str:
        """
        Encode a local image file to base64.
        """
        with open(image_path, "rb") as image_file:
            encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
        return f"data:image/jpeg;base64,{encoded_string}"

    def prepare_messages(
        self,
        image_paths: List[str],
        prompt: str
    ) -> List[dict]:
        """
        Prepare messages for the model using local image paths.
        """
        if isinstance(image_paths, str):
            image_paths = [image_paths]

        messages = []
        for path in image_paths:
            encoded_image = self._encode_image(path)
            messages.append({
                "role": "user",
                "content": [
                    {"type": "image", "image": encoded_image},
                    {"type": "text", "text": prompt}
                ]
            })
        return messages

    def process_images(
        self,
        image_paths: List[str],
        prompt: str,
        max_new_tokens: int = 2000,
        temperature: float = 0.1,
        top_p: float = 0.9
    ) -> List[str]:
        """
        Process local images with the given prompt.
        """
        if isinstance(image_paths, str):
            image_paths = [image_paths]
            
        results = []
        
        # Process one image at a time to avoid memory issues
        for image_path in image_paths:
            print(f"Processing image: {os.path.basename(image_path)}")
            messages = self.prepare_messages(image_path, prompt)

            with torch.inference_mode():
                text = self.processor.apply_chat_template(
                    messages,
                    tokenize=False,
                    add_generation_prompt=True
                )

                image_inputs, video_inputs = process_vision_info(messages)
                inputs = self.processor(
                    text=[text],
                    images=image_inputs,
                    videos=video_inputs,
                    padding=True,
                    return_tensors="pt"
                )

                inputs = inputs.to(self.device)

                generated_ids = self.model.generate(
                    **inputs,
                    max_new_tokens=max_new_tokens,
                    do_sample=True,
                    temperature=temperature,
                    top_p=top_p,
                    pad_token_id=self.processor.tokenizer.pad_token_id,
                    eos_token_id=self.processor.tokenizer.eos_token_id
                )

                generated_ids_trimmed = [
                    out_ids[len(in_ids):]
                    for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
                ]

                output_text = self.processor.batch_decode(
                    generated_ids_trimmed,
                    skip_special_tokens=True,
                    clean_up_tokenization_spaces=False
                )
                
                results.append(output_text[0])
                
                # Clear cache after each image
                if self.device == "cuda":
                    torch.cuda.empty_cache()

        return results

def main():
    start_time = time.time()
    
    # Set your image directory
    image_directory = "pif_ar"  # Base directory containing images
    
    
    image_filenames = [
    "page_23.jpg",
    "page_22.jpg",
    ]
    
    # Generate full paths
    image_paths = [os.path.join(image_directory, filename) for filename in image_filenames]
    
    # Verify images exist
    valid_image_paths = []
    for path in image_paths:
        if os.path.isfile(path):
            valid_image_paths.append(path)
        else:
            print(f"Warning: Image not found: {path}")
    
    if not valid_image_paths:
        print("No valid images found.")
        return
    
    print(f"Found {len(valid_image_paths)} images to process with QwenVL")
    
    # Initialize QwenVL processor
    processor = QwenVLProcessor()
    
    # OCR prompt
    ocr_prompt = """You are an expert OCR model who can read and interpret hard images in details
                   and in great precision. Given these images extract every detail of text in an organized format.
                   Include all text visible in the image, preserving the structure where possible .. generate in arabic text"""
    
    # Process images with QwenVL for OCR
    results = processor.process_images(valid_image_paths, prompt=ocr_prompt)
    
    # Print results
    print("\n===== OCR RESULTS =====")
    for i, (image_path, ocr_text) in enumerate(zip(valid_image_paths, results)):
        print(f"\nImage {i+1}: {os.path.basename(image_path)}")
        print("-" * 40)
        print(ocr_text)
        print("-" * 40)
    
    # Save results to file
    with open("ocr_results.txt", "w", encoding="utf-8") as f:
        for i, (image_path, ocr_text) in enumerate(zip(valid_image_paths, results)):
            f.write(f"\nImage {i+1}: {os.path.basename(image_path)}\n")
            f.write("-" * 40 + "\n")
            f.write(ocr_text + "\n")
            f.write("-" * 40 + "\n")
    
    print(f"\nResults saved to ocr_results.txt")
    
    total_execution_time = time.time() - start_time
    print(f"\nScript completed in {total_execution_time:.2f} seconds")

if __name__ == "__main__":
    main()

Found 2 images to process with QwenVL
Loading QwenVL model from Qwen/Qwen2.5-VL-7B-Instruct...


Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

QwenVL model and processor loaded successfully
Processing image: page_23.jpg
Processing image: page_22.jpg

===== OCR RESULTS =====

Image 1: page_23.jpg
----------------------------------------
الاستراتيجية | نظرة عامة على الأداء

أداء صندوق الاستثمارات العامة لعام 2023م

تم توزيع الاستثمارات على مجموعة واسعة من الصناعات، ووفقاً للمعايير العالمية للقطاعات، كما يلي:

تقنية المعلومات %9.4
العقار %17.0
الطاقة %23.1
المراقبة العامة %5.5
الاتصالات %6.9
المالية %7.3
السلع الاستهلاكية الكاملة %2.5
الصناعات %3.1
المواد الأساسية %4.6
غير مصنفة %18.9
الصحة %0.4
السلع الاستهلاكية الأساسية %1.2

إجمالي عائد المساهمين منذ بدء برنامج تحقيق الرؤية التابع لصندوق الاستثمارات العامة (سنويًا) %8.7

1- إجمالي عائد المساهمين منذ بدء برنامج تحقيق الرؤية في 30 سبتمبر 2017م حتى نهاية عام 2023م (على أساس سنوي).
2- ممثل الصناديق غير المصنفة / استثمارات لديها أصول متعددة، والنقد، وحسابات تحت الطلب، والودائع الآجلة وصناديق أسواق النقد.

تقرير الصندوق السنوي لعام 2023م | 44
---------------------------------------