#  **P.A.R.S.E.R**

### Source : **https://docs.cloud.llamaindex.ai/llamaparse/features/multimodal**

## **What is Multimodal Parsing?**

Multimodal parsing refers to the ability of a model to process and understand multiple forms of data simultaneously. In the context of LlamaParse, this means the model can handle not just text but also images, tables, and other document elements. This approach is particularly useful for documents like PDFs where the information is presented in various formats (text, images, charts, etc.).

## **How it works ?**

When using this mode, LlamaParse's regular parsing is bypassed and instead the following process is used:

- A screenshot of every page of your document is taken
- Each page screenshot is sent to the multimodal with instruction to extract as markdown
- The resulting markdown of each page is consolidated into the final result.

=> More expensive than LlamaParse's regular parsing

## **Example of code**

### A - Example of multimodal with GPT-40

In [1]:
import os
import nest_asyncio
from dotenv import load_dotenv
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# Allow nested event loops
nest_asyncio.apply()

# Load environment variables
load_dotenv()

# Initialize the parser with multimodal settings
parser = LlamaParse(
    api_key=os.getenv("LLAMA_CLOUD_API_KEY"),
    use_vendor_multimodal_model=True,
    vendor_multimodal_model_name="openai-gpt4o"
)

# Use SimpleDirectoryReader to parse the file
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(input_files=['/Users/alina.ghani/VS_project/chatbot_dan/data/doc/V11_Argumentaire_Peugeot_2024.pdf'], file_extractor=file_extractor).load_data()

# Save the parsed result to a markdown file
with open('parsed_result_gpt.md', 'w') as result_file:
    for doc in documents:
        result_file.write(doc.text)  


Started parsing the file under job_id 84bda804-5d8b-4fd4-991f-db3b2101fff4


### B - Example of multimodal with CLAUDE 

In [2]:
import os
import nest_asyncio
from dotenv import load_dotenv
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# Allow nested event loops
nest_asyncio.apply()

# Load environment variables
load_dotenv()

# Initialize the parser with multimodal settings
parser = LlamaParse(
    api_key=os.getenv("LLAMA_CLOUD_API_KEY"),
    use_vendor_multimodal_model=True,
    vendor_multimodal_model_name="anthropic-sonnet-3.5"
)

# Use SimpleDirectoryReader to parse the file
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(input_files=['/Users/alina.ghani/VS_project/chatbot_dan/data/doc/V11_Argumentaire_Peugeot_2024.pdf'], file_extractor=file_extractor).load_data()

# Save the parsed result to a markdown file
with open('parsed_result_claude.md', 'w') as result_file:
    for doc in documents:
        result_file.write(doc.text)  

Started parsing the file under job_id f3d794f1-6b8e-4fa8-90ad-16de8cf625ab
.

## Function to clean data

In [5]:
import json

# Read JSON file
with open('/Users/alina.ghani/VS_project/chatbot_dan/data/parsed_doc/parsed_result_with_descriptions.json', 'r') as file:
    data = json.load(file)

import json

def decode_unicode(data):
    if isinstance(data, str):
        return data.encode().decode('unicode_escape')
    elif isinstance(data, dict):
        return {k: decode_unicode(v) for k, v in data.items()}
    elif isinstance(data, list):
        return [decode_unicode(i) for i in data]
    else:
        return data

# Read JSON file
with open('/Users/alina.ghani/VS_project/chatbot_dan/data/parsed_doc/parsed_result_with_descriptions.json', 'r', encoding='utf-8') as file:
    data = json.load(file)

# Decode Unicode escape sequences
decoded_data = decode_unicode(data)

# Write cleaned data back to JSON file
with open('/Users/alina.ghani/VS_project/chatbot_dan/data/parsed_doc/parsed_result_with_descriptions.json', 'w', encoding='utf-8') as file:
    json.dump(decoded_data, file, ensure_ascii=False, indent=4)

print("Unicode escape sequences have been decoded and the cleaned data is saved in 'cleaned_data.json'.")



Unicode escape sequences have been decoded and the cleaned data is saved in 'cleaned_data.json'.


## WITH ARGUMENTS

In [20]:
import os
import nest_asyncio
from dotenv import load_dotenv
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# Allow nested event loops
nest_asyncio.apply()

# Load environment variables
load_dotenv()

# Initialize the parser with multimodal settings
parser = LlamaParse(
    api_key=os.getenv("LLAMA_CLOUD_API_KEY2"),
    use_vendor_multimodal_model=True,
    vendor_multimodal_model_name="openai-gpt4o",
    parsing_instruction = """
                            You are parsing images from a document. Please extract only the descriptions of each image and ensure that each description has a unique key formatted as `image_x`, where x is a unique number from 1 to 132. Do not reuse any key. Focus exclusively on the text next to the images for their descriptions. Be precise and include all relevant details. For example, if there is an image of a car with the model name next to it, include the model name in the description. Ensure that only image descriptions are parsed, and nothing else.
                            """,

    result_type="text"
)

# Use SimpleDirectoryReader to parse the file
file_extractor = {".pdf": parser}

try:
    documents = SimpleDirectoryReader(input_files=['/Users/alina.ghani/VS_project/chatbot_dan/data/doc/V11_Argumentaire_Peugeot_2024.pdf'], file_extractor=file_extractor).load_data()

    # Save the parsed result to a JSON file
    with open('images_description.txt', 'w') as result_file:
        for doc in documents:
            result_file.write(doc.text)
except Exception as e:
    print(f"Error while parsing the file: {e}")


Error while parsing the file '<bytes/buffer>': Failed to parse the file: {"detail":"You've exceeded the maximum number of pages you can parse in a day (1000). Please contact support to increase your limit."}


## Ra aranger les keys image_1 unique key car un screenshot par page donc key non unique dans tout le doc!

In [26]:
import re
import json

def replace_image_keys_and_save_to_json(md_file_path, json_output_path):
    # Read the content of the markdown file
    with open(md_file_path, 'r', encoding='utf-8') as file:
        content = file.read()

    # Initialize a global counter
    global_counter = 1

    # Function to replace image_ keys
    def replace_key(match):
        nonlocal global_counter
        replacement = f"image_{global_counter}"
        global_counter += 1
        return replacement

    # Replace all occurrences of image_ keys with incremental numbers
    modified_content = re.sub(r'image_\d*', replace_key, content)

    # Extract the keys and their descriptions from the modified content
    keys_and_descriptions = re.findall(r'(image_\d+):\s*(.*?)(?=\nimage_\d+:|\n#|\n##|\Z)', modified_content, re.DOTALL)

    # Create a dictionary from the keys and descriptions
    keys_dict = {key: description.strip() for key, description in keys_and_descriptions}

    # Write the modified content back to the markdown file
    new_md_file_path = md_file_path.replace('.md', '_modified.md')
    with open(new_md_file_path, 'w', encoding='utf-8') as file:
        file.write(modified_content)

    # Write the dictionary to a JSON file
    with open(json_output_path, 'w', encoding='utf-8') as json_file:
        json.dump(keys_dict, json_file, indent=4)

    return new_md_file_path, json_output_path, keys_dict


# Example usage
md_file_path = '/Users/alina.ghani/VS_project/chatbot_dan/notebook/2_images_description_unique_key.md'
json_output_path = '/Users/alina.ghani/VS_project/chatbot_dan/notebook/3_images_description.json'
replace_image_keys_and_save_to_json(md_file_path, json_output_path)


('/Users/alina.ghani/VS_project/chatbot_dan/notebook/2_images_description_unique_key_modified.md',
 '/Users/alina.ghani/VS_project/chatbot_dan/notebook/3_images_description.json',
 {'image_1': '{\n  "description": "The image shows the rear view of a bright lime green Peugeot electric car, likely the e-208 GT model. The car\'s sleek design is highlighted, with visible \'PEUGEOT\' lettering across the rear and the model designation \'e-208 GT\' visible. The image is set against a light blue sky background, creating a striking contrast with the car\'s vibrant color. The Peugeot logo, featuring a lion\'s head in a shield, is displayed at the top left of the image."\n}NO_CONTENT_HERE# L\'ÉLECTRIQUE POUR TOUS',
  'image_2': 'Two people standing side by side, one wearing a blue shirt and the other in a white coat, with a bright light in the background.',
  'image_3': 'Close-up of a Peugeot logo on a textured black surface.',
  'image_4': "Interior view of a car dashboard and steering wheel, w

## Re aranger dans un json les keys de gpt_vision

In [32]:
def reorder_image_keys(input_json_path, output_json_path):
    # Load the JSON data from the file
    with open(input_json_path, 'r', encoding='utf-8') as file:
        data = json.load(file)

    # Initialize the dictionary to store image descriptions
    image_descriptions = {}

    # Loop through each document in the data list
    for document in data:
        # Ensure the document is a dictionary and has 'metadata' and 'images'
        if isinstance(document, dict):
            images = document.get('metadata', {}).get('images', [])
            for image in images:
                # Extract the image name and description
                image_path = image.get('image_path', '')
                image_name = image_path.split('/')[-1].replace('.png', '')
                image_description = image.get('image_description', '')
                image_descriptions[image_name] = image_description

    # Sort the dictionary by the numeric part of the keys
    sorted_image_descriptions = dict(sorted(image_descriptions.items(), key=lambda item: int(re.findall(r'\d+', item[0])[0])))

    # Save the sorted image descriptions to the output JSON file
    with open(output_json_path, 'w', encoding='utf-8') as outfile:
        json.dump(sorted_image_descriptions, outfile, indent=4)

# Define file paths
input_json_path = '/Users/alina.ghani/VS_project/chatbot_dan/data/parsed_doc_GPT/parsed_result_with_descriptions.json'
output_json_path = '/Users/alina.ghani/VS_project/chatbot_dan/data/parsed_doc_GPT/parsed_result_with_descriptions_order.json'

# Run the function
reorder_image_keys(input_json_path, output_json_path)


