# Prompt Experiments

In this notebook we will experiment with prompts and the OpenAI models. We will try introducing different concepts and prompts using the very capable GPT-4 family of models. 


### Python Imports


In [2]:
%load_ext autoreload
%autoreload 2


import sys
sys.path.append('..\\code')


import os
from dotenv import load_dotenv
load_dotenv()

from IPython.display import display, Markdown, HTML
from PIL import Image
from doc_utils import *


def show_img(img_path, width = None):
    if width is not None:
        display(HTML(f'<img src="{img_path}" width={width}>'))
    else:
        display(Image.open(img_path))


### Make sure we have the OpenAI Models information

We will need the GPT-4-Turbo and GPT-4-Vision models for this notebook.

When running the below cell, the values should reflect the OpenAI reource you have created in the `.env` file.

In [None]:
model_info = {
        'AZURE_OPENAI_RESOURCE': os.environ.get('AZURE_OPENAI_RESOURCE'),
        'AZURE_OPENAI_KEY': os.environ.get('AZURE_OPENAI_KEY'),
        'AZURE_OPENAI_MODEL_VISION': os.environ.get('AZURE_OPENAI_MODEL_VISION'),
        'AZURE_OPENAI_MODEL': os.environ.get('AZURE_OPENAI_MODEL'),
}

model_info


### Generate Sample Data

Generate the sample images that will be used in this notebook

In [4]:
import fitz  # PyMuPDF

# Create a directory to store the outputs
work_dir = "sample_data/pdf_outputs"
os.makedirs(work_dir, exist_ok=True)

# Load a sample PDF document

def read_pdf(pdf_doc):
    doc = fitz.open(pdf_doc)
    print(f"PDF File {os.path.basename(pdf_doc)} has {len(doc)} pages.")
    return doc

def nb_extract_pages_as_png_files(doc):
    png_files = []
    for page in doc:
        page_num = page.number
        img_path = f"{work_dir}/page_{page_num}.png"
        page_pix = page.get_pixmap(dpi=300)
        page_pix.save(img_path)
        print(f"Page {page_num} saved as {img_path}")
        png_files.append(img_path)
    
    return png_files


pdf_doc = "sample_data/1_London_Brochure.pdf"
doc = read_pdf(pdf_doc)
png_files = nb_extract_pages_as_png_files(doc)  


PDF File 1_London_Brochure.pdf has 2 pages.
Page 0 saved as sample_data/pdf_outputs/page_0.png
Page 1 saved as sample_data/pdf_outputs/page_1.png


## Visual Element Detection

The first function we will experiment with is Detection. Sometimes, we need to detect whether a page has an image or embedded inside it, or a table, using a Vision model.

#### Table Detection

In the below cell, we are trying to detect tables in the png files generated from the PDF document. The code below will display as an image the pages where tables were found.

In [20]:
detect_num_of_tables_prompt = """
You are an assistant working on a document processing task that involves detecting and counting the number of data tables in am image file using a vision model. Given an image, your task is determine the number of data tables present. 

Output:
Return a single integer representing the number of data tables detected in the page. Do **NOT** generate any other text or explanation, just the number of tables. We are **NOT** looking for the word 'table' in the text, we are looking for the number of data tables in the image.

"""
for png in png_files:
    result, description = call_gpt4v(png, gpt4v_prompt = detect_num_of_tables_prompt, temperature = 0.2, model_info=model_info)
    print(f"Status: {description}")
    print(f"Result: {result} tables detected in the PDF page.")

    if int(result) > 0:
        show_img(png, width=400)



16.03.2024_16.07.52 :: [92mStart of GPT4V Call to process file(s) ['sample_data/pdf_outputs/page_0.png'] with model: https://oai-tst-sweden.openai.azure.com/ [0m
endpoint https://oai-tst-sweden.openai.azure.com/openai/deployments/gpt4v/extensions/chat/completions?api-version=2023-12-01-preview

16.03.2024_16.07.58 :: [92mEnd of GPT4V Call to process file(s) ['sample_data/pdf_outputs/page_0.png'] with model: https://oai-tst-sweden.openai.azure.com/ [0m
Status: Image was successfully explained, with Status Code: 200
Result: 0 tables detected in the PDF page.

16.03.2024_16.07.58 :: [92mStart of GPT4V Call to process file(s) ['sample_data/pdf_outputs/page_1.png'] with model: https://oai-tst-sweden.openai.azure.com/ [0m
endpoint https://oai-tst-sweden.openai.azure.com/openai/deployments/gpt4v/extensions/chat/completions?api-version=2023-12-01-preview

16.03.2024_16.08.04 :: [92mEnd of GPT4V Call to process file(s) ['sample_data/pdf_outputs/page_1.png'] with model: https://oai-tst-s

#### Image Detection

In the below cell, we are trying to detect images in the png files generated from the PDF document. The code below will display as an image the pages where images were found.

In [21]:
detect_num_of_diagrams_prompt = """
You are an assistant working on a document processing task that involves detecting and counting the number of visual assets in a document page using a vision model. Given a screenshot of a documenat page, your task is determine the number of visual assets present. Please ignore any standard non-visual assets such as text, headers, footers, page numbers, tables, etc.

What is meant by visual assets: infographics, maps, flowcharts, timelines, tables, illustrations, icons, heatmaps, scatter plots, pie charts, bar graphs, line graphs, histograms, Venn diagrams, organizational charts, mind maps, Gantt charts, tree diagrams, pictograms, schematics, blueprints, 3D models, storyboards, wireframes, dashboards, comic strips, story maps, process diagrams, network diagrams, bubble charts, area charts, radar charts, waterfall charts, funnel charts, sunburst charts, sankey diagrams, choropleth maps, isometric drawings, exploded views, photomontages, collages, mood boards, concept maps, fishbone diagrams, decision trees, Pareto charts, control charts, spider charts, images, diagrams, logos, charts or graphs.

Output:
Return a single integer representing the number of visual assets detected in the page. Do **NOT** generate any other text or explanation, just the count of . 

"""

for png in png_files:
    result, description = call_gpt4v(png, gpt4v_prompt = detect_num_of_diagrams_prompt, temperature = 0.2, model_info=model_info)
    print(f"Status: {description}")
    print(f"Result: {result} images detected in the PDF page.")

    if int(result) > 0:
        show_img(png, width=400)



16.03.2024_16.08.44 :: [92mStart of GPT4V Call to process file(s) ['sample_data/pdf_outputs/page_0.png'] with model: https://oai-tst-sweden.openai.azure.com/ [0m
endpoint https://oai-tst-sweden.openai.azure.com/openai/deployments/gpt4v/extensions/chat/completions?api-version=2023-12-01-preview

16.03.2024_16.08.50 :: [92mEnd of GPT4V Call to process file(s) ['sample_data/pdf_outputs/page_0.png'] with model: https://oai-tst-sweden.openai.azure.com/ [0m
Status: Image was successfully explained, with Status Code: 200
Result: 2 images detected in the PDF page.



16.03.2024_16.08.50 :: [92mStart of GPT4V Call to process file(s) ['sample_data/pdf_outputs/page_1.png'] with model: https://oai-tst-sweden.openai.azure.com/ [0m
endpoint https://oai-tst-sweden.openai.azure.com/openai/deployments/gpt4v/extensions/chat/completions?api-version=2023-12-01-preview

16.03.2024_16.08.55 :: [92mEnd of GPT4V Call to process file(s) ['sample_data/pdf_outputs/page_1.png'] with model: https://oai-tst-sweden.openai.azure.com/ [0m
Status: Image was successfully explained, with Status Code: 200
Result: 0 images detected in the PDF page.


## Analyze Code

Read in the doc_utils.py library, and generate 

In [3]:
prompt = """

{code}

In the library above, detect the important functions in the '{func_name}' tree. Start with the '{func_name}' function, and then list the essential functions called by that '{func_name}'. Please ignore small functions that are 3 lines of code or less. Focus only on the custom functions defined in the code using the keyword 'def', and ignore the imported functions or system functions like fitz.open and glob.glob. Then please do the following:

    1. Output the list in bullet-point format
    2. Show their relationship by generating Mermaid code that represents these functions.
    3. Make sure the relationships are correct. The arrow should point from the calling function to the called function.

"""

p = prompt.format(code = read_asset_file("../code/doc_utils.py")[0], func_name = 'ingest_docs_directory')

output = ask_LLM(p, model_info=model_info)
print(output)



- `ingest_docs_directory`
  - `ingest_doc`
    - `process_pdf`
      - `extract_high_res_page_images`
      - `extract_text`
      - `harvest_code`
      - `extract_images`
      - `post_process_images`
      - `extract_tables`
      - `post_process_tables`
    - `commit_assets_to_vector_index`
      - `add_asset_to_vec_store`
        - `create_metadata`
        - `get_embeddings`
        - `get_solution_path_components`
        - `generate_uuid_from_string`
    - `save_docx_as_pdf`
    - `save_pptx_as_pdf`

```mermaid
graph TD;
    ingest_docs_directory --> ingest_doc;
    ingest_doc --> process_pdf;
    process_pdf --> extract_high_res_page_images;
    process_pdf --> extract_text;
    process_pdf --> harvest_code;
    process_pdf --> extract_images;
    process_pdf --> post_process_images;
    process_pdf --> extract_tables;
    process_pdf --> post_process_tables;
    ingest_doc --> commit_assets_to_vector_index;
    commit_assets_to_vector_index --> add_asset_to_vec_store;
    add

### Render the Mermaid code

To render the above generated Mermaid Code, please copy the above Mermaid script block, visit [mermaid.live](https://mermaid.live) in your browser, and paste the copied script in your browser. 

The image should be rendered immediately.

![Mermaid Representation](../images/ingestion_tree.png)