<a href="https://www.kaggle.com/code/kajetanniewczas/rag-n-tex?scriptVersionId=234820369" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

![RAG'n'TeX logo](https://storage.googleapis.com/kagglesdsdata/datasets/7173291/11449376/long_logo-1.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20250417%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20250417T145731Z&X-Goog-Expires=259200&X-Goog-SignedHeaders=host&X-Goog-Signature=3ffe55d837ca9a360a0a211f67bee8e021e82add8eec7ea273ca4bf918221a0d059ecc4cb6fc7c4415b7b1a9e364655d6db12133ff9f87c891c195f7496ff552f61141c8e1b3908c2557c4f40920d187f30328eda06233c6fff1e92e9d7a7bb7be8f3091390ea6ba3abf82c5200f60a650409053b6417007327fb446061c2911dda76334b76a8eb4a31f16c079cb88139fe61a4f048cf47e93f2775ec711799ba45d8d9d4531748914d6b0fc286b6448a42de0e309c4f35862d93a2fc43aced079def30acbe33dcc85cd6d56a1ea32eaaa2ac9c8bdf35bec8d640467378dc1bfb12aeae244cef0b5a0540b6111d2f9adec1d3dbe39c1a448ed228d090e6fcc0a)

# 📚 From Text to Visuals: Auto-Generating LaTeX Beamer Presentations with GenAI

In this project, we explore how generative AI can automate the creation of professional-looking presentation slides—directly from extensive collections of PDF documents.

---

## 🧠 Use Case

Creating slide decks from dense documents (like whitepapers or scientific articles) is a time-consuming and cognitively heavy task. Our goal is to streamline this process using generative AI. 

We built an AI assistant that transforms document collections into LaTeX Beamer presentations—complete with structure, content, and visuals—based on a user-defined topic.

---

## 🔍 How It Works

### 💿 Database Creation 
We start by creating a vector database from a collection of PDF documents (e.g., arXiv papers). Each document is processed to extract text chunks, and for each chunk, we store:
* **Embeddings** of the text from the file
* **Associated metadata** (like images, their captions, how many images there are)

### 🗣️ User Prompt

A user enters a natural language query like:

> "I need a presentation about AI agents."

### 📄 Document Retrieval

We use **Chroma** as our vector database to retrieve the most relevant PDF documents based on the query. 

### 🧠 Content Understanding

Using **Gemini 2.0 Flash** (`google-genai==1.7.0`), the assistant analyzes the retrieved content and extracts the most relevant ideas and insights.

### 🎞️ Slide Generation (LaTeX Beamer)

The model outputs fully formatted compilable LaTeX code following a clear and consistent structure:

- **Introduction**: Topic overview and motivation  
- **Main Content**: 2–4 slides, each presenting a core idea  
- **Summary**: Key takeaways  

It also incorporates relevant images extracted from the documents, organizing them into LaTeX Beamer-friendly layouts (e.g., *Core Idea 2* and *Core Idea 3* slide formats).


---
---

# 🛠️ General configuration

---

## 📦 Install the required dependencies

### GenAI and ChromaDB

⚠️ Note: The pip install below may show dependency conflict warnings.
         These are expected and do NOT affect functionality in this notebook.
         You can safely ignore them.

In [1]:
!pip install -qU "google-genai==1.11.0" "chromadb==1.0.5"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m159.7/159.7 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m65.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m61.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.9/94.9 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m14.3 MB/s[0m 

### Processing PDF files

Needed to parse and extract content from provided files.

In [2]:
!pip install pymupdf

Collecting pymupdf
  Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (20.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m65.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pymupdf
Successfully installed pymupdf-1.25.5


### LaTeX dependencies

It might take a few minutes.

In [3]:
!apt-get update -qq > /dev/null
!apt-get install -y -qq texlive-latex-base texlive-fonts-recommended texlive-fonts-extra texlive-latex-extra > /dev/null

W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
Extracting templates from packages: 100%


### Set up your API key

To run the following cell, your API key must be stored it in a [Kaggle secret](https://www.kaggle.com/discussions/product-feedback/114053) named `GOOGLE_API_KEY`.

If you don't already have an API key, you can grab one from [AI Studio](https://aistudio.google.com/app/apikey). You can find [detailed instructions in the docs](https://ai.google.dev/gemini-api/docs/api-key).

To make the key available through Kaggle secrets, choose `Secrets` from the `Add-ons` menu and follow the instructions to add your key or enable it for this notebook.

You can also see the list of the embedding functions available. For this project, we use models/text-embedding-004

In [4]:
from google import genai
from google.genai import types

print(f"🕹️ This notebook is working on Google GenAI version: {genai.__version__}")

# Configure the API access
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("GOOGLE_API_KEY")

client = genai.Client(api_key=api_key)

🕹️ This notebook is working on Google GenAI version: 1.11.0


---

## 🧬 Generate embeddings

Now let’s define how do we generate embeddings with Gemini API. We need the two actions: *retrieval_document* to generate document embedding and *retrieval_query* for the user query embeddings. 

In [5]:
from google.api_core import retry
from chromadb import Documents, EmbeddingFunction, Embeddings

# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

class GeminiEmbeddingFunction(EmbeddingFunction):
    def __init__(self):
        self.document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

---

## 🗃️ Configure the database

It is time to initialise our vector database, where we also provide our custom function to generate embeddings.

In [6]:
import chromadb

DB_NAME = "ragntex"

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
# chroma_client.delete_collection(DB_NAME)
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

---
---

# 📄 Working with PDFs

We are not done with documents yet! We need to process our PDFs to extract metadata for the future presentation generation.

---

## 🖼️ Extracting images

We need to extract the images from PDFs. Additionally, as our model will rely on image captions to decide which image to use on the slide, so we need to extract this information from the initial document.

In [7]:
import fitz
import re

def find_image_caption(page, image_bbox, max_distance=100):
    # Get all text blocks on the page
    blocks = page.get_text("dict")["blocks"]
    image_bottom = image_bbox.y1
    image_x_center = (image_bbox.x0 + image_bbox.x1) / 2

    best_match_caption = None
    fallback_caption   = None
    closest_distance   = float('inf')

    for block in blocks:
        if block["type"] != 0:
            continue

        block_bbox        = block["bbox"]
        x0, y0, x1, y1    = block_bbox
        vertical_distance = y0 - image_bottom

        # Merge all spans' text into one string
        block_text = " ".join(
            span["text"] for line in block.get("lines", []) for span in line.get("spans", [])
        ).strip()

        if y0 >= image_bottom and abs(image_x_center - (x0 + x1) / 2) < image_bbox.width / 2:
            if vertical_distance < max_distance and vertical_distance < closest_distance:
                if re.match(r"^(Fig(ure)?\.?\s*\d+[:\-])", block_text, re.IGNORECASE):
                    best_match_caption = block_text
                    break
                else:
                    fallback_caption = block_text
                    closest_distance = vertical_distance
    
    if best_match_caption and len(best_match_caption.strip()) > 10:
        return best_match_caption.strip()
    elif fallback_caption and len(fallback_caption.strip()) > 10:
        return fallback_caption.strip()
    else:
        return None

def extract_images(pdf, doc, page, page_num):
    images = page.get_images(full=True)

    imgs = []
    for img_index, img in enumerate(images):
        # Extract the image
        xref = img[0]
        base_image  = doc.extract_image(xref)
        image_bytes = base_image["image"]
        image_hash  = hashlib.md5(image_bytes).hexdigest()
        image_name  = f"doc{pdf}_page{page_num}_img{img_index}_hash{image_hash[:8]}.png"

        # Find the bbox
        image_bbox = None
        for img_info in page.get_image_info(xrefs=True):
            if img_info["xref"] == xref:
                image_bbox = fitz.Rect(img_info["bbox"])
                break

        # Get the image ratio
        width       = image_bbox.width
        height      = image_bbox.height
        ratio       = width / height if height != 0 else None

        # Classify the image based on the ratio
        if ratio >= 1.5:
            image_type = "horizontal"
        elif ratio <= 0.67:
            image_type = "vertical"
        else:
            image_type = "square"

        # Get caption based on bbox
        caption = find_image_caption(page, image_bbox) if image_bbox else None

        # Append an image
        imgs.append({
            "name":    image_name,
            "caption": caption,
            "ratio":   image_type,
            "hash":    image_hash
        })
    
    return imgs

---

## 📊 Capturing vector graphics as images

Images that are embedded vector objects need to be processed separately.

In [8]:
from collections import defaultdict
from rtree import index

def are_bounding_boxes_close(bbox1, bbox2, threshold=50):
    # Extracting the four edges of each bounding box
    left1, top1, right1, bottom1 = bbox1
    left2, top2, right2, bottom2 = bbox2
    
    # Check if any of the borders are within the threshold distance
    return (
        abs(left1 - right2) < threshold or
        abs(right1 - left2) < threshold or
        abs(top1 - bottom2) < threshold or
        abs(bottom1 - top2) < threshold
    )

def merge_bounding_boxes(bboxes):
    if not bboxes:
        return None
    # Start with the first bounding box
    combined_bbox = bboxes[0]
    for bbox in bboxes[1:]:
        combined_bbox = combined_bbox | bbox  # Combine the bounding boxes (union)
    return combined_bbox

def group_bounding_boxes(bboxes, max_drawings=2000, threshold=50):
    # R-tree index setup
    idx = index.Index()
    for i, rect in enumerate(bboxes):
        expanded = rect + (-threshold, -threshold, threshold, threshold)
        idx.insert(i, expanded)

    # Graph connectivity
    adj_list = defaultdict(list)
    for i, rect in enumerate(bboxes):
        expanded = rect + (-threshold, -threshold, threshold, threshold)
        for j in idx.intersection(expanded):
            if i != j:
                adj_list[i].append(j)

    # Perform DFS to find connected components (groups of connected bounding boxes)
    visited = [False] * len(bboxes)
    components = []

    def dfs(node, component):
        visited[node] = True
        component.append(bboxes[node])
        for neighbor in adj_list[node]:
            if not visited[neighbor]:
                dfs(neighbor, component)

    # Find all connected components using DFS
    for i in range(len(bboxes)):
        if not visited[i]:
            component = []
            dfs(i, component)
            components.append(component)

    # Return grouped and merged bboxes
    return [merge_bounding_boxes(group) for group in components]

def process_large_drawing(drawings, max_drawings=1000, threshold=50):
    bboxes = [fitz.Rect(d["rect"]) for d in drawings if d.get("rect")]

    if len(bboxes) < max_drawings:
        return group_bounding_boxes(bboxes, threshold=threshold)

    # Split the data into smaller chunks
    num_chunks = (len(bboxes) // max_drawings) + 1
    all_results = []
    
    for chunk_index in range(num_chunks):
        chunk   = bboxes[chunk_index * max_drawings : (chunk_index + 1) * max_drawings]
        results = group_bounding_boxes(chunk, threshold=threshold)
        all_results.extend(results)

    # Return the combined results
    return group_bounding_boxes(all_results, threshold=threshold)

def find_surrounding_text(page, group, threshold=50):
    text_blocks = page.get_text("dict")["blocks"]
    expanded    = group + (-threshold, -threshold, threshold, threshold)
    surrounding = []

    for block in text_blocks:
        if block["type"] != 0:
            continue

        block_rect = fitz.Rect(block["bbox"])
        if expanded.intersects(block_rect):
            surrounding.append(block_rect)

    return surrounding

def extract_vector(pdf, doc, page, page_num):
    MAX_DRAWINGS =  1000
    MIN_SIZE     =  0.05
    MAX_SIZE     =  0.30
    THRESHOLD    =     5
    ZOOM         =     4

    page_size = page.rect.width * page.rect.height
    min_size  = page_size * MIN_SIZE
    max_size  = page_size * MAX_SIZE

    all_text = page.get_text()
    drawings = page.get_drawings()

    # Group drawings into figures
    grouped  = process_large_drawing(drawings, max_drawings=MAX_DRAWINGS, threshold=THRESHOLD)

    figs = []
    for group_num, group in enumerate(grouped):
        # Try to include any text labels around
        surrounding = find_surrounding_text(page, group, threshold=THRESHOLD)
        if surrounding:
            figure_bbox = merge_bounding_boxes([group] + surrounding)
        else:
            figure_bbox = group

        # Filter by minimal plot size
        width  = figure_bbox[2] - figure_bbox[0]
        height = figure_bbox[3] - figure_bbox[1]
        area   = width * height
        if area > min_size and area < max_size:
            scale_mat    = fitz.Matrix(ZOOM, ZOOM)
            figure_pix   = page.get_pixmap(matrix=scale_mat, clip=figure_bbox)
            figure_bytes = figure_pix.tobytes("png")
            figure_hash  = hashlib.md5(figure_bytes).hexdigest()
            figure_name  = f"doc{pdf}_page{page_num}_fig{group_num}_hash{figure_hash[:8]}.png"

            # Get the figure ratio
            ratio = width / height if height != 0 else None
    
            # Classify the image based on the ratio
            if ratio >= 1.5:
                figure_type = "horizontal"
            elif ratio <= 0.67:
                figure_type = "vertical"
            else:
                figure_type = "square"
    
            # Get caption based on bbox
            caption = find_image_caption(page, figure_bbox) if figure_bbox else None
    
            # Append an image
            figs.append({
                "name":    figure_name,
                "caption": caption,
                "ratio":   figure_type,
                "hash":    figure_hash
            })

    return figs


---

## 🧾 Extracting content from PDFs

Here we extract all the information from our file to be added later to the database.

In [9]:
def extract_pdf_content(pdf_path: str):
    doc = fitz.open(pdf_path)
    pdf = os.path.splitext(os.path.basename(pdf_path))[0]

    text = ""
    figs = []
    for page_num, page in enumerate(doc):
        # Parse the text
        text = ' '.join([text, page.get_text().strip()])

        # Extract images
        figs += extract_images(pdf, doc, page, page_num)

        # Extract vector graphics
        figs += extract_vector(pdf, doc, page, page_num)

    # Format the metadata
    metas = {
      "num_images": len(figs),
      "pdf_path":   pdf_path
    }

    return text, figs, metas

---
---

# ⚙️ Processing the dataset

---

## 📥 Acquiring new files

We scan a dataset directory for PDF files and append new ones to the existing database of papers. We also make sure no duplicates are added.

In [10]:
import os

# Get all the existing papers in the database
all_entries   = db.get(include=["metadatas"])
existing_pdfs = [meta.get("source_pdf") for meta in all_entries["metadatas"]]

# Dataset
dataset = "/kaggle/input/ragntex-dataset/"
files   = os.listdir(dataset)

# Look for new PDFs to append to our database
pdf_files = [
    os.path.join(dataset, f)
    for f in files
    if f.lower().endswith(".pdf") and f not in existing_pdfs
]

---

## 🧾 Processing new PDFs

The new files must undergo text and images extraction.

In [11]:
import re
import hashlib

documents = []
metadatas = []

for pdf_path in pdf_files:
    text, imgs, metas = extract_pdf_content(pdf_path)

    documents.append(text)

    # Format images
    images_info = []
    for i, img in enumerate(imgs, start=1):
        caption     = img.get("caption")
        caption_str = str(caption) if caption is not None else ""
        img_name    = img["name"]
        img_ratio   = img["ratio"]
        full_path   = f"gfx/{img_name}"

        cleaned_caption = re.sub(
            r"^(fig(?:ure)?\.?\s*\d+\.\s*)",
            "",
            caption_str,
            flags=re.IGNORECASE
        ).strip()
        caption = cleaned_caption if cleaned_caption else "None"

        images_info.append(f'{{"path": "{full_path}", "caption": "{caption}", "orientation": "{img_ratio}"}}')

    images_passage = "\n".join(images_info)

    # Format metadata
    fixed_metadata = {
        "num_images":     metas.get("num_images"),
        "pdf_path":       metas.get("pdf_path"),
        "images_passage": images_passage
    }
    metadatas.append(fixed_metadata)

---

## 🧩 Filling in the database

Now we have everything we want to see in our database, so we add there all the content we extracted.

In [12]:
if documents:
    db.add(
        documents=documents,
        ids=[str(i) for i in range(len(documents))],  # IDs for each document
        metadatas=metadatas
    )

print(f"💾 Processed {len(documents)} new PDFs; now the database contains {db.count()} entries")

💾 Processed 6 new PDFs; now the database contains 6 entries


---
---

# 🎨 LaTeX presentation

This is the most exciting part. Here we will identify the documents from our database that contain relevant information, teach the gemini model to Conde LaTeX Beamer presentations and see the results!

---

## 🧵 Beamer document

We take the model output, strip it of any possible wrapping, and pass through the LaTeX compiler. We need to compile the presentation twice to ensure proper page enumeration.

In [13]:
import os

def CompilePresentation(latex_code, work_dir):

    # remove possible Markdown wrapping around the output
    if latex_code.startswith("```latex"):
      latex_code = latex_code.split("\n", 1)[1]  # remove the first line
    if latex_code.endswith("```"):
      latex_code = latex_code.rsplit("\n", 1)[0]

    # Save LaTeX code to file
    tex_file = os.path.join(work_dir, 'presentation.tex')
    with open(tex_file, "w") as f:
        f.write(latex_code)
    print("="*100)
    print("📄 Files in the directory:")
    !ls "{work_dir}"

    # Compile with pdflatex
    os.chdir(work_dir)  # Change working directory
    # We compile it twice to ensure proper slide enumeration
    !pdflatex -interaction=nonstopmode presentation.tex > output.log
    !pdflatex -interaction=nonstopmode presentation.tex > output.log

    # Check for PDF output
    if not os.path.exists("presentation.pdf"):
        print("❌ PDF generation failed. Here's the log:")
        with open("output.log", "r") as log:
            print(log.read())
    else:
        print(f"💾 PDF generated successfully in: {work_dir}")

---

## ✨ Golden prompt

We need to write the detailed instruction of what we want from the model. We provide the structure of the LaTeX slides and logic that they should follow.  We also provide information about available images.

In [14]:
# This prompt is where you can specify any guidance on tone, or what topics the model should stick to, or avoid.
embed_fn.document_mode = False
query = "I need a presentation about AI agents."
query_oneline = query.replace("\n", " ")  # (Optional) For cleaner input in case of newlines
result = db.query(query_texts=[query], n_results=2)
[documents] = result["documents"]
[metadatas] = result["metadatas"]

prompt = r"""You are a presentation assistant that creates clear, concise, and engaging slide decks from the reference material provided. You extract the most relevant and important information, organize it logically, and generate LaTeX code for a presentation using the Beamer class.

Structure your slides as follows:
1. **Introduction**: Present the topic and explain why it's important or interesting.
2. **Main Content**: Break down the topic into 2–4 core ideas, one idea per slide. Explain each with clear language, bullet points, or short sentences.
3. **Summary**: Recap the key takeaways, what was learned, and what it means for the audience.

Use friendly and accessible language that anyone can understand. Avoid technical jargon unless it's essential—and when you use it, explain it simply.

Output only valid LaTeX Beamer code. Each slide must be defined using \begin{frame} ... \end{frame}.

Here is some reference content retrieved from a document. Please generate a LaTeX Beamer presentation based on the content, following this structure:

1. Introduction (what is the topic and why it matters)
2. Main Part (a few slides on the core ideas)
3. Summary (what was learned, 2–3 key takeaways)

Please output valid LaTeX code only, like this format:

\documentclass{beamer}
\usetheme{Madrid}

\title{[Presentation Title]}
\author{AI-generated}
\date{\today}

\begin{document}

\frame{\titlepage}

\begin{frame}
\frametitle{Introduction}
\begin{itemize}
\item What's the topic?
\item Why is it important?
\end{itemize}
\end{frame}

\begin{frame}
\frametitle{Core Idea 1}
\begin{itemize}
\item Key point 1
\item Key point 2
\end{itemize}
\end{frame}

\begin{frame}
\frametitle{Core Idea 2}
\begin{columns}
\begin{column}{0.5\linewidth}
\begin{itemize}
\item Key point 1
\item Key point 2
\end{itemize}
\end{column}
\begin{column}{0.5\linewidth}
\center{\includegraphics[height=1.0\textheight, width=1.0\textwidth, keepaspectratio]{gfx/image} \\ image caption}
\end{column}
\end{columns}
\end{frame}

\begin{frame}
\frametitle{Core Idea 3}
\center{\includegraphics[height=0.5\textheight, width=0.8\textwidth, keepaspectratio]{gfx/image} \\ image caption\\}
\begin{itemize}
\item Key point 1
\item Key point 2
\end{itemize}
\end{frame}

...

\begin{frame}
\frametitle{Summary}
\begin{itemize}
\item Key takeaway 1
\item Key takeaway 2
\end{itemize}
\end{frame}

\end{document}

"""

prompt += f"""You must use **at least one image** when creating a presentation. In your output, include `\\includegraphics` commands in **at least half of the slides** where appropriate.

Prioritize the slide structure of **Core Idea 2** (two-column layout) over **Core Idea 3** (single image on top).
Only use Core Idea 3 when appropriate based on the image orientation.

You must select images **only from the list provided below**. Each item includes:
- `"path"`: the exact image path to use — **you must copy it exactly as-is**.
- `"caption"`: the image's description — if `"None"`, **do not use the image**.
- `"orientation"`: either `"horizontal"`, `"vertical"`, or `"square"`.

Strict rules:
- Never invent, change, or guess the image path. Use the `"path"` value exactly as written.
- Never change or edit anything inside the square brackets `[...]` in any `\\includegraphics` command. All image formatting options (height, width, keepaspectratio) must be left untouched.
- Skip any image with caption `"None"`.
- Use **Core Idea 2** layout for **vertical** and **square** images.
- Use **Core Idea 3** layout for **horizontal** images (when a two-column layout is not suitable).\n\n"""

for passage, metas in zip(documents, metadatas):
    passage_oneline = passage.replace("\n", " ")
    images_passage = metas["images_passage"]

    prompt += f"PASSAGE: {passage_oneline}\n"
    prompt += f"IMAGES: {images_passage}\n"

---

## 💻 Generate the source code

Time to generate!

In [15]:
from datetime import datetime

# Create a new subfolder
base_path = "/kaggle/working/"
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
work_dir  = os.path.join(base_path, timestamp)
os.makedirs(work_dir, exist_ok=False)
print(f"📁 Created folder: {work_dir}")

# Generate the presentation code
model_name = "gemini-2.0-flash"
# model_name = "gemini-2.5-flash-preview-04-17"
answer = client.models.generate_content(
    model=model_name,
    contents=prompt
)
print(f"🎲 Generated the response using: {model_name}")

📁 Created folder: /kaggle/working/2025-04-19_11-22-10
🎲 Generated the response using: gemini-2.0-flash


---

## 🖼️ Processing images

We need to copy the images that the model decided to use in the presentation to the dedicated folder.

In [16]:
def save_pdf_images(pdf_path: str, req_imgs: list, images_dir: str):
    doc = fitz.open(pdf_path)
    pdf = os.path.splitext(os.path.basename(pdf_path))[0]

    # Create a set of page to process
    pages_to_inspect = set()
    for img in req_imgs:
        if pdf == img["doc"]:
            pages_to_inspect.add(img["page"])

    for page_num, page in enumerate(doc):
        if page_num in pages_to_inspect:

            # Extract images
            images_info = page.get_images(full=True)
            for img_index, img in enumerate(images_info):
                xref = img[0]
                base_image  = doc.extract_image(xref)
                image_bytes = base_image["image"]
                image_hash  = hashlib.md5(image_bytes).hexdigest()

                image_found = any(
                        pdf       == img["doc"]
                    and page_num  == img["page"]
                    and img_index == img["img"]
                    and image_hash.startswith(img["hash"])
                    for img in req_imgs
                )
    
                # Save the image
                if image_found:
                    image_name = f"doc{pdf}_page{page_num}_img{img_index}_hash{image_hash[:8]}.png"
                    image_path = os.path.join(images_dir, image_name)
                    with open(image_path, "wb") as f:
                        f.write(image_bytes)

    return True

def save_pdf_figures(pdf_path: str, req_figs: list, figures_dir: str):
    doc = fitz.open(pdf_path)
    pdf = os.path.splitext(os.path.basename(pdf_path))[0]
    
    # Create a set of page to process
    pages_to_inspect = set()
    for fig in req_figs:
        if pdf == fig["doc"]:
            pages_to_inspect.add(fig["page"])

    for page_num, page in enumerate(doc):
        if page_num in pages_to_inspect:
            MAX_DRAWINGS =  1000
            MIN_SIZE     =  0.05
            MAX_SIZE     =  0.30
            THRESHOLD    =     5
            ZOOM         =     4

            page_size = page.rect.width * page.rect.height
            min_size  = page_size * MIN_SIZE
            max_size  = page_size * MAX_SIZE

            all_text = page.get_text()
            drawings = page.get_drawings()

            # Group drawings into figures
            grouped = process_large_drawing(drawings, max_drawings=MAX_DRAWINGS, threshold=THRESHOLD)

            for group_num, group in enumerate(grouped):
                # Try to include any text labels around
                surrounding = find_surrounding_text(page, group, threshold=THRESHOLD)
                if surrounding:
                    figure_bbox = merge_bounding_boxes([group] + surrounding)
                else:
                    figure_bbox = group

                # Filter by minimal plot size
                width  = figure_bbox[2] - figure_bbox[0]
                height = figure_bbox[3] - figure_bbox[1]
                area   = width * height
                if area > min_size and area < max_size:
                    scale_mat    = fitz.Matrix(ZOOM, ZOOM)
                    figure_pix   = page.get_pixmap(matrix=scale_mat, clip=figure_bbox)
                    figure_bytes = figure_pix.tobytes("png")
                    figure_hash  = hashlib.md5(figure_bytes).hexdigest()

                    figure_found = any(
                            pdf       == fig["doc"]
                        and page_num  == fig["page"]
                        and group_num == fig["fig"]
                        and figure_hash.startswith(fig["hash"])
                        for fig in req_figs
                    )
                    
                    # Save the figure
                    if figure_found:
                        figure_name = f"doc{pdf}_page{page_num}_fig{group_num}_hash{figure_hash[:8]}.png"
                        figure_path = os.path.join(figures_dir, figure_name)
                        with open(figure_path, "wb") as f:
                            f.write(figure_bytes)

    return True

In [17]:
# Create a subfolder for graphics
graphics_dir = os.path.join(work_dir, "gfx")
os.makedirs(graphics_dir, exist_ok=True)
print(f"🌄 Images will be saved to: {graphics_dir}")

# Find images, which are used in the presentation
pattern_img = re.compile(
    r"doc(?P<doc>[a-zA-Z0-9_]+)_page(?P<page>\d+)_img(?P<img>\d+)_hash(?P<hash>[a-fA-F0-9]{8})\.png"
)
matches_img = pattern_img.finditer(answer.text)

req_imgs = []
for match in matches_img:
    req_img = {
        "doc":  match.group("doc"),
        "page": int(match.group("page")),
        "img":  int(match.group("img")),
        "hash": match.group("hash")
    }
    req_imgs.append(req_img)

# Find figures, which are used in the presentation
pattern_fig = re.compile(
    r"doc(?P<doc>[a-zA-Z0-9_]+)_page(?P<page>\d+)_fig(?P<fig>\d+)_hash(?P<hash>[a-fA-F0-9]{8})\.png"
)
matches_fig = pattern_fig.finditer(answer.text)

req_figs = []
for match in matches_fig:
    req_fig = {
        "doc":  match.group("doc"),
        "page": int(match.group("page")),
        "fig":  int(match.group("fig")),
        "hash": match.group("hash")
    }
    req_figs.append(req_fig)

# Save the required graphics
for metadata in metadatas:
    save_pdf_images (metadata["pdf_path"], req_imgs, graphics_dir)
    save_pdf_figures(metadata["pdf_path"], req_figs, graphics_dir)

🌄 Images will be saved to: /kaggle/working/2025-04-19_11-22-10/gfx


---

## 🎯 Source code compilation

Now, we have everything we need to get the result!

In [18]:
CompilePresentation(answer.text, work_dir)

📄 Files in the directory:
gfx  presentation.tex
💾 PDF generated successfully in: /kaggle/working/2025-04-19_11-22-10


---
---

# 🏁 Final results

---

## ⭐️ Your presentation

In [19]:
import base64
from IPython.display import HTML, display, Markdown, FileLink
import warnings

presentation_path = os.path.join(work_dir, "presentation.pdf")

warnings.filterwarnings("ignore", category=UserWarning, module='IPython.core.display')

with open(presentation_path, "rb") as f:
    base64_pdf = base64.b64encode(f.read()).decode('utf-8')

pdf_display = f'<iframe src="data:application/pdf;base64,{base64_pdf}" width="800" height="600" type="application/pdf"></iframe>'
HTML(pdf_display)

---

## 📋 Validation

Let's pass the generated code though our simple cheklist and ask a user for a feedback.

In [20]:
def basic_latex_checks(latex_code: str):
    checks = {
        "Has Introduction slide": bool(re.search(r"Introduction", latex_code, re.IGNORECASE)),
        "Has at least 2 Core slides": len(re.findall(r"\\begin{frame}", latex_code)) >= 5,
        "Has Summary slide": bool(re.search(r"Summary", latex_code, re.IGNORECASE)),
        "Uses at least one image": "includegraphics" in latex_code,
    }
    return checks

In [21]:
print("✅ Evaluation checklist:")
for key, passed in basic_latex_checks(answer.text).items():
    print(f"- {key}: {'✔️' if passed else '❌'}")

✅ Evaluation checklist:
- Has Introduction slide: ✔️
- Has at least 2 Core slides: ✔️
- Has Summary slide: ✔️
- Uses at least one image: ✔️


---

## 📝 Human Evaluation

In [22]:
display(Markdown("""
Please rate the presentation you just generated on the following:

- ✅ Relevance to topic (1–5):
- ✅ Slide structure and clarity (1–5):
- ✅ Overall usefulness (1–5):

You can use this to compare different models, prompts, or retrieval settings.
"""))


Please rate the presentation you just generated on the following:

- ✅ Relevance to topic (1–5):
- ✅ Slide structure and clarity (1–5):
- ✅ Overall usefulness (1–5):

You can use this to compare different models, prompts, or retrieval settings.


---
---

#### This is where this story ends. Thanks for your participation! We hope you enjoyed the results.

Anna & Kajetan