# AI PPTX Creator - Improved Version

This improved version includes:
- Better error handling and validation
- Configuration management
- Code organization with functions
- Security improvements
- Better documentation

## Table of Contents

1. [Configuration and Setup](#1.-Configuration-and-Setup)
2. [Testing GPT4 Connection](#2.-Testing-GPT4-Connection)
3. [Loading PDF Files](#3.-Loading-PDF-Files)
4. [Embedding Model and Vector Database](#4.-Embedding-Model-and-Vector-Database)
5. [Creating RAG Chain](#5.-Creating-RAG-Chain)
6. [Generating PPTX Code](#6.-Generating-PPTX-Code)
7. [Creating the Presentation](#7.-Creating-the-Presentation)

## 1. Configuration and Setup

Load environment variables and configure constants. Using a configuration class makes the code more maintainable and testable.

In [None]:
# Standard library imports
import os
import sys
from pathlib import Path
from typing import List, Dict, Any

# Third-party imports
from dotenv import load_dotenv
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai import OpenAI
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.documents import Document

In [None]:
class Config:
    """Configuration class for AI PPTX Creator."""
    
    # API Configuration
    OPENAI_API_KEY: str = None
    
    # Model Configuration
    CHAT_MODEL: str = "gpt-4-turbo"
    CODE_GEN_MODEL: str = "gpt-3.5-turbo-instruct"  # More cost-effective for code generation
    CODE_GEN_TEMPERATURE: float = 0.0  # Deterministic output for code
    CODE_GEN_MAX_TOKENS: int = 2048  # Increased for longer code
    
    # Retriever Configuration
    RETRIEVER_K: int = 2  # Number of documents to retrieve
    RETRIEVER_LAMBDA_MULT: float = 0.25  # MMR diversity parameter
    
    # Directory Configuration
    BASE_DIR: Path = Path("..")
    PDF_DIR: Path = BASE_DIR / "pdfs"
    PPTX_DIR: Path = BASE_DIR / "pptx"
    CHROMA_DB_DIR: Path = BASE_DIR / "chroma_db"
    
    @classmethod
    def load_env(cls) -> None:
        """Load environment variables and validate configuration."""
        load_dotenv()
        cls.OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
        
        if not cls.OPENAI_API_KEY:
            raise ValueError(
                "OPENAI_API_KEY not found in environment variables. "
                "Please check your .env file."
            )
        
        # Ensure directories exist
        cls.PPTX_DIR.mkdir(exist_ok=True)
        
        print("âœ“ Configuration loaded successfully")
        print(f"âœ“ PDF directory: {cls.PDF_DIR}")
        print(f"âœ“ Output directory: {cls.PPTX_DIR}")

# Load configuration
Config.load_env()

## 2. Testing GPT4 Connection

Verify the API connection with error handling.

In [None]:
def test_llm_connection(test_query: str = "What is the Suez Canal?") -> str:
    """
    Test the LLM connection with a simple query.
    
    Args:
        test_query: The test question to ask
        
    Returns:
        The model's response
        
    Raises:
        Exception: If connection fails
    """
    try:
        model = ChatOpenAI(model=Config.CHAT_MODEL)
        response = model.invoke(test_query)
        print("âœ“ LLM connection successful")
        return response.content
    except Exception as e:
        print(f"âœ— LLM connection failed: {e}")
        raise

# Test the connection
response = test_llm_connection()
print(f"\nResponse preview: {response[:200]}...")

## 3. Loading PDF Files

Load PDF documents with validation.

In [None]:
def load_pdf_documents(pdf_dir: Path = Config.PDF_DIR) -> List[Document]:
    """
    Load PDF documents from the specified directory.
    
    Args:
        pdf_dir: Path to the directory containing PDF files
        
    Returns:
        List of loaded documents
        
    Raises:
        FileNotFoundError: If PDF directory doesn't exist
        ValueError: If no PDF files are found
    """
    if not pdf_dir.exists():
        raise FileNotFoundError(f"PDF directory not found: {pdf_dir}")
    
    # Check for PDF files
    pdf_files = list(pdf_dir.glob("*.pdf"))
    if not pdf_files:
        raise ValueError(f"No PDF files found in {pdf_dir}")
    
    print(f"Found {len(pdf_files)} PDF file(s): {[f.name for f in pdf_files]}")
    
    # Load documents
    loader = PyPDFDirectoryLoader(str(pdf_dir))
    pages = loader.load()
    
    print(f"âœ“ Loaded {len(pages)} pages from PDF documents")
    return pages

# Load PDF documents
pages = load_pdf_documents()

## 4. Embedding Model and Vector Database

Create embeddings and store in ChromaDB for efficient retrieval.

**Embeddings** transform text into numerical vectors that capture semantic meaning, enabling similarity search.

**ChromaDB** is a vector database optimized for storing and retrieving embeddings.

In [None]:
def create_vector_store(documents: List[Document]) -> Chroma:
    """
    Create or load a ChromaDB vector store from documents.
    
    Args:
        documents: List of documents to embed
        
    Returns:
        Chroma vector store instance
    """
    try:
        # Initialize embedding model
        embeddings = OpenAIEmbeddings()
        
        # Create vector store
        vector_store = Chroma.from_documents(
            documents=documents,
            embedding=embeddings,
            persist_directory=str(Config.CHROMA_DB_DIR)
        )
        
        print(f"âœ“ Vector store created with {len(documents)} documents")
        return vector_store
        
    except Exception as e:
        print(f"âœ— Error creating vector store: {e}")
        raise

def create_retriever(vector_store: Chroma):
    """
    Create a retriever with Maximal Marginal Relevance (MMR).
    
    MMR balances relevance and diversity in retrieved documents.
    
    Args:
        vector_store: The vector store to create retriever from
        
    Returns:
        Configured retriever
    """
    retriever = vector_store.as_retriever(
        search_type="mmr",
        search_kwargs={
            "k": Config.RETRIEVER_K,
            "lambda_mult": Config.RETRIEVER_LAMBDA_MULT
        }
    )
    print("âœ“ Retriever configured with MMR search")
    return retriever

# Create vector store and retriever
chroma_db = create_vector_store(pages)
retriever = create_retriever(chroma_db)

## 5. Creating RAG Chain

Build a Retrieval-Augmented Generation (RAG) chain to generate structured content from the PDF.

In [None]:
def create_content_generation_chain(retriever, model):
    """
    Create a RAG chain for generating structured bullet points.
    
    Args:
        retriever: Document retriever
        model: Language model to use
        
    Returns:
        Configured chain
    """
    template = """
    You are an expert at summarizing documents into clear, structured presentations.
    
    Given the context below, generate:
    1. A clear, descriptive header
    2. Exactly 10 numbered bullet points
    3. Each bullet point should be 30-40 words
    4. Focus on the most important information
    
    Format:
    **Header: [Your Header Here]**
    
    1. **[Topic]**: [Description]
    2. **[Topic]**: [Description]
    ...
    
    Context: {context}
    
    Question: {question}
    """
    
    prompt = ChatPromptTemplate.from_template(template)
    parser = StrOutputParser()
    
    chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | model
        | parser
    )
    
    return chain

# Initialize model and chain
chat_model = ChatOpenAI(model=Config.CHAT_MODEL)
content_chain = create_content_generation_chain(retriever, chat_model)

print("âœ“ RAG chain created successfully")

In [None]:
# Generate content
query = "What are the key points and implications of the briefing?"
print(f"Query: {query}\n")

response = content_chain.invoke(query)
print(response)

## 6. Generating PPTX Code

Use an LLM to generate Python code for creating the PowerPoint presentation.

In [None]:
def create_code_generation_chain(presentation_title: str, output_filename: str):
    """
    Create a chain for generating python-pptx code.
    
    Args:
        presentation_title: Title for the presentation
        output_filename: Name of the output PPTX file
        
    Returns:
        Configured chain
    """
    template = """
    You are an expert Python developer specializing in the python-pptx library.
    
    Task: Generate clean, executable Python code to create a PowerPoint presentation.
    
    Requirements:
    1. Import required modules: `from pptx import Presentation` and `from pptx.util import Pt`
    2. Create presentation with:
       - Slide 1 (layout 0): Title: "{title}", Subtitle: "Generated by AI"
       - Slide 2 (layout 1): Title: "Key Insights (Part 1)", Content: First 5 bullet points
       - Slide 3 (layout 1): Title: "Key Insights (Part 2)", Content: Last 5 bullet points
    3. Set body text font size to 18pt for readability
    4. Save to: "{output_path}"
    5. Output ONLY executable Python code, NO markdown formatting
    6. Add error handling for file operations
    
    Content to include:
    {context}
    
    Output format: Plain Python code only, no ```python``` markers.
    """
    
    output_path = Config.PPTX_DIR / output_filename
    
    prompt = ChatPromptTemplate.from_template(template)
    prompt = prompt.partial(
        title=presentation_title,
        output_path=str(output_path)
    )
    
    model = OpenAI(
        temperature=Config.CODE_GEN_TEMPERATURE,
        max_tokens=Config.CODE_GEN_MAX_TOKENS
    )
    parser = StrOutputParser()
    
    chain = prompt | model | parser
    return chain

# Create code generation chain
code_chain = create_code_generation_chain(
    presentation_title="EPRS Briefing Analysis",
    output_filename="Red_Sea_Security_Threats.pptx"
)

print("âœ“ Code generation chain created")

In [None]:
# Generate Python code
generated_code = code_chain.invoke({"context": response})
print("Generated code preview:")
print("=" * 80)
print(generated_code[:500] + "...\n" + "=" * 80)

## 7. Creating the Presentation

Execute the generated code with proper validation and error handling.

In [None]:
def clean_python_code(code_str: str) -> str:
    """
    Remove markdown code block syntax from generated code.
    
    Args:
        code_str: Raw code string from LLM
        
    Returns:
        Cleaned Python code
    """
    # Remove markdown code blocks
    if '```python' in code_str:
        code_str = code_str.split('```python')[1]
    if '```' in code_str:
        code_str = code_str.split('```')[0]
    
    return code_str.strip()

def validate_code_safety(code: str) -> tuple[bool, str]:
    """
    Perform basic safety checks on generated code.
    
    Args:
        code: The code to validate
        
    Returns:
        Tuple of (is_safe, message)
    """
    dangerous_patterns = [
        "os.system",
        "subprocess",
        "eval(",
        "__import__",
        "open(",  # Should only save PPTX files
    ]
    
    # Check for dangerous patterns (excluding open() in context of pptx.save)
    for pattern in dangerous_patterns:
        if pattern == "open(" and ".save(" in code:
            continue  # Allow pptx save operations
        if pattern in code:
            return False, f"Potentially unsafe code detected: {pattern}"
    
    # Verify required imports are present
    required_imports = ["from pptx import Presentation"]
    for req in required_imports:
        if req not in code:
            return False, f"Missing required import: {req}"
    
    return True, "Code validation passed"

def execute_generated_code(code: str, verbose: bool = True) -> bool:
    """
    Safely execute the generated Python code.
    
    Args:
        code: The Python code to execute
        verbose: Whether to print execution details
        
    Returns:
        True if execution succeeded, False otherwise
    """
    # Clean the code
    cleaned_code = clean_python_code(code)
    
    if verbose:
        print("Cleaned code:")
        print("=" * 80)
        print(cleaned_code)
        print("=" * 80)
    
    # Validate code safety
    is_safe, message = validate_code_safety(cleaned_code)
    if not is_safe:
        print(f"âœ— {message}")
        print("Code execution blocked for safety reasons.")
        return False
    
    print(f"âœ“ {message}")
    
    # Execute the code
    try:
        exec(cleaned_code)
        print("âœ“ Presentation created successfully!")
        return True
    except Exception as e:
        print(f"âœ— Error executing code: {e}")
        import traceback
        traceback.print_exc()
        return False

# Execute the generated code
success = execute_generated_code(generated_code, verbose=False)

if success:
    output_file = Config.PPTX_DIR / "Red_Sea_Security_Threats.pptx"
    print(f"\nðŸ“Š Presentation saved to: {output_file}")

## Summary

This improved notebook demonstrates:
- âœ“ Configuration management with a Config class
- âœ“ Proper error handling and validation
- âœ“ Code organization with reusable functions
- âœ“ Security checks for generated code
- âœ“ Better documentation and type hints
- âœ“ Modular design for easier testing and maintenance