## Generate documents

This Jupyter cell contains Python code designed to automatically generate a book in LaTeX format.  It leverages several key technologies to streamline the process:

*   **OpenAI's language models (like o1):** To generate the actual content of each book section based on a defined outline and relevant background information.
*   **Pandas:** To efficiently manage and load background data, which is expected to be pre-processed and saved in a pickle file. This background data contains text and pre-calculated embeddings for similarity searches.
*   **Pickle:** To load the background data quickly from a `.pkl` file, preserving the data structure and embeddings.
*   **LaTeX:** To format the generated book content into a professional, high-quality PDF document.

**Here's a high-level overview of what the code does:**

1.  **Loads Book Outline:** Reads a JSON file (`book_outline.json`) that defines the structure of your book (sections, titles, goals, and required background for each section).
2.  **Loads Background Data:** Loads a pre-processed Pandas DataFrame from a pickle file (`regulations_with_embeddings.pkl`). This DataFrame should contain background text and their corresponding embeddings.
3.  **Iterates Through Book Sections:**  Loops through each section defined in the book outline.
4.  **Finds Relevant Background Text:** For each section, it uses cosine similarity to find the most relevant background text from the loaded DataFrame based on the "required background" description in the outline.
5.  **Generates Section Content with OpenAI:**  Uses OpenAI's API to generate the text content for each section, providing the section title, goal, and the most similar background text as context to the language model.
6.  **Formats Content in LaTeX:**  Structures the generated text into LaTeX sections, including proper LaTeX preamble and postamble for a complete document.  It also includes basic escaping of LaTeX special characters in titles and preamble.
7.  **Saves LaTeX File:**  Saves the complete LaTeX code to a `.tex` file (`generated_book.tex`).
8.  **Compiles LaTeX to PDF (Optional):**  Attempts to automatically compile the generated `.tex` file into a PDF document using `pdflatex`.

This code provides a framework for automated book generation, and you can customize the outline, background data, prompts, and LaTeX formatting to create your own unique book.  Run the cell to start the book generation process!

In [1]:
import json
import numpy as np
import pickle
import re
import logging
from sklearn.metrics.pairwise import cosine_similarity
import openai
import textwrap
import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.callbacks import get_openai_callback
import subprocess
import datetime
import pandas as pd
import shutil
import glob

import logging
from typing import Any, Union, List, Dict

# --- Logging Configuration ---
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

### A. Parametrisation

In [2]:
# --- Configuration ---
JSON_OUTLINE_FILE = "outline_v3.json"  # Path to your JSON outline file
GENERAL_KB = "regulations_with_embeddings.pkl"  # Path to your pickle file with background data
VALIDATION_KB = 'validation_kb.pkl'
TEXT_COLUMN_NAME = "body_of_the_text"  # Column with text content
EMBEDDING_COLUMN_NAME = "combined_text_embedding"  # Column with pre-calculated embeddings
OPENAI_MODEL = "o1-preview"  # "gpt-4o"  # Your preferred OpenAI model
LATEX_OUTPUT_FILE = f"validation_book_{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.tex"
TABLE_INPUT_FILE = "validation_table.tex"

# Global variable to track the total cost for the whole run
TOTAL_COST = 0.0

In [None]:
# {
#   "table_of_contents": [
#     {
#       "id": "1",
#       "level": 1,
#       "heading": "Introduction to Validation",
#       "goal": "Highlight why validation is essential for financial institutions and lay the groundwork for subsequent validation chapters.",
#       "expected_length": "2 pages",
#       "required_background": "Readers should be familiar with basic financial concepts such as loans, defaults, and interest rates, and have a general understanding of risk management principles."
#     },
#     {
#       "id": "1.1",
#       "level": 2,
#       "heading": "The role of Credit Risk Models",
#       "goal": "Introduce the main types of credit risk models (PD, LGD, EAD, ELBE) and explain their role in credit risk management. Provide a concise overview of the various credit risk model types, their typical use cases, and key differences in data and methodology.",
#       "expected_length": "1 page",
#       "required_background": "A rudimentary knowledge of probability, statistics, and how banks estimate default risk is helpful."
#     },
#     {
#       "id": "1.2",
#       "level": 2,
#       "heading": "Assessment of validation tests",
#       "goal": "Create a table with main validation tests and related thresholds. Explain ho this table should be used.",
#       "expected_length": "1 page",
#       "required_background": "Basic understanding of model performance metrics and statistical testing.",
#       "details": "The table should have the following headers: Statistic, Green, Amber, Red. All headers should be bold. The table rows should be as follows: AUC (Area Under the ROC Curve) with Green: >= 0.75, Amber: 0.65 - 0.75, Red: < 0.65; Kolmogorov-Smirnov (KS) Statistic with Green: >= 0.40, Amber: 0.30 - 0.40, Red: < 0.30; Gini Coefficient with Green: >= 0.50, Amber: 0.30 - 0.50, Red: < 0.30; Brier Score with Green: <= 0.10, Amber: 0.10 - 0.20, Red: > 0.20; Hosmer-Lemeshow Statistic (p-value) with Green: > 0.05, Amber: 0.01 - 0.05, Red: <= 0.01; Accuracy with Green: >= 0.80, Amber: 0.70 - 0.80, Red: < 0.70; Precision (for default class) with Green: >= 0.60, Amber: 0.40 - 0.60, Red: < 0.40; Recall (for default class) with Green: >= 0.70, Amber: 0.50 - 0.70, Red: < 0.50; F1-Score (for default class) with Green: >= 0.65, Amber: 0.45 - 0.65, Red: < 0.45; Somers' D with Green: >= 0.50, Amber: 0.30 - 0.50, Red: < 0.30."
#     }
#   ]
# }

### B. Helper functions

In [3]:
# --- Helper Functions ---
def load_json_outline(json_file: str) -> Dict[str, Any]:
    """Loads the book outline from a JSON file."""
    try:
        with open(json_file, "r", encoding="utf-8") as f:
            outline = json.load(f)
        logging.info(f"Book outline loaded from {json_file}")
        return outline
    except Exception as e:
        logging.error(f"Error loading JSON outline: {e}")
        raise


def import_latex_table(filename):
    """Imports the LaTeX table from a separate file."""
    try:
        with open(filename, "r", encoding="utf-8") as infile:
            table_content = infile.read()
        return table_content
    except FileNotFoundError:
        logging.error(f"Error: Table file '{filename}' not found.")
        return ""  # Return an empty string so the program can continue
    except Exception as e:
        logging.error(f"Error reading table file '{filename}': {e}")
        return ""
        

def load_dataframe_from_pickle(pickle_filepath: str) -> Any:
    """Loads a DataFrame from a pickle file."""
    try:
        with open(pickle_filepath, "rb") as f:
            loaded_df = pickle.load(f)
        logging.info(f"DataFrame loaded from pickle file: {pickle_filepath}")
        return loaded_df
    except Exception as e:
        logging.error(f"Error loading DataFrame pickle: {e}")
        raise


def load_pandas_dataframe(pkl_file: str) -> Any:
    """
    Loads a pandas DataFrame from a pickle file and converts embedding strings to numpy arrays.
    Use this if your embeddings are stored as strings.
    """
    df = load_dataframe_from_pickle(pkl_file)
    try:
        df[EMBEDDING_COLUMN_NAME] = df[EMBEDDING_COLUMN_NAME].apply(
            lambda x: np.array(json.loads(x)) if isinstance(x, str) else x
        )
        logging.info("Embeddings converted to numpy arrays (if necessary).")
    except Exception as e:
        logging.error(f"Error converting embeddings: {e}")
        raise
    return df


def get_embedding(text: str, model: str = "text-embedding-ada-002") -> list:
    """Generates an embedding for the given text using OpenAI."""
    text = text.replace("\n", " ")
    try:
        response = openai.embeddings.create(input=[text], model=model)
        return response.data[0].embedding
    except Exception as e:
        logging.error(f"Error generating embedding: {e}")
        raise


def find_similar_background_text(
    df: pd.DataFrame,
    background_description: str,
    text_column: str,
    embedding_column: str,
    top_n: int = 5,
    show_scores: bool = True
) -> Union[str, List[str]]:
    """
    Finds the `top_n` most similar texts in `df` to the `background_description` 
    using cosine similarity. If `top_n`=1 (the default), returns a single string. 
    Otherwise, returns a list of the top results.
    
    :param df: DataFrame containing your text and embeddings
    :param background_description: The text you want to compare to
    :param text_column: Name of the column in df that has the text
    :param embedding_column: Name of the column in df that has the embeddings
    :param top_n: How many top results to return
    :param show_scores: Whether to print the text and similarity score for each of the top results
    :return: A single string if top_n == 1, otherwise a list of strings
    """
    try:
        # Compute the embedding for the background description
        description_embedding = np.array(get_embedding(background_description)).reshape(1, -1)
        
        # Make sure we have a numpy array of all embeddings
        background_embeddings = np.vstack(df[embedding_column].to_numpy())  # shape: (num_rows, embedding_dim)

        # Compute similarity: shape => (1, num_rows)
        similarities = cosine_similarity(description_embedding, background_embeddings).flatten()
        
        # Get indices of top_n results, sorted by descending similarity
        top_indices = np.argsort(similarities)[::-1][:top_n]

        # Build the top results
        top_results = []
        for idx in top_indices:
            text_val = df.iloc[idx][text_column]
            score_val = similarities[idx]
            top_results.append(text_val)
            
            # Optionally show each text with its similarity score
            if show_scores:
                print(f"Similarity: {score_val:.4f} | Background: {background_description} | Text: {text_val}")
        
        # Return a single string if top_n = 1, else return a list
        if top_n == 1:
            return top_results[0] if top_results else ""
        else:
            return top_results
    
    except Exception as e:
        logging.error(f"Error finding similar background text: {e}")
        raise

### C. Content functions

In [4]:
def generate_content_text(title: str, 
                          goal: str,
                          level: int,
                          background: str,
                          references: str,
                          model: str = OPENAI_MODEL,
                          use_langchain: bool = False) -> str:
    """
    Generates LaTeX-formatted text for either a section or a subsection.
    
    Parameters:
      - title: The title of the section/subsection.
      - goal: The goal of the section/subsection.
      - background_text: Background content to guide the writing.
      - level: Either "section" or "subsection". This will modify the prompt.
      - model: The OpenAI model to use.
      - use_langchain: If True, use LangChain's ChatOpenAI; otherwise, use openai.chat.completions.create.
    """
    global TOTAL_COST
    
    header = f"""
        You are a helpful AI assistant specialized in writing technical books about regulatory compliance and model validation in finance.

        Level {level} heading | Heading: {title}

        Goal of this book part of the book: {goal}

        Background information to consider when writing:
        { references }
        { background }
    """
    
    instructions = f"""
        ---
        Write the content for the {title} above, keeping in mind the goal and background information.
        Under no circumstance change the title label and NEVER create any additional Latex sections, subsections or sub-subsections.
        Format the output as LaTeX, suitable for inclusion in a LaTeX document.
        Level 1 heading stands for section, level 2 heading stands for subsection, level 3 stands for sub-subsection - add it to Latex. 
        Please use standard LaTeX commands for formatting (e.g., \\textbf{{important text}}, \\textit{{emphasized text}}).
        If you need to include lists, use LaTeX list environments like \\begin{{itemize}} ... \\end{{itemize}} or \\begin{{enumerate}} ... \\end{{enumerate}}.
    """
    
    instructions += "For mathematical formulas, use inline math mode $...$ or display math mode \\begin{{equation}} ... \\end{{equation}}.\n"
    instructions += "Do not use mathematical formulas.\n\n"
    instructions += "----\n"
    instructions += "When writing Python code, **format the output as valid Python code enclosed in ```python code blocks.**\n"
    instructions += "Include comments to explain the code where necessary.\n"
    instructions += "Focus on clarity, correctness, and efficiency of the Python code.\n"
    instructions += "Do not include any explanations outside of the code block.\n"

    prompt = header + instructions
    
    # Remove any unwanted indentation from the multi-line string.
    prompt = textwrap.dedent(prompt)

    try:
        if use_langchain:
            llm = ChatOpenAI(model_name=model, temperature=1.0)
            with get_openai_callback() as cb:
                response = llm.invoke(prompt)
                content = response.content.strip()
                logging.info("LLM call for content generation completed.")
                logging.info(f"Generation token usage: {cb.total_tokens} (Prompt: {cb.prompt_tokens}, "
                             f"Completion: {cb.completion_tokens}, Cost: ${cb.total_cost:.4f})")

                 # Accumulate the cost from this call
                TOTAL_COST += cb.total_cost
                
            return content
        else:
            response = openai.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": "You are a helpful AI assistant specialized in LaTeX output for technical books."},
                    {"role": "user", "content": prompt}
                ],
                # temperature=0.7,
                max_completion_tokens=1700
            )
            return response.choices[0].message.content.strip()
    except Exception as e:
        logging.error(f"Error generating text for Level {level} '{title}': {e}")
        return f"**Error generating content for this Level {level}. Please check logs.**"


def create_latex_section(section_text: str) -> str:
    """Formats a section with a LaTeX \\section header."""
    return f"{section_text}"


def create_latex_subsection(subsection_text: str) -> str:
    """Formats a subsection with a LaTeX \\subsection header."""
    return f"{subsection_text}"


def create_latex_preamble(title: str, author: str, header_text: str = 'Do you trust your credit risk models?') -> str:
    """
    Creates the LaTeX preamble for the document, including a custom title page.
    """
    preamble = f"""
\\documentclass[12pt,a4paper]{{article}}

\\usepackage[utf8]{{inputenc}}
\\usepackage[T1]{{fontenc}}
\\usepackage{{lmodern}}
\\usepackage[margin=1in]{{geometry}}
\\usepackage{{setspace}}
\\usepackage{{titlesec}}
\\usepackage{{etoolbox}}
\\usepackage{{fancyhdr}}
\\usepackage{{graphicx}}
\\usepackage{{amsmath}}
\\usepackage{{listings}} % For code listings
\\usepackage[table]{{xcolor}}
\\usepackage{{booktabs}}
\\usepackage{{array}}
\\usepackage{{caption}}
\\usepackage{{ragged2e}}  % For \RaggedRight in table cells

\\lstset{{
  basicstyle=\\ttfamily\\footnotesize,
  breaklines=true,
  showstringspaces=false
}}

% Ensure each \\section begins on a new page
\\preto\\section{{\\clearpage}}

% Format for section and subsection titles
\\titleformat{{\\section}}{{\\large\\bfseries}}{{\\thesection}}{{1em}}{{}}
\\titleformat{{\\subsection}}{{\\normalsize\\bfseries}}{{\\thesubsection}}{{1em}}{{}}

\\setlength{{\\parindent}}{{10pt}}
\\setlength{{\\parskip}}{{0.5\\baselineskip}}
\\setlength{{\\headheight}}{{14.5pt}}

\\pagestyle{{fancy}}
\\fancyhf{{}}
\\fancyhead[C]{{{header_text}}}
\\fancyfoot[C]{{\\thepage}}
\\renewcommand{{\\headrulewidth}}{{0pt}}

\\begin{{document}}
\\pagenumbering{{gobble}}

% --- Custom Title Page ---
\\begin{{titlepage}}
    \\begin{{center}}
        \\vspace*{{3cm}}
        
        {{\\Huge \\textbf{{{title}}}}}\\\\[0.8em]
        
        {{\\Large \\textit{{Review and application of key validation tests}}}}\\\\[2.5cm]
        
        {{\\large \\textbf{{{author}}}}}\\\\[0.5cm]
        
        \\vfill
        {{\\large \\today}}
    \\end{{center}}
    \\thispagestyle{{empty}}
\\end{{titlepage}}

\\thispagestyle{{empty}}
\\tableofcontents
\\thispagestyle{{empty}} % No page number on ToC
\\clearpage
\\pagenumbering{{arabic}}
\\setcounter{{page}}{{1}}
"""
    return preamble

def create_latex_postamble() -> str:
    """Creates the LaTeX postamble for the document."""
    return "\n\\end{document}\n"


def convert_markdown_code_blocks_to_lstlisting(text: str) -> str:
    """
    Converts Markdown code blocks (```python ... ```) into LaTeX lstlisting environments.
    This helps prevent errors from raw backticks in the LaTeX document.
    """
    pattern = re.compile(r"```python\s*(.*?)\s*```", re.DOTALL)
    def repl(match):
        code_content = match.group(1)
        return "\\begin{lstlisting}[language=Python]\n" + code_content + "\n\\end{lstlisting}"
    return pattern.sub(repl, text)


def compile_latex_to_pdf(tex_filename):
    """
    Compiles a .tex file to PDF using pdflatex (requires LaTeX installed).
    """
    try:
        # Run pdflatex twice to ensure references are updated if needed
        subprocess.run(["pdflatex", tex_filename], check=True)
        subprocess.run(["pdflatex", tex_filename], check=True)
        print("PDF successfully generated.")
    except subprocess.CalledProcessError as e:
        print(f"Error during LaTeX compilation: {e}")

### D. Main function

In [5]:
# --- Main Function (Modified for Flat Structure) ---
def main():
    """Main function to generate the book content."""
    global TOTAL_COST  # Make sure to update this where you incur costs

    # Load the book outline
    book_outline = load_json_outline(JSON_OUTLINE_FILE)

    # Load the background DataFrame.
    try:
        validation_df = load_pandas_dataframe(VALIDATION_KB)
        general_df = load_pandas_dataframe(GENERAL_KB)
    except Exception as e:
        logging.info(f"Something went wrong: {e}")

    # Merge with another knowledge base
    cols = ['source', 'body_of_the_text', 'word_count', 'combined_text_embedding']
    background_df = pd.concat([validation_df[cols], general_df[cols]], axis=0, ignore_index=True)

    # Filter out too short or too long paragraphs
    idx_1 = background_df['word_count'] > 10
    idx_2 = background_df['word_count'] < 1000
    background_df = background_df[idx_1 & idx_2]

    # Create the LaTeX preamble
    latex_content = create_latex_preamble(
        title="Do you trust your risk models?",
        author="Collaboration between Human and AI"
    )
    logging.info("Generating book content section by section...")

    # contents = book_outline["table_of_contents"][:6]
    contents = book_outline["table_of_contents"]

    for section_data in contents:
        level = section_data["level"]
        heading = section_data["heading"]
        goal = section_data["goal"]
        background = section_data["required_background"]
        section_number = section_data["id"]

        if level == 0:
            continue

        logging.info(f"Processing section {section_number}: {heading}")
        logging.info("Finding similar background for section intro...")

        references = find_similar_background_text(
            background_df, goal, TEXT_COLUMN_NAME, EMBEDDING_COLUMN_NAME
        )

        logging.info("Generating text with OpenAI for section intro...")

        generated_section_text = generate_content_text(
            title=heading,
            goal=goal,
            level=level,
            background=background,
            references=references,
            model=OPENAI_MODEL,
            use_langchain=True  # Adjust as needed
        )

        # Convert any Markdown code blocks to lstlisting environments
        generated_section_text = convert_markdown_code_blocks_to_lstlisting(generated_section_text)
        latex_content += create_latex_section(generated_section_text)

    # Import the Validation Table
    latex_content += "\\clearpage"
    table_latex = import_latex_table(TABLE_INPUT_FILE)
    latex_content += table_latex

    latex_content += create_latex_postamble()

    # Save the LaTeX output to a file
    try:
        with open(LATEX_OUTPUT_FILE, "w", encoding="utf-8") as outfile:
            outfile.write(latex_content)
        logging.info(f"Saved LaTeX output to {LATEX_OUTPUT_FILE}")
    except Exception as e:
        logging.error(f"Error saving LaTeX file: {e}")

    logging.info("Book generation complete!")
    logging.info(f"Total cost of the run: ${TOTAL_COST:.2f}")
    logging.info(f"Now you can compile '{LATEX_OUTPUT_FILE}' with LaTeX (e.g., pdflatex).")

main()

2025-02-13 07:37:43,021 - INFO - Book outline loaded from outline_v3.json
2025-02-13 07:37:43,031 - INFO - DataFrame loaded from pickle file: validation_kb.pkl
2025-02-13 07:37:43,032 - INFO - Embeddings converted to numpy arrays (if necessary).
2025-02-13 07:37:44,301 - INFO - DataFrame loaded from pickle file: regulations_with_embeddings.pkl
2025-02-13 07:37:44,305 - INFO - Embeddings converted to numpy arrays (if necessary).
2025-02-13 07:37:44,313 - INFO - Generating book content section by section...
2025-02-13 07:37:44,314 - INFO - Processing section 1: Introduction to Validation
2025-02-13 07:37:44,314 - INFO - Finding similar background for section intro...
2025-02-13 07:37:45,193 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-02-13 07:37:46,264 - INFO - Generating text with OpenAI for section intro...


Similarity: 0.8606 | Background: Highlight why validation is essential for financial institutions and lay the groundwork for subsequent validation chapters. | Text: Section 1 recalls as an introduction the specificities of the validation in the context of the prudential framework, and in particular in terms of corporate governance and structural independence from the CRCU;
Similarity: 0.8595 | Background: Highlight why validation is essential for financial institutions and lay the groundwork for subsequent validation chapters. | Text: A sound validation function is crucial to ensuring the reliability of internal models and their ability to accurately compute capital requirements. It is the responsibility of the credit institution to ensure that its internal models are fully compliant with all regulatory requirements.
Similarity: 0.8584 | Background: Highlight why validation is essential for financial institutions and lay the groundwork for subsequent validation chapters. | Text: Credit

2025-02-13 07:38:15,092 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-13 07:38:15,103 - INFO - LLM call for content generation completed.
2025-02-13 07:38:15,103 - INFO - Generation token usage: 3585 (Prompt: 703, Completion: 2882, Cost: $0.1835)
2025-02-13 07:38:15,104 - INFO - Processing section 1.1: The role of Credit Risk Models
2025-02-13 07:38:15,105 - INFO - Finding similar background for section intro...
2025-02-13 07:38:15,686 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-02-13 07:38:16,756 - INFO - Generating text with OpenAI for section intro...


Similarity: 0.8914 | Background: Introduce the main types of credit risk models (PD, LGD, EAD, ELBE) and explain their role in credit risk management. Provide a concise overview of the various credit risk model types, their typical use cases, and key differences in data and methodology. | Text: With regard to ELBE, in 29% of cases there was a dedicated model in place, while in 44% of cases ELBE was set as equal to the specific credit risk adjustments for the exposure. Of the cases where a dedicated model was in place, 62% based their expected loss estimation on the LGD performing model. Of the cases with a standalone model, the majority used empirical evidence based on internal data in the ELBE estimation.
Similarity: 0.8895 | Background: Introduce the main types of credit risk models (PD, LGD, EAD, ELBE) and explain their role in credit risk management. Provide a concise overview of the various credit risk model types, their typical use cases, and key differences in data and methodolo

2025-02-13 07:40:13,699 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-13 07:40:13,701 - INFO - LLM call for content generation completed.
2025-02-13 07:40:13,702 - INFO - Generation token usage: 4331 (Prompt: 1009, Completion: 3322, Cost: $0.2145)
2025-02-13 07:40:13,702 - INFO - Processing section 1.2: Assessment of validation tests
2025-02-13 07:40:13,703 - INFO - Finding similar background for section intro...
2025-02-13 07:40:14,456 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-02-13 07:40:15,535 - INFO - Generating text with OpenAI for section intro...


Similarity: 0.8182 | Background: Create a table with main validation tests and related thresholds. Explain ho this table should be used. | Text: Quantitative thresholds (see paragraph (<>)77(c)) should be set up for at least the following tests:
Similarity: 0.8152 | Background: Create a table with main validation tests and related thresholds. Explain ho this table should be used. | Text: In particular for tests where no thresholds are applied, a consistent qualitative assessment of the results should be performed and documented. In the event of a negative qualitative assessment, adequate measures or actions should be triggered.
Similarity: 0.8143 | Background: Create a table with main validation tests and related thresholds. Explain ho this table should be used. | Text: The content of the validation process should include quantitative analyses, which in turn should include thresholds. If such thresholds are breached, further investigation should be initiated and, if necessary, adequate

2025-02-13 07:41:41,713 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-13 07:41:41,715 - INFO - LLM call for content generation completed.
2025-02-13 07:41:41,716 - INFO - Generation token usage: 4894 (Prompt: 1129, Completion: 3765, Cost: $0.2428)
2025-02-13 07:41:41,717 - INFO - Saved LaTeX output to validation_book_20250213_073742.tex
2025-02-13 07:41:41,718 - INFO - Book generation complete!
2025-02-13 07:41:41,718 - INFO - Total cost of the run: $0.64
2025-02-13 07:41:41,719 - INFO - Now you can compile 'validation_book_20250213_073742.tex' with LaTeX (e.g., pdflatex).


### E. Convert to pdf

In [6]:
# pdflatex -output-directory=C:/projects/generate_documents/latex_materials generated_book_v19.tex

In [7]:
def compile_latex_to_pdf(tex_filename, temp_dir):
    """
    Compiles a .tex file to PDF using pdflatex in an existing temporary directory.
    
    This function copies all tex-related files from the current directory into
    temp_dir, performs the LaTeX compilation there, and then copies the generated
    PDF back to the original directory.
    
    Parameters:
        tex_filename (str): The name of the .tex file to compile.
        temp_dir (str): Path to an existing temporary directory.
    """
    # Save the current working directory.
    current_dir = os.getcwd()

    # Ensure the temporary directory exists.
    if not os.path.isdir(temp_dir):
        raise ValueError(f"Temporary directory {temp_dir} does not exist.")

    try:
        # Define file extensions relevant to the LaTeX project.
        extensions = ['tex', 'bib', 'sty', 'cls', 'bst', 'png', 'jpg', 'jpeg', 'pdf', 'eps']
        
        # Copy each file with the specified extensions from the current directory to temp_dir.
        for ext in extensions:
            pattern = os.path.join(current_dir, f'*.{ext}')
            for filepath in glob.glob(pattern):
                shutil.copy(filepath, temp_dir)
        
        # Change the working directory to the temporary directory.
        os.chdir(temp_dir)
        
        # Run pdflatex twice to ensure proper reference resolution.
        subprocess.run(["pdflatex", tex_filename], check=True)
        subprocess.run(["pdflatex", tex_filename], check=True)
        
        # Determine the output PDF's name.
        pdf_filename = os.path.splitext(tex_filename)[0] + '.pdf'
        
        # Copy the PDF back to the original directory if it exists.
        if os.path.exists(pdf_filename):
            shutil.copy(pdf_filename, current_dir)
            print("PDF successfully generated and copied to the current directory.")
        else:
            print("PDF not found after compilation.")
    
    except subprocess.CalledProcessError as e:
        print(f"Error during LaTeX compilation: {e}")

compile_latex_to_pdf(LATEX_OUTPUT_FILE, "latex_materials")


PDF successfully generated and copied to the current directory.
