 # PDF to Text Converter with OpenAI Embeddings



 This notebook extracts text from PDF files (including large ones) and can generate embeddings using OpenAI's API.

 The extracted text is saved to a text file, and optionally embeddings can be generated

 for the content to enable semantic search or analysis.



 ## Requirements

 You'll need to install these packages first:

 ```

 pip install PyPDF2 pymupdf pdf2image pytesseract pillow openai tqdm tiktoken numpy python-dotenv

 ```



 Note: For OCR functionality (handling scanned PDFs), you'll also need to install Tesseract OCR:

 - Windows: https://github.com/UB-Mannheim/tesseract/wiki

 - Mac: `brew install tesseract`

 - Linux: `sudo apt install tesseract-ocr`

 ## Import Libraries

In [1]:
import os
import re
import time
import json
import glob
from pathlib import Path
import PyPDF2
import openai
from tqdm.notebook import tqdm  # Using notebook version for better display in Jupyter
import numpy as np
import tiktoken
from dotenv import load_dotenv
import fitz  # PyMuPDF
import io
from PIL import Image
import pytesseract
from pdf2image import convert_from_path

# Load environment variables from .env file in the current directory
load_dotenv()

# Get API key from environment variable
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")

# Set Tesseract executable path
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

 ## Constants

In [2]:
# Constants
EMBEDDING_MODEL = "text-embedding-3-small"  # OpenAI embedding model to use
MAX_TOKENS = 8191  # Maximum tokens per chunk for embedding
CHUNK_OVERLAP = 200  # Number of tokens to overlap between chunks
DPI = 300  # DPI for OCR conversion (higher is better quality but slower)


 ## Helper Functions for PDF Extraction

In [3]:
def extract_text_from_pdf(pdf_path, use_ocr=False, ocr_threshold=10):
    """
    Extract text from a PDF file
    
    Args:
        pdf_path: Path to the PDF file
        use_ocr: Whether to use OCR for text extraction (for scanned PDFs)
        ocr_threshold: Character count threshold below which to try OCR (per page)
    
    Returns:
        Extracted text as a string
    """
    total_text = []
    
    # Try extracting text with PyMuPDF (faster and better than PyPDF2 for most PDFs)
    try:
        doc = fitz.open(pdf_path)
        total_pages = len(doc)
        
        print(f"Extracting text from {total_pages} pages...")
        for page_num in tqdm(range(total_pages)):
            page = doc.load_page(page_num)
            text = page.get_text()
            
            # Check if page has little text and might need OCR
            if use_ocr and len(text.strip()) < ocr_threshold:
                print(f"Page {page_num+1} might be scanned. Attempting OCR...")
                # Convert page to image
                pix = page.get_pixmap(matrix=fitz.Matrix(2, 2))
                img = Image.open(io.BytesIO(pix.tobytes()))
                # Apply OCR
                text = pytesseract.image_to_string(img)
            
            total_text.append(text)
        
        doc.close()
        
    except Exception as e:
        print(f"Error with PyMuPDF: {e}")
        print("Falling back to PyPDF2...")
        
        # Fallback to PyPDF2
        try:
            with open(pdf_path, 'rb') as file:
                reader = PyPDF2.PdfReader(file)
                total_pages = len(reader.pages)
                
                for page_num in tqdm(range(total_pages)):
                    page = reader.pages[page_num]
                    text = page.extract_text() or ""
                    
                    # If text extraction failed or returned minimal text, try OCR
                    if use_ocr and len(text.strip()) < ocr_threshold:
                        print(f"Page {page_num+1} might be scanned. Attempting OCR...")
                        # Convert PDF page to image using pdf2image
                        images = convert_from_path(pdf_path, dpi=DPI, first_page=page_num+1, last_page=page_num+1)
                        # Apply OCR to the image
                        text = pytesseract.image_to_string(images[0])
                    
                    total_text.append(text)
                    
        except Exception as e:
            print(f"Error with PyPDF2: {e}")
            # If everything fails, try full OCR if enabled
            if use_ocr:
                print("Attempting full document OCR...")
                try:
                    images = convert_from_path(pdf_path, dpi=DPI)
                    for i, img in enumerate(tqdm(images)):
                        text = pytesseract.image_to_string(img)
                        total_text.append(text)
                except Exception as e:
                    print(f"OCR failed: {e}")
    
    # Join all text with double newlines between pages
    return "\n\n".join(total_text)


In [4]:
def clean_text(text):
    """Clean the extracted text"""
    # Replace multiple newlines with a single one
    text = re.sub(r'\n+', '\n', text)
    # Replace multiple spaces with a single one
    text = re.sub(r' +', ' ', text)
    # Fix any broken words that might have been split across lines
    text = re.sub(r'(\w+)-\n(\w+)', r'\1\2', text)
    return text.strip()


 ## Helper Functions for Embeddings

In [5]:
def get_token_count(text, encoding_name="cl100k_base"):
    """Count the number of tokens in a text string"""
    encoding = tiktoken.get_encoding(encoding_name)
    tokens = encoding.encode(text)
    return len(tokens)


In [6]:
def split_into_chunks(text, max_tokens=MAX_TOKENS, overlap=CHUNK_OVERLAP):
    """Split text into chunks respecting token limits with overlap"""
    encoding = tiktoken.get_encoding("cl100k_base")
    tokens = encoding.encode(text)
    
    chunks = []
    i = 0
    while i < len(tokens):
        # Get chunk of tokens (respecting max_tokens)
        chunk_tokens = tokens[i:i + max_tokens]
        # Decode chunk back to text
        chunk = encoding.decode(chunk_tokens)
        chunks.append(chunk)
        # Move forward by max_tokens - overlap
        i += max_tokens - overlap
    
    return chunks


In [7]:
def create_embeddings(chunks, api_key=None):
    """Create embeddings for text chunks using OpenAI API"""
    # Use provided API key or environment variable
    if api_key:
        openai.api_key = api_key
    else:
        openai.api_key = OPENAI_API_KEY
        
    if not openai.api_key:
        print("Error: No OpenAI API key provided. Set OPENAI_API_KEY in your .env file.")
        return []
        
    embeddings = []
    
    print(f"Creating embeddings for {len(chunks)} chunks...")
    for i, chunk in enumerate(tqdm(chunks)):
        try:
            # Add a small delay to respect API rate limits
            if i > 0 and i % 10 == 0:
                time.sleep(1)
                
            response = openai.embeddings.create(
                model=EMBEDDING_MODEL,
                input=chunk
            )
            embedding = response.data[0].embedding
            embeddings.append({
                "chunk": chunk,
                "embedding": embedding,
                "chunk_index": i
            })
        except Exception as e:
            print(f"Error creating embedding for chunk {i}: {e}")
    
    return embeddings


 ## Semantic Search Function

In [8]:
def search_embeddings(query, embeddings, api_key=None, top_n=5):
    """Search embeddings for relevant text chunks based on a query"""
    # Get embedding for the query
    if api_key:
        openai.api_key = api_key
    else:
        # Use environment variable if no API key is provided
        openai.api_key = OPENAI_API_KEY
    
    if not openai.api_key:
        print("Error: No OpenAI API key provided. Set OPENAI_API_KEY in your .env file.")
        return []
    
    try:
        response = openai.embeddings.create(
            model=EMBEDDING_MODEL,
            input=query
        )
        query_embedding = response.data[0].embedding
        
        # Convert query embedding to numpy array
        query_embedding_array = np.array(query_embedding)
        
        # Calculate similarity scores
        similarities = []
        for i, item in enumerate(embeddings):
            embed_array = np.array(item["embedding"])
            # Cosine similarity
            similarity = np.dot(query_embedding_array, embed_array) / (
                np.linalg.norm(query_embedding_array) * np.linalg.norm(embed_array)
            )
            similarities.append((i, similarity))
        
        # Sort by similarity (highest first)
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        # Return top N results
        results = []
        for idx, score in similarities[:top_n]:
            results.append({
                "chunk": embeddings[idx]["chunk"],
                "similarity": float(score),
                "chunk_index": embeddings[idx]["chunk_index"]
            })
        
        return results
    
    except Exception as e:
        print(f"Error during search: {e}")
        return []


 ## Interactive Notebook Version

In [9]:
# Interactive part - you can edit these parameters
pdf_pattern = "*.pdf"  # Glob pattern to match PDF files (e.g., "*.pdf", "documents/*.pdf")
output_dir = "."  # Directory to save output files
generate_embeddings = True  # Set to True if you want to generate embeddings
use_ocr = True  # Set to True for scanned PDFs that need OCR
batch_process = True  # Set to True to process multiple PDFs matching the pattern
single_file = "your_document.pdf"  # Path to a single PDF file if not batch processing

# Use API key from environment variable by default
# You can override it here if needed
openai_api_key = OPENAI_API_KEY

# Check if API key is available when embeddings are requested
if generate_embeddings and not openai_api_key:
    print("Warning: No OpenAI API key found in environment variables.")
    print("Please add OPENAI_API_KEY to your .env file or set it here manually.")


 ## Process PDF Files

In [10]:
# Create output directory if it doesn't exist
output_dir = Path(output_dir)
output_dir.mkdir(exist_ok=True, parents=True)

# Function to process a single PDF file
def process_pdf(pdf_file):
    print(f"\nProcessing: {pdf_file}")
    
    # Get filename without extension
    filename = Path(pdf_file).stem
    
    # Extract text from PDF
    print(f"Extracting text from {pdf_file}...")
    text = extract_text_from_pdf(pdf_file, use_ocr=use_ocr)
    
    # Clean the text
    if text:
        print("Cleaning extracted text...")
        text = clean_text(text)
        
        # Save text content
        text_path = output_dir / f"{filename}.txt"
        with open(text_path, "w", encoding="utf-8") as f:
            f.write(text)
        print(f"Text saved to {text_path}")
        
        # Display a sample of the text
        print("\nSample of extracted text:")
        print(text[:500] + "..." if len(text) > 500 else text)
        print("\nTotal characters:", len(text))
        token_count = get_token_count(text)
        print(f"Total tokens: {token_count}")
        
        # Create embeddings if requested
        if generate_embeddings:
            if not openai_api_key:
                print("Error: OpenAI API key is required for creating embeddings")
                print("Please set OPENAI_API_KEY in your .env file or provide it in the parameters cell.")
            else:
                # Split text into chunks
                chunks = split_into_chunks(text)
                print(f"Split into {len(chunks)} chunks")
                
                # Generate embeddings
                embeddings = create_embeddings(chunks, openai_api_key)
                
                # Save embeddings
                embeddings_path = output_dir / f"{filename}_embeddings.json"
                with open(embeddings_path, "w", encoding="utf-8") as f:
                    json.dump(embeddings, f, ensure_ascii=False, indent=2)
                print(f"Embeddings saved to {embeddings_path}")
                
                # Save a version with just the text chunks for reference
                chunks_path = output_dir / f"{filename}_chunks.json"
                chunks_data = [{"chunk_index": i, "chunk": chunk} for i, chunk in enumerate(chunks)]
                with open(chunks_path, "w", encoding="utf-8") as f:
                    json.dump(chunks_data, f, ensure_ascii=False, indent=2)
                print(f"Text chunks saved to {chunks_path}")
        
        return True
    else:
        print(f"Failed to extract text from {pdf_file}")
        return False


In [11]:
# Process files based on settings
if batch_process:
    # Find all PDF files matching the pattern
    pdf_files = glob.glob(pdf_pattern)
    
    if not pdf_files:
        print(f"No PDF files found matching pattern: {pdf_pattern}")
    else:
        print(f"Found {len(pdf_files)} PDF files matching pattern: {pdf_pattern}")
        
        # Process each PDF file
        results = []
        for pdf_file in pdf_files:
            success = process_pdf(pdf_file)
            results.append((pdf_file, success))
        
        # Summary
        print("\n===== Processing Summary =====")
        print(f"Total files: {len(results)}")
        successful = sum(1 for _, success in results if success)
        print(f"Successfully processed: {successful}")
        print(f"Failed: {len(results) - successful}")
        
        if len(results) - successful > 0:
            print("\nFailed files:")
            for pdf_file, success in results:
                if not success:
                    print(f"- {pdf_file}")
else:
    # Process a single file
    process_pdf(single_file)


Found 7 PDF files matching pattern: *.pdf

Processing: Apoha-Buddhist Nominalism and Human Cognition.pdf
Extracting text from Apoha-Buddhist Nominalism and Human Cognition.pdf...
Extracting text from 341 pages...


  0%|          | 0/341 [00:00<?, ?it/s]

Page 1 might be scanned. Attempting OCR...
Page 2 might be scanned. Attempting OCR...
Page 3 might be scanned. Attempting OCR...
Page 4 might be scanned. Attempting OCR...
Page 5 might be scanned. Attempting OCR...
Page 6 might be scanned. Attempting OCR...
Page 7 might be scanned. Attempting OCR...
Page 8 might be scanned. Attempting OCR...
Page 9 might be scanned. Attempting OCR...
Page 10 might be scanned. Attempting OCR...
Page 11 might be scanned. Attempting OCR...
Page 12 might be scanned. Attempting OCR...
Page 13 might be scanned. Attempting OCR...
Page 14 might be scanned. Attempting OCR...
Page 15 might be scanned. Attempting OCR...
Page 16 might be scanned. Attempting OCR...
Page 17 might be scanned. Attempting OCR...
Page 18 might be scanned. Attempting OCR...
Page 19 might be scanned. Attempting OCR...
Page 20 might be scanned. Attempting OCR...
Page 21 might be scanned. Attempting OCR...
Page 22 might be scanned. Attempting OCR...
Page 23 might be scanned. Attempting OCR.

  0%|          | 0/28 [00:00<?, ?it/s]

Embeddings saved to Apoha-Buddhist Nominalism and Human Cognition_embeddings.json
Text chunks saved to Apoha-Buddhist Nominalism and Human Cognition_chunks.json

Processing: Dan Arnold - Brains, Buddhas, and Believing - The Problem of Intentionality in Classical Buddhist and Cognitive-Scientific Philo.pdf
Extracting text from Dan Arnold - Brains, Buddhas, and Believing - The Problem of Intentionality in Classical Buddhist and Cognitive-Scientific Philo.pdf...
Extracting text from 316 pages...


  0%|          | 0/316 [00:00<?, ?it/s]

Page 1 might be scanned. Attempting OCR...
Page 316 might be scanned. Attempting OCR...
Cleaning extracted text...
Text saved to Dan Arnold - Brains, Buddhas, and Believing - The Problem of Intentionality in Classical Buddhist and Cognitive-Scientific Philo.txt

Sample of extracted text:
THE PROBLEM OF INTENTIONALITY IN CLASSICAL BUDDHIST AND
COGNITIVE-SCIENTIFIC PHILOSOPHY OF MIND
Dan Arnold 
Brains, Buddhas, and Believing 
THE PROBLEM OF INTENTIONALITY 
IN CLASSICAL BUDDHIST AND COGNITIVE­
SCIENTIFIC PHILOSOPHY OF MIND 
columbia University Press 
New¥ork 
COLUMBIA UNIVERSITY PRESS 
Publishers Since 1893 
New York Chichester, West Sussex 
cup.columbia.edu 
Copyright © 2012 Columbia University Press 
All rights reserved 
Library of Congress cataloging-in-Publication Data 
Arnold...

Total characters: 873513
Total tokens: 224528
Split into 29 chunks
Creating embeddings for 29 chunks...


  0%|          | 0/29 [00:00<?, ?it/s]

Embeddings saved to Dan Arnold - Brains, Buddhas, and Believing - The Problem of Intentionality in Classical Buddhist and Cognitive-Scientific Philo_embeddings.json
Text chunks saved to Dan Arnold - Brains, Buddhas, and Believing - The Problem of Intentionality in Classical Buddhist and Cognitive-Scientific Philo_chunks.json

Processing: Dreyfus_RecognizingReality.pdf
Extracting text from Dreyfus_RecognizingReality.pdf...
Extracting text from 643 pages...


  0%|          | 0/643 [00:00<?, ?it/s]

Page 9 might be scanned. Attempting OCR...
Page 13 might be scanned. Attempting OCR...
Page 21 might be scanned. Attempting OCR...
Page 35 might be scanned. Attempting OCR...
Page 63 might be scanned. Attempting OCR...
Page 65 might be scanned. Attempting OCR...
Page 67 might be scanned. Attempting OCR...
Page 147 might be scanned. Attempting OCR...
Page 225 might be scanned. Attempting OCR...
Page 303 might be scanned. Attempting OCR...
Page 305 might be scanned. Attempting OCR...
Page 349 might be scanned. Attempting OCR...
Page 351 might be scanned. Attempting OCR...
Page 583 might be scanned. Attempting OCR...
Page 595 might be scanned. Attempting OCR...
Page 623 might be scanned. Attempting OCR...
Page 631 might be scanned. Attempting OCR...
Cleaning extracted text...
Text saved to Dreyfus_RecognizingReality.txt

Sample of extracted text:
SUNY Series in Buddhist Studies 
Matthew Kapstein, editor 
· RECtJCiNIZING REALITY 
Dharmakirti s Philosophy 
and Its Tibetan Interpretations 
G

  0%|          | 0/63 [00:00<?, ?it/s]

Embeddings saved to Dreyfus_RecognizingReality_embeddings.json
Text chunks saved to Dreyfus_RecognizingReality_chunks.json

Processing: Dunne_Foundations FDP smaller file.pdf
Extracting text from Dunne_Foundations FDP smaller file.pdf...
Extracting text from 245 pages...


  0%|          | 0/245 [00:00<?, ?it/s]

Cleaning extracted text...
Text saved to Dunne_Foundations FDP smaller file.txt

Sample of extracted text:
FOUNDATIONS OF DHARMAKÊRTI’S PHILOSOPHY
Studies in Indian and Tibetan Buddhism
This series was conceived to provide a forum for publishing outstanding new contributions to scholarship on
Indian and Tibetan Buddhism and also to make accessible seminal research not widely known outside a narrow
specialist audience, including translations of appropriate
monographs and collections of articles from other languages. The series strives to shed light on the Indic Buddhist traditions by exposing them to ...

Total characters: 1279735
Total tokens: 347335
Split into 44 chunks
Creating embeddings for 44 chunks...


  0%|          | 0/44 [00:00<?, ?it/s]

Embeddings saved to Dunne_Foundations FDP smaller file_embeddings.json
Text chunks saved to Dunne_Foundations FDP smaller file_chunks.json

Processing: Dunne_J_Key_Features_of_Dharmakirtis_Apoha_Theory.pdf
Extracting text from Dunne_J_Key_Features_of_Dharmakirtis_Apoha_Theory.pdf...
Extracting text from 25 pages...


  0%|          | 0/25 [00:00<?, ?it/s]

Cleaning extracted text...
Text saved to Dunne_J_Key_Features_of_Dharmakirtis_Apoha_Theory.txt

Sample of extracted text:
T
he apoha theory contains a number of occasionally technical and 
even counterintuitive elements, and the main purpose of this chapter is to present its most fundamental features in a straightforward 
fashion. At the outset it is critical to note that, while certainly uniﬁ ed in its 
overall scope, the apoha theory underwent historical development that led 
to divergent interpretations among its formulators, and any single, uniﬁ ed 
account of the theory would be problematic. Hence, this chapte...

Total characters: 71538
Total tokens: 19298
Split into 3 chunks
Creating embeddings for 3 chunks...


  0%|          | 0/3 [00:00<?, ?it/s]

Embeddings saved to Dunne_J_Key_Features_of_Dharmakirtis_Apoha_Theory_embeddings.json
Text chunks saved to Dunne_J_Key_Features_of_Dharmakirtis_Apoha_Theory_chunks.json

Processing: Jake H. Davis, Owen Flanagan - A mirror is for reflection _ understanding Buddhist ethics-Oxford University Press (2017).pdf
Extracting text from Jake H. Davis, Owen Flanagan - A mirror is for reflection _ understanding Buddhist ethics-Oxford University Press (2017).pdf...
Extracting text from 393 pages...


  0%|          | 0/393 [00:00<?, ?it/s]

Page 1 might be scanned. Attempting OCR...
Page 3 might be scanned. Attempting OCR...
Page 7 might be scanned. Attempting OCR...
Page 11 might be scanned. Attempting OCR...
Page 19 might be scanned. Attempting OCR...
Page 21 might be scanned. Attempting OCR...
Page 27 might be scanned. Attempting OCR...
Page 29 might be scanned. Attempting OCR...
Page 43 might be scanned. Attempting OCR...
Page 45 might be scanned. Attempting OCR...
Page 99 might be scanned. Attempting OCR...
Page 101 might be scanned. Attempting OCR...
Page 159 might be scanned. Attempting OCR...
Page 213 might be scanned. Attempting OCR...
Page 265 might be scanned. Attempting OCR...
Page 267 might be scanned. Attempting OCR...
Page 325 might be scanned. Attempting OCR...
Page 381 might be scanned. Attempting OCR...
Page 388 might be scanned. Attempting OCR...
Page 389 might be scanned. Attempting OCR...
Page 390 might be scanned. Attempting OCR...
Page 391 might be scanned. Attempting OCR...
Page 392 might be scanne

  0%|          | 0/29 [00:00<?, ?it/s]

Embeddings saved to Jake H. Davis, Owen Flanagan - A mirror is for reflection _ understanding Buddhist ethics-Oxford University Press (2017)_embeddings.json
Text chunks saved to Jake H. Davis, Owen Flanagan - A mirror is for reflection _ understanding Buddhist ethics-Oxford University Press (2017)_chunks.json

Processing: Peter-Woods-MA-Thesis-2022.pdf
Extracting text from Peter-Woods-MA-Thesis-2022.pdf...
Extracting text from 121 pages...


  0%|          | 0/121 [00:00<?, ?it/s]

Cleaning extracted text...
Text saved to Peter-Woods-MA-Thesis-2022.txt

Sample of extracted text:
Echoes of Awakening: 
Reimagining Liberation in the New Treasures of Chokgyur Lingpa 
 
 
 
 
 
Peter F. Woods 
Kathmandu University, Centre for Buddhist Studies at Rangjung Yeshe Institute 
Master of Arts (MA) in Buddhist Studies 
September 30, 2021 
 
 
 
 
 
 
2 
Table of Contents 
 
 
 
 
1. Introducing the Lotus Essence Tantra 
1.1 Introduction 
1.2 Outline 
1.3 Literature Review 
1.4 Methodology 
2. A Brief History of Liberating Scriptures (and Other Materials) in Indian and Tibetan 
Buddh...

Total characters: 291176
Total tokens: 78329
Split into 10 chunks
Creating embeddings for 10 chunks...


  0%|          | 0/10 [00:00<?, ?it/s]

Embeddings saved to Peter-Woods-MA-Thesis-2022_embeddings.json
Text chunks saved to Peter-Woods-MA-Thesis-2022_chunks.json

===== Processing Summary =====
Total files: 7
Successfully processed: 7
Failed: 0


 ## Example: Using the Semantic Search Feature



 After generating embeddings, you can use this code to search within your PDF content.

 Uncomment and modify this code when you're ready to search your embeddings.

In [12]:
# Example code for semantic search - uncomment and modify when needed
# Load embeddings from file
# embedding_file = "your_document_embeddings.json"  # Replace with your actual embeddings file
# with open(embedding_file, "r", encoding="utf-8") as f:
#     embeddings = json.load(f)

# Search for relevant content - uses API key from .env by default
# query = "Enter your search query here"  # Replace with your actual search query
# results = search_embeddings(query, embeddings, top_n=3)

# Display results
# print(f"Search results for: {query}\n")
# for i, result in enumerate(results):
#     print(f"Result {i+1} (Similarity: {result['similarity']:.4f}):")
#     print("-" * 40)
#     print(result["chunk"][:300] + "...")  # Show first 300 chars
#     print()


 ## Examples: Batch Processing Patterns



 Here are examples of different glob patterns you can use for batch processing.

 Uncomment and modify these examples when needed.

In [13]:
# Example glob patterns for different scenarios - uncomment and modify when needed

# Process all PDFs in the current directory
pdf_pattern = "*.pdf"

# Process PDFs with specific naming pattern
# pdf_pattern = "report_*.pdf"

# Process PDFs in a specific directory
# pdf_pattern = "documents/*.pdf"

# Process PDFs in a specific directory and subdirectories (recursive)
# pdf_pattern = "documents/**/*.pdf"  # Note: requires Python 3.5+

# Process PDFs from multiple directories
# pdf_files = []
# pdf_files.extend(glob.glob("reports/*.pdf"))
# pdf_files.extend(glob.glob("archives/*.pdf"))


 ## Creating a .env File



 Create a file named `.env` in the same directory as this notebook with the following content:



 ```

 OPENAI_API_KEY=your_api_key_here

 ```



 This file will be automatically loaded when you run the notebook, and the API key will be available

 without having to hardcode it.