# Section 1: Setup & LangSmith

## 1.1: Install Dependencies
We need to install the libraries required for LangChain, OpenAI, vector DBs, PDF parsing, tokenization, and LangSmith (for tracing & debugging).

In [1]:
# Install core libraries for LangChain, OpenAI, and LangSmith integration
!pip install -U langchain langchain-openai langchain-community chromadb pypdf tiktoken langsmith python-dotenv


Collecting langchain-openai
  Downloading langchain_openai-0.3.32-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.29-py3-none-any.whl.metadata (2.9 kB)
Collecting chromadb
  Downloading chromadb-1.0.20-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)
Collecting pypdf
  Downloading pypdf-6.0.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langsmith
  Downloading langsmith-0.4.25-py3-none-any.whl.metadata (14 kB)
Collecting requests<3,>=2 (from langchain)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading postho

## 1.2: Load Keys from Colab Environment
We need to configure OpenAI (for embeddings + model) and LangSmith (for tracing & observability).
You’ll need your OpenAI API key and optionally a LangSmith API key (if you want to see traces in your LangSmith dashboard).

In [2]:
# Load API keys from Colab environment variables
import os
from google.colab import userdata

# These should already exist in Colab -> Settings -> Variables
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
os.environ["LANGSMITH_API_KEY"] = userdata.get("LANGSMITH_API_KEY")

# Enable LangSmith tracing (optional but useful for debugging)
os.environ["LANGSMITH_TRACING_V2"] = "true"
os.environ["LANGSMITH_ENDPOINT"] = "https://eu.api.smith.langchain.com"
os.environ["LANGSMITH_PROJECT"] = "langchain-exercise"

# Section 2: Connect to External Sources (Upload PDF)

## 2.1: Upload PDF to Colab
This will open a file picker in Colab so you can upload your PDF.

In [3]:
# Upload a PDF directly into Colab's local storage
from google.colab import files

uploaded = files.upload()

# Show uploaded file names
list(uploaded.keys())


Saving 22365_3_Prompt Engineering_v7.pdf to 22365_3_Prompt Engineering_v7.pdf


['22365_3_Prompt Engineering_v7.pdf']

## 2.2: Load & Preview PDF
We’ll use PyPDFLoader from LangChain to extract the pages. Then we’ll just peek at the first chunk of text to make sure it worked.

In [4]:
# Load the uploaded PDF using LangChain's PyPDFLoader
# ---------------------------------------------------
# PyPDFLoader takes a PDF file and extracts text from each page.
# - Each page becomes a LangChain Document object.
# - Document.page_content -> the text of that page
# - Document.metadata -> info like page number, source file, etc.
#
# NOTE: PyPDFLoader only works on text-based PDFs.
# If a page is image-only (like a scanned document), the page_content will be empty.

from langchain_community.document_loaders import PyPDFLoader

# Use the uploaded file (from the previous cell)
pdf_path = list(uploaded.keys())[0]
loader = PyPDFLoader(pdf_path)

# Load the PDF -> returns a list of Document objects (one per page)
pages = loader.load()

print(f"✅ Total pages loaded: {len(pages)}\n")
print("Preview of first page content:\n")
print(pages[0].page_content[:500])  # Show first 500 chars for sanity check


✅ Total pages loaded: 68

Preview of first page content:

Prompt  
Engineering
Author: Lee Boonstra


# Section 3: Processing (Chunking & Embeddings)


## 3.1: Chunk the PDF text
We’ll convert page-level Documents into smaller, overlapping chunks so retrieval works well.
Using RecursiveCharacterTextSplitter keeps sentences/paragraphs as intact as possible while respecting size.

In [5]:
# Chunk the loaded Documents into retrieval-friendly pieces
# ---------------------------------------------------------
# Why chunk? Smaller, overlapping chunks improve recall and reduce irrelevant context.
# We use RecursiveCharacterTextSplitter which tries to split on sensible boundaries
# (paragraphs, sentences, etc.) before falling back to hard character limits.

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1200,     # ~1–2 paragraphs (tune by your use-case)
    chunk_overlap=200,   # overlap preserves context continuity across chunks
    separators=[ "\n\n", "\n", ". ", " ", "" ]  # try larger boundaries first
)

chunks = text_splitter.split_documents(pages)

print(f"✅ Chunks created: {len(chunks)}")
print("Preview of first chunk:\n")
print(chunks[0].page_content[:500])
print("\nMetadata example:", chunks[0].metadata)


✅ Chunks created: 107
Preview of first chunk:

Prompt  
Engineering
Author: Lee Boonstra

Metadata example: {'producer': 'Adobe PDF Library 17.0', 'creator': 'Adobe InDesign 20.2 (Macintosh)', 'creationdate': '2025-03-17T13:40:21-06:00', 'moddate': '2025-03-17T13:40:26-06:00', 'trapped': '/False', 'source': '22365_3_Prompt Engineering_v7.pdf', 'total_pages': 68, 'page': 0, 'page_label': '1'}


## 3.2: Create Embeddings with OpenAI
Embeddings = vector representations of text.
They let us search & retrieve semantically (e.g., “find parts of the PDF about Chain of Thought”).
We’ll use OpenAIEmbeddings from LangChain.

In [6]:
# Create embeddings for each chunk using OpenAI
# ---------------------------------------------------------
# OpenAIEmbeddings will take each text chunk and convert it into
# a high-dimensional vector (list of floats).
# These vectors capture semantic meaning, which makes similarity search possible.

from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

# Generate embeddings for our chunks (happens automatically when we add them to a vector store later)
sample_vector = embedding_model.embed_query("What is prompt engineering?")

print(f"✅ Embedding vector length: {len(sample_vector)}")
print(f"First 10 dimensions: {sample_vector[:10]}")


✅ Embedding vector length: 1536
First 10 dimensions: [0.015054242685437202, -0.009864309802651405, -0.031565792858600616, 0.014215604402124882, -0.0163878146559, 0.036240167915821075, 0.0117753054946661, 0.050648245960474014, -0.015177976340055466, -0.007375891786068678]


# Section 4: Store in a Vector DB (Chroma)

## 4.1: Create & Populate Chroma DB
We’ll store all chunk embeddings inside Chroma.
That way, later we can do semantic search like: “Explain Chain of Thought prompting” → and Chroma finds the most relevant chunks from the PDF.

In [7]:
# Store embeddings in a Chroma vector database
# ---------------------------------------------------------
# We create a Chroma DB and fill it with our chunked documents.
# Each chunk gets embedded using OpenAI and stored with metadata.
# Later, we can query this DB for relevant chunks (retrieval).

from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    documents=chunks,              # our chunked documents
    embedding=embedding_model,     # OpenAI embeddings
    persist_directory=None         # use in-memory DB (no files saved)
)

# Test the vector DB with a semantic search
query = "What is Chain of Thought prompting?"
results = vectorstore.similarity_search(query, k=2)

print("🔍 Query:", query)
print("\nTop 2 retrieved chunks:\n")
for i, doc in enumerate(results, 1):
    print(f"--- Chunk {i} ---")
    print(doc.page_content[:300], "...\n")


🔍 Query: What is Chain of Thought prompting?

Top 2 retrieved chunks:

--- Chunk 1 ---
Prompt Engineering
February 2025
29
Chain of Thought (CoT)
Chain of Thought (CoT) 9 prompting is a technique for improving the reasoning capabilities 
of LLMs by generating intermediate reasoning steps. This helps the LLM generate more 
accurate answers. You can combine it with few-shot prompting to ...

--- Chunk 2 ---
Chain of thought can be useful for various use-cases. Think of code generation, for breaking 
down the request into a few steps, and mapping those to specific lines of code. Or for 
creating synthetic data when you have some kind of seed like “The product is called XYZ, 
write a description guiding  ...



# Section 5: Set up the Model (OpenAI via LangChain)

## 5.1: Initialize the Chat Model

We’ll use ChatOpenAI from langchain_openai.
Set sensible defaults (low temperature for factual answers; adjust later if you want more creativity).

In [8]:
# Initialize the OpenAI chat model for generation
# ------------------------------------------------
# ChatOpenAI is the LangChain wrapper around OpenAI chat models.
# - temperature: lower = more deterministic, higher = more creative
# - model: pick a cost-effective model for RAG-style Q&A

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",   # good balance of quality/cost; change if you prefer
    temperature=0.2,       # we want precise answers over creativity for RAG
    max_tokens=800,        # cap response length; tune per your needs
)

# quick sanity call (no RAG yet) to ensure the model is reachable
resp = llm.invoke("Respond with the single word: ready")
print("Model check:", resp.content)


Model check: ready


# Section 6: Prompt Template & Instructions

## 6.1: Define Prompt with Message Objects (no memory)

We’ll build the system + human messages explicitly, then assemble the template.
This prompt expects two inputs later: {context} (from Chroma) and {question} (your query).

In [9]:
# Define a chat prompt using templated message classes
# ----------------------------------------------------
# Why this change?
# - SystemMessage/HumanMessage are "static" messages -> variables aren't tracked.
# - *PromptTemplate* message classes register input variables so chains can validate them.
#
# Inputs the chain expects:
#   - {context}: filled by the retriever (StuffDocumentsChain default variable name)
#   - {question}: your user query

from langchain.prompts import ChatPromptTemplate
from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate

system_tmpl = (
    "You are an expert tutor on prompt engineering. "
    "Answer using ONLY the provided context from the PDF. "
    "If the answer isn't in the context, say you don't know. "
    "Be concise and, when possible, cite the source page numbers like."
)

human_tmpl = (
    "Context:\n{context}\n\n"
    "Question: {question}\n\n"
    "Answer:"
)

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_tmpl),
    HumanMessagePromptTemplate.from_template(human_tmpl),
])

# Sanity check: should list the registered variables
print("✅ Prompt input variables:", prompt.input_variables)


✅ Prompt input variables: ['context', 'question']


# Section 7: Get the Response (RAG Pipeline)

## 7.1: Build Retrieval Chain (Vector DB → Prompt → LLM)

We’ll connect:

Retriever → pulls relevant chunks from Chroma.

Prompt → injects chunks + user’s question.

LLM → generates the final answer.

In [10]:
# Build a retrieval-based QA chain
# ------------------------------------------------------------
# RetrievalQA glues together:
#   - retriever: pulls top-k chunks from the vector DB
#   - prompt: template for system + human messages
#   - llm: the OpenAI chat model
#
# IMPORTANT about variables:
# - {context}: special variable automatically filled with retrieved chunks
#              from the retriever (StuffDocumentsChain default).
# - {question}: special variable filled with the user’s query.
#
# -> These MUST exist in your prompt (unless you rename them).
#    If your template expects anything else (like {page}), it will error.
#
# chain_type:
# - "stuff"      -> dump all chunks into {context} at once (simple, best for small k).
# - "map_reduce" -> answer per chunk, then summarize (better for large docs).
# - "refine"     -> iteratively refine an answer as chunks are processed (good for progressive detail).
#
# Note: RetrievalQA accepts input as {"query": "..."}.
# Behind the scenes, "query" is mapped to {question} for the prompt.

from langchain.chains import RetrievalQA

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",   # try "map_reduce" or "refine" later
    chain_type_kwargs={"prompt": prompt}
)

# Example query
query = "Explain the concept of Chain of Thought prompting."
result = qa_chain.invoke({"query": query})

print("🔍 Query:", query)
print("\n💡 Answer:\n", result)


🔍 Query: Explain the concept of Chain of Thought prompting.

💡 Answer:
 {'query': 'Explain the concept of Chain of Thought prompting.', 'result': 'Chain of Thought (CoT) prompting is a technique that enhances the reasoning capabilities of large language models (LLMs) by generating intermediate reasoning steps. This approach helps the LLM produce more accurate answers and can be combined with few-shot prompting for better results on complex tasks that require reasoning. CoT is low-effort and effective, working well with off-the-shelf LLMs without the need for fine-tuning. It provides interpretability, allowing users to learn from the LLM’s responses and identify any malfunctions. Additionally, CoT improves robustness across different LLM versions, leading to more consistent performance. It is particularly useful for tasks that can be solved by "talking through" the problem, such as code generation or creating synthetic data (source page 29).'}


# Section 7B: Conversation History (Memory)

## 7B.1 — Adjust the Prompt for Conversational History

Right now your prompt expects {context} and {question}.
With conversation, we need one extra input: {chat_history} (the previous turns).

In [11]:
# Prompt with MessagesPlaceholder for conversation history
# --------------------------------------------------------
# Why this is better:
# - chat_history is injected as a proper message list
# - preserves roles (user vs assistant) for each turn
# - avoids flattening history into one string

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate

system_tmpl = (
    "You are an expert tutor on prompt engineering. "
    "Use ONLY the provided context from the PDF + chat history. "
    "If the answer isn't in the context, say you don't know. "
    "Be concise and, when possible, cite page numbers."
)

human_tmpl = (
    "Chat history:\n{chat_history}\n\n"   # <-- plain string injection
    "Context:\n{context}\n\n"
    "Question: {question}\n\n"
    "Answer:"
)

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_tmpl),
    HumanMessagePromptTemplate.from_template(human_tmpl),
])

print("✅ Prompt variables:", prompt.input_variables)


✅ Prompt variables: ['chat_history', 'context', 'question']


## 7B.2 — Set up Conversation Buffer Memory

In [12]:
# ConversationBufferMemory setup
# ------------------------------------------------------------
# - Stores a running list of HumanMessage/AIMessage objects
# - Matches the prompt's MessagesPlaceholder(variable_name="chat_history")
# - return_messages=True ensures history is injected as structured messages
# - output_key="answer" aligns with the chain’s output field

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history",  # must match MessagesPlaceholder name
    return_messages=True,       # keep role-based messages
    output_key="answer"         # capture assistant answers into memory
)

print("✅ ConversationBufferMemory ready.")


✅ ConversationBufferMemory ready.


  memory = ConversationBufferMemory(


## 7B.3 — Build the Conversational Retrieval Chain

In [13]:
# ConversationalRetrievalChain with memory
# ------------------------------------------------------------
# This chain is like RetrievalQA but with chat history awareness.
# - Retriever fetches top-k chunks from the vector DB
# - Memory fills the {chat_history} slot in your prompt
# - Prompt ensures the model uses both context + conversation
#
# Key params:
# - search_kwargs={"k": 3} : top 3 chunks retrieved each turn
# - combine_docs_chain_kwargs={"prompt": prompt} : injects your custom system/human template

from langchain.chains import ConversationalRetrievalChain

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

conv_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,   # our ConversationBufferMemory
    combine_docs_chain_kwargs={"prompt": prompt}
)

print("✅ ConversationalRetrievalChain ready.")


✅ ConversationalRetrievalChain ready.


## 7B.4 — Ask a question + a follow-up

In [14]:
# Demo: conversational Q&A with memory
# ------------------------------------------------------------
# First we ask a full question.
# Then we ask a follow-up that ONLY makes sense if the model sees chat_history.

# Turn 1
q1 = "List the main prompting techniques covered in the PDF."
a1 = conv_chain.invoke({"question": q1})["answer"]

# Turn 2 (follow-up relies on history)
q2 = "And what about CoT?"
a2 = conv_chain.invoke({"question": q2})["answer"]

print("Q1:", q1, "\nA1:\n", a1, "\n")
print("Q2:", q2, "\nA2:\n", a2, "\n")


Q1: List the main prompting techniques covered in the PDF. 
A1:
 The main prompting techniques covered in the PDF are:

1. General prompting / zero shot
2. One-shot & few-shot prompting
3. System prompting
4. Contextual prompting
5. Role prompting
6. Step-back prompting
7. Chain of Thought (CoT)
8. Self-consistency
9. Tree of Thoughts (ToT)
10. ReAct (reason & act)
11. Automatic Prompt Engineering
12. Code prompting (including prompts for writing, explaining, translating, debugging, and reviewing code) 

These techniques are discussed in various sections of the document, particularly from pages 13 to 48. 

Q2: And what about CoT? 
A2:
 Chain of Thought (CoT) prompting is a technique for improving the reasoning capabilities of large language models (LLMs) by generating intermediate reasoning steps. This approach helps the LLM produce more accurate answers by breaking down the problem into smaller, manageable steps. CoT can be particularly effective when combined with few-shot prompting 