In [None]:
print('Setup complete.')

# LangChain Essentials - Lab

**Hands-on**: "Knowledge Snippet Agent" answers one grounded query with citations.
**Deliverable**: grounded Q&A.

## Instructions

In this lab, you will build a Knowledge Snippet Agent using LangChain components. The agent should:

1. Load and process documents
2. Create embeddings and a vector store
3. Build a retrieval system
4. Answer a question with proper citations

## Success Criteria
- Agent retrieves relevant document chunks
- Answer is grounded in the retrieved content
- Citations are provided for source documents
- Output includes confidence/relevance metrics

In [None]:
# TODO: Install required packages
# Install langchain, langchain-openai, langchain-community, faiss-cpu, tiktoken
# Use pip install command

In [None]:
# TODO: Import necessary modules
# Import document loaders, text splitters, embeddings, vector stores, chains
# Import OpenAI components for LLM and embeddings
# Set up your OpenAI API key (use environment variable or direct assignment)

## Step 1: Prepare Documents

Create or load documents that will serve as your knowledge base.

In [None]:
# TODO: Create sample documents or load from files
# Create at least 5-10 documents about a topic of your choice
# Convert them to LangChain Document objects with proper metadata
# Include source information in metadata for citation purposes
# Print the number of documents loaded and preview the first document

## Step 2: Split Documents

Split your documents into appropriate chunks for processing.

In [None]:
# TODO: Initialize a RecursiveCharacterTextSplitter
# Set appropriate chunk_size (300-500 characters recommended)
# Set chunk_overlap (20-50 characters recommended)
# Split your documents and print statistics about the splitting process
# Show example of original vs split content

## Step 3: Create Embeddings and Vector Store

Convert your document chunks into embeddings and store them in a vector database.

In [None]:
# TODO: Initialize OpenAI embeddings
# Create a FAISS vector store from your split documents
# Print confirmation of vector store creation with number of vectors
# Test similarity search with a sample query

## Step 4: Build the Retriever

Create a retriever that will find the most relevant documents for a query.

In [None]:
# TODO: Create a retriever from your vector store
# Configure it to return top 3-5 most similar documents
# Test the retriever with a sample query
# Display retrieved documents with their sources and similarity scores if available

## Step 5: Create the Knowledge Snippet Agent

Build a RetrievalQA chain that combines retrieval with answer generation.

In [None]:
# TODO: Initialize ChatOpenAI model with temperature=0 for consistency
# Create a RetrievalQA chain using chain_type="stuff"
# Enable return_source_documents=True for citations
# Set verbose=True to see the chain's internal processing

## Step 6: Test Your Knowledge Snippet Agent

Ask your agent a question and analyze the grounded response with citations.

In [None]:
# TODO: Define a specific question related to your documents
# Run the question through your RetrievalQA chain
# Display the answer clearly
# Show all source documents that were used
# Include metadata about each source (document ID, relevance, etc.)

## Step 7: Format the Final Output

Create a properly formatted response with citations.

In [None]:
# TODO: Create a function to format the Q&A output
# Include:
# - The original question
# - The agent's answer
# - Numbered citations with source information
# - Confidence indicators (number of sources, relevance scores)
# Display the formatted output for your deliverable

## Bonus Challenges (Optional)

If you finish early, try these additional tasks:

In [None]:
# TODO BONUS 1: Implement a custom prompt template
# Create a prompt that explicitly requests citations in a specific format
# Test how this changes the output quality

In [None]:
# TODO BONUS 2: Add relevance scoring
# Use similarity_search_with_score to get relevance scores
# Filter out documents below a certain relevance threshold
# Display confidence metrics based on retrieval scores

In [None]:
# TODO BONUS 3: Handle multiple questions
# Create a list of 3-5 questions
# Process them all and compare the quality of answers
# Identify which types of questions work best with your agent

## Deliverable Checklist

Before submitting, ensure your Knowledge Snippet Agent:

- [ ] Successfully processes and chunks documents
- [ ] Creates embeddings and stores them in a vector database
- [ ] Retrieves relevant documents for queries
- [ ] Generates grounded answers based on retrieved content
- [ ] Provides clear citations with source information
- [ ] Includes confidence/relevance indicators
- [ ] Formats output professionally for presentation

**Final Deliverable**: A working Knowledge Snippet Agent that answers one grounded query with proper citations and source attribution.