# üìö Chapter 7: Using Langchain and Llamaindex in Hugging Face

---

## üéØ Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand Large Language Models (LLMs)** - Learn what LLMs are and their key characteristics
2. **Build LangChain Applications** - Create LLM applications using prompt templates and chains
3. **Implement Conversational AI** - Maintain context and history in LLM conversations
4. **Connect LLMs to Private Data** - Use LlamaIndex for Retrieval-Augmented Generation (RAG)
5. **Build Interactive Interfaces** - Create web frontends using Gradio

---

## üìñ Table of Contents

1. [Introduction to Large Language Models (LLMs)](#1-introduction-to-large-language-models)
2. [Getting Started with LangChain](#2-getting-started-with-langchain)
3. [Building Conversational Chains](#3-building-conversational-chains)
4. [Using Different LLM Providers](#4-using-different-llm-providers)
5. [Introduction to LlamaIndex and RAG](#5-introduction-to-llamaindex-and-rag)
6. [Building a Document Q&A System](#6-building-a-document-qa-system)
7. [Creating Interactive Interfaces with Gradio](#7-creating-interactive-interfaces-with-gradio)
8. [Summary and Best Practices](#8-summary-and-best-practices)

---

## üîß Environment Setup

First, let's install all the required packages for this tutorial.

In [1]:
# # Install required packages
# !pip install -q langchain langchain-huggingface langchain-core langchain-community
# !pip install -q llama-index llama-index-core llama-index-embeddings-huggingface llama-index-llms-huggingface-api llama-index-llms-huggingface llama-index-readers-file
# !pip install -q transformers torch gradio huggingface_hub accelerate bitsandbytes

In [25]:
# Set up API tokens (Replace with your actual tokens)
import os

# You can get your Hugging Face token at: https://huggingface.co/settings/tokens
os.environ['HUGGINGFACEHUB_API_TOKEN'] = 'hf_---------------------'

# Optional: For OpenAI integration (get key at: https://platform.openai.com/api-keys)
# os.environ['OPENAI_API_KEY'] = 'your_openai_api_key_here'

---

## 1. Introduction to Large Language Models (LLMs) <a id="1-introduction-to-large-language-models"></a>

### üß† What is an LLM?

A **Large Language Model (LLM)** is a type of AI model designed to understand and generate human-like text based on patterns learned from massive amounts of training data.

### Key Characteristics:

| Feature | Description |
|---------|-------------|
| **Size & Scale** | Trained on billions of tokens from books, articles, websites, videos |
| **Pretraining** | Learns statistical relationships between words and sentences |
| **Fine-tuning** | Specialized for specific tasks like classification, translation, etc. |

### üìä Scale Comparison of Popular LLMs:

| Model | Parameters | Training Tokens |
|-------|------------|----------------|
| GPT-3 (Davinci) | 175 billion | 499 billion |
| Llama 3.1 | 8B - 405B | 15 trillion |
| Mistral 7B | 7 billion | ~1 trillion |

### üî§ Understanding Tokenization

LLMs don't process text character by character. Instead, they use **tokens** - chunks of text that are processed as single units.

**Example:** The word "artificial" might be split into:
- `art` + `ificial` (2 tokens)

This is called **subword tokenization** (e.g., Byte-Pair Encoding or BPE).

In [3]:
# Example: Visualizing how tokenization works
from transformers import AutoTokenizer

# Load a tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Example sentences
texts = [
    "Artificial Intelligence is transforming the world.",
    "The quick brown fox jumps over the lazy dog.",
    "Supercalifragilisticexpialadocious"  # A very long word!
]

print("üîç Tokenization Examples:\n")
for text in texts:
    tokens = tokenizer.tokenize(text)
    print(f"Text: '{text}'")
    print(f"Tokens ({len(tokens)}): {tokens}")
    print("-" * 60)

üîç Tokenization Examples:

Text: 'Artificial Intelligence is transforming the world.'
Tokens (8): ['Art', 'ificial', 'ƒ†Intelligence', 'ƒ†is', 'ƒ†transforming', 'ƒ†the', 'ƒ†world', '.']
------------------------------------------------------------
Text: 'The quick brown fox jumps over the lazy dog.'
Tokens (10): ['The', 'ƒ†quick', 'ƒ†brown', 'ƒ†fox', 'ƒ†jumps', 'ƒ†over', 'ƒ†the', 'ƒ†lazy', 'ƒ†dog', '.']
------------------------------------------------------------
Text: 'Supercalifragilisticexpialadocious'
Tokens (11): ['Super', 'cal', 'if', 'rag', 'il', 'ist', 'ice', 'xp', 'ial', 'ad', 'ocious']
------------------------------------------------------------


---

## 2. Getting Started with LangChain <a id="2-getting-started-with-langchain"></a>

### üîó What is LangChain?

**LangChain** is a framework designed to simplify the creation of applications using LLMs. Think of it as a way to "chain" together different components:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Prompt Template ‚îÇ -> ‚îÇ     LLM     ‚îÇ -> ‚îÇ  Output Parser   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Key Components:

- **Prompt Templates** - Structure instructions for LLMs
- **LLMs** - The language models (GPT, Llama, Zephyr, etc.)
- **Chains** - Connect components together
- **Memory** - Maintain conversation context
- **Agents** - Make decisions based on user input

### üìù Creating Prompt Templates

A **Prompt Template** structures the instruction given to the model. It's like a fill-in-the-blank form that gets completed with user input.

In [4]:
from langchain_core.prompts import PromptTemplate

# Simple Q&A template
qa_template = '''
You are a helpful expert assistant.

Question: {question}

Please provide a clear and concise answer:
'''

# Create the prompt template
qa_prompt = PromptTemplate(
    template=qa_template,
    input_variables=['question']  # These variables will be filled in
)

# Preview the prompt template
print("üìã Prompt Template Created:")
print(qa_prompt)

üìã Prompt Template Created:
input_variables=['question'] input_types={} partial_variables={} template='\nYou are a helpful expert assistant.\n\nQuestion: {question}\n\nPlease provide a clear and concise answer:\n'


In [5]:
# Let's see how the template gets filled
sample_question = "What causes the Northern Lights?"
filled_prompt = qa_prompt.format(question=sample_question)

print("‚ú® Filled Prompt:")
print(filled_prompt)

‚ú® Filled Prompt:

You are a helpful expert assistant.

Question: What causes the Northern Lights?

Please provide a clear and concise answer:



### ü§ñ Connecting to Hugging Face LLMs

Now let's connect our prompt template to an actual LLM hosted on Hugging Face.

In [6]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
import os

# Initialize with stop sequences and max tokens limit
llm_endpoint = HuggingFaceEndpoint(
    repo_id="HuggingFaceH4/zephyr-7b-alpha",
    task="conversational",
    max_new_tokens=256,  # Limits response length
    stop_sequences=["User:", "Human:", "\n\nUser", "\n\nHuman", "[/INST]"],
    temperature=0.7,
    huggingfacehub_api_token=os.environ.get('HUGGINGFACEHUB_API_TOKEN')
)

# Wrap in ChatHuggingFace for proper chat formatting
llm = ChatHuggingFace(llm=llm_endpoint)

print("‚úÖ LLM initialized with stop sequences!")

‚úÖ LLM initialized with stop sequences!


### ‚õìÔ∏è Creating Your First LLM Chain

A **chain** connects the prompt template, LLM, and output parser together using the pipe (`|`) operator.

In [7]:
from langchain_core.output_parsers import StrOutputParser

# Create your chain 
llm_chain = qa_prompt | llm | StrOutputParser()

# Test it 
response = llm_chain.invoke({"question": "Why is the sky blue?"})
print(response)



The sky looks blue because the Earth's atmosphere scatters short wavelength light (blue light) more than it does long wavelength light (red light), causing blue light to be reflected back to our eyes while red light passes through and appears as less visible. This phenomenon is known as scattering, and it's due to the presence of tiny particles in the atmosphere, such as dust, water vapor, and air molecules. When sunlight enters the Earth's atmosphere, it interacts with these particles and scatters in all directions, causing the blue light to be more dispersed than other colors, resulting in the appearance of a blue sky. This is what creates the blue color we see when we look up at the sky. The amount of scattering depends on the angle of the sun and the amount of particles in the air, which can vary throughout the day and in different weather conditions, creating different shades of blue or even orange or pink skies during sunrise and sunset. However, during pollution or smoggy cond

---

## 3. Building Conversational Chains <a id="3-building-conversational-chains"></a>

### üó£Ô∏è The Problem with Stateless LLMs

By default, LLMs don't remember previous interactions. Each query is treated independently.

**Example Problem:**
- User: "Who invented the telephone?"
- AI: "Alexander Graham Bell"
- User: "When was he born?" 
- AI: ‚ùå Doesn't know who "he" refers to!

### üíæ Solution 1: Manual History Management

We can modify our prompt template to include conversation history.

In [8]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage

# Use ChatPromptTemplate instead of string-based PromptTemplate
conversation_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer questions concisely and accurately."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{question}")
])

# Create chain
conversation_chain = conversation_prompt | llm | StrOutputParser()

print("‚úÖ Conversation chain created!")

‚úÖ Conversation chain created!


In [9]:
# Store history as message objects (not raw strings)
message_history = []

conversation_flow = [
    "Who was the first person to walk on the moon?",
    "Which mission was that part of?",
    "How long did the entire journey take?"
]

print("üöÄ Space Exploration Conversation:\n")

for question in conversation_flow:
    print(f"üë§ You: {question}")
    
    # Invoke with proper history format
    response = conversation_chain.invoke({
        "question": question,
        "history": message_history
    })
    
    # Clean up extra whitespace
    response = response.strip()
    
    print(f"ü§ñ Assistant: {response}")
    
    # Add to history as proper message objects
    message_history.append(HumanMessage(content=question))
    message_history.append(AIMessage(content=response))
    
    print("-" * 60)

üöÄ Space Exploration Conversation:

üë§ You: Who was the first person to walk on the moon?
ü§ñ Assistant: Neil Armstrong was the first person to walk on the moon. He was a part of the Apollo 11 mission that landed on the moon on July 20, 1969. He took his famous first steps on the lunar surface, declaring, "That's one small step for man, one giant leap for mankind."
------------------------------------------------------------
üë§ You: Which mission was that part of?
ü§ñ Assistant: The Apollo 11 mission.
------------------------------------------------------------
üë§ You: How long did the entire journey take?
ü§ñ Assistant: The entire Apollo 11 mission took 11 days, 3 hours, and 38 minutes, from launch to splashdown. The moon landing itself took approximately 2.5 hours.
------------------------------------------------------------


### üíæ Solution 2: Using RunnableWithMessageHistory

LangChain provides a built-in class for managing conversation history more elegantly.

In [10]:
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

# Store for session histories
store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# Create a chain with message history
with_history_chain = RunnableWithMessageHistory(
    qa_prompt | llm | StrOutputParser(),
    get_session_history,
    input_messages_key="question",
)

print("‚úÖ History-aware chain created!")

‚úÖ History-aware chain created!


In [12]:
# To use the chain, you need to provide a session_id in the config
# The session_id groups messages together - same ID = same conversation

# Example 1: Single question
response = with_history_chain.invoke(
    {"question": "Who invented the light bulb?"},
    config={"configurable": {"session_id": "user123"}}
)
print(f"Answer: {response}")

Answer: 

Thomas Edison invented the practical light bulb in 1879. While others had attempted electric lighting before him, his version was the first to be commercially successful and widely adopted.


In [13]:
# Example 2: Multi-turn conversation (same session_id = remembers context)

session = "space_chat"  # Use a consistent session ID for related questions

questions = [
    "Who was the first person in space?",
    "What country was he from?",    # "he" refers to previous answer
    "When did this happen?"          # "this" refers to the space journey
]

print("üöÄ Conversation with Memory:\n")
for q in questions:
    print(f"üë§ You: {q}")
    
    response = with_history_chain.invoke(
        {"question": q},
        config={"configurable": {"session_id": session}}
    )
    
    print(f"ü§ñ Assistant: {response.strip()}")
    print("-" * 50)

üöÄ Conversation with Memory:

üë§ You: Who was the first person in space?
ü§ñ Assistant: The first person in space was Yuri Gagarin, a Soviet cosmonaut, who completed the first human spaceflight on April 12, 1961. He was launched into orbit aboard the Vostok 1 spacecraft as part of the Vostok 1 mission.
--------------------------------------------------
üë§ You: What country was he from?
ü§ñ Assistant: Yuri Gagarin was a Soviet cosmonaut who completed the first human spaceflight on April 12, 1961. He was from the Soviet Union (now Russia).
--------------------------------------------------
üë§ You: When did this happen?
ü§ñ Assistant: The first human spaceflight occurred on April 12, 1961, and the first person in space was Yuri Gagarin, a Soviet cosmonaut from the Soviet Union (now Russia).
--------------------------------------------------


In [14]:
# Example 3: Check the stored history
print("\nüìú Conversation History:")
print(store["space_chat"].messages)


üìú Conversation History:
[HumanMessage(content='Who was the first person in space?', additional_kwargs={}, response_metadata={}), AIMessage(content='The first person in space was Yuri Gagarin, a Soviet cosmonaut, who completed the first human spaceflight on April 12, 1961. He was launched into orbit aboard the Vostok 1 spacecraft as part of the Vostok 1 mission.', additional_kwargs={}, response_metadata={}, tool_calls=[], invalid_tool_calls=[]), HumanMessage(content='What country was he from?', additional_kwargs={}, response_metadata={}), AIMessage(content='Yuri Gagarin was a Soviet cosmonaut who completed the first human spaceflight on April 12, 1961. He was from the Soviet Union (now Russia).', additional_kwargs={}, response_metadata={}, tool_calls=[], invalid_tool_calls=[]), HumanMessage(content='When did this happen?', additional_kwargs={}, response_metadata={}), AIMessage(content=' \n\nThe first human spaceflight occurred on April 12, 1961, and the first person in space was Yur

In [15]:
# Example 4: Start a NEW conversation (different session_id)
response = with_history_chain.invoke(
    {"question": "What is photosynthesis?"},
    config={"configurable": {"session_id": "biology_chat"}}  # New session!
)
print(f"New topic answer: {response}")

New topic answer: 

Photosynthesis is a biological process that occurs in green plants, algae, and some bacteria, through which they convert light energy into chemical energy in the form of glucose and oxygen. During photosynthesis, chlorophyll in plant cells captures light energy and converts carbon dioxide and water into oxygen and glucose in the presence of water and chlorophyll. This process is essential for the growth and survival of most living organisms, as it provides energy for plants to sustain life and releases oxygen into the atmosphere, which is essential for all living beings to breathe. This process takes place in the chloroplasts of plant cells and is essential for the ecosystem as it produces food for herbivores and other organisms, and serves as the base of the food chain. It is a vital process for the maintenance of the earth's atmosphere, as it contributes to the oxygen-nitrogen cycle and helps to maintain the balance of carbon dioxide levels.

Photosynthesis is imp

---

## 4. Using Different LLM Providers <a id="4-using-different-llm-providers"></a>

LangChain supports multiple LLM providers. Let's explore some options.

In [16]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

translation_llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3.1-8B-Instruct",
    task="conversational",
    max_new_tokens=128,
    stop_sequences=["\n\n", "Text:", "English:"],
    temperature=0.3,  # Lower temperature for more accurate translations
    huggingfacehub_api_token=os.environ.get('HUGGINGFACEHUB_API_TOKEN')
)

# Wrap for proper chat handling
translation_chat = ChatHuggingFace(llm=translation_llm)

# Create a translation chain
translation_template = '''Translate the following text to {target_language}. 
Only provide the translation, nothing else.

Text: {text}

Translation:'''

translation_prompt = PromptTemplate(
    template=translation_template,
    input_variables=['text', 'target_language']
)

translation_chain = translation_prompt | translation_chat | StrOutputParser()

# Test translations
translations = [
    {"text": "Hello, how are you today?", "target_language": "Spanish"},
    {"text": "The weather is beautiful.", "target_language": "French"},
    {"text": "Thank you very much!", "target_language": "Japanese"}
]

print("üåç Translation Examples:\n")
for t in translations:
    result = translation_chain.invoke(t)
    # Clean up the result
    result = result.strip().split('\n')[0]  # Take only first line
    print(f"English: {t['text']}")
    print(f"{t['target_language']}: {result}")
    print("-" * 50)

üåç Translation Examples:

English: Hello, how are you today?
Spanish: Hola, ¬øc√≥mo est√°s hoy?
--------------------------------------------------
English: The weather is beautiful.
French: Le temps est beau.
--------------------------------------------------
English: Thank you very much!
Japanese: „ÅÇ„Çä„Åå„Å®„ÅÜ„Åî„Åñ„ÅÑ„Åæ„Åô„ÄÇ
--------------------------------------------------


### üîß Creative Use Case: Code Explanation Chain

In [17]:
# Create a code explainer chain
code_explainer_template = '''
You are a programming tutor. Explain the following code snippet in simple terms.
Target audience: {audience}

Code:
```{language}
{code}
```

Explanation:
'''

code_prompt = PromptTemplate(
    template=code_explainer_template,
    input_variables=['code', 'language', 'audience']
)

code_chain = code_prompt | llm | StrOutputParser()

# Example code snippets to explain
python_code = '''
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
'''

print("üíª Code Explanation:\n")
explanation = code_chain.invoke({
    'code': python_code,
    'language': 'python',
    'audience': 'beginner programmers'
})
print(explanation)

üíª Code Explanation:


This code snippet is defining a function called `fibonacci` that takes an argument `n`. This function uses recursive calls to calculate the nth number of the Fibonacci sequence. The Fibonacci sequence is a mathematical sequence where each number is the sum of the previous two numbers. For example, the first two numbers are 0 and 1. The function first checks if `n` is less than or equal to 1. If it is, it returns `n` since the base case is when the number is 0 or 1. If `n` is greater than 1, the function calls itself recursively to get the (n-1)th and (n-2)th numbers and adds them together to get the `n`th number. So, for instance, if we call `fibonacci(5)`, it will call itself recursively for `n-1` (which is `4`) and get the 4th and 3rd numbers, add them up, and return the 5th number. This continues until we reach the base case (0 or 1) and return the final answer. This function will keep calling itself to calculate all the numbers


---

## 5. Introduction to LlamaIndex and RAG <a id="5-introduction-to-llamaindex-and-rag"></a>

### üìö What is LlamaIndex?

**LlamaIndex** is a data framework that enables LLM-based applications to ingest, structure, and access private or domain-specific data.

### üîÑ What is RAG (Retrieval-Augmented Generation)?

**RAG** enhances LLM performance by integrating them with an external retrieval system:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   Question   ‚îÇ ->  ‚îÇ Retrieve Docs ‚îÇ ->  ‚îÇ LLM Answer  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                            ‚Üì
                     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                     ‚îÇ  Vector Store ‚îÇ
                     ‚îÇ (Your Data!)  ‚îÇ
                     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Key Concepts:

| Concept | Description |
|---------|-------------|
| **Document Loading** | Importing your private documents (PDFs, text, etc.) |
| **Vector Embedding** | Converting text to numerical representations |
| **Indexing** | Organizing embeddings for efficient retrieval |
| **Querying** | Finding relevant context to answer questions |

### üìÑ Creating Sample Documents

Let's create some sample documents to demonstrate LlamaIndex capabilities.

In [18]:
import os

# Create a directory for sample documents
sample_docs_dir = "sample_documents"
os.makedirs(sample_docs_dir, exist_ok=True)

# Create sample company handbook documents
documents_content = {
    "company_policies.txt": """
TechCorp Employee Handbook - Company Policies

VACATION POLICY:
- All full-time employees receive 20 days of paid vacation per year.
- Vacation days are accrued at a rate of 1.67 days per month.
- Unused vacation days can be carried over (maximum of 5 days).
- Vacation requests must be submitted at least 2 weeks in advance.

REMOTE WORK POLICY:
- Employees can work remotely up to 3 days per week.
- Remote work days should be coordinated with your team.
- A stable internet connection is required for remote work.
- Core hours are 10 AM to 3 PM in your local timezone.

SICK LEAVE:
- 10 paid sick days per year.
- For absences longer than 3 days, a doctor's note is required.
    """,
    
    "benefits_guide.txt": """
TechCorp Employee Handbook - Benefits Guide

HEALTH INSURANCE:
- Comprehensive medical, dental, and vision coverage.
- Company covers 90% of premium costs for employees.
- Family plans available with 70% company contribution.
- HSA option with $1,500 annual company contribution.

RETIREMENT BENEFITS:
- 401(k) plan with 6% company match.
- Immediate vesting for company contributions.
- Access to financial planning services.

PROFESSIONAL DEVELOPMENT:
- $2,500 annual budget for conferences and courses.
- Access to online learning platforms (Coursera, LinkedIn Learning).
- Mentorship program available.
- Tuition reimbursement up to $5,000/year for approved programs.
    """,
    
    "it_guidelines.txt": """
TechCorp Employee Handbook - IT Security Guidelines

PASSWORD REQUIREMENTS:
- Minimum 12 characters with uppercase, lowercase, numbers, and symbols.
- Passwords must be changed every 90 days.
- Multi-factor authentication (MFA) is mandatory.
- Never share passwords with anyone, including IT staff.

DATA SECURITY:
- All company data must be stored on approved cloud services.
- Personal USB drives are prohibited for company data.
- Encryption is required for all laptops and mobile devices.
- Report suspicious emails to security@techcorp.com.

SOFTWARE INSTALLATION:
- Only approved software may be installed on company devices.
- Submit software requests through the IT portal.
- Open source software requires security review.
    """
}

# Write documents to files
for filename, content in documents_content.items():
    filepath = os.path.join(sample_docs_dir, filename)
    with open(filepath, 'w', encoding='utf-8') as f:
        f.write(content)
    print(f"‚úÖ Created: {filename}")

print(f"\nüìÇ Sample documents created in: {sample_docs_dir}/")

‚úÖ Created: company_policies.txt
‚úÖ Created: benefits_guide.txt
‚úÖ Created: it_guidelines.txt

üìÇ Sample documents created in: sample_documents/


---

## 6. Building a Document Q&A System <a id="6-building-a-document-qa-system"></a>

Now let's build a system that can answer questions about our company documents!

### üì• Step 1: Loading Documents

In [19]:
from llama_index.core import SimpleDirectoryReader

# Load all documents from the directory
loader = SimpleDirectoryReader(
    input_dir="./sample_documents",
    recursive=True,  # Include subdirectories
    required_exts=[".txt"]  # Only load text files
)

documents = loader.load_data()

print(f"üìÑ Loaded {len(documents)} documents:")
for doc in documents:
    print(f"   - {doc.metadata.get('file_name', 'Unknown')}")

üìÑ Loaded 3 documents:
   - benefits_guide.txt
   - company_policies.txt
   - it_guidelines.txt


### üßÆ Step 2: Creating Vector Embeddings

In [20]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex

# Initialize embedding model
embedding_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

print("üî¢ Creating vector embeddings...")

# Create index from documents
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embedding_model
)

# Persist the index
index.storage_context.persist(persist_dir="./index_storage")

print("‚úÖ Vector index created and saved!")

2026-01-05 12:10:44,951 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

2026-01-05 12:10:47,148 - INFO - 1 prompt is loaded, with the key: query


üî¢ Creating vector embeddings...
‚úÖ Vector index created and saved!


### üîç Step 3: Setting Up the Query Engine

In [22]:
import torch
from llama_index.llms.huggingface import HuggingFaceLLM

llm_llama = HuggingFaceLLM(
    model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    tokenizer_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    device_map="auto",
    max_new_tokens=256,
    model_kwargs={
        "dtype": torch.float16,
    },
    generate_kwargs={
        "temperature": 0.3,
        "do_sample": True
    }
)

print("‚úÖ Local LLM loaded!")

# Create query engine
query_engine = index.as_query_engine(llm=llm_llama)
print("‚úÖ Query engine ready!")

# Test
question = "What is the company match for 401k?"
response = query_engine.query(question)
print(f"‚ùì {question}")
print(f"üí° {response.response}")

2026-01-05 12:11:19,569 - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


‚úÖ Local LLM loaded!
‚úÖ Query engine ready!
‚ùì What is the company match for 401k?
üí° 6% of company contributions


---

## 7. Creating Interactive Interfaces with Gradio <a id="7-creating-interactive-interfaces-with-gradio"></a>

### ü§ñ Building a Chatbot UI

In [23]:
import gradio as gr

chat_engine = index.as_chat_engine(llm=llm_llama)

def chat_response(message, history):
    response = chat_engine.chat(message)
    return response.response

demo = gr.ChatInterface(
    fn=chat_response,
    title="TechCorp HR Assistant",
    description="Ask me about company policies!"
)

# # Uncomment To launch: 
# demo.launch(share=False)

2026-01-05 12:11:35,368 - INFO - HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"


In [24]:
gr.close_all()

---

## 8. Summary and Best Practices <a id="8-summary-and-best-practices"></a>

### üéØ Key Takeaways

1. **LangChain** is great for complex logic and connecting components.
2. **LlamaIndex** is specialized for indexing and retrieving from your data.
3. **RAG** allows LLMs to answer questions about information they weren't trained on.
4. **Gradio** provides a professional-looking UI with very little code.