<a href="https://colab.research.google.com/github/Sibikrish3000/chatbot_workshop/blob/main/RAG_with_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **What is RAG?**
- **Definition**: A hybrid approach combining retrieval-based and generative models.
- **Purpose**: Enhances chatbots by retrieving relevant documents from a knowledge base and generating context-aware responses.

### **Key Components**:
1. **Retrieval**:
   - Searches a database/documents (e.g., Wikipedia, company files) for chunks relevant to the user’s query.
   - Uses semantic search (e.g., cosine similarity with embeddings) to find contextually similar text..
2. **Augmentation**:
  - Injects the retrieved information into the prompt as context for the generative model.
3. **Generation**:
   - Employs a language model (e.g., DeepSeek, Qwen) to generate natural language responses based on retrieved context.

### **Workflow**:
1. **Query Input**: User submits a question.
2. **Document Retrieval**: Relevant documents are fetched from the knowledge base.
3. **Prompt Construction**: Context and query are combined into a prompt.
4. **Response Generation**: The model generates a response using the context.
5. **Output**: Cleaned and formatted response is returned to the user.

### **Benefits**:
- Accurate and context-aware responses.
- Scalable with large knowledge bases.
- Improves conversational flow and user satisfaction.

# **Advavance Rag Implemenation Flow chart:**[![3JB0VdF.md.png](https://iili.io/3JB0VdF.png)](https://freeimage.host/i/3JB0VdF)

# **Naive RAG:**
![](https://iili.io/3JRglls.png)
### **Workflow in RAG**
1. **Embedding Conversion**:  
   - The query and all documents are converted into dense vector representations via an embedding model.
2. **Similarity Calculation**:  
   - Cosine similarity is computed between the query vector and every document vector.
3. **Top-k Retrieval**:  
   - Documents with the highest cosine scores are selected as context for the generative model (e.g., GPT-3).
4. **Generation**:  
   - The retrieved documents inform the generative model to produce a relevant, accurate answer.
![](https://iili.io/3JYWszg.png)

In [14]:
# Documents
question = "What kinds of pets do I like?" # @param ["What kinds of pets do I like?"] {"allow-input":true}
document = "My favorite pet is a cat."    # @param ["My favorite pet is a cat."] {"allow-input":true}

## Part 1 : Indexing
![](https://iili.io/3JBjI1f.png)

In [None]:
!pip install tiktoken

### What are tokens?
>Tokens can be thought of as pieces of words. Before the API processes the request, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:

1 token ~= 4 chars in English

1 token ~= ¾ words

100 tokens ~= 75 words

Or

1-2 sentence ~= 30 tokens

1 paragraph ~= 100 tokens

1,500 words ~= 2048 tokens

In [None]:
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    enc_str=encoding.encode(string)
    print(f'encoded string: {enc_str}')
    dec_str=encoding.decode(enc_str)
    print(f'Decode string: {dec_str}')
    num_tokens = len(enc_str)
    return num_tokens

num_tokens_from_string(question, "cl100k_base")

encoded string: [72, 3021, 11058]
Decode string: i love coding


3

## Part 2 : Embedding
>embedding is a representation of data (such as words, sentences, or other entities) in a continuous vector space. These vectors are typically dense, meaning they consist of real numbers rather than sparse binary values, and are designed to capture semantic relationships between the items being represented.

Key Concepts of Embeddings:

Vector Representation :

Embeddings map discrete objects (e.g., words, images, or categories) into continuous vector spaces. For example, a word like "king" might be represented as a vector [0.25, -0.1, 0.9, ...] in a high-dimensional space.

----

Semantic Meaning :

The key idea behind embeddings is that similar items will have similar vector representations. For instance, in word embeddings, semantically related words (like "king" and "queen") will have vectors that are close to each other in the vector space.

* Suppose you have the words "king", "queen", "man", and "woman". After training, their embeddings might look something like this in a simplified 2D space:
```
king   -> [0.8, 0.2]
queen  -> [0.7, 0.3]
man    -> [0.9, 0.1]
woman  -> [0.8, 0.4]
```
In this case, the vector for "king" is closer to "man" than to "woman", and "queen" is closer to "woman". Additionally, the relationship between "king" and "queen" might be similar to the relationship between "man" and "woman", which can be captured by vector arithmetic:
```
king - man + woman ≈ queen
```

In [None]:
!pip install langchain_community

In [15]:
from langchain_community.embeddings import HuggingFaceEmbeddings
embd = HuggingFaceEmbeddings()
query_result = embd.embed_query(question)
document_result = embd.embed_query(document)
# print(f'query_result: {query_result}')
print(f'query_result length: {len(query_result)}')
# print(f'document_result: {document_result}')
print(f'document_result length: {len(document_result)}')

  embd = HuggingFaceEmbeddings()


query_result length: 768
document_result length: 768


## Part3 : Cosine similarity
Cosine similarity is a mathematical measure used to determine how similar two vectors are, regardless of their magnitude. In the context of Retrieval-Augmented Generation (RAG) , cosine similarity plays a crucial role in retrieving relevant information from a knowledge base or corpus before generating responses.

---

- **Mathematical Formula**:  
   The cosine similarity between two vectors $ A $ and $ B $ is given by:
   $$
   \text{Cosine Similarity}(A, B) = \cos(\theta) = \frac{A \cdot B}{\|A\| \|B\|}
   $$
   Where:
   - $ A \cdot B $ is the dot product of $ A $ and $ B $: $ A \cdot B = \sum_{i=1}^n A_i B_i $
   - $ \|A\| $ and $ \|B\| $ are the magnitudes (or Euclidean norms) of $ A $ and $ B $, respectively:

    $$
    \|A\| = \sqrt{\sum_{i=1}^n A_i^2}, \quad \|B\| = \sqrt{\sum_{i=1}^n B_i^2} $$
---
### **Key Properties**
1. **Range**:  
   - Ranges from **-1 to 1** (but **0 to 1** for non-negative vectors, e.g., TF-IDF document vectors).  
   - **1**: Vectors are identical in direction.  
   - **0**: Vectors are orthogonal (no similarity).  
   - **-1**: Vectors are diametrically opposed.

2. **Magnitude Invariance**:  
   - Focuses on orientation, not magnitude. Useful when comparing objects where size differences are irrelevant (e.g., text documents of varying lengths).
---
### **Example Calculation**
For vectors $ \mathbf{A} = [1, 2] $ and $ \mathbf{B} = [2, 1] :$  
1. **Dot Product**:$ (1 \times 2) + (2 \times 1) = 4 . $
2. **Magnitudes**: $ \|\mathbf{A}\| = \sqrt{1^2 + 2^2} = \sqrt{5} ,  \|\mathbf{B}\| = \sqrt{2^2 + 1^2} = \sqrt{5} .  $
3. **Cosine Similarity**: $ \frac{4}{\sqrt{5} \times \sqrt{5}} = \frac{4}{5} = 0.8 . $


In [None]:
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)

Cosine Similarity: 1.0


## Part 5 : Loader

In [None]:
#### INDEXING ####

# Loading blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()



In [None]:
blog_docs

## Part 6 : [**Text Splitter**](https://python.langchain.com/docs/how_to/recursive_text_splitter/)



> This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.




In [None]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

In [None]:
len(splits)

52

## Part 7 : Vectorstore

In [None]:
!pip install chromadb

In [None]:
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=HuggingFaceEmbeddings())

In [None]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [None]:
docs = retriever.get_relevant_documents("What is Task Decomposition?")

In [None]:
len(docs)

5

In [None]:
docs[0]

Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\nThe system comprises of 4 stages:\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\nInstruction:')

In [None]:
docs[1]

Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search)

## Part 8: Generation
![](https://iili.io/3JCzBQn.png)

In [None]:
from langchain.prompts import ChatPromptTemplate

# Prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})])

In [None]:
!pip install langchain-huggingface

In [None]:
from langchain_huggingface import HuggingFaceEndpoint
# LLM
llm = HuggingFaceEndpoint(
    repo_id="deepseek-ai/DeepSeek-R1",
     temperature=0.2 #@param {"type":"number","placeholder":"0.7"}
    ,
    max_length=512,
    task='text-generation',
)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.


In [None]:
# Chain

chain = prompt | llm #chain pipeline

# ### without pipeline
# input=prompt.format_prompt(context=docs[0].page_content, question="What is Task Decomposition?")
# response=llm.invoke(input)
# response

In [None]:
# Run
chain.invoke({"context":lambda docs: "\n\n".join(d.page_content for d in docs|),
                                                 "question":"What is Task Decomposition?"})



'Answer: Task Decomposition is the process of breaking down a complex task into smaller, more manageable steps or subgoals. This can be done by a Large Language Model (LLM) using simple prompts, task-specific instructions, or with human inputs.'

In [None]:
from langchain import hub
prompt_hub_rag = hub.pull("rlm/rag-prompt") #@markdown create a langsmith acount here [**smith.langchain.com**](https://smith.langchain.com/) then generate langsmith API key

In [None]:
prompt_hub_rag

## [**RAG chains**]()

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.runnables import RunnableLambda


rag_chain = (
    {
        "context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)),
        "question": RunnablePassthrough()
    }
    | prompt_hub_rag
    | llm
    | StrOutputParser()
)
rag_chain.invoke("What is Task Decomposition?")



'Answer: Task Decomposition is the process of breaking down a complex task into smaller, more manageable subtasks. This can be done by the LLM with simple prompting, using task-specific instructions, or with human inputs. The goal is to transform big tasks into multiple manageable tasks to make them easier to understand and execute.'

# Installing Required packages

In [None]:
#@title <b>Installing Required packages</b>
!pip install langchain faiss-cpu sentence-transformers transformers accelerate pypdf langchain-experimental langchain-groq langchain-community unstructured[all]

# RAG with API's

### Groq API

In [2]:
#@title <b>RAG implementaion with Groq API </b>
import dotenv
from langchain_groq import ChatGroq
from google.colab import userdata
groq_api_key = userdata.get('GROQ_API_KEY') # @param ["userdata.get('GROQ_API_KEY')"] {"type":"raw","allow-input":true}

# ## load the Groq API key
# dotenv.load_dotenv()
# groq_api_key=os.environ['GROQ_API_KEY']

model_name = "llama-3.2-3b-preview" # @param ["llama-3.2-3b-preview","deepseek-r1-distill-llama-70b","llama-3.3-70b-versatile","llama-3.3-70b-specdec","llama-3.2-1b-preview","llama-3.1-8b-instant","llama3-70b-8192","mixtral-8x7b-32768","llama3-8b-8192","llama-guard-3-8b"]

llm=ChatGroq(groq_api_key=groq_api_key,
             model_name=model_name,
temperature = 0.7 # @param {"type":"number","placeholder":"0.7"}
             )

### Mistralai API

In [None]:
#@title <b>RAG implementaion with Mistral API
from google.colab import userdata

mistral_api_key = userdata.get('MISTRAL_API_KEY') # @param ["userdata.get('MISTRAL_API_KEY')"] {"type":"raw","allow-input":true}

from langchain_mistralai.chat_models import ChatMistralAI

model_name = "mistral-small-latest" # @param ["mistral-small-latest","pixtral-large-latest","codestral-latest","mistral-large-latest","ministral-8b-latest","ministral-3b-latest","open-mistral-nemo"]

llm = ChatMistralAI(
    mistral_api_key=mistral_api_key,
    model=model_name,
    temperature = 0.7 # @param {"type":"number","placeholder":"0.7"}
)

In [None]:
!pip install langchain-mistralai

### Google GenerativeAI API

In [None]:
#@title <b>RAG implementaion with Google GenerativeAI API
from langchain_google_genai import ChatGoogleGenerativeAI
from google.colab import userdata

# 1. Configure Google AI Studio get from https://makersuite.google.com/
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY') # @param ["userdata.get('GOOGLE_API_KEY')"] {"type":"raw","allow-input":true}

model_name = "gemini-2.0-flash" # @param ["gemini-2.0-flash","gemini-2.0-flash-lite-preview-02-05","gemini-2.0-pro-exp-02-05","gemini-2.0-flash-thinking-exp-01-21","gemini-2.0-flash-exp","learnlm-1.5-pro-experimental","gemini-1.5-pro","gemini-1.5-flash","gemini-1.5-8b","gemini-1.5-flash-8b"]

# 2. Initialize Gemini model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=GOOGLE_API_KEY,
    temperature = 0.7 # @param {"type":"number","placeholder":"0.7"}
    ,
    convert_system_message_to_human=True
)

In [None]:
!pip install google-generativeai langchain-google-genai

# **Running RAG with Locally with Ollama**


In [None]:
#@title **Downloading Ollama**
!curl -fsSL https://ollama.com/install.sh | sh

In [None]:
%%bash --bg
#@title Starting ollama server in colab background
ollama serve > /dev/null 2>&1  # Start server in background, suppress logs

In [None]:
!curl http://localhost:11434  # Should return "Ollama is running"

In [None]:
#@title Downloading Deepseek-r1:7.5b Locally
!ollama pull deepseek-r1

In [None]:
#List the local Ollama models
!ollama list

In [None]:
#@title **kill Running Ollama**
!pkill -f "ollama serve" #Dont run this cell unless u want to stop ollama server

In [None]:
#@title Testing the Deepseek-r1  model in locally
!ollama run --verbose deepseek-r1 'hi'

In [None]:
%pip install -U langchain-ollama

In [None]:
#@title **Initialize the Ollama model with Python Langchain**

from langchain_community.llms import Ollama
# from langchain_ollama.llms import OllamaLLM

# Initialize the Ollama model
llm = Ollama(
    model="deepseek-r1",
    temperature=0.9,
    base_url='http://localhost:11434'
)

# Run inference
response = llm.invoke("Explain quantum computing in 3 sentences.")
print(response)

### Testing LLM

In [None]:
from langchain.schema.messages import HumanMessage, SystemMessage
messages = [
 SystemMessage(
      content="""You're an assistant knowledgeable about
     healthcare. Only answer healthcare-related questions."""
),
 HumanMessage(content="What is Medicaid managed care?"),
]
llm.invoke(messages)



AIMessage(content="Medicaid managed care is a system where a state contracts with managed care organizations (MCOs) to administer healthcare services to Medicaid beneficiaries. Instead of the state directly paying providers for each service (fee-for-service), the state pays the MCO a fixed per-member, per-month (capitation) rate to cover the cost of healthcare for the enrollees in that plan.\n\nHere's a breakdown of key aspects:\n\n*   **How it works:** States contract with MCOs (which can be HMOs, provider-sponsored organizations, or other types of health plans) to provide a defined set of healthcare services to Medicaid enrollees. Enrollees typically choose a managed care plan from a selection offered in their area.\n\n*   **Capitation:** The MCO receives a fixed payment per member per month, regardless of how much or how little healthcare the enrollee uses. This incentivizes the MCO to manage costs and promote preventive care.\n\n*   **Network of Providers:** MCOs create and manage 

# Building a RAG Agent

In [3]:
#@title <b>Load documents with multiple file support</b>
from langchain.document_loaders import DirectoryLoader, PyPDFLoader, TextLoader, UnstructuredMarkdownLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
import os
# 1. Load documents with multiple file support
def load_documents(path):
    loaders = [
        DirectoryLoader(path, glob="**/*.pdf", loader_cls=PyPDFLoader),
        DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader),
        DirectoryLoader(path, glob="**/*.md", loader_cls=UnstructuredMarkdownLoader)
    ]
    documents = []
    for loader in loaders:
        documents.extend(loader.load())
    return documents

knowledge_base_path = "knowledge_base/" # @param ["knowledge_base/"] {"allow-input":true}
docs = load_documents(knowledge_base_path)


In [None]:
#@title <b>Splitting text into semantic chunks</b>
text_splitter = SemanticChunker(HuggingFaceEmbeddings())
documents = text_splitter.split_documents(docs)
##cell completed with cpu 12min 23sec

In [None]:
!pip install chromadb

In [None]:
# Generate embeddings
embeddings = HuggingFaceEmbeddings()

In [7]:
#@title **creating vector store**

# from langchain_community.vectorstores import FAISS
# vector_store = FAISS.from_documents(documents, embeddings)

from langchain.vectorstores import Chroma

# creating Chroma vector store
vector_store = Chroma.from_documents(
    documents=documents,
    embedding=embeddings,
    persist_directory="chroma_db"  # Optional: persist to disk
)

# Connect retriever (same interface)
default_retriever = vector_store.as_retriever(search_kwargs={"k": 3})  # Fetch top 3 chunks

In [8]:
#@title **Craft the prompt template**
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
prompt = """
**Rules**:
1. You are a flipkart customer service agent.
2. Answer only based on the context.
3. If context doesn't contain answer, list common reasons.
3. If you don't know the answer, just say that you don't know, don't try to make up an answer.
4. Keep answers under 4 sentences.
5. you can greet to user.
6. Format the answer in Markdown.  Include headings, lists, code blocks, and other Markdown elements as appropriate.

Context: {context}

Question: {question}

Answer:
"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(prompt)

In [None]:
!pip install tiktoken

In [None]:
#@title **LLM Max input token truncation**

from langchain.schema import BaseRetriever, Document
from pydantic import Field
from typing import List

class TokenSafeRetriever(BaseRetriever):
    vector_store: object = Field(...)  # Proper Pydantic field declaration

    class Config:
        arbitrary_types_allowed = True  # Allow custom vector store type

    def _get_relevant_documents(self, query: str, **kwargs) -> List[Document]:
        docs = self.vector_store.similarity_search(query, k=3)
        encoder = get_encoding("cl100k_base")

        # token truncation logic
        max_tokens = 3800 # @param {"type":"number","placeholder":"3800"}
        current_tokens = 0
        filtered_docs = []

        for doc in docs:
            doc_tokens = len(encoder.encode(doc.page_content))
            if current_tokens + doc_tokens <= max_tokens:
                filtered_docs.append(doc)
                current_tokens += doc_tokens
            else:
                remaining = max_tokens - current_tokens
                truncated = " ".join(doc.page_content.split()[:int(remaining*0.7)])
                filtered_docs.append(Document(
                    page_content=truncated,
                    metadata=doc.metadata
                ))
                break

        return filtered_docs

# Initialize with proper Pydantic
custom_retriever = TokenSafeRetriever(vector_store=vector_store)

In [None]:
#@title **RAG pipeline**
from langchain.chains import LLMChain, StuffDocumentsChain
# Chain 1: Generate answers
llm_chain = LLMChain(llm=llm, prompt=QA_CHAIN_PROMPT)

# Chain 2: Combine document chunks
document_prompt = PromptTemplate(
    template="Context:\ncontent:{page_content}\nsource:{source}",
    input_variables=["page_content", "source"]
)

# Final RAG pipeline
qa = RetrievalQA(
    combine_documents_chain=StuffDocumentsChain(
        llm_chain=llm_chain,
        document_prompt=document_prompt,
         document_variable_name="context"
    ),
    return_source_documents=True,
retriever = default_retriever # @param ["custom_retriever","default_retriever"] {"type":"raw"}
)

In [None]:
#@title **Customer Sentiment Analysis**
from transformers import pipeline
# 1. Sentiment Analysis Model
sentiment_analyzer = pipeline(
    "text-classification",
    model="nlptown/bert-base-multilingual-uncased-sentiment"
)

# 2. Escalation Check Function
def requires_human_escalation(query):
    # Check for explicit requests
    explicit_triggers = {
        "human agent", "talk to manager", "supervisor",
        "real person", "angry", "frustrated"
    }

    if any(trigger in query.lower() for trigger in explicit_triggers):
        print('explicit_triggers')
        return True

    # Analyze sentiment
    result = sentiment_analyzer(query)[0]
    if result['label'] in ['1 star', '2 stars']:  # Negative sentiment
        print(result['score'])
        return result['score'] > 0.85  # High confidence

    return False

In [None]:
while True:
    query = input("Enter your query: ")
    if query.lower() in ['exit', 'quit',':q']:
        break
    # Retrieve and generate response using "query" as the input key
    if requires_human_escalation(query):
            print("\nBot: I'm truly sorry you're having this experience. ")
            print("Let me connect you with a senior support agent immediately.")
            print("Please hold while I transfer your chat...")
            # integration with live agent system
            break

    result = qa({"query": query})
    full_response = result["result"]

    # Extract only the part after "Answer:" if present
    if "Answer:" in full_response:
        answer = full_response.split("Answer:")[-1].strip()
    else:
        answer = full_response.strip()

    sources = list(set([doc.metadata["source"] for doc in result["source_documents"]]))

    print(f"\nBot: {answer}")
    print(f"\nSources: {sources}")

# **Integrating RAG Agent into Telegram Bot**

In [None]:
!pip install python-telegram-bot

In [None]:
!pip install nest_asyncio

In [None]:
import os
from telegram import Update
from telegram.ext import Application, MessageHandler, filters, ContextTypes
import logging
from google.colab import userdata
import asyncio
import nest_asyncio # To run our bot in jupyter notebook
nest_asyncio.apply()

# Set up logging
logging.basicConfig(
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    level=logging.INFO
)

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    query = update.message.text

    # Check for exit commands (optional)
    if query.lower() in ['exit', 'quit', ':q']:
        await update.message.reply_text("Goodbye! 👋")
        return

    # Human escalation check
    if requires_human_escalation(query):
        response = (
            "⚠️ I'm truly sorry you're having this experience.\n\n"
            "Let me connect you with a senior support agent immediately...\n"
            "Please hold while I transfer your chat ⌛"
        )
        await update.message.reply_text(response)
        # Add your live agent integration here
        return

    # Process query with RAG
    try:
        result = qa({"query": query})
        full_response = result["result"]

        # Extract answer
        if "Answer:" in full_response:
            answer = full_response.split("Answer:")[-1].strip()
        else:
            answer = full_response.strip()

        # Get unique sources
        sources = list(set([doc.metadata["source"] for doc in result["source_documents"]]))

        # Format response
        response = f"🤖 {answer}\n\n📚 Sources:\n" + "\n".join(sources)

        await update.message.reply_text(response)

    except Exception as e:
        logging.error(f"Error processing query: {e}")
        await update.message.reply_text("⚠️ Sorry, I encountered an error processing your request. Please try again.")

def main():
    # Get Telegram token from environment
    token =userdata.get('TELEGRAM_BOT_TOKEN') # os.getenv("TELEGRAM_BOT_TOKEN")

    # Create Application
    application = Application.builder().token(token).build()

    # Add message handler
    application.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))

    # Start polling
    logging.info("Bot is running...")
    application.run_polling()

if __name__ == "__main__":
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    try:
        loop.run_until_complete(main())
    except KeyboardInterrupt:
        pass
    finally:
        loop.close()

In [None]:
#this code only for non-notebook python scripts
import logging
from telegram import Update
from telegram.ext import Application, CommandHandler, MessageHandler, ContextTypes, filters
from google.colab import userdata
import os

# Enable logging for debugging
logging.basicConfig(
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO
)
logger = logging.getLogger(__name__)

# /start command handler
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    await update.message.reply_text("Hello! I'm your simple Telegram bot. How can I help you today?")

# Echo any text message back to the user
async def echo(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    # For now, simply echo the incoming message text.
    received_text = update.message.text
    await update.message.reply_text(received_text)

def main() -> None:
    # Get Telegram token from environment
    token =os.getenv("TELEGRAM_BOT_TOKEN") # userdata.get('TELEGRAM_BOT_TOKEN')
    application = Application.builder().token(token).build()

    # Add command handler for /start command.
    application.add_handler(CommandHandler("start", start))
    # Add message handler for text messages.
    application.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, echo))

    # Run the bot until the user presses Ctrl-C
    application.run_polling()

if __name__ == "__main__":
    main()
