# Mastering RAG Chatbots: From Implementation to Deployment

## Course Objectives

In this comprehensive tutorial, you'll learn how to:
1. Understand Retrieval-Augmented Generation (RAG) architecture
2. Implement a RAG-based chatbot from scratch
3. Create a user-friendly web interface using Gradio
4. Deploy and share your AI-powered document chat application

### Prerequisites
- Python programming skills
- Basic understanding of machine learning
- OpenAI API key
- Google Colab environment

## 1. Environment Setup

Let's install the required libraries:

In [145]:
# import torch
# from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

# model_name = "deepseek-ai/deepseek-llm-7b-chat"
# tokenizer = AutoTokenizer.from_pretrained(model_name)
# model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16) #device_map="auto")
# model.generation_config = GenerationConfig.from_pretrained(model_name)
# model.generation_config.pad_token_id = model.generation_config.eos_token_id

# messages = [
#     {"role": "user", "content": "Who are you?"}
# ]
# input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
# outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)

# result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
# print(result)


In [146]:
# !pip install requests beautifulsoup4 PyPDF2 numpy gradio -q

In [147]:
#!openai migrate
# !pip install openai==0.28 -q

In [148]:
# !pip install ollama

In [149]:
# pip install ollama

In [150]:
# pip install PyPDF2


In [151]:
# pip install gradio


In [152]:
# pip install openai

In [153]:
import os
import numpy as np
import requests
# from bs4 import BeautifulSoup
import PyPDF2
#import openai
import gradio as gr
import ollama
from openai import OpenAI

# Set OpenAI API Key
#openai.api_key = "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"  # Replace with your actual key

## 2. Content Extraction Modules

We'll create functions to extract content from different sources:

In [154]:
# Set DeepSeek API Key
deepseek_client =OpenAI(api_key="sk-005d9d3de8d946ecbf9721293c44d6be", base_url="https://api.deepseek.com")

In [155]:
# Extraction d'information depuis un site web
# def scrape_website(url):
#     """Scrape text content from a website"""
#     try:
#         response = requests.get(url)
#         soup = BeautifulSoup(response.text, 'html.parser')
#         return ' '.join(soup.stripped_strings)
#     except Exception as e:
#         return f"Error scraping website: {str(e)}"

# Extraction d'information depuis un pdf
def extract_pdf_content(pdf_path):
    """Extract text from a PDF file"""
    try:
        with open(pdf_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            return ' '.join(page.extract_text() for page in reader.pages)
    except Exception as e:
        return f"Error processing PDF: {str(e)}"

## 3. Text Processing and Embedding

Implement text chunking and embedding generation:

In [156]:
def split_into_chunks(text, chunk_size=500):
    """Split text into manageable chunks"""
    words = text.split()
    return [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

# def generate_embeddings(text):
#     """Generate embeddings using OpenAI"""
#     response = openai.Embedding.create(
#         model="text-embedding-ada-002",
#         input=text
#     )
#     return response['data'][0]['embedding']

def generate_embeddings(text):
    """Generate embeddings using Ollama"""
    response = ollama.embeddings(model="llama3.2:1b", prompt=text)
    return response['embedding']

def cosine_similarity(vec1, vec2):
    """Compute cosine similarity between two vectors"""
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

## 4. RAG Chatbot Core Implementation

In [157]:
class RAGChatbot:
    def __init__(self):
        self.chunks_with_embeddings = None

    def load_from_url(self, url):
        """Load content from a website"""
        content = scrape_website(url)
        self._process_content(content)

    def load_from_pdf(self, pdf_path):
        """Load content from a PDF"""
        content = extract_pdf_content(pdf_path)
        self._process_content(content)

    def _process_content(self, content):
        """Process content into chunks and generate embeddings"""
        chunks = split_into_chunks(content)
        self.chunks_with_embeddings = [
            {"content": chunk, "embedding": generate_embeddings(chunk)}
            for chunk in chunks
        ]

    def find_relevant_chunk(self, query):
        """Find most relevant text chunk for a query"""
        query_embedding = generate_embeddings(query)
        similarities = [
            (chunk["content"], cosine_similarity(query_embedding, chunk["embedding"]))
            for chunk in self.chunks_with_embeddings
        ]
        return max(similarities, key=lambda x: x[1])[0]

    # def ask(self, query):
    #     """Generate response based on query and context"""
    #     if not self.chunks_with_embeddings:
    #         return "Please load content first using load_from_url or load_from_pdf"

        # relevant_chunk = self.find_relevant_chunk(query)
        # response = openai.ChatCompletion.create(
        #     #model="gpt-3.5-turbo",
        #     model="gpt-4o-mini",
        #     messages=[
        #         {"role": "system", "content": "You are a helpful assistant using context to answer questions."},
        #         {"role": "user", "content": f"Context: {relevant_chunk}\n\nQuery: {query}"}
        #     ],
        #     max_tokens=200
        # )
        # return response['choices'][0]['message']['content']

    def ask(self, query):
        """Generate response based on query and context using Ollama"""
        if not self.chunks_with_embeddings:
            return "Please load content first using load_from_url or load_from_pdf"

        relevant_chunk = self.find_relevant_chunk(query)
        
        response = ollama.chat(
            # model="llama3.2:1b",  # Tu peux changer pour "llama3" ou un autre modèle
            model="deepseek-llm",
            messages=[
                {"role": "system", "content": "You are an GDPR expert using context to answer questions."},
                {"role": "user", "content": f"Context: {relevant_chunk}\n\nQuery: {query}"}
            ]
        )
        
        return response['message']['content']

## 5. Gradio Web Interface

In [158]:
# --- ipython-input-7-7774a7eacb87 ---
class RAGChatbotInterface:
    def __init__(self):
        self.chatbot = RAGChatbot()
        self.chat_history = []

    def process_file(self, file):
        """Process uploaded PDF file"""
        try:
            self.chatbot.load_from_pdf(file.name)
            return "PDF successfully loaded! You can now ask questions."
        except Exception as e:
            return f"Error processing PDF: {str(e)}"

    def process_url(self, url):
        """Process website URL"""
        try:
            self.chatbot.load_from_url(url)
            return "Website content successfully loaded! You can now ask questions."
        except Exception as e:
            return f"Error processing URL: {str(e)}"

    def chat(self, message, history):
        """Process chat message and update history"""
        try:
            response = self.chatbot.ask(message)
            history.append((message, response))
            return response, history
        except Exception as e:
            error_message = f"Error generating response: {str(e)}"
            history.append((message, error_message))
            return error_message, history

    def launch_interface(self, share=True):
        """Create and launch Gradio interface"""
        with gr.Blocks(title="RAG Chatbot") as interface:
            gr.Markdown("# 📚 RAG Chatbot: Learn from Any Document")

            with gr.Tab("PDF Input"):
                # Change 'type' to 'filepath' to get the file path
                pdf_upload = gr.File(label="Upload PDF", type="filepath", file_types=[".pdf"])
                pdf_status = gr.Textbox(label="PDF Status", interactive=False)
                pdf_upload.upload(fn=self.process_file, inputs=[pdf_upload], outputs=[pdf_status])

            with gr.Tab("URL Input"):
                url_input = gr.Textbox(label="Enter Website URL", placeholder="https://example.com")
                url_status = gr.Textbox(label="URL Status", interactive=False)
                url_button = gr.Button("Load Content")
                url_button.click(fn=self.process_url, inputs=[url_input], outputs=[url_status])

            chatbot = gr.Chatbot(label="Chat with Your Document", height=400)
            msg = gr.Textbox(label="Your Question", placeholder="Ask a question about the document...")
            clear = gr.Button("Clear Chat")

            msg.submit(fn=self.chat, inputs=[msg, chatbot], outputs=[msg, chatbot])
            clear.click(lambda: None, None, chatbot, queue=False)

        interface.launch(share=share)

## 6. Launching the RAG Chatbot Interface

In [159]:
# Create and launch the interface
rag_interface = RAGChatbotInterface()
rag_interface.launch_interface(share=True)

  chatbot = gr.Chatbot(label="Chat with Your Document", height=400)


* Running on local URL:  http://127.0.0.1:7869

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.


## Learning Outcomes

By completing this tutorial, you will have learned:

1. **Content Extraction Techniques**
   - Web scraping with BeautifulSoup
   - PDF text extraction

2. **Text Processing**
   - Text chunking strategies
   - Semantic embedding generation

3. **RAG Architecture**
   - Context retrieval
   - Similarity-based chunk selection
   - Prompt engineering

4. **Web Interface Development**
   - Creating interactive UIs with Gradio
   - Handling file and URL inputs
   - Managing chat interactions

## Challenges and Extensions

1. Implement multi-document support
2. Add more sophisticated embedding techniques
3. Improve error handling and input validation
4. Create a more advanced prompt engineering strategy
5. Implement conversation memory

## Ethical Considerations

- Always respect copyright and terms of service
- Be mindful of privacy when processing documents
- Use AI responsibly and ethically