# RAG (Retrieval-Augmented Generation) Pipeline Notebook

This notebook demonstrates a complete RAG pipeline using LlamaIndex and OpenRouter LLM.

---

## 1. Install Required Libraries
1. **llama-index**: 
   - Creates and queries document indexes for efficient searching and retrieval.
   - `pip install llama-index`

2. **llama-index-llms-openrouter**: 
   - Integrates LlamaIndex with OpenRouter’s LLMs for enhanced querying.
   - `pip install llama-index-llms-openrouter`

3. **requests**: 
   - Simplifies HTTP requests to interact with web APIs.
   - `pip install requests`

4. **pymupdf**: 
   - Allows PDF manipulation and text/image extraction.
   - `pip install pymupdf`

5. **streamlit**: 
   - Builds interactive web apps for data visualization and ML.
   - `pip install streamlit`

6. **pickle**: 
   - Serializes and deserializes Python objects (e.g., models or data).
   - `pip install pickle` *(Part of Python standard library)*



In [None]:
pip install llama-index llama-index-llms-openrouter requests pymupdf streamlit streamlit-feedback pickle

## Code Explanation

1. **os.makedirs("educational_docs", exist_ok=True)**: 
   - Creates a directory called `educational_docs` if it doesn’t exist.

2. **pdf_sources**: 
   - A dictionary containing the filenames and corresponding URLs of OpenStax PDFs to download.

3. **requests.get(url)**: 
   - Downloads the PDF from the provided URL.

4. **PDF Validation**: 
   - Checks if the downloaded file is a valid PDF by inspecting the content.

5. **File Saving**: 
   - Saves the PDF content to the `educational_docs` folder.

6. **Completion**: 
   - Prints success or failure for each PDF and confirms when all documents are saved.


In [None]:
import os
import requests

# Ensure directory exists
os.makedirs("educational_docs", exist_ok=True)

# ✅ Real OpenStax PDFs (tested)
pdf_sources = { #put your PDFs links here
    #"Principles_of_Data_Science.pdf": "https://assets.openstax.org/oscms-prodcms/media/documents/Principles-of-Data-Science-WEB.pdf",
    #"Introduction_to_Python_Programming.pdf": "https://assets.openstax.org/oscms-prodcms/media/documents/Introduction_to_Python_Programming_-_WEB.pdf",
    #"Introductory_Statistics.pdf": "https://assets.openstax.org/oscms-prodcms/media/documents/IntroductoryStatistics-OP_i6tAI7e.pdf",
}

# Download each PDF and save
for filename, url in pdf_sources.items():
    pdf_path = os.path.join("educational_docs", filename)

    print(f"Downloading {filename}...")
    response = requests.get(url)
    if not response.ok or b"%PDF" not in response.content[:1024]:
        print(f"❌ Failed to download valid PDF from: {url}")
        continue

    with open(pdf_path, "wb") as f:
        f.write(response.content)

    print(f"✅ Saved {filename}")

print("✅ Done: Educational documents saved as PDFs.")


## Code Explanation

1. **OpenRouter from llama_index.llms.openrouter**: 
   - Imports the `OpenRouter` class to interact with OpenRouter's API for language model access.

2. **OpenRouter Initialization**: 
   - Initializes an `OpenRouter` instance with the following parameters:
     - **api_key**: Your OpenRouter API key for authentication. (https://openrouter.ai/deepseek/deepseek-chat:free)
     - **max_tokens**: Limits the length of generated responses to 256 tokens.
     - **context_window**: Defines the context size for the model (4096 tokens).
     - **model**: Specifies the model used, in this case, "deepseek/deepseek-chat:free".

In [None]:
from llama_index.llms.openrouter import OpenRouter


llm = OpenRouter(
    api_key="<YOUR_OPENROUTER_API_KEY_HERE>",
    max_tokens=256,
    context_window=4096,
    model="deepseek/deepseek-chat:free",
)

## Explanation of Libraries

1. **llama-index-embeddings-huggingface**: 
   - Integrates Hugging Face models with LlamaIndex for using embeddings in retrieval-augmented generation (RAG) systems.
   - `pip install llama-index-embeddings-huggingface`

2. **llama-index-embeddings-instructor**: 
   - Provides integration for Instructor embeddings with LlamaIndex, allowing for the use of specific embeddings for RAG tasks.
   - `pip install llama-index-embeddings-instructor`


In [None]:
pip install llama-index-embeddings-huggingface llama-index-embeddings-instructor

## Code Explanation

1. **HuggingFaceEmbedding from llama_index.embeddings.huggingface**: 
   - Imports the `HuggingFaceEmbedding` class to use embedding models from Hugging Face for LlamaIndex.

2. **Embed Model Initialization**:
   - Initializes the embedding model with the specified model name:
     - The first option (commented out) loads the **BAAI/bge-small-en** model.
     - The second (active) loads the **Alibaba-NLP/gte-Qwen2-7B-instruct** model.

3. **Embedding Models Compatible with DeepSeek**:
   - Many Hugging Face embedding models can work with DeepSeek, including the one used in this code. Here’s a list of some models:
     - **BAAI/bge-small-en**
     - **BAAI/bge-small-en-v1.5**
     - **Alibaba-NLP/gte-Qwen2-7B-instruct** (used here)
     - **sentence-transformers/all-MiniLM-L6-v2**
     - **distilbert-base-nli-stsb-mean-tokens**
     - **facebook/dpr-ctx_encoder-single-nq-base**
     - **openai/embeddings**
     - **google/electra-base-discriminator**


In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# loads BAAI/bge-small-en
# embed_model = HuggingFaceEmbedding()

# loads BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="Alibaba-NLP/gte-Qwen2-7B-instruct")

## Code Explanation

1. **HuggingFaceEmbedding from llama_index.embeddings.huggingface**: 
   - Imports the `HuggingFaceEmbedding` class to load embedding models from Hugging Face for LlamaIndex.

2. **Settings from llama_index.core**: 
   - Imports the `Settings` class to configure the LlamaIndex settings, such as the language model (LLM) and embedding model.

3. **Environment Variable (`TOKENIZERS_PARALLELISM`)**: 
   - Disables tokenizer parallelism to avoid warnings related to multi-threading issues in Hugging Face models.

4. **Embedding Model Setup**: 
   - Sets the embedding model to be used in the LlamaIndex by initializing `embed_model` (which was set to use a free API from OpenRouter).

5. **LLM Setup**: 
   - Sets the LLM (language model) to be used in the LlamaIndex. In this case, it's configured to use a free model from OpenRouter instead of the default OpenAI models.

6. **Default Behavior of LlamaIndex**:
   - By default, LlamaIndex uses OpenAI's models, which require a paid API. To avoid this cost, we’ve opted for a free API from OpenRouter for both the LLM and embedding models.

7. **Why the Change**: 
   - Since DeepSeek cannot work with OpenAI’s embedding models, we imported a compatible embedding model from Hugging Face (`Alibaba-NLP/gte-Qwen2-7B-instruct`) to work with DeepSeek.


In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
Settings.embed_model = embed_model
Settings.llm = llm

## Code Explanation

1. **SimpleDirectoryReader**:  
   - Loads all documents (PDFs) from the `educational_docs` folder.

2. **VectorStoreIndex.from_documents**:  
   - Creates a vector index from the loaded documents, allowing for efficient semantic search and retrieval.

3. **⚠️ Note on Performance**:  
   - This step can take a **long time** if your machine has **limited computing power**, especially with **large or multiple PDFs**. Be patient during indexing.


In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
#from llama_index.embeddings.huggingface import HuggingFaceEmbedding
#from llama_index.core import Settings

# Load documents
documents = SimpleDirectoryReader("educational_docs").load_data()

# Create the index
index = VectorStoreIndex.from_documents(documents)


## Code Explanation

1. **index.as_query_engine()**:  
   - Converts the index into a query engine that can understand and answer natural language questions.

2. **query_engine.query(...)**:  
   - Sends a prompt or question to the query engine and retrieves a response based on the indexed documents.

3. **print(response)**:  
   - Displays the model’s answer in the notebook output.


In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("<put your prompt here>") #to test if the model works
print(response)


## Code Explanation

1. **pickle**:  
   - A Python module used to serialize (save) and deserialize Python objects.

2. **Saving the Query Engine**:  
   - `pickle.dump(query_engine, f)` saves the `query_engine` object to a file named `rag_model.pkl`.  
   - This allows you to reload and reuse the RAG model later without rebuilding the index.


In [None]:
import pickle
with open('rag_model.pkl', 'wb') as f:
    pickle.dump(query_engine, f)

## Streamlit Chatbot App

This app uses Streamlit to interact with a precomputed query engine. The chatbot loads a model (`rag_model.pkl`) and stores user feedback in a CSV file.

### Features:
- **Caching**: The model and feedback storage are cached for performance.
- **Chat History**: Displays the conversation between the user and assistant.
- **Feedback**: Users can provide feedback on the assistant's response, stored in `feedback_log.csv`.

### Key Functions:
- `load_query_engine()`: Loads the pickled model for querying.
- `store_feedback()`: Stores user feedback to a CSV file.
- `init_state()`: Initializes the session state.

### How It Works:
1. User inputs a prompt.
2. The chatbot responds using the precomputed query engine.
3. Feedback is collected after each response.

In [None]:
%%writefile app.py
import streamlit as st
import pickle
import os
import csv
from typing import Optional

# Cache the model load
@st.cache_resource
def load_query_engine():
    with open('rag_model.pkl', 'rb') as f:
        return pickle.load(f)

# Cache feedback storage
@st.cache_data(max_entries=100)
def store_feedback(question: str, response: str, feedback: str):
    file_exists = os.path.exists("feedback_log.csv")
    with open("feedback_log.csv", "a", newline='', encoding='utf-8') as f:
        writer = csv.writer(f, quoting=csv.QUOTE_ALL)
        if not file_exists:
            writer.writerow(["User Question", "Assistant Response", "Feedback"])
        writer.writerow([question, response, feedback])

# Initialize app state
def init_state():
    if "messages" not in st.session_state:
        st.session_state.messages = [{"role": "assistant", "content": "How can I help you?"}]
    if "feedback_submitted" not in st.session_state:
        st.session_state.feedback_submitted = False

# Load components
init_state()
query_engine = load_query_engine()

# UI Elements (static parts)
with st.sidebar:
    st.write("Query Engine Chatbot")
    st.markdown("[View the source code](https://github.com/streamlit/llm-examples/blob/main/Chatbot.py)")

st.title("💬 Query Engine Chatbot")
st.caption("🚀 A Streamlit chatbot powered by a precomputed index")

# Display chat history
for msg in st.session_state.messages:
    st.chat_message(msg["role"]).write(msg["content"])

# Chat input
if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    st.chat_message("user").write(prompt)
    
    with st.spinner("Thinking..."):
        response = query_engine.query(prompt)
        msg = str(response)
        
    st.session_state.messages.append({"role": "assistant", "content": msg})
    st.chat_message("assistant").write(msg)
    st.session_state.feedback_submitted = False

# Feedback handling
if (not st.session_state.feedback_submitted and 
    len(st.session_state.messages) > 1 and 
    st.session_state.messages[-1]["role"] == "assistant"):
    
    feedback = st.radio(
        "How was the answer?",
        ["👍 Good", "👎 Bad"],
        key="feedback_radio",
        index=None
    )
    
    if feedback:
        store_feedback(
            st.session_state.messages[-2]["content"],
            st.session_state.messages[-1]["content"],
            feedback
        )
        st.session_state.feedback_submitted = True
        st.rerun()


## 🚀 Running the Streamlit App

To launch your chatbot app from `app.py`, run the following:

```python
!streamlit run app.py
````

* This starts the Streamlit server and opens the chatbot interface.
* On platforms like **Google Colab** or **Lightning AI**, the default local URL may not open directly. You can fix this using `ngrok` (see below).

---

### ☁️ On Cloud-Based Platforms (e.g., Colab, Lightning AI)

To expose the Streamlit app to the web using `ngrok`:

```python
!pip install pyngrok
from pyngrok import ngrok

public_url = ngrok.connect(8501)
print(public_url)

!streamlit run app.py
```

* `ngrok` will generate a public link (e.g., [https://xxxx.ngrok.io](https://xxxx.ngrok.io)) to access your app from any browser.

---

### 🖥️ Run Locally on Your Machine

1. Open a terminal or command prompt.
2. Navigate to the directory containing `app.py`.
3. Run:

```bash
streamlit run app.py
```

4. The app will open at `http://localhost:8501` in your browser.

---

### 🛑 Stopping the App

* Press **Ctrl+C** in your terminal to stop the Streamlit server.
* In Colab or other notebook environments, just stop the cell to terminate the process.



In [None]:
!streamlit run app.py