<a href="https://colab.research.google.com/github/NormLorenz/ai-llm-gradio-rag/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Google Colab Jupyter Version
Build a RAG pipeline from a file!

# Documentation
https://docs.google.com/document/d/1iRPcqsYZj0Jmd6QqI6UoT2mVbs3BjGkpIpGCI5SFQCM/edit?tab=t.0

In [1]:
# Install dependencies
!uv pip install gradio openai pinecone langchain langchain-openai langchain_pinecone langchain_community langchain_classic

[2mUsing Python 3.12.12 environment at: /usr[0m
[2K[2mResolved [1m101 packages[0m [2min 1.60s[0m[0m
[2K[2mPrepared [1m16 packages[0m [2min 688ms[0m[0m
[2mUninstalled [1m3 packages[0m [2min 42ms[0m[0m
[2K[2mInstalled [1m16 packages[0m [2min 104ms[0m[0m
 [32m+[39m [1maiohttp-retry[0m[2m==2.9.1[0m
 [32m+[39m [1mdataclasses-json[0m[2m==0.6.7[0m
 [32m+[39m [1mlangchain-classic[0m[2m==1.0.0[0m
 [32m+[39m [1mlangchain-community[0m[2m==0.4.1[0m
 [32m+[39m [1mlangchain-openai[0m[2m==1.1.0[0m
 [32m+[39m [1mlangchain-pinecone[0m[2m==0.2.13[0m
 [32m+[39m [1mlangchain-text-splitters[0m[2m==1.0.0[0m
 [32m+[39m [1mmarshmallow[0m[2m==3.26.1[0m
 [32m+[39m [1mmypy-extensions[0m[2m==1.1.0[0m
 [31m-[39m [1mnumpy[0m[2m==2.0.2[0m
 [32m+[39m [1mnumpy[0m[2m==2.3.5[0m
 [31m-[39m [1mpackaging[0m[2m==25.0[0m
 [32m+[39m [1mpackaging[0m[2m==24.2[0m
 [32m+[39m [1mpinecone[0m[2m==7.3.0[0m
 [32m+[39m [1

# Where is the this import RetrievalQA located in various versions of langchain?

**Direct Answer:**  
The `RetrievalQA` class has changed location across LangChain versions. In **early releases (0.0.x ‚Äì 0.1.16)** it was imported directly from `langchain.chains`. Starting with **0.1.17**, it moved to `langchain.chains.retrieval_qa.base` and was marked **deprecated**. In **LangChain 1.0+**, it was **removed entirely**‚Äîyou must use `create_retrieval_chain` instead.

---

### üìú Version-by-Version Breakdown

| LangChain Version | Import Path | Status |
|-------------------|-------------|--------|
| **0.0.x ‚Äì 0.1.16** | `from langchain.chains import RetrievalQA` | ‚úÖ Available |
| **0.1.17 ‚Äì 0.2.x** | `from langchain.chains.retrieval_qa.base import RetrievalQA` | ‚ö†Ô∏è Deprecated (use `create_retrieval_chain`) |
| **‚â• 1.0.0** | ‚ùå Not available | ‚ú® Must use `create_retrieval_chain` |

---

### ‚ö†Ô∏è Why This Matters
- If you‚Äôre on **LangChain 1.0+**, trying to import `RetrievalQA` will always throw `ModuleNotFoundError`.
- If you‚Äôre on **LangChain 0.1.17‚Äì0.2.x**, the class exists but you‚Äôll see deprecation warnings.
- If you‚Äôre on **older versions (<0.1.17)**, the original import works fine.

---

### ‚úÖ Migration Example (Modern Code)
Instead of `RetrievalQA`, use:

```python
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

retriever = ...  # your retriever
llm = ChatOpenAI()

qa_chain = create_retrieval_chain(
    retriever,
    create_stuff_documents_chain(
        llm,
        ChatPromptTemplate.from_messages([
            ("system", "Use the given context to answer the question concisely."),
            ("human", "{input}")
        ])
    )
)
```


In [2]:
!pip show langchain

Name: langchain
Version: 1.1.0
Summary: Building applications with LLMs through composability
Home-page: https://docs.langchain.com/
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.12/dist-packages
Requires: langchain-core, langgraph, pydantic
Required-by: 


In [6]:
# Declare imports
import gradio as gr
import os
from pinecone import Pinecone, ServerlessSpec
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.chains import create_retrieval_chain # Correct import for LangChain 1.1.0
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from langchain_community.document_loaders import PyPDFLoader, TextLoader
import tempfile
from google.colab import userdata

ModuleNotFoundError: No module named 'langchain.chains'

In [None]:
# Set API keys
openai_key = userdata.get("OPENAI_API_KEY")
pinecone_key = userdata.get("PINECONE_API_KEY")

In [None]:
# Set Pinecone index name
PINECONE_INDEX_NAME = "rag-qa-index"

In [None]:
# Define RAG Pipeline class
class RAGPipeline:
    """Class to handle RAG pipeline operations"""

    def __init__(self):
        """Initialize RAG pipeline components"""
        self.embeddings = None
        self.vectorstore = None
        self.qa_chain = None
        self.pc = None
        self.index = None

    def initialize_pinecone(self, api_key):
        """Initialize Pinecone client and create/connect to index"""
        try:
            self.pc = Pinecone(api_key=api_key)

            # Check if index exists, if not create it
            existing_indexes = [index.name for index in self.pc.list_indexes()]

            if PINECONE_INDEX_NAME not in existing_indexes:
                self.pc.create_index(
                    name=PINECONE_INDEX_NAME,
                    dimension=1536,  # OpenAI embeddings dimension
                    metric='cosine',
                    spec=ServerlessSpec(
                        cloud='aws',
                        region='us-east-1'
                    )
                )

            self.index = self.pc.Index(PINECONE_INDEX_NAME)
            return "‚úì Pinecone initialized successfully"
        except Exception as e:
            return f"‚úó Pinecone initialization failed: {str(e)}"

    def load_existing_vectorstore(self):
        """Load existing vector store from Pinecone if data exists"""
        try:
            # Initialize Pinecone connection
            pinecone_status = self.initialize_pinecone(pinecone_key)
            if "failed" in pinecone_status:
                return False, "‚úó Could not connect to Pinecone"

            # Check if index has data
            if self.index.describe_index_stats().total_vector_count == 0:
                return False, "‚úó Vector database is empty. Please process a document first."

            # Initialize embeddings
            self.embeddings = OpenAIEmbeddings(openai_api_key=openai_key)

            # Load existing vector store
            self.vectorstore = PineconeVectorStore(
                index=self.index,
                embedding=self.embeddings
            )

            # Initialize QA chain
            llm = ChatOpenAI(
                model_name="gpt-4",
                temperature=0,
                openai_api_key=openai_key
            )

            self.qa_chain = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=self.vectorstore.as_retriever(
                    search_kwargs={"k": 3}
                ),
                return_source_documents=True
            )

            vector_count = self.index.describe_index_stats().total_vector_count
            return True, f"‚úì Loaded existing vector database!\n- Vectors in index: {vector_count}\n- Ready for questions!"

        except Exception as e:
            return False, f"‚úó Error loading vector database: {str(e)}"

    def process_document(self, file, chunk_size, chunk_overlap):
        """Process uploaded document and store in Pinecone"""
        try:
            # Initialize APIs
            os.environ["OPENAI_API_KEY"] = openai_key
            pinecone_status = self.initialize_pinecone(pinecone_key)

            if "failed" in pinecone_status:
                return pinecone_status

            # Save uploaded file temporarily
            with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.name)[1]) as tmp_file:
                tmp_file.write(file.read() if hasattr(
                    file, 'read') else open(file.name, 'rb').read())
                tmp_path = tmp_file.name

            # Load document based on file type
            if file.name.endswith('.pdf'):
                loader = PyPDFLoader(tmp_path)
            elif file.name.endswith('.txt'):
                loader = TextLoader(tmp_path)
            else:
                return "‚úó Unsupported file format. Please upload PDF or TXT file."

            documents = loader.load()

            # Split documents into chunks
            text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=int(chunk_size),
                chunk_overlap=int(chunk_overlap),
                length_function=len
            )
            chunks = text_splitter.split_documents(documents)

            # Initialize embeddings
            self.embeddings = OpenAIEmbeddings(openai_api_key=openai_key)

            # Create vector store
            self.vectorstore = PineconeVectorStore.from_documents(
                documents=chunks,
                embedding=self.embeddings,
                index_name=PINECONE_INDEX_NAME
            )

            # Initialize QA chain
            llm = ChatOpenAI(
                model_name="gpt-4",
                temperature=0,
                openai_api_key=openai_key
            )

            self.qa_chain = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=self.vectorstore.as_retriever(
                    search_kwargs={"k": 3}
                ),
                return_source_documents=True
            )

            # Clean up temporary file
            os.unlink(tmp_path)

            return f"‚úì Document processed successfully!\n- File: {file.name}\n- Chunks created: {len(chunks)}\n- Ready for questions!"

        except Exception as e:
            return f"‚úó Error processing document: {str(e)}"

    def answer_question(self, question):
        """Answer question using RAG pipeline"""
        if not self.qa_chain:
            return "‚ö† Please upload and process a document first!"

        if not question.strip():
            return "‚ö† Please enter a question!"

        try:
            result = self.qa_chain.invoke({"query": question})

            answer = result['result']
            sources = result.get('source_documents', [])

            # Format response with sources
            response = f"**Answer:**\n{answer}\n\n"

            if sources:
                response += "**Sources:**\n"
                for i, doc in enumerate(sources[:3], 1):
                    content_preview = doc.page_content[:200] + "..." if len(
                        doc.page_content) > 200 else doc.page_content
                    response += f"\n{i}. {content_preview}\n"

            return response

        except Exception as e:
            return f"‚úó Error answering question: {str(e)}"

In [None]:
# Initialize pipeline
pipeline = RAGPipeline()

In [None]:
# Create Gradio interface
with gr.Blocks(title="RAG Q&A Pipeline") as demo:
    gr.Markdown("# üìö Retrieval Augmented Generation (RAG)")
    gr.Markdown(
        "Upload a document (PDF or TXT) and ask questions about its content using AI-powered retrieval.")

    with gr.Row():
        with gr.Column(scale=1):

            gr.Markdown("### üìÑ Document Upload")
            file_input = gr.File(file_types=[".pdf", ".txt"])

            with gr.Accordion("‚öôÔ∏è Advanced Settings", open=False):
                chunk_size = gr.Slider(
                    minimum=100,
                    maximum=2000,
                    value=1000,
                    step=100,
                    label="Chunk Size"
                )
                chunk_overlap = gr.Slider(
                    minimum=0,
                    maximum=500,
                    value=200,
                    step=50,
                    label="Chunk Overlap"
                )

            with gr.Row():
                process_btn = gr.Button(
                    "üöÄ Process Document", variant="primary")
                load_existing_btn = gr.Button(
                    "üìö Load Existing Data", variant="secondary")

            status_output = gr.Textbox(
                label="Status",
                lines=5,
                interactive=False
            )

        with gr.Column(scale=1):
            gr.Markdown("### üí¨ Ask Questions")
            question_input = gr.Textbox(
                label="Your Question",
                placeholder="Ask anything about the uploaded document...",
                lines=3
            )
            ask_btn = gr.Button("üîç Get Answer", variant="primary")
            answer_output = gr.Markdown(label="Answer")

            gr.Markdown("### üìù Example Questions")
            gr.Examples(
                examples=[
                    ["What is the main topic of this document?"],
                    ["Can you summarize the key points?"],
                    ["What are the main conclusions?"],
                ],
                inputs=question_input
            )

    # Event handlers
    process_btn.click(
        fn=pipeline.process_document,
        inputs=[file_input, chunk_size, chunk_overlap],
        outputs=status_output
    )

    load_existing_btn.click(
        fn=lambda: pipeline.load_existing_vectorstore()[1],
        inputs=[],
        outputs=status_output
    )

    ask_btn.click(
        fn=pipeline.answer_question,
        inputs=question_input,
        outputs=answer_output
    )

    question_input.submit(
        fn=pipeline.answer_question,
        inputs=question_input,
        outputs=answer_output
    )

In [None]:
# Launch the app
if __name__ == "__main__":
    demo.launch(share=False, inbrowser=True)

#