<a href="https://colab.research.google.com/github/adsmundra/GenAI/blob/main/LLM_Hands_on_Workshop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🚀 AI Hubs by Kubesimplify X Welzin | Chandigarh Edition  
**Beginner's Guide to Building RAG Pipelines with LLMs**  
*(Hands-On Workshop | 2 Hours | Bring Your Laptop)*  

https://konfhub.com/ai-hubs-meetup-chandigarh-edition


The best time to start building with AI was yesterday. The next best is NOW!

*Let's turn your LLM ideas into reality!*  

## 🌟 **Workshop Highlights**
- ✅ **No prior AI experience needed** - Perfect for first-timers!  
- ✅ **Deploy a local LLM** with Ollama (no GPU/cloud required)  
- ✅ **Build a custom Q&A bot** that understands *your* documents  
- ✅ **Take home** fully functional code and Colab notebooks  

---

## 🛠️ **What You'll Build**  
| Project             | Tools Used          | Outcome                                  |
|---------------------|---------------------|------------------------------------------|
| Local Text Summarizer | Ollama, LangChain  | Summarize PDFs/websites offline          |
| Document Q&A Bot     | ChromaDB, Deepseek| Ask questions about custom datasets      |
| Hybrid RAG Pipeline  | SentenceTransformers| Combine keyword + vector search          |
|||

---

## 📝 **Agenda**  

### **Part 1: LLM Fundamentals (30 mins)**  
- Why traditional LLMs fail with custom data?  
- RAG architecture explained
- Live demo: ChatGPT vs. local Mistral-7B comparison  

### **Part 2: Hands-On Lab (90 mins)**  
1. **Setup**  
   - Install Ollama + load Deepseek model

2. **Document Processing**  
  - Ingest PDFs/websites → chunk text → create embeddings  

3. **RAG Implementation**  
  - Vector DB setup with Chroma  
  - LangChain orchestration  

4. **Deployment**  
  - Build Gradio UI for your Q&A system  
  - Share your local endpoint via ngrok  



## 🛠️ **Key Libraries Explained**

- **Ollama**  
  - *What it does:* Enables running open-source LLMs locally, such as DeepSeek, Mistral, or Llama, without cloud dependencies.
  - *Why use it:* Privacy, cost-free inference, and offline capabilities.

- **LangChain**  
  - *What it does:* Provides tools for chaining LLM workflows, including document loaders, text splitters, and retrieval-augmented generation logic.
  - *Why use it:* Simplifies building complex LLM pipelines and integrates with various models and data sources.

- **ChromaDB**  
  - *What it does:* Lightweight, in-memory vector database for storing and retrieving document embeddings.
  - *Why use it:* Fast, easy to use, and perfect for prototyping RAG systems.

- **PyMuPDF**  
  - *What it does:* Extracts text and metadata from PDF documents.
  - *Why use it:* Essential for processing PDF-based datasets.
  
- **HuggingFaceEmbeddings**  
  - *What it does:* Converts text into vector embeddings using pre-trained models (e.g., `all-MiniLM-L6-v2`).
  - *Why use it:* Enables semantic search and retrieval by representing text as vectors.

- **Gradio**  
  - *What it does:* Builds interactive web interfaces for machine learning models with minimal code.
  - *Why use it:* Quickly demo and share your LLM pipeline with others, even non-technical users.



### What Is Retrieval Augmented Generation, or RAG?

Retrieval augmented generation, or RAG, is an architectural approach that can improve the efficacy of large language model (LLM) applications by leveraging custom data. This is done by retrieving data/documents relevant to a question or task and providing them as context for the LLM.

RAG has shown success in support chatbots and Q&A systems that need to maintain up-to-date information or access domain-specific knowledge.



<img src="https://miro.medium.com/v2/resize:fit:1400/format:webp/1*YLrQl5CM7NjQPcfTCrf-sQ.png" alt="Alt text" width="800"/>


https://www.databricks.com/glossary/retrieval-augmented-generation-rag

In [1]:
# Install modules

!pip install -q -U  langchain \
                    langchain-community \
                    langchain-huggingface \
                    chromadb \
                    gradio \
                    pymupdf \
                    ollama

In [2]:
# Install Ollama and CUDA drivers

import os

!nvidia-smi
!curl https://ollama.ai/install.sh | sh
!echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
!sudo apt-get update && sudo apt-get install -y cuda-drivers

# Set LD_LIBRARY_PATH so the system NVIDIA library
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13281    0 13281    0     0  69518      0 --:--:-- --:--:-- --:--:-- 69900
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Hit:2 https://developer.downl

In [3]:
# Configure Ollama and pull models in your local

!nohup ollama serve &
!ollama ps
!ollama pull deepseek-r1:1.5b # [deepseek-r1:1.5b, deepseek-r1:14b]
!ollama list

nohup: appending output to 'nohup.out'
NAME    ID    SIZE    PROCESSOR    UNTIL 
[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l
NAME                ID              SIZE      MODIFIED               
deepseek-r1:1.5b    e0979632db5a    1.1 GB    Less than a second ago    


In [4]:
# Import required modules

from typing import List  # For type hinting

from langchain_community.document_loaders import PyMuPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline
from langchain_community.llms import Ollama

from transformers import pipeline
import ollama
import gradio as gr

import warnings
warnings.filterwarnings('ignore')

In [5]:
# Load a language model

EMBEDDING_MODEL = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")  # ['sentence-transformers/all-MiniLM-L6-v2', 'sentence-transformers/all-mpnet-base-v2']
LLM_MODEL = 'deepseek-r1:1.5b'  # ['deepseek-r1:1.5b', 'deepseek-r1:7b', 'deepseek-r1:14b']

# def get_llm():
#     pipe = pipeline("text-generation",
#                     model=LLM_MODEL,
#                     device=0)  # Use GPU if available

#     return HuggingFacePipeline(pipeline=pipe)

In [6]:
# Retrieve relevant documents and generate an answer

def rag_chain(question, retriever):
    llm = Ollama(model=LLM_MODEL)  # Initialize Ollama LLM with the specified model
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",  # Or other chain types like "map_reduce", "refine", "map_rerank"
        retriever=retriever,
        return_source_documents=True  # Optional: Return the source documents as well
    )
    result = qa_chain({"query": question})
    return result["result"] # Return the generated answer

In [7]:
def read_files(files: List[gr.File]) -> List[dict]:  # Type hint the input as a list of Files
    """Reads the content of multiple uploaded files and returns a list of Documents.
    Args:
        files (List[gr.File]): A list of uploaded files (PDF, TXT).
    Returns:
        list: A list of Langchain Document objects.
    Raises ValueError: If any file format is unsupported.
    """
    all_documents = []
    for file in files:
        if file.name.endswith(".pdf"):
            loader = PyMuPDFLoader(file.name)
            documents = loader.load()
            all_documents.extend(documents)  # Extend the list with the new documents
        elif file.name.endswith(".txt"):
            loader = TextLoader(file.name) # Use TextLoader for txt files
            documents = loader.load()
            all_documents.extend(documents)
        else:
            raise ValueError(f"Unsupported file format: {file.name}. Please upload PDF or TXT files.")
    return all_documents

In [8]:
# Function to process the uploaded files and generate a retriever

def process_files(files: List[gr.File]): # Type hint the input as a list of Files
    """Processes multiple uploaded files and returns a retriever.
    Args:
        files (List[gr.File]): A list of uploaded files.
    Returns:
        object: A retriever object. Returns None if no files are provided.
    """
    if not files:
        return None
    try:
        documents = read_files(files)  # Read content of all uploaded files
    except ValueError as e:
        return str(e)  # Return the error message to the Gradio interface
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    chunks = text_splitter.split_documents(documents)

    vectorstore = Chroma.from_documents(
        documents=chunks, embedding=EMBEDDING_MODEL, persist_directory="./chroma_db"
    )
    return vectorstore.as_retriever()

In [9]:
# Main function to handle Gradio interface logic

def ask_question(files: List[gr.File], question):
    retriever = process_files(files)
    if isinstance(retriever, str):
        return retriever, ""  # Return error message and empty string for answer
    if not retriever:
        return "Please upload at least one valid file (PDF or TXT).", ""
    if not question.strip():
        return "Please enter a question.", ""
    try:
        result = rag_chain(question, retriever)
        return "", result  # Return empty string for error and the result
    except Exception as e:  # Catch any other exceptions during RAG
        return f"An error occurred during processing: {e}", ""  # Return error message and empty string

In [None]:
# Gradio interface setup with improvements

with gr.Blocks() as demo:
    gr.Markdown("""
        # Document Question Answering with DeepSeek-R1 and Ollama
        Upload one or more PDF or text files and ask questions. The DeepSeek-R1 model, powered by Ollama, will extract relevant information to answer your query.
        **Important:** Ensure you have the `deepseek-r1:1.5b` model downloaded via `ollama pull deepseek-r1:1.5b` and Ollama is running.
    """)
    with gr.Row():
        file_upload = gr.Files(label="Upload one or more files (PDF or TXT)")
        question_input = gr.Textbox(label="Ask a question", placeholder="Type your question here...")
    with gr.Row():
        submit_btn = gr.Button("Submit")
    with gr.Row():
        error_output = gr.Textbox(label="Error Messages", visible=False, lines=3)  # Added lines for better visibility
        answer_output = gr.Textbox(label="Answer", lines=10)
    submit_btn.click(
        ask_question,
        inputs=[file_upload, question_input],
        outputs=[error_output, answer_output],
    )
    gr.Markdown("""
        **Tips:**
        * You can upload multiple files at once.
        * For large documents, the processing might take some time.
        * Check the "Error Messages" box if you encounter any issues.
    """)
    with gr.Row(): # Put clear buttons in their own row for better layout.
        clear_question_btn = gr.Button("Clear Question")
        clear_question_btn.click(lambda: "", inputs=None, outputs=question_input)
        clear_answer_btn = gr.Button("Clear Answer")
        clear_answer_btn.click(lambda: "", inputs=None, outputs=answer_output)

demo.launch(share=True,
            show_error=True,
            debug=True)


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://e1de7c7370380eba4f.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
