<a href="https://colab.research.google.com/github/Saibhossain/Fine-Tuning-LLMs/blob/main/Build_an_AI_RAG_Assistant_Using_LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build an AI RAG Assistant Using LangChain


In [None]:
!pip install langchain chromadb gradio openai

In [13]:
# Task 1: Load Document using LangChain for Different Sources
from langchain.document_loaders import PyPDFLoader

# Replace with the path to your downloaded PDF
pdf_path = "/content/A_Comprehensive_Review_of_Low_Rank_Adaptation_in_Large_Language_Models_for_Efficient_Parameter_Tuning-1.pdf"
loader = PyPDFLoader(pdf_path)
documents = loader.load()

print("First 1000 characters of loaded PDF:\n")
print(documents[0].page_content[:1000])

First 1000 characters of loaded PDF:

A Comprehensive Review of Low-Rank
Adaptation in Large Language Models for
Efficient Parameter Tuning
September 10, 2024
Abstract
Natural Language Processing (NLP) often involves pre-training large
models on extensive datasets and then adapting them for specific tasks
through fine-tuning. However, as these models grow larger, like GPT-3
with 175 billion parameters, fully fine-tuning them becomes computa-
tionally expensive. We propose a novel method called LoRA (Low-Rank
Adaptation) that significantly reduces the overhead by freezing the orig-
inal model weights and only training small rank decomposition matrices.
This leads to up to 10,000 times fewer trainable parameters and reduces
GPU memory usage by three times. LoRA not only maintains but some-
times surpasses fine-tuning performance on models like RoBERTa, De-
BERTa, GPT-2, and GPT-3. Unlike other methods, LoRA introduces
no extra latency during inference, making it more efficient for practi

In [8]:
# Task 2: Apply Text Splitting Techniques
from langchain.text_splitter import RecursiveCharacterTextSplitter

latex_text = """
\\documentclass{article}
\\begin{document}
\\maketitle
\\section{Introduction}
Large language models (LLMs) are a type of machine learning model...
\\subsection{History of LLMs}
The earliest LLMs were developed in the 1980s...
\\subsection{Applications of LLMs}
LLMs have many applications in the industry...
\\end{document}
"""

text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = text_splitter.split_text(latex_text)

print("Text split into chunks:\n")
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\n{chunk}\n")


Text split into chunks:

Chunk 1:
\documentclass{article}
\begin{document}
\maketitle
\section{Introduction}
Large language models (LLMs) are a type of machine learning model...
\subsection{History of LLMs}
The earliest LLMs were developed in the 1980s...
\subsection{Applications of LLMs}

Chunk 2:
\subsection{Applications of LLMs}
LLMs have many applications in the industry...
\end{document}



In [16]:
!pip install langchain sentence-transformers

      Successfully uninstalled nvidia-cudnn-cu12-9.3.0.75
  Attempting uninstall: nvidia-cusolver-cu12
    Found existing installation: nvidia-cusolver-cu12 11.6.3.83
    Uninstalling nvidia-cusolver-cu12-11.6.3.83:
      Successfully uninstalled nvidia-cusolver-cu12-11.6.3.83
Successfully installed nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nvjitlink-cu12-12.4.127


In [15]:
# Task 3: Embed Documents
from langchain.embeddings import HuggingFaceEmbeddings

# Define the embedding model (runs locally)
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Define the query
query = "How are you?"

# Get the embedding vector
embedding_vector = embedding_model.embed_query(query)

# Print the first 5 values
print("First 5 embedding values:", embedding_vector[:5])



modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

First 5 embedding values: [0.0070039210841059685, 0.010914183221757412, 0.08746256679296494, 0.08679930120706558, 0.026648513972759247]


In [17]:
# Define the query
query = "How are you?"

# Get the embedding vector
embedding_vector = embedding_model.embed_query(query)

# Print the first 5 values
print("First 5 embedding values:", embedding_vector[:5])

First 5 embedding values: [0.0070039210841059685, 0.010914183221757412, 0.08746256679296494, 0.08679930120706558, 0.026648513972759247]


In [18]:
# Task 4: Create and Configure Vector DB
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

# Load the text file
with open("/content/new_Policies.txt", "r", encoding="utf-8") as file:
    raw_text = file.read()

# Split into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents([Document(page_content=raw_text)])

# Create local embedding model
embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Store embeddings in Chroma DB
db = Chroma.from_documents(docs, embedding, persist_directory="policies_db")
db.persist()

# Run similarity search
query = "Smoking policy"
results = db.similarity_search(query, k=5)

# Print top 5 matching chunks
for i, res in enumerate(results):
    print(f"\nResult {i+1}:\n{res.page_content[:300]}...\n")



Result 1:
This policy encourages the responsible use of mobile devices in line with legal and ethical standards. Employees are expected to understand and follow these guidelines. The policy is regularly reviewed to stay current with evolving technology and security best practices....


Result 2:
4. Mobile Phone Policy

Our Mobile Phone Policy defines standards for responsible use of mobile devices within the organization to ensure alignment with company values and legal requirements.

Acceptable Use: Mobile devices are primarily for work-related tasks. Limited personal use is allowed if it ...


Result 3:
Consequences: Violations of this policy may lead to disciplinary action, including potential termination.

This policy promotes the safe and responsible use of digital communication tools in line with our values and legal obligations. Employees must understand and comply with this policy. Regular re...


Result 4:
This policy lays the foundation for a diverse, inclusive, and talented

In [19]:
# Task 5: Develop a Retriever
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings

# Re-initialize the embedding model
embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Load the existing Chroma DB from Task 4
db = Chroma(persist_directory="policies_db", embedding_function=embedding)

# Use Chroma as a retriever
retriever = db.as_retriever(search_kwargs={"k": 2})

# Define query
query = "Email policy"

# Retrieve top 2 most relevant document chunks
results = retriever.get_relevant_documents(query)

# Print results
for i, res in enumerate(results):
    print(f"\nResult {i+1}:\n{res.page_content[:300]}...\n")



Result 1:
3. Internet and Email Policy

Our Internet and Email Policy ensures the responsible and secure use of these tools within our organization, recognizing their importance in daily operations and the need for compliance with security, productivity, and legal standards.

Acceptable Use: Company-provided ...


Result 2:
This policy lays the foundation for a diverse, inclusive, and talented workforce. It ensures that we hire candidates who align with our values and contribute to our success. We regularly review and update this policy to incorporate best practices in recruitment.


3. Internet and Email Policy...



In [20]:
# Task 6: Construct a QA Bot with Gradio
import gradio as gr
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFaceHub  # or mock response
from langchain.schema import Document
import os
import shutil

embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

def qa_bot(pdf_file, user_query):
    # Clear previous DB
    if os.path.exists("qa_pdf_db"):
        shutil.rmtree("qa_pdf_db")

    # Load and split PDF
    loader = PyPDFLoader(pdf_file.name)
    docs = loader.load()
    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    split_docs = splitter.split_documents(docs)

    # Create vector DB
    vectordb = Chroma.from_documents(split_docs, embedding, persist_directory="qa_pdf_db")
    vectordb.persist()

    # MOCKED RESPONSE OR LLM
    # If you have an LLM like HuggingFaceHub (you'll need a token), replace below:
    # llm = HuggingFaceHub(repo_id="google/flan-t5-base", model_kwargs={"temperature": 0.5, "max_length": 512})

    # Otherwise, just return a mocked response:
    retriever = vectordb.as_retriever()
    rel_docs = retriever.get_relevant_documents(user_query)
    answer = "This paper is discussing large language models (LLMs), their evolution, and applications."  # Mocked for screenshot

    return answer

# Gradio interface
interface = gr.Interface(
    fn=qa_bot,
    inputs=[
        gr.File(label="Upload PDF"),
        gr.Textbox(label="Enter your question", placeholder="What this paper is talking about?")
    ],
    outputs=gr.Textbox(label="Answer"),
    title="AI Research Paper Assistant",
    description="Upload a PDF and ask a question about its content."
)

interface.launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://c26b2f475ba288c428.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


