<a href="https://colab.research.google.com/github/connectchayan/ViBe/blob/main/MICRO_LLM_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Build a RAG system using a micro LLM.

## Install necessary libraries

### Subtask:
Install libraries such as `transformers`, `datasets`, `torch`, and `langchain` that will be used to build the RAG system.


**Reasoning**:
Install the necessary libraries using pip.



In [None]:
%pip install transformers datasets torch langchain

## Load a micro llm

### Subtask:
Load a pre-trained micro LLM model from the `transformers` library.


**Reasoning**:
Import the necessary classes from the `transformers` library and load a pre-trained micro LLM model and its tokenizer.



In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Specify the name of a pre-trained micro LLM model
model_name = "distilgpt2"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the model
model = AutoModelForCausalLM.from_pretrained(model_name)

print(f"Tokenizer for {model_name} loaded successfully.")
print(f"Model {model_name} loaded successfully.")

## Load and process data

### Subtask:
Load the data that will be used as the knowledge base for the RAG system. Process the data into a suitable format for the LLM.


**Reasoning**:
Load a dataset from the `datasets` library, inspect its structure, select relevant columns, and process it into a suitable format for a RAG system.



In [None]:
from datasets import load_dataset

# Load a suitable dataset for RAG
# Using 'squad' as it's a question-answering dataset
dataset = load_dataset("squad")

# Inspect the dataset structure
print(dataset)

# Select relevant columns and process data
# For SQuAD, we can use 'context', 'question', and 'answers'
# We'll focus on 'context' as the knowledge base text
def process_squad_dataset(examples):
    # Concatenate context and question for potential use, or just keep context
    # For a simple RAG knowledge base, context is sufficient
    return {"text": examples["context"]}

# Apply the processing function to the dataset
processed_dataset = dataset.map(process_squad_dataset, remove_columns=['id', 'title', 'question', 'answers'])

# Inspect the processed dataset
print(processed_dataset)
print(processed_dataset['train'][0])

## Build the rag system

### Subtask:
Build the RAG system using the loaded LLM and processed data.


**Reasoning**:
Import the necessary classes from langchain, create embeddings for the processed data, build a vector store from the embeddings, and set up the RetrievalQA system.



In [None]:
%pip install langchain-community

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline
from transformers import pipeline

# 2. Create embeddings for the text data
# Using a smaller, faster sentence transformer model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Extract text from the processed dataset
texts = [example['text'] for example in processed_dataset['train']]

# 3. Create a vector store from the embeddings and text data
# Using FAISS for efficient similarity search
vectorstore = FAISS.from_texts(texts, embeddings)

# 4. Set up a retrieval mechanism
retriever = vectorstore.as_retriever()

# 5. Combine the loaded micro LLM, the retrieval mechanism
# Create a HuggingFace pipeline for the loaded model
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=100
)
hf_llm = HuggingFacePipeline(pipeline=pipe)

# Create the RetrievalQA system
qa_chain = RetrievalQA.from_chain_type(
    llm=hf_llm,
    chain_type="stuff",
    retriever=retriever
)

print("RAG system (RetrievalQA chain) created successfully.")

**Reasoning**:
The previous command failed because the `langchain_community` module was not found. This is likely due to a version mismatch or the module being in a different location in the installed langchain package. The fix is to install `langchain-community`.



**Reasoning**:
Now that `langchain-community` is installed, retry the previous command to import necessary classes, create embeddings, build the vector store, and set up the RetrievalQA system.



In [None]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline
from transformers import pipeline

# 2. Create embeddings for the text data
# Using a smaller, faster sentence transformer model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Extract text from the processed dataset
# Taking a smaller subset for demonstration to avoid excessive memory usage
sample_size = 1000
texts = [example['text'] for example in processed_dataset['train'].select(range(sample_size))]

# 3. Create a vector store from the embeddings and text data
# Using FAISS for efficient similarity search
vectorstore = FAISS.from_texts(texts, embeddings)

# 4. Set up a retrieval mechanism
retriever = vectorstore.as_retriever()

# 5. Combine the loaded micro LLM, the retrieval mechanism
# Create a HuggingFace pipeline for the loaded model
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=100
)
hf_llm = HuggingFacePipeline(pipeline=pipe)

# Create the RetrievalQA system
qa_chain = RetrievalQA.from_chain_type(
    llm=hf_llm,
    chain_type="stuff",
    retriever=retriever
)

print("RAG system (RetrievalQA chain) created successfully.")

**Reasoning**:
The previous command failed because the `faiss` package was not found. The traceback suggests installing either `faiss-gpu` or `faiss-cpu`. Since we are not guaranteed to have a GPU environment, `faiss-cpu` is the safer choice.



In [None]:
%pip install faiss-cpu

**Reasoning**:
Now that `faiss-cpu` is installed, retry the command to import necessary classes, create embeddings, build the vector store, and set up the RetrievalQA system.



In [None]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline
from transformers import pipeline

# 2. Create embeddings for the text data
# Using a smaller, faster sentence transformer model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Extract text from the processed dataset
# Taking a smaller subset for demonstration to avoid excessive memory usage
sample_size = 1000
texts = [example['text'] for example in processed_dataset['train'].select(range(sample_size))]

# 3. Create a vector store from the embeddings and text data
# Using FAISS for efficient similarity search
vectorstore = FAISS.from_texts(texts, embeddings)

# 4. Set up a retrieval mechanism
retriever = vectorstore.as_retriever()

# 5. Combine the loaded micro LLM, the retrieval mechanism
# Create a HuggingFace pipeline for the loaded model
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=100
)
hf_llm = HuggingFacePipeline(pipeline=pipe)

# Create the RetrievalQA system
qa_chain = RetrievalQA.from_chain_type(
    llm=hf_llm,
    chain_type="stuff",
    retriever=retriever
)

print("RAG system (RetrievalQA chain) created successfully.")

## Test the rag system

### Subtask:
Test the RAG system with some example queries to ensure it is working correctly.


**Reasoning**:
Define example queries and use the qa_chain to get responses for each query.



In [None]:
# Define example queries
example_queries = [
    "What is the capital of France?",
    "What is the purpose of the lymphatic system?",
    "Who was the first man on the moon?",
    "What is the function of photosynthesis?",
    "What is machine learning?"
]

# Get responses for each query using the qa_chain
for query in example_queries:
    response = qa_chain.invoke({"query": query})
    print(f"Query: {query}")
    print(f"Response: {response['result']}")
    print("-" * 20)

**Reasoning**:
The previous attempt resulted in an IndexError because the input sequence length exceeded the model's maximum length. To fix this, I need to adjust the `max_new_tokens` parameter in the HuggingFace pipeline to a smaller value to prevent generating sequences that are too long. I will also re-run the testing with the example queries.



In [None]:
from transformers import pipeline
from langchain.llms import HuggingFacePipeline

# Re-create the HuggingFace pipeline with a smaller max_new_tokens
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=50 # Reduced from 100
)
hf_llm = HuggingFacePipeline(pipeline=pipe)

# Re-create the RetrievalQA system with the updated LLM
# The retriever remains the same
qa_chain = RetrievalQA.from_chain_type(
    llm=hf_llm,
    chain_type="stuff",
    retriever=retriever
)

print("RAG system (RetrievalQA chain) updated with reduced max_new_tokens.")

# Re-run the testing with example queries
example_queries = [
    "What is the capital of France?",
    "What is the purpose of the lymphatic system?",
    "Who was the first man on the moon?",
    "What is the function of photosynthesis?",
    "What is machine learning?"
]

# Get responses for each query using the qa_chain
for query in example_queries:
    response = qa_chain.invoke({"query": query})
    print(f"Query: {query}")
    print(f"Response: {response['result']}")
    print("-" * 20)

**Reasoning**:
The IndexError persists even after reducing `max_new_tokens`. This indicates the issue might be with the combined length of the prompt (context + query) exceeding the model's limit. To address this, I will limit the number of documents retrieved by the retriever to reduce the context size provided to the LLM. I will set `search_kwargs` to limit the retrieval to a smaller number of documents and then re-run the tests.



In [None]:
# Set up a retrieval mechanism with limited search results
retriever = vectorstore.as_retriever(search_kwargs={"k": 1}) # Limit to retrieving only 1 document

# Re-create the RetrievalQA system with the updated retriever
# The LLM pipeline remains the same as updated in the previous step
qa_chain = RetrievalQA.from_chain_type(
    llm=hf_llm,
    chain_type="stuff",
    retriever=retriever
)

print("RAG system (RetrievalQA chain) updated with limited document retrieval.")

# Re-run the testing with example queries
example_queries = [
    "What is the capital of France?",
    "What is the purpose of the lymphatic system?",
    "Who was the first man on the moon?",
    "What is the function of photosynthesis?",
    "What is machine learning?"
]

# Get responses for each query using the qa_chain
for query in example_queries:
    response = qa_chain.invoke({"query": query})
    print(f"Query: {query}")
    print(f"Response: {response['result']}")
    print("-" * 20)

## Summary:

### Data Analysis Key Findings

*   The necessary libraries (`transformers`, `datasets`, `torch`, `langchain`, `langchain-community`, `faiss-cpu`) were successfully installed.
*   A pre-trained micro LLM model, "distilgpt2", and its tokenizer were successfully loaded from the `transformers` library.
*   The SQuAD dataset was successfully loaded using the `datasets` library and processed to extract the 'context' information into a 'text' column.
*   A RAG system was successfully built using `RetrievalQA` from `langchain`, incorporating a `HuggingFaceEmbeddings` model, a `FAISS` vector store created from a subset of the processed data, and a `HuggingFacePipeline` for the loaded micro LLM.
*   Initial testing of the RAG system resulted in an `IndexError`, which was resolved by limiting the number of retrieved documents to one (`search_kwargs={"k": 1}`) to manage the input length for the micro LLM.
*   The RAG system was able to process queries after the input length issue was addressed, but the quality of the generated answers was poor, highlighting the limitations of using a micro LLM and a dataset like SQuAD for generative QA.

### Insights or Next Steps

*   Consider using a larger, more capable language model if improved answer quality is required, or fine-tune the current micro LLM on a task-specific dataset.
*   Explore alternative datasets or pre-processing techniques that are better suited for generative question answering with a RAG system.
