<a href="https://colab.research.google.com/github/amkayhani/DSML24/blob/main/warwick_ai_chatbot_colab2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🤖 Building a RAG-based Chatbot with LLMs
This Colab notebook demonstrates how to build a chatbot using a **Large Language Model (LLM)** and **Retrieval-Augmented Generation (RAG)**. The chatbot answers questions about the MSc Applied Artificial Intelligence programme at the University of Warwick.

## 🔍 What is an LLM?
**Large Language Models (LLMs)** are AI systems trained on massive text datasets to understand and generate human-like language. Examples include OpenAI's GPT models and Meta's **LLaMA** (Large Language Model Meta AI). In this notebook, we use we use a the **LLaMA3-8b** for demonstration purposes.

## 🧠 What is RAG (Retrieval-Augmented Generation)?
**Retrieval-Augmented Generation** enhances LLM responses by incorporating external knowledge. Instead of relying only on pre-trained information, RAG retrieves relevant documents from a database and feeds them into the LLM, making the chatbot more accurate and grounded in up-to-date information.

## 📦 Step 1: Install Required Libraries
We begin by installing the libraries needed for language processing, scraping, vector storage, and the chatbot UI.

In [1]:
# Install required libraries
!pip install -q langchain chromadb gradio bs4 requests
!pip install -U langchain-community

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.3/19.3 MB[0m [31m88.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.9/94.9 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m76.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.6/101.6 kB[0m [31m7.1 MB/s[0m eta [36m0:00:0

## 🌐 Step 2: Web Scraping the Programme Page
We use `requests` and `BeautifulSoup` to download and parse the MSc. Applied AI programme page at the University of Warwick website.

In [2]:
# Web Scraping the programme page
from bs4 import BeautifulSoup  # Parse HTML using BeautifulSoup
import requests

#You can substitute this URL with another webpage's URL
url = "https://warwick.ac.uk/fac/sci/wmg/study/masters-degrees/applied-artificial-intelligence"
response = requests.get(url)  # Fetch the HTML content of the target web page
soup = BeautifulSoup(response.text, 'html.parser')  # Parse HTML using BeautifulSoup

# Add tags like 'span', 'div', 'td' (table cells), 'th', 'a', 'blockquote', etc.
tags = ['p', 'h1', 'h2', 'h3', 'strong', 'li', 'span', 'div', 'td', 'th', 'blockquote']
all_text_nodes = soup.find_all(tags)

# Join their text, preserving a single space between elements
text = ' '.join([tag.get_text(separator=' ', strip=True) for tag in all_text_nodes])


## ✂️ Step 3: Split the Text into Chunks
To make the data manageable, we split the scraped text into smaller chunks using LangChain's `RecursiveCharacterTextSplitter`.

In [3]:
# Split text into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=300)  # Splits long text into overlapping chunks
texts = text_splitter.create_documents([text])

## 🧬 Step 4: Create Embeddings and Store in ChromaDB
We generate vector embeddings of each text chunk and store them in a vector database (Chroma) for fast retrieval.

In [4]:
# Create embeddings and store in ChromaDB
from langchain.embeddings import HuggingFaceEmbeddings  # Load a pre-trained sentence embedding model
from langchain.vectorstores import Chroma

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")  # Load a pre-trained sentence embedding model
vectorstore = Chroma.from_documents(texts, embedding_model)  # Create or connect to a local vector store

  embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")  # Load a pre-trained sentence embedding model


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## 🧠 Step 5: Load and Configure the LLM with RAG
In this step, we configure the Large Language Model (LLM) and integrate it with our retriever using LangChain’s RetrievalQA chain to enable RAG-based question answering.

In [5]:
# Import LangChain wrapper for Hugging Face pipelines
from langchain.llms import HuggingFacePipeline

# Import tokenizer and model from Hugging Face Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# Import the RetrievalQA chain from LangChain (used for RAG-based QA)
from langchain.chains import RetrievalQA

# Import PromptTemplate to define how the model should be instructed
from langchain.prompts import PromptTemplate

# Define the Hugging Face model to use (Meta's LLaMA 3, 8B parameter version)
# You could also try other models such as LLaMA 2 or Mistral
model_name = "meta-llama/Meta-Llama-3-8B-Instruct"

# Load the model's tokenizer — this is needed to convert text to tokens
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the model with automatic hardware and datatype configuration (e.g. GPU/CPU)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",         # Automatically uses available GPU if possible
    torch_dtype="auto"         # Automatically selects appropriate precision (e.g. float16)
)

# Create a text generation pipeline using the model and tokenizer
pipe = pipeline(
    "text-generation",         # Specifies the task: generate text based on input
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=256,        # Limits the length of generated responses
    temperature=0.3,           # Lower temperature = more deterministic output/Higher temperature more creative answers but more potential hallucinations
    repetition_penalty=1.1     # Discourages repeated phrases
)

# Wrap the pipeline using LangChain's HuggingFacePipeline interface
llm = HuggingFacePipeline(pipeline=pipe)

# Define the prompt template to instruct the LLM
# It includes a "context" (retrieved documents) and a "question" (from the user)
prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""
You are a helpful AI assistant. Use only the information in the context below to answer the question.
If you are not able to provide an accurate answer, say "Unfortunately, I can't answer this question. Please contact the course leader for further information.".

Context:
{context}

Question:
{question}

Answer:
"""
)

# Create the RetrievalQA chain — this combines:
# 1. The LLM (wrapped in HuggingFacePipeline)
# 2. The retriever from ChromaDB (to fetch relevant documents)
# 3. The prompt template (to guide the model's response)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 7}),  # Retrieve top 3 most relevant chunks
    chain_type_kwargs={"prompt": prompt_template},
    return_source_documents=False  # Do not return the original documents in the output
)


tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=pipe)


## 💬 Step 6: Build a Gradio Interface
Finally, we create a simple user interface using Gradio to interact with the chatbot.

In [6]:
# Create Gradio interface
import gradio as gr  # Import Gradio for building the web interface

synonym_map = {
    "course leader": ["course director", "programme leader", "programme director"],
    "modules": ["subjects", "courses", "units"],
    "lecturer": ["instructor", "tutor", "teaching staff"],
    "assessment": ["exams", "tests", "evaluation"],
    "dissertation": ["thesis", "final project", "research project"],
}

#To consider the synonym/alternative words based on the synonym map(above) to answer the questions
def expand_question_with_synonyms(question):
    expanded = [question]
    for key, synonyms in synonym_map.items():
        if key in question.lower():
            for alt in synonyms:
                alt_question = question.lower().replace(key, alt)
                expanded.append(alt_question)
    return " OR ".join(set(expanded))

def answer_question(user_input):
    expanded_query = expand_question_with_synonyms(user_input)
    raw_output = qa_chain.run(expanded_query)

    # Isolate the actual answer
    if "Answer:" in raw_output:
        answer = raw_output.split("Answer:")[-1].strip()
    else:
        answer = raw_output.strip()

    return answer




interface = gr.Interface(fn=answer_question,
                         inputs="text",
                         outputs="text",
                         title="Warwick MSc Applied AI Chatbot",
                         description="Ask questions about the MSc Applied Artificial Intelligence programme at the University of Warwick.")
interface.launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f38de1f2a98ef6f1cb.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




##🧠 Exercise Activities
**✍️ Exercise 1:** Customise the Prompt Template
Modify the PromptTemplate used in the chatbot to change the assistant's tone. For example:

Make it more conversational, friendly or formal.


**✍️ Exercise 2:** Extend the Knowledge Base
Replace or extend the source content (currently scraped from one Warwick MSc page) by:

* Scraping an additional page (e.g. another MSc or the student handbook)

* Merging its content into the same vectorstore



**✍️ Exercise 3:** Evaluate Retrieval Quality
The retriever currently uses k=7: retriever=vectorstore.as_retriever(search_kwargs={"k": 7})

Try different values for k (e.g. 3, 10, 15).
How does changing k affect:

* Response accuracy?

* Irrelevant information?

Record a few queries and compare outputs.

##✍️ Exercise 4: Swap the Language and Embedding Models
* Change the Embeddint model and Language model to antoher model(e.g. Mistral, etc.)
* Compare the answers

##✍️ Exercise 5: Enable Session Context
Right now, the chatbot treats each question independently.

* Implement a basic memory buffer using LangChain’s ConversationBufferMemory and integrate it into the chain.


