# Comparing between RAG-chatbot and non RAG Chatbot

- Come up with a story about the 4 of us in SDS.
- Question the chatbot about the story about SDS (Tell it not to make up information)
- Show how the RAG pipeline provides contextual background information to the LLM.

Start of Naive RAG process:

- Chunking
- Dense Embedding of text.
- Makeshift storing of embeddings in a dictionary (use this as a vector store python class)
- With same query, embed it.
- Show retrieval from makeshift vector store using cosine similarity between query vector and vectors in vector store
- Add context to the chatbot (tell it to refer to the context provided)
- Show the comparison between raw LLM response and RAG pipeline.

Additional/ Advanced steps to enhance your RAG pipeline:

- Query re-writing
- Re ranking using cross encoder
- Dynamic Embedding model fine-tuning
- Hybrid Search using BM25/ TF-IDF
- LLM Guardrails/ Query intention

Some practical applications:

- RAG for your school notes (eg modules like HSI, where control F (in this case vector search) could help greatly)


In [1]:
!pip install sentence-transformers
!pip install groq

import numpy as np
import re
from sentence_transformers import SentenceTransformer
from groq import Groq
import time



In [2]:
from google.colab import userdata

GROQ_API_KEY= userdata.get('GROQ_API_KEY')
CHAT_MODEL = "llama3-70b-8192"
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
RAG_SYSTEM_PROPMT = "You are a helpful, very cheerful and very bubbbly assistant who only answers based on the contextual information provided, and nothing else. If you are unsure, say that you are unsure due to a lack of information."
CHAT_SYSTEM_PROMPT = "You are a helpful assistant who answers question factually. If you are unsure, say that you are unsure due to a lack of information."


In [99]:
story = '''
In a tucked-away corner of the university, beyond the buzzing of classrooms and the hum of campus life, stood the headquarters of the **Statistics and Data Science Society (SDSS)**. It wasn’t much of a headquarters, really—just a dimly lit room cluttered with whiteboards, old textbooks, tangled wires, and laptops that emitted a soft glow day and night. Yet, to its members, it was home. A sanctuary for thinkers, dreamers, and problem-solvers alike.

Inside this hallowed space was a group of friends who had grown inseparable through their shared love of mathematics and coding: **Jerry**, **Eugene**, **Gangjoon**, and **Kaiwen**. Each brought their unique flavor to the society, contributing to a beautiful symphony of algorithms, puzzles, and data that made their bond unbreakable.

Jerry was the heart of the group. Tall and lanky, with wild hair that seemed to mirror the whirlwind of thoughts constantly running through his mind, he was always lost in the world of numbers. But Jerry wasn’t just about solving problems—he was about finding meaning behind the data, seeing patterns where others saw randomness, and weaving stories from statistics.

He often spoke in metaphors, much to the amusement of his friends. He'd glance at a dataset and say, "This is like a river, flowing unpredictably but with purpose." His mind was a curious blend of abstract philosophy and cold, hard logic, and the two coexisted in perfect harmony. Jerry’s dream was to use statistics and machine learning to answer life's big questions—Why are we here? What drives human behavior? Can we predict the future?

One day, as the SDSS members gathered for their usual post-lecture meeting, Jerry arrived with a wild glint in his eye. He had an idea—a radical one. "What if we could create an algorithm that predicts human emotions based on seemingly unrelated data?" he proposed, scribbling furiously on the whiteboard. "Like, imagine a system that reads social media posts, stock market fluctuations, weather patterns, and even the phases of the moon, and then predicts how people will feel tomorrow."

The room fell silent as the others tried to wrap their heads around the concept. But that was Jerry—always reaching for the stars.

Where Jerry was a dreamer, Eugene was the anchor. He kept the group grounded with his practicality and level-headedness. Medium height, sharp-dressed, and always with a cup of black coffee in hand, Eugene was the one who looked at Jerry’s outlandish ideas and said, "Okay, but how would that work in the real world?"

Eugene had a deep understanding of both mathematics and programming. He could break down even the most complex theories into manageable, bite-sized chunks, often simplifying them in ways that made the rest of the team wonder why they hadn’t thought of it first. But despite his rational demeanor, Eugene had his own flair—he was obsessed with optimization. Whether it was reducing the time complexity of an algorithm or finding the most efficient way to organize his desk, Eugene always sought perfection.

His love for optimization reached a new level when the group took on their first major project: a data-driven urban planning model for the city’s transport system. The goal was to reduce traffic congestion using statistical models and simulations. While Jerry and the others brainstormed grand, creative ideas, Eugene quietly worked in the background, refining their code, improving efficiency, and making sure everything ran smoothly. When the project finally came together, it was Eugene’s touch that made it a success, and the team celebrated with a dinner that he, of course, insisted on planning down to the minute.

“Data is beautiful,” he would say, “but it’s also messy. My job is to make it clean and efficient.”

Gangjoon was the wild card of the group. With his unkempt hair, black hoodie, and rebellious attitude, he lived for the thrill of breaking boundaries. If there was a rule in mathematics or coding, Gangjoon’s first instinct was to challenge it. He wasn’t satisfied with conventional methods—he always wanted to push the envelope, finding loopholes in algorithms and writing code that others deemed impossible.

A brilliant hacker, Gangjoon often participated in coding competitions, not for the recognition, but for the rush of outsmarting the system. He once built an encryption algorithm so complex that even the university’s computer science professor struggled to understand it. "I don’t play by the rules," Gangjoon would smirk, his fingers flying across the keyboard as he conjured up yet another solution that defied logic.

Despite his edgy persona, Gangjoon had a deep respect for his friends and their shared love of learning. When Jerry proposed his emotion-predicting algorithm, Gangjoon was the first to jump on board. "Let’s hack the universe," he said with a grin, already thinking about ways to pull data from obscure sources and connect it in unexpected ways.

But his most daring feat came when the group tackled a university-wide competition on predictive analytics. The task was to create a model that could accurately forecast energy consumption based on various factors. While others stuck to traditional datasets, Gangjoon had a radical idea: "Why not pull data from social media activity? People’s behavior online might correlate with how much energy they use."

It was unconventional, risky, and against the rules. But it worked. The team won the competition, thanks to Gangjoon’s audacious approach.

In contrast to Gangjoon’s rebelliousness, Kaiwen was the quiet, steady presence in the group. Petite and soft-spoken, with an aura of calm about her, she often went unnoticed in large crowds. But those who knew her understood that behind her serene exterior was a mind that worked faster than any computer.

Kaiwen had a talent for seeing the connections that others missed. She was the one who could take Jerry’s wild ideas, Eugene’s structured approach, and Gangjoon’s chaotic energy and blend them into something cohesive. Her specialty was in data visualization—turning raw numbers into something beautiful and understandable. Her graphs were more than just charts—they were art.

When the group was working on a project to analyze climate change data, it was Kaiwen who found a way to present the information in a way that spoke to people on an emotional level. "Data doesn’t have to be cold and impersonal," she said softly, sketching out a concept for an interactive map that showed the real-time impact of climate change on different regions. "We can make it relatable, make people feel the urgency."

Her map, when completed, became a sensation, attracting attention from professors and even local environmental organizations. Kaiwen’s ability to translate the complex into the simple, and the simple into the profound, was her superpower.

As the final year of university approached, the SDSS decided to embark on their most ambitious project yet—a **Predictive Human Index (PHI)**. Jerry’s grand vision of predicting emotions had evolved over time, and now the group had a tangible goal: to create a model that could analyze a person’s data footprint and predict not only their future actions but also their emotional state and overall well-being.

It was a project that required all of their strengths. Jerry provided the visionary framework, outlining the philosophical implications of predictive models on human behavior. Eugene focused on making the model scalable and functional in the real world, refining every part of the algorithm until it was a masterpiece of efficiency. Gangjoon, ever the hacker, found creative ways to pull in data from unconventional sources—everything from public forums to obscure government records. And Kaiwen, as always, turned the raw data into something visually compelling, making sure that the predictions were presented in a way that even non-experts could understand.

As they worked late into the night, fueled by endless cups of coffee and the occasional slice of pizza, they felt a sense of purpose like never before. This project was more than just an academic exercise—it was their legacy.

When they finally unveiled the PHI at the university’s data science symposium, the room fell silent. The model worked. It wasn’t perfect, of course, but it was a glimpse into the future of data science and human prediction. The audience, a mix of students, professors, and tech industry leaders, erupted into applause.

For the SDSS, this wasn’t the end—it was just the beginning. They had proven that with mathematics, coding, and a bit of imagination, the possibilities were limitless. They had learned from one another, grown together, and, most importantly, found joy in the world of numbers.

And so, as the university chapters of their lives came to a close, Jerry, Eugene, Gangjoon, and Kaiwen knew one thing for certain: no matter where the future took them, they would always be a team. Together, they had created something remarkable, and in the process, they had discovered not just the power of data, but the power of friendship.
'''




In [100]:
class VectorStore:
    def __init__(self):
        self.vectors = []
        self.texts = []

    def upsert(self, embedding, text):
        self.vectors.extend(embedding)
        self.texts.extend(text)

    def pretty_print_contexts(self, top_k_results):
        # Print the header
        print(f"{'Text':<50} {'Cosine Similarity Score':<25}")
        print("="*95)

        # Iterate over each result and print it formatted
        for result in top_k_results:
            text = result["Text"]
            similarity_score = f"{result['Cosine Similarity Score']:.4f}"  # Format score to 4 decimal places

            # Print each result with proper formatting
            print(f"{text:<50} {similarity_score:<25}")

    def retrieve(self, query_embedding, print_contexts, top_k=3):
        '''This method is responsible for retrieving the top k vectors from the vector store'''
        # np.dot() handles the matrix multiplication and computes the dot product between each vector in self.vectors and query_embedding
        dot_products = np.dot(vector_store.vectors, query_embedding)

        # Calculating the normalised vector of the query embedding
        normalised_query_embedding = np.linalg.norm(query_embedding)

        # Calculating the normalised vectors of each vector in vector store, is a 2D array
        normalised_vector_embeddings = np.linalg.norm(self.vectors, axis=1)

        # Calculate cosine similarity for each vector
        cosine_similarities = dot_products / (normalised_query_embedding * normalised_vector_embeddings)

        # using np.argsort to sort the similar vectors by their index
        sorted_indices = np.argsort(cosine_similarities)

        # Get indices of the top 3 most similar vectors
        top_k_similar_indices = sorted_indices[-top_k:]

        top_k_results = []
        for i in range(len(top_k_similar_indices)):
            similar_vector   = self.vectors[top_k_similar_indices[i]]
            similar_text     = self.texts[i]
            similarity_score = cosine_similarities[i]

            top_k_results.append({"Vector": similar_vector,
                                  "Text": similar_text,
                                  "Cosine Similarity Score": similarity_score}
                        )
        if print_contexts:
            self.pretty_print_contexts(top_k_results)

        return sorted(top_k_results, reverse=True, key=lambda x: x['Cosine Similarity Score'])


    def filter_similar_texts(self, retrieval_results):
        similar_contexts = ''''''

        for result in retrieval_results:
            similar_contexts += result["Text"]+"\n\n"

        return similar_contexts


class TextSplitter():

    def __init__(self, delimiters=None):
        if delimiters is None:
            # Use \n\n as delimiters for splitting if a delimiter is not specified by the user
            self.delimiters = "\n\n"
        else:
            self.delimiters = delimiters

    def split_text(self, text):
        '''This method takes in text and splits the text according to the delimiters specified'''
        # Here we are splitting by paragraphs
        chunks = text.split(self.delimiters)

        stripped_chunks =  []
        for chunk in chunks:
            cleaned_chunk = chunk.strip()
            # If this chunk is not simply an empty string
            if cleaned_chunk:
                stripped_chunks.append(chunk)

        return stripped_chunks


class EmbeddingModel():

    def __init__(self, model_name):
        self.embeddings = []
        self.model = SentenceTransformer(model_name)

    def generate_embeddings(self, texts: list):
        # generate embeddings of all the texts, no for loop is necessary
        embeddings = self.model.encode(texts)
        return embeddings



class chatbot():

    def __init__(self):
        self.client = Groq(api_key = GROQ_API_KEY)

    def build_prompt(self, user_query, context):
        prompt = f"Using the context provided, answer the question.\n\nContext:{context}\n\nQuestion:{user_query}"
        return prompt

    def fetch_response(self, prompt, system_prompt):
        # use chat completion function and insert this here
        stream = self.client.chat.completions.create(
            model=CHAT_MODEL,
            messages=[
                {
                    "role": "system",
                    "content": system_prompt
                },
                {
                    "role": "user",
                    "content": prompt
                }
            ],
            temperature=0,  # Control the randomness of the output (lower means less random)
            max_tokens=1024,  # Limit the response length
            top_p=1,  # Nucleus sampling parameter (1 means only the most likely tokens are considered)
            stream=True,  # Enable streaming of the response chunks
            stop=None
        )

        # initialize an empty string to accumulate the response content
        # answer = stream.choices[0].message.content
        print("ChatGPT: ", end="")
        for chunk in stream:
            print(chunk.choices[0].delta.content, end="")


    def chat(self):
        print("Welcome to the SDS Story chatbot! Feel free to ask me anything about the SDS workshop members!")

        while True:
            query = input('User: ')

            if query.lower() in ['exit', 'quit']:
                print("Thank you for chatting!\n")
                break

            query_embedding = embedding_model.generate_embeddings([query])[0]
            chatgpt.fetch_response(query, CHAT_SYSTEM_PROMPT)


    def rag_pipeline(self, contextual_knowledge):
        text_chunks = text_splitter.split_text(contextual_knowledge)
        text_embeddings = embedding_model.generate_embeddings(text_chunks)
        vector_store.upsert(text_embeddings, text_chunks)

        print("Welcome to the SDS Story chatbot! Feel free to ask me anything about the SDS workshop members!")
        time.sleep(1)
        while True:
            query = input('User: ')

            if query.lower() in ['exit', 'quit']:
                print("Thank you for chatting!\n")
                break

            query_embedding = embedding_model.generate_embeddings([query])[0]

            retrieved_results = vector_store.retrieve(query_embedding, print_contexts=False)
            contexts_to_llm = vector_store.filter_similar_texts(retrieved_results)
            prompt = chatgpt.build_prompt(query, contexts_to_llm)
            chatgpt.fetch_response(prompt, RAG_SYSTEM_PROPMT)

In [101]:
vector_store = VectorStore()
text_splitter = TextSplitter()
embedding_model = EmbeddingModel(model_name= EMBEDDING_MODEL)
chatgpt = chatbot()

In [None]:
chatgpt.rag_pipeline(story)

In [None]:
chatgpt.chat()