# Rosho: The Chatbot 

Rosho will help you find answers to queries from NCERT Chapter Sound. It can do the following:
* Retrieve relevant data and answer queries that are related to the text
* Greet appropriately
* Do math calculations when provided python compatible math expressions

![demo.png](attachment:3d58ce5a-7a43-4117-a5a6-0cdf6ad22283.png)

Let's get into how Rosho works!

**Briefly mention that this notebook only shows the inner workings. FastAPI and Chatbot UI can be found in main.py**

First install all necessary libraries

In [34]:
!pip install -r requirements.txt




[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [35]:
import os
import gradio as gr
from fastapi import FastAPI
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.llms import HuggingFaceHub
from langchain_community.vectorstores import FAISS
import asyncio
from transformers import pipeline
from langchain_community.document_loaders import PyPDFLoader
import uvicorn
from langchain.text_splitter import RecursiveCharacterTextSplitter
from fastapi.staticfiles import StaticFiles
from torch.cuda import is_available as if_gpu
from math import exp, log, log10, sqrt, fabs
import torch
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever

## Setting up environment
Give your HuggingFace API Token here:

In [36]:
# Add your API key here if not added already 
# os.environ["HUGGINGFACEHUB_API_TOKEN"] = 'abcd123'

## Hyperparameters
This is where you come back to fine tune your model. This cell contains all the models, directories, threshold values and other parameters used in the code.

**Briefly mention that classifier threshold is to make model as for further clarification if a query couldn't be classified into Greeting/Math/Sound**

In [37]:
model_llm = "google/flan-t5-large"
# model_llm = "sarvamai/sarvam-2b-v0.5"
model_classifier = "facebook/bart-large-mnli"
model_embeddings = 'sentence-transformers/all-mpnet-base-v2'
vectorDB_dir = 'ncert.index/'
source_directory = 'PDFs/'

labels = [
    "Greeting or identity question", 
    "Question about sound or waves", 
    "Physics or chemistry question (not sound/waves related)", 
    "Mathematical calculation"
]
template_replies = [
    'Hey, This is Rosho! I can help you with any doubts from the NCERT chapter Sound. ',
    'I am only equipped to do simple math calculations. Please provide Python compatible mathematical expressions. ',
    'Sorry, as of now I am only trained on the Chapter Sound from Physics NCERT Class 10 Textbook. ',
    'Sorry, I do not understand your question / Don\'t know that topic. Can you clarify a little more?'
]

classifier_threshold = 0.5

**Explain each Hyperparameter here - What's the different between Beam and Sampling? What does num_beams do in Beam Search? What does Temperature do in Sampling Search?**

In [38]:
max_new_tokens = 512                # Default for Flan T5 is None

# Beam Search
repetition_penalty = 1.0            # Default for Flan T5 is 1.0
length_penalty = 2.0                # Default for Flan T5 is 1.0
num_beams = 3                       # Default for Flan T5 is 1.0

# Sampling Search
temperature = 0.7                   # Default for Flan T5 is 1.0

**Briefly mention why we split into chunks. Why the chunks should have overlap (Without overlap some chunks might lose context for what comes before and after**

In [39]:
chunk_size = 1000
chunk_overlap = 200

**Briefly mention that sometimes not just semantics but also the exact keywords might be important. So we do Hybrid search retrieving k best results and using alpha as a slider**

In [40]:
alpha_hybridsearch = 0.5
k = 5

### How to choose the right embedding? 

This is probably a tough question to answer because there are a lot of factors that come into play:
* Quality of Embedding
* Memory required
* Availbility: Most of the lightweight and accurate embeddings are not opensource (and they call themselves "Open")

After trying out different embeddings, we have decided to go with HuggingFaceEmbeddings() with its default model 'sentence-transformers/all-mpnet-base-v2'

In [41]:
device = 0 if torch.cuda.is_available() else -1
llm = pipeline(model = model_llm, device = device)
classifier = pipeline("zero-shot-classification", model=model_classifier, device=device)
embeddings = HuggingFaceEmbeddings(model_name = model_embeddings)



## Helper Functions

1. Function to format the prompt: Modifying prompt by prefixing with context from retrieved data. Also to suffix it with "Explain in detail." to improve the response quality
2. Calculator to help you evaluate math expressions

In [42]:
def format_prompt(results, query):
    user_prompt = results[0].page_content + ' \n ' + query
    user_prompt = user_prompt + ' Explain in detail.'
    return user_prompt

def calculator(query):
    query = str(query)
    query = query.strip()
    query = query.strip("?")
    query = query.replace("^", "**")
    return str(eval(query))

## Create Vectorstore and Initialise
Vector Embedding the data involves multiple steps:
1. Loading the data from given source directory using PyPDFLoader
2. Using RecursiveCharacterTextSplitter to split loaded data into mangeable size or 'chunks'. In this particular splitter, we first split at paragraph level, if the chunk size exceeds, it will move onto the next separator, at sentence level, if it still exceeds, it will move onto the next separator which is at word level. 
https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/
3. Creating vectorstore. FAISS is a library we are using for efficient similarity search and clustering of dense vectors.

In [43]:
def create_vectorstore(directory, embeddings):
    documents = []
    for filename in os.listdir(directory):
        if filename.endswith('.pdf'):
            loader = PyPDFLoader(os.path.join(directory, filename))
            pdf = loader.load()
            documents.extend(pdf)
    
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    documents = text_splitter.split_documents(documents)
    vector_store = FAISS.from_documents(documents, embeddings)
    return vector_store

Vector Embedding takes quite some time, depending on the Embedding you use and the amout of data you need to embed. Here we have the embedded zip_file in the directory which we are directly going to use.

In [44]:
vectorstore = None
faiss_retriever = None
def initialize_vectorstore(k=5):
    global vectorstore, faiss_retriever
    try:
        vectorstore = FAISS.load_local(vectorDB_dir, embeddings, allow_dangerous_deserialization=True)
        print("Vector store initialized successfully")
    except Exception as e:
        print(f"{e}: Creating VectorStore from {source_directory}")
        vectorstore = create_vectorstore(source_directory, embeddings)
        vectorstore.save_local(vectorDB_dir)
    faiss_retriever = vectorstore.as_retriever(search_kwargs={"k":k})

In [45]:
initialize_vectorstore(k)

Vector store initialized successfully


## Labelling the Prompt for Smart Responses
For eg: if the user prompt is 'hi', then Rosho need not extract relevant data from vectorDB. In order to make Rosho smarter, we are labelling the query using classifier (BART). Lable with the maximum score will be returned.

In [46]:
def labelling_fun(query, threshold): 
    print("-"*20+"Classification"+"-"*20+"\n")
    label_dict = classifier(query, labels, multi_label=True)
    print('labelled sequence: ', label_dict, '\n')
    best_label, max_score = None, 0
    for label, score in zip(label_dict['labels'], label_dict['scores']):
        if score >= threshold:
            if score > max_score:
                max_score = score 
                best_label = label
    print('label with highest score: ', best_label, '\n')
    return best_label

## Improving the RAG with Hybrid Search

**Briefly mention that to save Compute Power, we run BM25 only on the shortlist that FAISS gives and not on all the data again**

In [47]:
def hybrid_search(query, vectorstore, faiss_retriever, k=5, alpha=0.5):
    # Initializing BM25 only on the top-k FAISS Results
    faiss_res = vectorstore.similarity_search_with_score(query, 2*k)
    faiss_docs = [res[0] for res in faiss_res]
    bm25_retreiver = BM25Retriever.from_documents(faiss_docs)
    bm25_retreiver.k = k
    # Initializing a Hybrid Retriever
    hybrid_retriever = EnsembleRetriever(
        retrievers = [faiss_retriever, bm25_retreiver],
        weights = [alpha, 1-alpha]
    )
    return hybrid_retriever.invoke(query)

## Responding to a query
This involves following steps:
1. Labelling the query for initiating appropriate response
2. Returning appropriate response using label
3. If the query is to be answered from extracting relevant data from vectorDB (if label_name = "Question about sound or waves"):
    1. Extracting relevant data based on similarity score based ranking. The similarity score is simply the percentage of text in a submission that matched to other sources.
    2. Formatting the user prompt to contain the context extracted using RAG before sending to LLM
    3. Sending the modified user prompt to LLM to get the LLM response. We have also used repetition_penalty to avoid repetition of sentences/phrases by LLM
4. If not a query from the Textbook, one of the pre-decided responses are output.

In [48]:
def api_calling(query: str):
    try:
        if vectorstore is None:
            return {"error": "Vector store not initialized"}

        label = labelling_fun(query, classifier_threshold)

        if labels[1] == label:
            results = hybrid_search(query, vectorstore, faiss_retriever, k, alpha_hybridsearch)
            print("-"*15 + "Retrieval Augmented Prompt" + "-"*15+"\n")
            user_prompt = format_prompt(results, query)
            print(user_prompt,'\n\n')

            # Beam Search
            response = llm(user_prompt,
                repetition_penalty=repetition_penalty,
                max_new_tokens = max_new_tokens,
                num_beams = 3,
                length_penalty = length_penalty
            )[0]['generated_text']

            # Sampling Search
            # response = llm(user_prompt,
            #     repetition_penalty=repetition_penalty,
            #     max_new_tokens = max_new_tokens,
            #     temperature=temperature,
            #     do_sample=True
            # )[0]['generated_text']

        elif labels[0]  == label:
            response = template_replies[0]
        elif labels[3]  == label:
            try:
                response = calculator(query)
            except Exception as e:
                print(e)
                response = template_replies[1]
        elif labels[2]  == label:
            response = template_replies[2]
        else:
            response = template_replies[3]
        print("-"*24 + "Response" + "-"*24, '\n')
        return response
    except Exception as e:
        return str(e)

## Example Query:

In [49]:
api_calling("hi")

--------------------Classification--------------------

labelled sequence:  {'sequence': 'hi', 'labels': ['Greeting or identity question', 'Question about sound or waves', 'Mathematical calculation', 'Physics or chemistry question (not sound/waves related)'], 'scores': [0.8081603646278381, 0.5356581211090088, 0.3798612356185913, 0.15616482496261597]} 

label with highest score:  Greeting or identity question 

------------------------Response------------------------ 



'Hey, This is Rosho! I can help you with any doubts from the NCERT chapter Sound. '

In [50]:
api_calling("exp(3)/(3^2)")

--------------------Classification--------------------

labelled sequence:  {'sequence': 'exp(3)/(3^2)', 'labels': ['Mathematical calculation', 'Question about sound or waves', 'Physics or chemistry question (not sound/waves related)', 'Greeting or identity question'], 'scores': [0.9413856863975525, 0.3731705844402313, 0.235739067196846, 0.13816212117671967]} 

label with highest score:  Mathematical calculation 

------------------------Response------------------------ 



'2.23172632479863'

In [51]:
api_calling("what is a longitudinal wave?")

--------------------Classification--------------------

labelled sequence:  {'sequence': 'what is a longitudinal wave?', 'labels': ['Question about sound or waves', 'Greeting or identity question', 'Mathematical calculation', 'Physics or chemistry question (not sound/waves related)'], 'scores': [0.9723053574562073, 0.008230606094002724, 0.004612513352185488, 0.00016234863142017275]} 

label with highest score:  Question about sound or waves 

---------------Retrieval Augmented Prompt---------------

slinky with the sound propagation in the
medium. These waves are called longitudinal
waves. In these waves the individual particles
of the medium move in a direction parallel to
the direction of propagation of the disturbance.
The particles do not move from one place to
another but they simply oscillate back and
forth about their position of rest. This is
exactly how a sound wave propagates, hence
sound waves are longitudinal waves.
There is also another type of wave, called
a transverse wa

'In these waves the individual particles of the medium move in a direction parallel to the direction of propagation of the disturbance. The particles do not move from one place to another but they simply oscillate back and forth about their position of rest. This is exactly how a sound wave propagates, hence sound waves are longitudinal waves.'