## Hello, Here's How to use RAG w HF Models

Install some dependencies

In [None]:
!pip install -q -U bitsandbytes==0.42.0
!pip install -q -U peft==0.8.2
!pip install -q -U trl==0.7.10
!pip install -q -U accelerate==0.27.1
!pip install -q -U datasets==2.17.0
!pip install -q -U transformers==4.41.0
!pip install langchain sentence-transformers chromadb langchainhub
!pip install tensorflow
!pip install tf-keras
!pip install chromadb
!pip install langchain-community langchain-core


Get the Model You Want

In [25]:
from langchain_community.llms import HuggingFaceEndpoint
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# get the repository ID for the Gemma 2b model which I am testing with
repo_id = "google/gemma-2-2b-it"

Define Variables

In [26]:
import os

# set your own hf token then fetch it here
hf_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")

# obv params, max_length is max token len for generated text, temp=0.1 means give more predictable and less random results
llm = HuggingFaceEndpoint(
    task='text-generation',
    repo_id=repo_id,
    model="google/gemma-2-2b-it",
    max_length=1024,
    temperature=0.1,
    huggingfacehub_api_token=hf_token
)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.


Define Data Sources

In [28]:
import pandas as pd

# load ur data
# health_data = pd.read_csv('../Health-Data-and-Scripts-for-Chatbot/data-with-sources.csv')
# work_data = pd.read_csv('../Work-Study-Data-and-Scripts/work-and-education-data.csv')
transit_data = pd.read_csv('../Transit-Data-Ques-Ans/vancouver_transit_qa_pairs.csv')

# health_data_sample = health_data
# work_data_sample = work_data
transit_data_sample = transit_data

# health_data_sample['text'] = health_data_sample['Question'].fillna('') + ' ' + health_data_sample['Answer'].fillna('')
# work_data_sample['text'] = work_data_sample['Theme'].fillna('') + ' ' + work_data_sample['Content'].fillna('')
transit_data_sample['text'] = transit_data_sample['question'].fillna('') + ' ' + transit_data_sample['answer'].fillna('')

Set Embedding Model, and Chroma Client to Interact w Vector Database and Create Collections

In [29]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
import chromadb

# pt model for generating embeddings used pretty often
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# persistent client to interact w chroma vector store
client = chromadb.PersistentClient(path="./chroma_db")

# create collections for each data (for testing rn)
health_collection = client.get_or_create_collection(name="health_docs")
work_collection = client.get_or_create_collection(name="work_docs")
transit_collection = client.get_or_create_collection(name="transit_docs")

Function to add data to collection by embedding them

In [36]:
def add_data_to_collection(collection, data):
    for idx, row in data.iterrows():
        try:
            # get the embeddings using the embedding model for the documents
            embeddings = embedding_model.embed_documents([row['text']])[0]
            collection.add(
                ids=[str(idx)],
                embeddings=[embeddings],
                documents=[row['text']]
            )
        except Exception as e:
            print(f"Error on index {idx}: {e}")

# add data to collections
# add_data_to_collection(health_collection, health_data_sample)
# add_data_to_collection(work_collection, work_data_sample)
add_data_to_collection(transit_collection, transit_data_sample)

Insert of existing embedding ID: 0
Add of existing embedding ID: 0
Insert of existing embedding ID: 1
Add of existing embedding ID: 1
Insert of existing embedding ID: 2
Add of existing embedding ID: 2
Insert of existing embedding ID: 3
Add of existing embedding ID: 3
Insert of existing embedding ID: 4
Add of existing embedding ID: 4
Insert of existing embedding ID: 5
Add of existing embedding ID: 5
Insert of existing embedding ID: 6
Add of existing embedding ID: 6
Insert of existing embedding ID: 7
Add of existing embedding ID: 7
Insert of existing embedding ID: 8
Add of existing embedding ID: 8
Insert of existing embedding ID: 9
Add of existing embedding ID: 9
Insert of existing embedding ID: 10
Add of existing embedding ID: 10
Insert of existing embedding ID: 11
Add of existing embedding ID: 11
Insert of existing embedding ID: 12
Add of existing embedding ID: 12
Insert of existing embedding ID: 13
Add of existing embedding ID: 13
Insert of existing embedding ID: 14
Add of existing em

Function to now match for releveant document

In [40]:
def get_relevant_document(query, category):
    try:
        # get the embedding for the user query using same embedding model
        query_embeddings = embedding_model.embed_documents([query])[0]

        # choose the correct collection based on the category
        if category == "health":
            collection = health_collection
        elif category == "work":
            collection = work_collection
        elif category == "transit":
            collection = transit_collection
        # collection = health_collection if category == "health" else work_collection

        # query the collection
        results = collection.query(query_embeddings=[query_embeddings], n_results=1)

        print(f"Query Results: {results}")

        return results['documents'][0][0] if results['documents'] else None
    except Exception as e:
        print(f"Error querying: {e}")
        return None

Generate Answer

In [41]:
def generate_answer(query, category):
    # b4 rag
    output_before_rag = llm.predict(f"Respond to this question: {query}")
    response_before_rag = output_before_rag

    # get the relevant document
    relevant_document = get_relevant_document(query, category)
    if relevant_document is None:
        return f"Sorry, no relevant document found. Model's response before RAG: {response_before_rag}"

    relevant_document = " ".join(relevant_document.split())
    MAX_DOC_LENGTH = 500
    relevant_document = relevant_document[:MAX_DOC_LENGTH]

    # rag_prompt = f"""
    # You are a helpful assistant for international students new to B.C. Here is a relevant document:

    # {relevant_document}

    # Please respond to the following question based on the document above:

    # Question: {query}

    # Answer:
    # """
    rag_prompt = f"""
    You are a helpful assistant for international students new to B.C. Here is a relevant document:

    {relevant_document}

    Please respond to the following question based on the document above, if you can't answer anything or it requires the international student to ask a query again, direct them to additional resources like the vancouver transit website or the transit mobile app for transit related queries:

    Question: {query}

    Answer:
    """

    # print("Prompt being sent to model:")
    # print(rag_prompt)

    # now generate using RAG
    output_after_rag = llm.predict(rag_prompt)
    # print("Output from model:", output_after_rag)

    response_after_rag = output_after_rag

    # return both responses to compare
    return {
        "Before RAG Response": response_before_rag,
        "After RAG Response": response_after_rag
    }

Example Usage

In [42]:
user_query = "How do I commute in vancouver and how can I get to SFU?"
# user_query = "What do I need to do to apply for MSP coverage in B.C.?"
category = "transit"
# category = "health"
responses = generate_answer(user_query, category)

print("User Query:", user_query)
print("Response Before RAG:", responses["Before RAG Response"])
print("Response After RAG:", responses["After RAG Response"])



Query Results: {'ids': [['131']], 'embeddings': None, 'documents': [['How do I plan a bus trip in Vancouver? You can plan your trip using the TransLink website, Google Maps, or the TransLink mobile app. Enter your starting point and destination, and these tools will show you the best routes, including any transfers needed.']], 'uris': None, 'data': None, 'metadatas': [[None]], 'distances': [[0.7808046340942383]], 'included': [<IncludeEnum.distances: 'distances'>, <IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}




User Query: How do I commute in vancouver and how can I get to SFU?
Response Before RAG: 

**Here's a breakdown of commuting options in Vancouver and how to get to SFU:**

**1. Public Transit:**

* **TransLink:** Vancouver's public transit system is extensive and reliable. 
    * **SkyTrain:** The SkyTrain is the fastest option, connecting major areas like downtown Vancouver, Richmond, and Surrey. SFU is served by the Millennium Line.
    * **Bus:** Buses are a more affordable option, covering a wider range of routes.  
    * **SeaBus:** This ferry service connects North Vancouver to downtown Vancouver, offering a scenic route.
* **Getting to SFU:**
    * **SkyTrain:** Take the Millennium Line from Waterfront Station to SFU.
    * **Bus:** Several bus routes connect SFU to various parts of Vancouver. Check TransLink's website for specific routes and schedules.

**2. Driving:**

* **Traffic:** Vancouver traffic can be challenging, especially during peak hours.
* **Parking:** Parking on 

In [44]:
# verify
health_docs = health_collection.get()
print("Number of documents in health collection:", len(health_docs['documents']))

work_docs = work_collection.get()
print("Number of documents in work collection:", len(work_docs['documents']))

Number of documents in health collection: 76
Number of documents in work collection: 878
