<a href="https://colab.research.google.com/github/darrenCWJ/Govtech_ABC_2024/blob/main/abc_week_4_part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
---

<h1>Notebook: [ Week #04: Building your own RAG Bot ]</h1>

- Your objective in this notebook is create a RAG Bot that allow the users to interact with some notes from AI Champions Bootcamp.
- A convenient way to work on this notebook is to open the earlier Jupyter Notebook in `Topic 4`. Yes, the notebook with pre-populated code cells.
- You can refer to how a simple RAG Bot (or more like a RAG pipeline) is built
- You may extend the functionalities of the bot as you wish.
- Minimumly, you should have a simple RAG Bot like the one in the earlier `Topic 4` Jupyter Notebook


---
---

# Setup

In [1]:
!pip install openai
!pip install langchain
!pip install langchain-openai
!pip install langchain-experimental
!pip install langchain-chroma
!pip install pypdf
!pip install lolviz
!pip install chromadb
!pip install tqdm
!pip install tiktoken

# You may need to install other dependencies that you need for your project

Collecting openai
  Downloading openai-1.44.0-py3-none-any.whl.metadata (22 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.44.0-py3-none-any.whl (367 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m367.8/367.8 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K   [90m━

In [2]:
import os
import openai
from getpass import getpass

# Set up the OpenAI API key by setting the OPENAI_API_KEY environment variable
os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key")


Enter your OpenAI API Key··········


---

## Helper Functions

---

### Function for Generating Embedding

In [4]:
def get_embedding(input, model='text-embedding-3-small'):
    response = client.embeddings.create(
        input=input,
        model=model
    )
    return [x.embedding for x in response.data]

### Function for Text Generation

In [5]:
# This is the "Updated" helper function for calling LLM
def get_completion(prompt, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=256, n=1, json_output=False):
    if json_output == True:
      output_json_structure = {"type": "json_object"}
    else:
      output_json_structure = None

    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create( #originally was openai.chat.completions
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1,
        response_format=output_json_structure,
    )
    return response.choices[0].message.content

In [6]:
# This a "modified" helper function that we will discuss in this session
# Note that this function directly take in "messages" as the parameter.
def get_completion_by_messages(messages, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=1024, n=1):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1
    )
    return response.choices[0].message.content

## Functions for Token Counting

In [7]:
# These functions are for calculating the tokens.
# ⚠️ These are simplified implementations that are good enough for a rough estimation.

import tiktoken

def count_tokens(text):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    return len(encoding.encode(text))

def count_tokens_from_message(messages):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    value = ' '.join([x.get('content') for x in messages])
    return len(encoding.encode(value))


---
---

# Create a "Chat with your Document" Bot

**\[ Overview of Steps in RAG \]**

- 1. **Document Loading**
	- In this initial step, relevant documents are ingested and prepared for further processing. This process typically occurs offline.
- 2. **Splitting & Chunking**
	- The text from the documents is split into smaller chunks or segments.
	- These chunks serve as the building blocks for subsequent stages.
- 3. **Storage**
	- The embeddings (vector representations) of these chunks are created and stored in a vector store.
	- These embeddings capture the semantic meaning of the text.
- 4. **Retrieval**
	- When an online query arrives, the system retrieves relevant chunks from the vector store based on the query.
	- This retrieval step ensures that the system identifies the most pertinent information.
- 5. **Output**
	- Finally, the retrieved chunks are used to generate a coherent response.
	- This output can be in the form of natural language text, summaries, or other relevant content.

![](https://abc-notes.data.tech.gov.sg/resources/img/topic-4-rag-overview.png)

---
---

## Document Loading

Here are the "notes" that you must include in your RAG pipeline as the `Documents`
- [Key Parameters for LLMs](https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/2.-key-parameters-for-llms.html)
- [LLMs and Hallucinations](https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/3.-llms-and-hallucinations.html)
- [Prompting Techniques for BUilders](https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/4.-prompting-techniques-for-builders.html)

You have three options.
1) 💪🏼 Take up the challenge to find a way to get the content directly from the webpages above.
2) 🥴 Go with the easy route, download the notes nicely prepared in a `.txt` format. Download the zipped file [here](https://abc-notes.data.tech.gov.sg/resources/data/notes.zip)
3) 😎 “Only children choose; adults take all.” Experiment with both data sources and see which can help to the Bot to provide more accurate information for the user queries.

---

> 💡 **Feel free to add as many code cells as your need.**

---

In [32]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_path = ["https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/2.-key-parameters-for-llms.html",
                "https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/3.-llms-and-hallucinations.html",
                "https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/4.-prompting-techniques-for-builders.html"
              ]
              )

docs = loader.load()



In [34]:
docs

[Document(metadata={'source': 'https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/2.-key-parameters-for-llms.html', 'title': '2. Key Parameters for LLMs', 'description': 'AI Champions Bootcamp - 2. Key Parameters for LLMs', 'language': 'No language found.'}, page_content=' \n2. Key Parameters for LLMs\n\n\n\n\n\n\n\n\n\n\n\nicon: LiNotebookTabsCopyTitle: Key Parameters for LLMs\n\nTokens\nKey Parameters for LLM\nLLMs and Hallucination\nPrompting Techniques for Builders\nHands-on Walkthrough and Tasks\nKey Parameters for LLMs\n‚ú¶ For our Helper Function in the notebook, we only pass in three arguments to the create() method.\n# This is a function that send input (i.e., prompt) to LLM and receive the output from the LLM\ndef get_completion(prompt, model="gpt-4o-mini"):\n    messages = [{"role": "user", "content": prompt}]\n    response = client.chat.completions.create(\n        model=model,\n        messages=messages,\n        temperature=0, # this is the degree of r

In [36]:
!wget https://abc-notes.data.tech.gov.sg/resources/data/notes.zip
!unzip notes.zip

--2024-09-08 17:18:47--  https://abc-notes.data.tech.gov.sg/resources/data/notes.zip
Resolving abc-notes.data.tech.gov.sg (abc-notes.data.tech.gov.sg)... 13.32.151.8, 13.32.151.116, 13.32.151.79, ...
Connecting to abc-notes.data.tech.gov.sg (abc-notes.data.tech.gov.sg)|13.32.151.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13270 (13K) [application/zip]
Saving to: ‘notes.zip’


2024-09-08 17:18:48 (21.9 MB/s) - ‘notes.zip’ saved [13270/13270]

Archive:  notes.zip
   creating: notes/
  inflating: notes/2. Key Parameters for LLMs.txt  
  inflating: notes/3. LLMs and Hallucinations.txt  
  inflating: notes/4. Prompting Techniques for Builders.txt  


In [38]:
# Load notes from downloaded txt files
from langchain.document_loaders import TextLoader

notes = []
for file_path in ["/content/notes/2. Key Parameters for LLMs.txt",
                 "/content/notes/3. LLMs and Hallucinations.txt",
                 "/content/notes/4. Prompting Techniques for Builders.txt"]:
    loader = TextLoader(file_path) # Create a TextLoader for each file
    notes.extend(loader.load()) # Load the file and extend the notes list

In [39]:
notes

 Document(metadata={'source': '/content/notes/3. LLMs and Hallucinations.txt'}, page_content='\n\n<h1>Title: LLMs and Hallucinations</h1>\n\n\n# LLMs & Hallucinations\n- ✦ One important thing to take note of when using such AI powered by Large Language Models (LLMs) is that they often generate text that appears coherent and contextually relevant but is factually incorrect or misleading. \n\t- We call these **hallucination problems**. This issue arises due to the inherent nature of how LLMs are trained and their reliance on massive datasets. \n\t- While some of the models like ChatGPT go through a second phase in the training where humans try to improve the responses, there is generally no fact-checking mechanism that is built into these LLMs when you use them.\n\n- ✦ There is no easy foolproof safeguard against hallucination, although some system prompt engineering can help mitigate this. \n\t- What makes hallucination by LLM worse is that the responses are surprisingly real, even if t

## Splitting & Chunking

In [42]:
# < Your Code Here >
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=500,
    chunk_overlap=50,
    length_function=count_tokens
)


splitted_documents = text_splitter.split_documents(docs)


In [43]:
splitted_documents

[Document(metadata={'source': 'https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/2.-key-parameters-for-llms.html', 'title': '2. Key Parameters for LLMs', 'description': 'AI Champions Bootcamp - 2. Key Parameters for LLMs', 'language': 'No language found.'}, page_content='2. Key Parameters for LLMs\n\n\n\n\n\n\n\n\n\n\n\nicon: LiNotebookTabsCopyTitle: Key Parameters for LLMs\n\nTokens\nKey Parameters for LLM\nLLMs and Hallucination\nPrompting Techniques for Builders\nHands-on Walkthrough and Tasks\nKey Parameters for LLMs\n‚ú¶ For our Helper Function in the notebook, we only pass in three arguments to the create() method.\n# This is a function that send input (i.e., prompt) to LLM and receive the output from the LLM\ndef get_completion(prompt, model="gpt-4o-mini"):\n    messages = [{"role": "user", "content": prompt}]\n    response = client.chat.completions.create(\n        model=model,\n        messages=messages,\n        temperature=0, # this is the degree of rand

## Storage: Embedding & Vectorstores

In [45]:
# < Your Code Here >
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

In [46]:
embeddings_model = OpenAIEmbeddings(model='text-embedding-3-small')


# Store into vector database
vector_store = Chroma.from_documents(
    collection_name="ai_champions_bootcamp_week_2",
    documents=splitted_documents,
    embedding=embeddings_model,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not neccesary
)

In [47]:
# Show the number of documents in the vector store
vector_store._collection.count()

19

## Retrieval

In [48]:
# < Your Code Here >
vector_store.similarity_search('Zero Shot', k=3)

[Document(metadata={'description': 'AI Champions Bootcamp - 2. Key Parameters for LLMs', 'language': 'No language found.', 'source': 'https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/2.-key-parameters-for-llms.html', 'title': '2. Key Parameters for LLMs'}, page_content="\uf8ffüí° You don't have to worry about understanding the equation or memorizing it. \n\n\nIt's more for us to understand the intuition on where is the temperature being used\n\n\nSoftmax\n\n\n\nSoftmax with Temperature \n\n\n\nCalculations that are found on this page are for understanding the intuition behind the key parameters and do not represent the exact ways model providers code their algorithms\n\n‚ú¶ This applies to the calculations for temperature, top-K, and top-P\n\nTry out in notebook week 02\nThe live calculation to show the intuition of the Temperature  is included in the Notebook of this week. Try it out!\nTop-K\n‚ú¶ After the probabilities are computed, the model applies the Top-K s

In [49]:
vector_store.similarity_search_with_relevance_scores('Zero Shot', k=3)

  vector_store.similarity_search_with_relevance_scores('Zero Shot', k=3)


[(Document(metadata={'description': 'AI Champions Bootcamp - 2. Key Parameters for LLMs', 'language': 'No language found.', 'source': 'https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/2.-key-parameters-for-llms.html', 'title': '2. Key Parameters for LLMs'}, page_content="\uf8ffüí° You don't have to worry about understanding the equation or memorizing it. \n\n\nIt's more for us to understand the intuition on where is the temperature being used\n\n\nSoftmax\n\n\n\nSoftmax with Temperature \n\n\n\nCalculations that are found on this page are for understanding the intuition behind the key parameters and do not represent the exact ways model providers code their algorithms\n\n‚ú¶ This applies to the calculations for temperature, top-K, and top-P\n\nTry out in notebook week 02\nThe live calculation to show the intuition of the Temperature  is included in the Notebook of this week. Try it out!\nTop-K\n‚ú¶ After the probabilities are computed, the model applies the Top-K 

## Question & Answer

In [50]:
# < Your Code Here >
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

In [51]:
qa_chain = RetrievalQA.from_chain_type(
    ChatOpenAI(model='gpt-4o-mini'),
    retriever=vector_store.as_retriever(k=20)
)

qa_chain.invoke("Why LLM hallucinate?")

{'query': 'Why LLM hallucinate?',
 'result': "LLMs hallucinate because they generate text that appears coherent and contextually relevant but is factually incorrect or misleading. This issue arises from the inherent nature of how LLMs are trained, relying on massive datasets without a built-in fact-checking mechanism. When LLMs encounter questions they do not know the answer to, instead of admitting they don't know, they often produce a confident-sounding but incorrect response. This can lead to the dissemination of misinformation, making it crucial for users to fact-check the outputs."}

##Custom Prompt

In [52]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    ChatOpenAI(model='gpt-4o-mini'),
    retriever=vector_store.as_retriever(),
    return_source_documents=True, # Make inspection of document possible
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [53]:
qa_chain.invoke("Why LLM hallucinate?")

{'query': 'Why LLM hallucinate?',
 'result': "LLMs hallucinate because they generate text that seems coherent but can be factually incorrect or misleading, stemming from their reliance on vast datasets during training. They often lack a fact-checking mechanism, leading to confident yet incorrect responses when they don't know the answer. Thus, it's crucial to verify their outputs independently. Thanks for asking!",
 'source_documents': [Document(metadata={'description': 'AI Champions Bootcamp - 3. LLMs and Hallucinations', 'language': 'No language found.', 'source': 'https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/3.-llms-and-hallucinations.html', 'title': '3. LLMs and Hallucinations'}, page_content="3. LLMs and Hallucinations\n\n\n\n\n\n\n\n\n\n\n\nicon: LiNotebookTabsCopyTitle: LLMs and Hallucinations\n\nTokens\nKey Parameters for LLM\nLLMs and Hallucination\nPrompting Techniques for Builders\nHands-on Walkthrough and Tasks\nTable of Contents\n\nLLMs & Hallucin