<a href="https://colab.research.google.com/github/AnnaDS/ChatBots/blob/main/ChatBot_with_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Create ChatBot with RAG from Youtube video or Pdf document

### Installing required packages

In [7]:
! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain pypdf rapidocr-onnxruntime streamlit unstructured pdf2image pdfminer.six pikepdf pillow_heif langchain_experimental


Collecting langchain_community
  Downloading langchain_community-0.0.29-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken
  Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m26.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-openai
  Downloading langchain_openai-0.1.0-py3-none-any.whl (32 kB)
Collecting langchainhub
  Downloading langchainhub-0.1.15-py3-none-any.whl (4.6 kB)
Collecting chromadb
  Downloading chromadb-0.4.24-py3-none-any.whl (525 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m525.5/525.5 kB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.1.13-py3-none-any.whl (810 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m810.5/810.5 kB

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#### Add API keys and other environmental variables

In [2]:
import json

# Replace 'your_file.json' with the path to your actual JSON file
file_path = '/content/drive/MyDrive/OpenAI_API/Keys.json'

# Read the JSON file
with open(file_path, 'r') as file:
    data = json.load(file)

# Extract the value associated with the key 'k'
# Replace 'k' with your actual key name if different
OPENAI_API_KEY = data.get('OPENAI_API_KEY', None)  # Returns None if 'OPENAI_API_KEY' is not found
LANGSMITH_API_KEY = data.get('LANGSMITH', None)  # Returns None if 'OPENAI_API_KEY' is not found




To better secure for API keys set them inside of the session manualy

In [3]:
from getpass import getpass
#print(f'Enter the OPEN AI API key')
#OPENAI_API_KEY = getpass()

In [4]:
#print(f'Enter the LANGCHAIN API key')
#LANGSMITH_API_KEY = getpass()

In [5]:
#OpenAI API key
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY#data.get('OPENAI_API_KEY', None)

#Setup LangSmith to trace development
from langsmith import Client
os.environ["LANGCHAIN_PROJECT"] = 'RAG_CHAT'
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = LANGSMITH_API_KEY
os.environ["LANGCHAIN_TRACING_V2"] = "true"

### **Creating ChatBot**

In [6]:
#Define models
#Full list of models https://platform.openai.com/docs/models/overview
GPT4 = 'gpt-4-0125-preview'
GPT3 = 'gpt-3.5-turbo-0125'

In [7]:
#Import
#Import ChatOpenAI class
from langchain_openai import ChatOpenAI

from langchain_core.runnables.history import RunnableWithMessageHistory
# Import ChatMessageHistory class that will store our chat history.
# Import chat prompt templates classes and Message placeholders classes
from langchain.memory import ChatMessageHistory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

#Define stop words for our chatbot
stop_words = ["exit", "quit", "stop"]

#Define chat history
chat_history = ChatMessageHistory()

#Adding messages to the chat history (optional)
# Add a user message to the chat history
chat_history.add_user_message("What day ChatGPT was launched")
# Add an AI response message to the chat history
chat_history.add_ai_message("ChatGPT was launched at November 30, 2022")
# Add a user message to the chat history
chat_history.add_user_message("Was it successful aunch?")

#Define LLM

Chat = ChatOpenAI(model = GPT3)

# Create a ChatPromptTemplate using messages
prompt = ChatPromptTemplate.from_messages(
    [
        # Define a system message as a tuple
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        # Add a placeholder for the chat messages
        MessagesPlaceholder(variable_name="messages"),
    ]
)

#Define the chain
Chat_chain = prompt | Chat

#Use RunnableWithMessageHistory as a wrapper to manage message history
Chain_with_message_history = RunnableWithMessageHistory(
    Chat_chain,
    #define access to chat history
    lambda session_id : chat_history,
    input_messages_key="messages",
    history_messages_key="chat_history"
)

# Perform chat runs
print("Starting the chat...")
while True:
    question = input("User: ")

    # Check if the user input matches a stop word
    if question.lower() in stop_words:
        print("Exiting the chat...")
        break

    # Add a user message to the chat history
    chat_history.add_user_message(question)

    #Generate AI response
    ai_response = Chain_with_message_history.invoke({"messages": chat_history.messages}, {"configurable": {"session_id": chat_history }})#chat_history}})

    # Add an AI response message to the chat history
    chat_history.add_ai_message(ai_response.content)

    #Display AI answer
    print(f"AI: {ai_response.content}")

    ##user_messages:
    ## What is the day today?
    ## What is you knowledge cut-off date?

Starting the chat...
User: What is the day today?
AI: I'm an AI assistant and I don't have real-time information. Could you please confirm today's date for me?
User: What is the day today?
AI: Today is Sunday.
User: What is you knowledge cut-off date?
AI: My responses are based on the information available up to the present time. I don't have a specific knowledge cut-off date, but I provide information based on the latest data and knowledge available. If you have any specific questions or need up-to-date information, feel free to ask!
User: exit
Exiting the chat...


### **Creating ChatBot with RAG**

We'll create the chatbot to save some time watching all Youtube video presentation but instead directly asking the questions we are interested. I'm very interested about this new Time Series package AutoGluon released by Amazon, and I want to build a ChatBot to address my quations.
The name of the video we'll use is [Caner Turkmen, Oleksandr Shchur: AutoGluon - AutoML for Tabular, Multimodal and Time Series Data](https://www.youtube.com/watch?v=Lwu15m5mmbs&t=181s)

Install required libraries to load data from Youtube

In [8]:
!pip install youtube-transcript-api pytube

Collecting youtube-transcript-api
  Downloading youtube_transcript_api-0.6.2-py3-none-any.whl (24 kB)
Collecting pytube
  Downloading pytube-15.0.0-py3-none-any.whl (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pytube, youtube-transcript-api
Successfully installed pytube-15.0.0 youtube-transcript-api-0.6.2


#### Load data from Youtube

In [9]:
from langchain.document_loaders import YoutubeLoader

# Use the YoutubeLoader to load and parse the transcript of a YouTube video
loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=Lwu15m5mmbs&t=181s", add_video_info=True)
video = loader.load()
video

[Document(page_content="hey hey everyone mic check is is everything okay like with the voice all right great and thanks Antonia for the introduction that was about half the talk so uh we'll we'll focus the rest of our time into like the nitty-gritty of the library so um first of all welcome everyone to our session on autoglo on autoglo on is our automl library that provides a variety of data modalities for you to work with uh myself John R and Alexander will be presenting the session my name is Jenner I've I work at AWS I'm a senior applied scientist I've been with the company for about four years now and in the general data science space for about a decade uh Alexander my colleague is also an applied scientist at AWS and together uh he got his PhD recently from the Technical University of Munich and together we work on the forecasting time series uh features within the General Auto go on framework so um let's start so I'm going to start with uh basically describing ml in a nutshell an

#### Split the video transcript. into chunks


In [11]:
# Import text splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create an instance of RecursiveCharacterTextSplitter with custom chunk size and overlap
chunk_size = 300  # Adjust the chunk size as needed
chunk_overlap = 0  # Set the overlap between chunks

#Initiate splitter with desired parameters
splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

# Split the document into chunks using the RecursiveCharacterTextSplitter
splits = splitter.split_documents(video)

# Print the number of splits in the doc
print(f'Number of text splits in the document is: {len(splits)}')

# Print each split and a separator for readability
for split in splits:
    print(split)
    print("---")

Number of text splits in the document is: 134
page_content="hey hey everyone mic check is is everything okay like with the voice all right great and thanks Antonia for the introduction that was about half the talk so uh we'll we'll focus the rest of our time into like the nitty-gritty of the library so um first of all welcome everyone to our session on" metadata={'source': 'Lwu15m5mmbs', 'title': 'Caner Turkmen, Oleksandr Shchur:  AutoGluon - AutoML for Tabular, Multimodal and Time Series Data', 'description': 'Unknown', 'view_count': 921, 'thumbnail_url': 'https://i.ytimg.com/vi/Lwu15m5mmbs/hq720.jpg?sqp=-oaymwEmCIAKENAF8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGGUgYyhOMA8=&rs=AOn4CLBQMQ6d2UBD3hwRitYgFYqgmOkZ1w', 'publish_date': '2023-06-20 00:00:00', 'length': 2602, 'author': 'PyData'}
---
page_content="autoglo on autoglo on is our automl library that provides a variety of data modalities for you to work with uh myself John R and Alexander will be presenting the session my name is Jenner I've 

### Create embeddings

In [14]:
#Import vectorstore database and embeddings model
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Embeddings model
# Chose embeddings model
# Selected model is well described in https://openai.com/blog/new-embedding-models-and-api-updates
embeddings_model = OpenAIEmbeddings(model="text-embedding-ada-002")

#Define vector DB. Run this line of code only once.
#If accidently did more delete DB
vector_db = Chroma.from_documents(documents=splits, embedding=embeddings_model)

#Define retriever
retriever = vector_db.as_retriever()



In [15]:
#Code to delete db. (if needed)

# Delete the collection
#vector_db.delete_collection()
#print("Collection deleted successfully.")

#### Testing model

In [16]:
#Define question
question = 'What are the main features of AutoGluon for Time Series data?'

#Fetch 3 documents from vector store related to question
vector_db.similarity_search_with_score(question, k=3)

[(Document(page_content='much easier and get to the most accurate models with like maybe as little as three lines of code and in in the least amount of human involvement as possible Right and auto gluon does just that auto gluon is an automl Library framework that it aims primarily to democratize machine learning so as to', metadata={'author': 'PyData', 'description': 'Unknown', 'length': 2602, 'publish_date': '2023-06-20 00:00:00', 'source': 'Lwu15m5mmbs', 'thumbnail_url': 'https://i.ytimg.com/vi/Lwu15m5mmbs/hq720.jpg?sqp=-oaymwEmCIAKENAF8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGGUgYyhOMA8=&rs=AOn4CLBQMQ6d2UBD3hwRitYgFYqgmOkZ1w', 'title': 'Caner Turkmen, Oleksandr Shchur:  AutoGluon - AutoML for Tabular, Multimodal and Time Series Data', 'view_count': 921}),
  0.3640797436237335),
 (Document(page_content="glowing which is time series forecasting and this is again quite different from table of data quite different from multimodal and Aragon can help you deal with this as well and so time series

Well, the reply from the human delivered presentation is definetly not really good. Let's try to do the same but using the paper [AutoGluon–TimeSeries:
AutoML for Probabilistic Time Series Forecasting](https://arxiv.org/abs/2308.05566)

In [17]:
#First we delete the vector database
#Code to delete db. (if needed)

# Delete the collection
#vector_db.delete_collection()
#print("Collection deleted successfully.")

In [18]:
#Import pdf loader.
from langchain_community.document_loaders import UnstructuredPDFLoader

#Define loader
loader_pdf = UnstructuredPDFLoader("/content/drive/My Drive/OpenAI_API/AutoGluon–TimeSeries.pdf")
#Load an article
article_pdf = loader_pdf.load()

#Print doc to check it out
#print(article_pdf)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [19]:
# Import text splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create an instance of RecursiveCharacterTextSplitter with custom chunk size and overlap
chunk_size = 750  # Adjust the chunk size as needed
chunk_overlap = 0  # Set the overlap between chunks

#Initiate splitter with desired parameters
splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

# Split the document into chunks using the RecursiveCharacterTextSplitter
splits = splitter.split_documents(article_pdf)

# Print the number of splits in the doc
print(f'Number of text splits in the document is: {len(splits)}')

# Print each split and a separator for readability
#for split in splits:
#    print(split)
#    print("---")

Number of text splits in the document is: 123


In [20]:
#Import vectorstore database and embeddings model
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Embeddings model
embeddings_model = OpenAIEmbeddings(model='text-embedding-ada-002')

#Define vector DB. Run this line of code only once.
#If accidently did more delete DB

vector_db = Chroma.from_documents(documents=splits, embedding=embeddings_model)

#Define retriever
retriever = vector_db.as_retriever()



In [21]:
#Define question
question = 'What are the main features of AutoGluon for Time Series data?'

#Fetch 3 documents from vector store related to question
vector_db.similarity_search_with_score(question, k=3)

[(Document(page_content='AutoGluon–TimeSeries enables users to generate accurate forecasts in a few lines of code. This democratizes machine learning, lowering the barrier to entry to forecasting for non-experts. At the same time, AutoGluon–TimeSeries can be used by experienced users to design highly accurate forecasting pipelines. More accurate forecasts can directly translate to real-world impact in various domains. For example, forecasting renewable energy generation is a crucial component of smart grid management (Tripathy and Prusty, 2021); accurately predicting demand leads to more efficient inventory management and increased revenue (Makridakis et al., 2022).', metadata={'source': '/content/drive/My Drive/OpenAI_API/AutoGluon–TimeSeries.pdf'}),
  0.18131937086582184),
 (Document(page_content='Abstract We introduce AutoGluon–TimeSeries—an open-source AutoML library for probabilistic time series forecasting.1 Focused on ease of use and robustness, AutoGluon–TimeSeries enables user

### **Test the responses for the qustion using data from Vector Database**

In [22]:
from langchain.prompts import ChatPromptTemplate
# Prompt
template = """Answer the question based on the following context:
{context}

Question: {question}
"""

#Define rag_prompt from template
rag_prompt = ChatPromptTemplate.from_template(template)

#Print the promt to check it everything is ok
#rag_prompt

#Define LLM
RAG_llm = ChatOpenAI(model=GPT3)

#Define Chain
RAG_chain = rag_prompt | RAG_llm

#Assign docs
docs = vector_db.similarity_search(question, k=3)

#Chain to answer question based on defined docs
RAG_chain.invoke({"context":docs,"question": question})

# Create the Retrieval-Augmented Generation (RAG) chain with dynamic retrieval

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    # Define the input variables for the chain
    {"context": retriever, "question": RunnablePassthrough()}
    # Pipe the input through the RAG prompt template
    | rag_prompt
    # Pass the formatted prompt to the language model (LLM)
    | RAG_llm
    # Parse the LLM's output using the StrOutputParser
    | StrOutputParser()
)

#Invoke the chain

rag_chain.invoke("What are the main features of AutoGluon for Time Series data?")


'The main features of AutoGluon for Time Series data include:\n1. Ability to generate accurate forecasts with just a few lines of Python code\n2. Combines statistical models, machine-learning based forecasting approaches, and ensembling techniques\n3. Can generate both point and probabilistic forecasts\n4. Supports both static and time-varying covariates\n5. Designed for ease of use and robustness\n6. Demonstrates strong empirical performance on benchmark datasets'

### **Build RAG BOT chain prompt template**

In [23]:
import os
from langchain.schema import SystemMessage, HumanMessage, AIMessage
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.vectorstores import Chroma  # Assuming this is the type of your vector_db

# Initialize the OpenAI Chat model
chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)


# Function to augment the prompt with contextual information from your vector database
def augment_prompt(query: str, vector_db):
    results = vector_db.similarity_search(query, k=3)
    source_knowledge = "\n".join([x.page_content for x in results])
    augmented_prompt = f"""Using the contexts below, answer the query:

    Context:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

# Define stop words for our chatbot
stop_words = ["exit", "quit", "stop"]

# Initialize the chat history
chat_history = ChatMessageHistory()

# Define the initial system message and add it to the messages
initial_system_message = SystemMessage(content="You are a helpful assistant. Answer the questions based on the provided context.")
#messages = initial_system_message

# Define the chat prompt with augmented context outside the loop
rag_bot_prompt = ChatPromptTemplate.from_messages([
    initial_system_message,
    MessagesPlaceholder(variable_name="messages"),
    # The context-rich prompt from user query will be dynamically added later in the loop
])

# Create the RAG LLM chain by piping the RAG prompt to the LLM
rag_bot_chain = rag_bot_prompt | chat
# Setup the chat chain with message history
rag_chain_with_message_history = RunnableWithMessageHistory(
        runnable=rag_bot_chain,  # Chain the prompt template with the ChatOpenAI instance
        get_session_history=lambda session_id: chat_history,
        input_messages_key="messages",
        history_messages_key="chat_history"
    )

# Setup the interactive chat structure
while True:
    user_input = input("User: ")
    if user_input.lower() in stop_words:
        print("Exiting the chat...")
        break
    chat_history.add_user_message(user_input)

    # Generate the augmented prompt with the new user query
    augmented_prompt = augment_prompt(user_input, vector_db)
     # Treat augmented prompt as an additional message for context.
     #This actually prioritize the information in Vector database over the data from internet
    chat_history.add_message(augmented_prompt)

    # Generate the AI response with the augmented context
    ai_response = rag_chain_with_message_history.invoke({"messages": chat_history.messages}, {"configurable": {"session_id": chat_history}})

    # Add the AI response to the chat history and display it
    chat_history.add_ai_message(ai_response.content)
    print(f"AI: {ai_response.content}")


User: What are the main features of AutoGluon for Time Series data?
AI: The main features of AutoGluon for Time Series data include:
1. Ease of use: Users can generate accurate forecasts with just a few lines of Python code, making forecasting more accessible to non-experts.
2. Robustness: AutoGluon combines conventional statistical models, machine-learning based forecasting approaches, and ensembling techniques to deliver high accuracy in a short training time.
3. Performance: AutoGluon demonstrates strong empirical performance on a variety of benchmark datasets, outperforming a range of forecasting methods.
4. Accessibility: By leveraging ensembles of diverse forecasting models, AutoGluon enables both non-experts and experienced users to design highly accurate forecasting pipelines efficiently.
User: What models does AutoGluon use to forecast Time Series data?
AI: AutoGluon uses a combination of models to forecast Time Series data, including:
1. Conventional statistical models
2. Mac