<a href="https://colab.research.google.com/github/Nobobi-Hasan/RAG-Chatbots-for-Technical-Documentation/blob/main/RAG_Chatbots_for_Technical_Documentation_(DataCamp).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Project Instructions**\
The car manual HTML document has been loaded for you as 'car_docs'. Using Retrieval Augmented Generation (RAG) to make OpenAI's 'gpt-4o-mini' aware of the contents of 'car_docs', answer the following user query:

"The Gasoline Particular Filter Full warning has appeared. What does this mean and what should I do about it?"

* Store the answer to the user query in the variable 'answer'.

You're working for a well-known car manufacturer who is looking at implementing LLMs into vehicles to provide guidance to drivers. You've been asked to experiment with integrating car manuals with an LLM to create a context-aware chatbot. They hope that this context-aware LLM can be hooked up to a text-to-speech software to read the model's response aloud.

As a proof of concept, you'll integrate several pages from a car manual that contains car warning messages and their meanings and recommended actions. This particular manual, stored as an HTML file, `mg-zs-warning-messages.html`, is from an MG ZS automobile, a compact SUV. Armed with your newfound knowledge of LLMs and LangChain, you'll implement Retrieval Augmented Generation (RAG) to create the context-aware chatbot.

**Note: Although we'll be using the OpenAI API in this project, you do not need to specify an API key.**

## Install the necessary packages

In [None]:
# Run this cell to install the necessary packages
import subprocess
import pkg_resources

def install_if_needed(package, version):
    '''Function to ensure that the libraries used are consistent to avoid errors.'''
    try:
        pkg = pkg_resources.get_distribution(package)
        if pkg.version != version:
            raise pkg_resources.VersionConflict(pkg, version)
    except (pkg_resources.DistributionNotFound, pkg_resources.VersionConflict):
        subprocess.check_call(["pip", "install", f"{package}=={version}"])

install_if_needed("langchain-core", "0.3.72")
install_if_needed("langchain-openai", "0.3.28")
install_if_needed("langchain-community", "0.3.27")
install_if_needed("unstructured", "0.18.11")
install_if_needed("langchain-chroma", "0.2.5")
install_if_needed("langchain-text-splitters", "0.3.9")

In [None]:
# Import the required packages
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import UnstructuredHTMLLoader
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

## Load Data

In [None]:
# Load the HTML as a LangChain document loader
loader = UnstructuredHTMLLoader(file_path="data/mg-zs-warning-messages.html")
car_docs = loader.load()

In [None]:
import os

# Load the models required to complete the exercise
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", openai_api_key=os.environ["OPENAI_API_KEY"])

## Split Data

In [None]:
# Split the loaded document using RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
car_docs_split = text_splitter.split_documents(car_docs)

## Chroma DB

In [None]:
# Create a Chroma vector store from the split documents
vectorstore = Chroma.from_documents(
    documents=car_docs_split,
    embedding=embeddings,
    collection_name="mg-zs-warning-messages"
)

In [None]:
# Create a Chroma vector retriever for retrieving stroed data
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 3}
)

## Creating Prompt Template and Chain

In [None]:
# Define a chat prompt template for question answering
prompt_template = ChatPromptTemplate.from_template(
    """
    You are an assistant knowledgeable about MG ZS warning messages.
    Use the following context to answer the user's question.
    If the answer cannot be found in the context, say you don't know.

    Context:
    {context}

    Question:
    {question}

    Answer in a clear and concise manner.
    """
)

In [None]:
# Createing the chain
from langchain_core.output_parsers import StrOutputParser

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt_template
    | llm
    | StrOutputParser()
)

## Test

In [None]:
answer = chain.invoke("The Gasoline Particular Filter Full warning has appeared. What does this mean and what should I do about it?")

print(answer)