# Project: Building RAG Chatbots for Technical Documentation

A well-known car manufacturer that is looking at implementing large language models (LLMs) into vehicles to help drivers. Experiment with integrating car manuals with an LLM to create a chatbot that understands the situation. The hope is that this chatbot can be connected to a text-to-speech program to read its answers out loud.

As a test, I'll use several pages from a car manual that explain car warning messages, their meanings, and what to do about them. This specific manual, saved as an HTML file named `mg-zs-warning-messages.html`, is for an MG ZS, which is a small SUV. Using what I've learned about `LLMs` and `LangChain`, I will build the chatbot using a technique called `Retrieval Augmented Generation (RAG)`.

> **Retrieval Augmented Generation (RAG):** is a technique that gives large language models (LLMs) the ability to access and use up-to-date, external information to create more accurate and relevant responses.

> **LangChain:** is a set of tools that helps developers create powerful applications by connecting large language models (LLMs) with other sources of data.





## Necessary packages

In [None]:
# Install the necessary packages
import subprocess
import pkg_resources

def install_if_needed(package, version):
    '''Function to ensure that the libraries used are consistent to avoid errors.'''
    try:
        pkg = pkg_resources.get_distribution(package)
        if pkg.version != version:
            raise pkg_resources.VersionConflict(pkg, version)
    except (pkg_resources.DistributionNotFound, pkg_resources.VersionConflict):
        subprocess.check_call(["pip", "install", f"{package}=={version}"])

install_if_needed("langchain-core", "0.3.72")
install_if_needed("langchain-openai", "0.3.28")
install_if_needed("langchain-community", "0.3.27")
install_if_needed("unstructured", "0.18.11")
install_if_needed("langchain-chroma", "0.2.5")
install_if_needed("langchain-text-splitters", "0.3.9")

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [None]:
!python3 -m pip install --upgrade pip

Defaulting to user installation because normal site-packages is not writeable


In [None]:
# Import the required packages
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import UnstructuredHTMLLoader
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
import os

## Loading Data

Load the HTML as a LangChain document loader

In [None]:
loader = UnstructuredHTMLLoader(file_path="data/mg-zs-warning-messages.html")
car_docs = loader.load()

In [None]:
# Display the contents of the car_docs documents
for i, doc in enumerate(car_docs):
    print(f"Document {i+1}:\n{doc}\n")

Document 1:












## Load the models

In [None]:
# Load the models required to complete the exercise
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", openai_api_key=os.environ["OPENAI_API_KEY"])

## Splitting Data

The loaded document is split into smaller, manageable chunks  to facilitate efficient retrieval.

In [None]:
# Initialize the RecursiveCharacterTextSplitter with desired chunk size and overlap
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

# Split the loaded documents into chunks
car_docs_chunks = text_splitter.split_documents(car_docs)

In [None]:
# Display the contents of the car_docs_chunks
for i, chunk in enumerate(car_docs_chunks):
    print(f"Chunk {i+1}:\n{chunk}\n")

Chunk 1:

Chunk 2:

Chunk 3:

Chunk 4:

Chunk 5:

Chunk 6:

Chunk 7:

Chunk 8:

Chunk 9:




## Setting up Vector Store
The document chunks and their embeddings are stored in a Chroma vector database for efficient similarity search.

In [None]:
chroma_store = Chroma.from_documents(
    documents=car_docs_chunks,
    embedding=embeddings
)

Retriever to retrieve relevant documents from the vector store

In [None]:
retriever = chroma_store.as_retriever()

## Initializing LLM and Prompt

A ChatOpenAI model (gpt-4o-mini) and a ChatPromptTemplate are initialized to generate responses.

In [None]:
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0  # Default temperature set to 0 for less creative outputs
)

In [None]:
# Create an instance of the ChatPromptTemplate class and use the .from_template() method
prompt_template = ChatPromptTemplate.from_template(
    "Given the following context: {context}, please provide a detailed response."
)

## Defining RAG Chain

A RAG chain is defined using LangChain Expression Language (LCEL), connecting the retriever (vector store), the prompt template, and the LLM.

In [None]:
# Define the RAG chain using LangChain Expression Language (LCEL)
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt_template
    | llm
)

## Invoking RAG Chain
The RAG chain is invoked with a user query to retrieve relevant information from the manual and generate a context-aware response.

In [None]:
# Define the user query
user_query = "The Gasoline Particular Filter Full warning has appeared. What does this mean and what should I do about it?"

# Invoke chain with the user query
answer = rag_chain.invoke(user_query).content

# Display
print(answer)


1. **Stop Start System Fault**:
   - **Action**: The driver is advised to consult an MG Authorised Repairer as soon as possible.

2. **Clutch Switch Fault**:
   - **Description**: This message indicates a fault detected in the clutch switch.

3. **Gasoline Particular Filter Full**:
   - **Action**: The driver is advised to consult an MG Authorised Repairer as soon as possible.

4. **Ignition System Fault**:
   - **Description**: This indicates a fault in the ignition system.
   - **Action**: Immediate consultation with an MG Authorised Repairer is recommended.

5. **Low Oil Pressure**:
   - **Action**: The driver should stop the car safely, switch off the engine, check the engine oil level, and contact an MG Authorised Repairer as soon as possible.

6. **Engine Fault**:
   - **Description**: This message indicates a failure that will affect engine performance and emissions.
   - **Action**: The driver should contact an MG Authorised Repairer as soon as possible.

7. **Check Engine**:
