![A car dashboard with lots of new technical features.](images/dashboard.jpg)

You're working for a well-known car manufacturer who is looking at implementing LLMs into vehicles to provide guidance to drivers. You've been asked to experiment with integrating car manuals with an LLM to create a context-aware chatbot. They hope that this context-aware LLM can be hooked up to a text-to-speech software to read the model's response aloud.

As a proof of concept, you'll integrate several pages from a car manual that contains car warning messages and their meanings and recommended actions. This particular manual, stored as an HTML file, `mg-zs-warning-messages.html`, is from an MG ZS, a compact SUV. Armed with your newfound knowledge of LLMs and LangChain, you'll implement Retrieval Augmented Generation (RAG) to create the context-aware chatbot.

## Before you start

In order to complete the project you will need to create a developer account with OpenAI and store your API key as a secure environment variable. Instructions for these steps are outlined below.

### Create a developer account with OpenAI

1. Go to the [API signup page](https://platform.openai.com/signup). 

2. Create your account (you'll need to provide your email address and your phone number).

3. Go to the [API keys page](https://platform.openai.com/account/api-keys). 

4. Create a new secret key.

<img src="images/openai-new-secret-key.png" width="200">

5. **Take a copy of it**. (If you lose it, delete the key and create a new one.)

### Add a payment method

OpenAI sometimes provides free credits for the API, but this can vary depending on geography. You may need to add debit/credit card details. 

**This project should cost less than 1 US cents with GPT-3.5-Turbo (but if you rerun tasks, you will be charged every time).**

1. Go to the [Payment Methods page](https://platform.openai.com/account/billing/payment-methods).

2. Click Add payment method.

<img src="images/openai-add-payment-method.png" width="200">

3. Fill in your card details.

### Add an environmental variable with your OpenAI key

1. In the workbook, click on "Environment," in the top toolbar and select "Environment variables".

2. Click "Add" to add environment variables.

3. In the "Name" field, type "OPENAI_API_KEY". In the "Value" field, paste in your secret key.

<img src="images/datalab-env-var-details.png" width="500">

4. Click "Create", then you'll see the following pop-up window. Click "Connect," then wait 5-10 seconds for the kernel to restart, or restart it manually in the Run menu.

<img src="images/connect-integ.png" width="500">

### Update to Python 3.10

Due to how frequently the libraries required for this project are updated, you'll need to update your environment to Python 3.10:

1. In the workbook, click on "Environment," in the top toolbar and select "Session details".

2. In the workbook language dropdown, select "Python 3.10".

3. Click "Confirm" and hit "Done" once the session is ready.

In [122]:
# Update your environment to Python 3.10 as described above before running this cell
import subprocess
import pkg_resources

def install_if_needed(package, version):
    '''Function to ensure that the libraries used are consistent to avoid errors.'''
    try:
        pkg = pkg_resources.get_distribution(package)
    except pkg_resources.DistributionNotFound:
        subprocess.check_call(["pip", "install", f"{package}=={version}"])

install_if_needed("langchain", "0.2.2")
install_if_needed("langchain-openai", "0.1.8")
install_if_needed("langchain-community", "0.2.3")
install_if_needed("unstructured", "0.14.4")
install_if_needed("chromadb", "0.5.0")

In [123]:
# Set your API key to a variable
import os
openai_api_key = openai_api_key #replace this with real openapi keys

# Import the required packages
import langchain
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.document_loaders import UnstructuredHTMLLoader
from langchain_openai import OpenAIEmbeddings
from langchain.schema.runnable import RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_community.document_loaders import UnstructuredHTMLLoader

In [124]:
# Load the HTML as a LangChain document loader
loader = UnstructuredHTMLLoader(file_path="data/mg-zs-warning-messages.html")
car_docs = loader.load()

In [125]:
# Preview the car_docs document
car_docs[0].page_content[:2000]  # Display the first 2000 characters of the document content



In [126]:
# Split the document into smaller chunks for better processing
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # Set the chunk size to 1000 characters
    chunk_overlap=200  # Set the overlap between chunks to 200 characters
)

# Split the car_docs into chunks
car_docs_chunks = text_splitter.split_documents(car_docs)

# Preview the first chunk to ensure proper splitting
car_docs_chunks[0].page_content



In [127]:
# Initialize the OpenAI embeddings
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Create a Chroma vector store
vector_store = Chroma(embedding_function=embeddings, collection_name="car_docs_embeddings")

# Add the document chunks to the vector store
vector_store.add_documents(car_docs_chunks)

['91e4a9f7-0b89-41ca-bcaa-25d52437f2f5',
 '360f9bda-6657-4519-b593-d2ef43240670',
 'fee76219-a434-4b3e-a37a-8932a370ff9e',
 '7e9c7c04-c102-433b-adff-52c1e31f3cd5',
 '0c1f3812-0b3f-4f63-922f-7eb805eaef6d',
 '4eb7c98f-69a1-4081-947d-4005bfd1f911',
 '122ce78e-ceed-4913-b004-264cb6829f28',
 'c0c21b31-6955-4cbf-8498-41af35265ae3',
 '7734769f-6116-454c-ab59-a21d038c5e14']

In [128]:
# Preview the added car_docs_chunks in vector_store
# Retrieve the first document from the vector store
retrieved_docs = vector_store.similarity_search(query=" ", k=8)

# Display the content of the first retrieved document
retrieved_docs[7].page_content



In [129]:
# Split GDPR HTML
splits = text_splitter.split_documents(car_docs)

In [130]:
# Initialize Chroma vectorstore with documents as splits and using OpenAIEmbeddings
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings(openai_api_key=openai_api_key))

In [131]:
# Define RAG prompt
prompt = PromptTemplate(input_variables=['question', 'context'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:")

# Initialize chat-based LLM with 0 temperature and using GPT-3.5 Turbo
model = ChatOpenAI(openai_api_key=openai_api_key, model_name="gpt-3.5-turbo", temperature=0)

# Setup the chain
rag_chain = (
    {"context": retriever , "question": RunnablePassthrough()}
    | prompt
    | model
)

# Initialize query
query = "The Gasoline Particular Filter Full warning has appeared. What does this mean and what should I do about it?"

# Invoke the query
answer = rag_chain.invoke(query).content
print(answer)



In [134]:
# Initialize query
query = "The Gasoline Particular Filter Full warning is showing alert, but the car isn't malfunctioned. is further checking required?"

# Invoke the query
answer = rag_chain.invoke(query).content
print(answer)

