In order to run this, you should have vector database `(vector_db_)` already stored as .zip. You can download it using the link
https://drive.google.com/drive/folders/1o2Y-BLNWNFT5zesysd2n3QZPtN0Jro3I?usp=drive_link

In [18]:
# Install the required libraries
# Note: After running this code, the kernel needs to be restarted.

!pip install \
    chromadb==0.5.5 \
    langchain-chroma==0.1.2 \
    langchain==0.2.11 \
    langchain-community==0.2.10 \
    langchain-text-splitters==0.2.2 \
    langchain-groq==0.1.6 \
    transformers==4.43.2 \
    sentence-transformers==3.0.1 \
    unstructured==0.15.0 \
    "unstructured[pdf]==0.15.0"




Run the code snippet below it has every library included.

In [19]:
import os
import pandas as pd
from transformers import pipeline

# Langchain modules
from langchain.document_loaders import UnstructuredFileLoader, PyPDFDirectoryLoader, TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import TextLoader
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.llms import HuggingFacePipeline
from langchain.text_splitter import (
    CharacterTextSplitter,
    RecursiveCharacterTextSplitter,
    SentenceTransformersTokenTextSplitter,
    TokenTextSplitter,
)
from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores import Chroma
from langchain_groq import ChatGroq

# Set API Key for Groq
os.environ["GROQ_API_KEY"] = "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"


# Download Sentence Transformers Embedding From Hugging Face
embeddings = HuggingFaceEmbeddings(model_name ='sentence-transformers/all-MiniLM-L6-v2')
# embeddings = HuggingFaceEmbeddings() # Make sure you use the same embedders as you used to embed the original dataset into vector database.


There are two ways to read and predict in the model, Manual and Automatic

# Automatic entry of vector database


## Unziping

In [None]:
import os
from zipfile import ZipFile

# Directory containing the zip files
zip_dir = "/content/"
extract_path = "/content/"  # Extraction path

# Iterate through all files in the directory
for file_name in os.listdir(zip_dir):
    if file_name.endswith(".zip"):  # Check if the file is a .zip file
        zip_path = os.path.join(zip_dir, file_name)
        print(f"Extracting {zip_path}...")

        # Unzip the file
        with ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(extract_path)
            print(f"Extraction completed for {file_name}!")

Extracting /content/vectordb_book2.zip...
Extraction completed for vectordb_book2.zip!
Extracting /content/vector_db_auto_parts.zip...
Extraction completed for vector_db_auto_parts.zip!
Extracting /content/vectordb_book1.zip...
Extraction completed for vectordb_book1.zip!


## Reading the vector database

In [20]:
import os
from langchain.vectorstores import Chroma

# Directory containing vector databases
vector_db_dir = "/content/content/"
embedding_function = embeddings  # Ensure this is defined earlier in your code

# Initialize a list to store retrievers
retrievers = []

# Iterate through all subdirectories in the specified directory
for sub_dir in os.listdir(vector_db_dir):
    full_path = os.path.join(vector_db_dir, sub_dir)
    if os.path.isdir(full_path):  # Check if it's a directory
        print(f"Loading vector database from: {full_path}")

        # Load the vector database
        vectordb = Chroma(
            persist_directory=full_path,
            embedding_function=embedding_function
        )

        # Create a retriever and add it to the list
        retrievers.append(vectordb.as_retriever())

# Define a combined retriever function
def combined_retriever(query):
    combined_results = []
    for retriever in retrievers:
        # Retrieve documents from each retriever
        combined_results.extend(retriever.get_relevant_documents(query))

    return combined_results

# Example Usage
query = "What is the price of a fuel injector for 2005 toyota rav 4 ?"
results = combined_retriever(query)
print(f"Retrieved {len(results)} documents.")

# Initialize the LLM
llm = ChatGroq(
    model="llama-3.1-70b-versatile",
    temperature=0
)

# Load a question-answering chain using the "stuff" chain type
qa_chain = load_qa_chain(llm, chain_type="map_reduce")


Loading vector database from: /content/content/chroma_auto_parts_2001
Loading vector database from: /content/content/auto_book_1
Retrieved 8 documents.


## Chatbot finetuning and chatting

In [21]:
# Create a prompt template
chatbot_prompt = """
You are an automotive parts assistant. When a user asks about their vehicle, you will refer to relevant documents and provide guidance in a concise, clear manner. Your goal is to stay under 1,000 tokens for each response, including all necessary details.

Here are the steps to follow:

1. **Identify the Vehicle and Parts**: Determine what part the user is asking about based on their question. If it’s unclear, ask clarifying questions to understand the model, make, and year of the vehicle.

2. **Provide Pricing**: Always provide the price for the part requested. If you don’t have the exact year of the vehicle the user asks about, provide the price for the earliest year available in your database and inform the user that you cannot help with the exact year requested. You should always mention the earliest year available, even if the part is not available for the requested year.

3. **Include Related Subcategories**: If the part requested falls under a category that has subcategories (e.g., "engine parts" has "fuel injectors", "oil filters", etc.), list those subcategories with their prices, if available.

4. **Mention Fluids**: If the part requested is related to fluids (e.g., oil, transmission fluid), also mention the fluids associated with the part and their availability/price.

5. **Maintain Chat Context**: Keep track of previous conversations and refer to them when necessary to provide consistent follow-up answers. For example, if the user has already asked about a part and later asks about another part from the same vehicle, refer back to previous details such as model, year, or part-related info.

6. **Structure the Response**:
   - Begin by acknowledging the vehicle type and confirming details, especially if the model year or make was mentioned.
   - Provide the price for the part.
   - List any relevant subcategories and related fluids.
   - If no exact match for the year is found, state the earliest year available in the database and explain the limitation.

**Keep responses concise, with no more than 1,000 tokens. If the response exceeds the token limit, trim unnecessary details.**

Conversation so far:
{chat_history}

User's question:
{user_input}
"""

# Define the chatbot function
def chatbot():
    print("Welcome to the Car Issue Chatbot! Type 'exit' to end the conversation.")

    # Initialize chat history
    chat_history = []

    while True:
        # Get user input
        query = input("\n **You**: ")

        # Exit condition
        if query.lower() == "exit":
            print("Chatbot: Goodbye!")
            break

        # Automatically add the user message to the chat history
        chat_history.append({"role": "user", "content": query})

        # Create prompt for the current conversation context
        prompt = chatbot_prompt.format(
            chat_history="\n".join([f"{message['role'].capitalize()}: {message['content']}" for message in chat_history]),
            user_input=query
        )

        # Retrieve documents from both vector databases
        combined_results = combined_retriever(query)

        # Pass the combined results to the chain
        response = qa_chain.invoke(
            {"input_documents": combined_results, "question": query}, return_only_outputs=True # Passing the user query directly
        )

        # Print the result
        print(f"\n **SBG**: {response['output_text']}")

        # Add assistant's response to chat history automatically
        chat_history.append({"role": "assistant", "content": response})

# Run the chatbot
chatbot()

Welcome to the Car Issue Chatbot! Type 'exit' to end the conversation.

 **You**: My car's brke is not working

 **SBG**: Unfortunately, the provided text does not offer direct troubleshooting or repair information for a non-working brake. However, it does list some brake-related components for various Mercedes-Benz models, which might be relevant if you're looking to replace or repair your brake system.

It's recommended to consult a professional mechanic or the vehicle's repair manual for assistance in diagnosing and fixing the issue with your car's brake.

 **You**: 2001 BMW X5

 **SBG**: Based on the provided text, I can answer the following questions about the 2001 BMW X5:

1. What are the prices of the transmission systems for a 2001 BMW X5 1.8 Manual? 
- The prices are: 
  - Pro-Series OE: $773.48
  - Pro-Series OE Plus: $726.8
  - Brembo: $810.73
  - Centric: $798.1

2. What are the prices of the turn signals for a 2001 BMW X5 1.8 Manual? 
- The prices are: 
  - Pro-Series OE: 

KeyboardInterrupt: Interrupted by user

## Using invoke method

In [None]:
# Create a prompt template
chatbot_prompt = """
You are an automotive parts assistant. When a user asks about their vehicle, you will refer to relevant documents and provide guidance in a concise, clear manner. Your goal is to stay under 1,000 tokens for each response, including all necessary details.

Here are the steps to follow:

1. **Identify the Vehicle and Parts**: Determine what part the user is asking about based on their question. If it’s unclear, ask clarifying questions to understand the model, make, and year of the vehicle.

2. **Provide Pricing**: Always provide the price for the part requested. If you don’t have the exact year of the vehicle the user asks about, provide the price for the earliest year available in your database and inform the user that you cannot help with the exact year requested. You should always mention the earliest year available, even if the part is not available for the requested year.

3. **Include Related Subcategories**: If the part requested falls under a category that has subcategories (e.g., "engine parts" has "fuel injectors", "oil filters", etc.), list those subcategories with their prices, if available.

4. **Mention Fluids**: If the part requested is related to fluids (e.g., oil, transmission fluid), also mention the fluids associated with the part and their availability/price.

5. **Maintain Chat Context**: Keep track of previous conversations and refer to them when necessary to provide consistent follow-up answers. For example, if the user has already asked about a part and later asks about another part from the same vehicle, refer back to previous details such as model, year, or part-related info.

6. **Structure the Response**:
   - Begin by acknowledging the vehicle type and confirming details, especially if the model year or make was mentioned.
   - Provide the price for the part.
   - List any relevant subcategories and related fluids.
   - If no exact match for the year is found, state the earliest year available in the database and explain the limitation.

**Keep responses concise, with no more than 1,000 tokens. If the response exceeds the token limit, trim unnecessary details.**

Conversation so far:
{chat_history}

User's question:
{user_input}
"""

# Define the chatbot function
def chatbot():
    print("Welcome to the Car Issue Chatbot! Type 'exit' to end the conversation.")

    # Initialize chat history
    chat_history = []

    while True:
        # Get user input
        query = input("\n **You**: ")

        # Exit condition
        if query.lower() == "exit":
            print("Chatbot: Goodbye!")
            break

        # Automatically add the user message to the chat history
        chat_history.append({"role": "user", "content": query})

        # Create prompt for the current conversation context
        prompt = chatbot_prompt.format(
            chat_history="\n".join([f"{message['role'].capitalize()}: {message['content']}" for message in chat_history]),
            user_input=query
        )

        # Retrieve documents from both vector databases
        combined_results = combined_retriever(query)

        # Pass the combined results to the chain
        response = qa_chain.invoke(
            {"input_documents": combined_results, "question": query}, return_only_outputs=True # Passing the user query directly
        )

        # Print the result
        print(f"\n **SBG**: {response['output_text']}")

        # Add assistant's response to chat history automatically
        chat_history.append({"role": "assistant", "content": response})

# Run the chatbot
chatbot()

Welcome to the Car Issue Chatbot! Type 'exit' to end the conversation.

 **You**: My car is not starting

 **SBG**: Based on the provided text, here are some potential steps you can take to troubleshoot the issue:

1. Check the ignition system: The text suggests that the issue might be related to the ignition system. You can try checking the ignition system components, such as the spark plugs, ignition coil, or crankshaft position sensor (CKP).

2. Check for power in the run and crank positions: Use a test light to see if there is power in the run and crank positions. Consult a wiring diagram to identify the correct wires to check.

3. Check the battery: If your car was recently run and now won't start after sitting for the night, it might be related to a loss of surface charge. Try jump-starting the car or replacing the battery.

4. Check the starter and solenoid: The text mentions that replacing a solenoid and starter drive might be necessary to diagnose and repair starting system is

# Manual entry of vector database - Practice only

In [None]:
import os
from zipfile import ZipFile

# Directory containing the zip files
zip_dir = "/content/"
extract_path = "/content/"  # Extraction path

# Iterate through all files in the directory
for file_name in os.listdir(zip_dir):
    if file_name.endswith(".zip"):  # Check if the file is a .zip file
        zip_path = os.path.join(zip_dir, file_name)
        print(f"Extracting {zip_path}...")

        # Unzip the file
        with ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(extract_path)
            print(f"Extraction completed for {file_name}!")


Extracting /content/vector_db_parts_2001.zip...
Extraction completed for vector_db_parts_2001.zip!
Extracting /content/vectordb_book1.zip...
Extraction completed for vectordb_book1.zip!


In [None]:
# Retrieving the first vector database
persist_directory_1 = "/content/content/auto_book_1"
vectordb_1 = Chroma(
    persist_directory=persist_directory_1,
    embedding_function=embeddings
)

# Retrieving the second vector database
persist_directory_2 = "/content/content/chroma_auto_parts_2001"
vectordb_2 = Chroma(
    persist_directory=persist_directory_2,
    embedding_function=embeddings
)
"""
# Retrieving the third vector database
persist_directory_3 = "/content/content/auto_book_2"
vectordb_3 = Chroma(
    persist_directory=persist_directory_3,
    embedding_function=embeddings
)
"""
# Create retrievers for both vector databases
retriever_1 = vectordb_1.as_retriever()
retriever_2 = vectordb_2.as_retriever()
#retriever_3 = vectordb_3.as_retriever()

# Define a combined retriever function
def combined_retriever(query):
    # Retrieve documents from both retrievers
    results_1 = retriever_1.get_relevant_documents(query)
    results_2 = retriever_2.get_relevant_documents(query)
    #results_3 = retriever_3.get_relevant_documents(query)

    # Combine results (you can sort or filter if needed)
    combined_results = results_1 + results_2 #+ results_3
    return combined_results

# Initialize the LLM
llm = ChatGroq(
    model="llama-3.1-70b-versatile",
    temperature=0
)

# Load a question-answering chain using the "stuff" chain type
qa_chain = load_qa_chain(llm, chain_type="map_reduce") # "stuff" , "refine"


  vectordb_1 = Chroma(


In [None]:
# Create a prompt template
chatbot_prompt = """
You are an automotive parts assistant. When a user asks about their vehicle, you will refer to relevant documents and provide guidance in a concise, clear manner. Your goal is to stay under 1,000 tokens for each response, including all necessary details.

Here are the steps to follow:

1. **Identify the Vehicle and Parts**: Determine what part the user is asking about based on their question. If it’s unclear, ask clarifying questions to understand the model, make, and year of the vehicle.

2. **Provide Pricing**: Always provide the price for the part requested. If you don’t have the exact year of the vehicle the user asks about, provide the price for the earliest year available in your database and inform the user that you cannot help with the exact year requested. You should always mention the earliest year available, even if the part is not available for the requested year.

3. **Include Related Subcategories**: If the part requested falls under a category that has subcategories (e.g., "engine parts" has "fuel injectors", "oil filters", etc.), list those subcategories with their prices, if available.

4. **Mention Fluids**: If the part requested is related to fluids (e.g., oil, transmission fluid), also mention the fluids associated with the part and their availability/price.

5. **Maintain Chat Context**: Keep track of previous conversations and refer to them when necessary to provide consistent follow-up answers. For example, if the user has already asked about a part and later asks about another part from the same vehicle, refer back to previous details such as model, year, or part-related info.

6. **Structure the Response**:
   - Begin by acknowledging the vehicle type and confirming details, especially if the model year or make was mentioned.
   - Provide the price for the part.
   - List any relevant subcategories and related fluids.
   - If no exact match for the year is found, state the earliest year available in the database and explain the limitation.

**Keep responses concise, with no more than 1,000 tokens. If the response exceeds the token limit, trim unnecessary details.**

Conversation so far:
{chat_history}

User's question:
{user_input}

"""

# Define the chatbot function
def chatbot():
    print("Welcome to the Car Issue Chatbot! Type 'exit' to end the conversation.")

    # Initialize chat history
    chat_history = []

    while True:
        # Get user input
        query = input("\n **You**: ")

        # Exit condition
        if query.lower() == "exit":
            print("Chatbot: Goodbye!")
            break

        # Automatically add the user message to the chat history
        chat_history.append({"role": "user", "content": query})

        # Create prompt for the current conversation context
        prompt = chatbot_prompt.format(
            chat_history="\n".join([f"{message['role'].capitalize()}: {message['content']}" for message in chat_history]),
            user_input=query
        )

        # Retrieve documents from both vector databases
        combined_results = combined_retriever(query)

        # Pass the combined results to the chain
        response = qa_chain.run(
            input_documents=combined_results,
            question=prompt  # Passing the user query directly
        )

        # Print the result
        print(f"\n **SBG**: {response}")

        # Add assistant's response to chat history automatically
        chat_history.append({"role": "assistant", "content": response})

# Run the chatbot
chatbot()

Welcome to the Car Issue Chatbot! Type 'exit' to end the conversation.

 **You**: My tire is falt


  results_1 = retriever_1.get_relevant_documents(query)
  response = qa_chain.run(


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1554 > 1024). Running this sequence through the model will result in indexing errors



 **SBG**: I think you meant to say "My tire is flat." I'd be happy to help you with that. Can you please tell me what kind of vehicle you have, including the make, model, and year? This will help me provide more accurate information about your tire.

However, I can provide some general information about tires. According to our documentation, tire-related topics include:

- Tire pressure monitoring systems (TPMSs)
- Tire construction
- Load ratings
- DOT tire codes
- Inspecting and installing tires

Additionally, we have information on tire services such as:

* Tire Inflation
* Checking Air Pressure
* Adjusting Tire Pressure
* Tire Wear
* Sidewall Checks
* Tire Rotation
* Removing and Tightening Lug Nuts
* Removing and Mounting Tires on Rims
* Inspecting the Tire and Wheel
* Valve Stem Service
* Tire Repair

Please provide more details about your vehicle so I can offer more specific guidance and pricing for the required services.

 **You**: 2001 Toyota Corolla brakes

 **SBG**: For you