In order to run this, you should have vector databases `(vectorDB)` already stored as .zip. You can download it using the link
https://github.com/dheer183/Capstone-AI-Service-Bot/tree/1c3706ccd96bd8a637e9d1ac0d29006672da5deb/Service/vectorDB

In [19]:
!pip install chromadb langchain-chroma langchain langchain-community langchain-text-splitters langchain-groq langchain-huggingface --force-reinstall numpy==1.26.4

Collecting chromadb
  Using cached chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting langchain-chroma
  Using cached langchain_chroma-0.2.2-py3-none-any.whl.metadata (1.3 kB)
Collecting langchain
  Using cached langchain-0.3.20-py3-none-any.whl.metadata (7.7 kB)
Collecting langchain-community
  Using cached langchain_community-0.3.19-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-text-splitters
  Using cached langchain_text_splitters-0.3.6-py3-none-any.whl.metadata (1.9 kB)
Collecting langchain-groq
  Using cached langchain_groq-0.2.5-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-huggingface
  Using cached langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting numpy==1.26.4
  Using cached numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting build>=1.0.3 (from chromadb)
  Using cached build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting pydantic>=1.9 (from chromadb)
  Using cached 

Run the code snippet below it has every library included.

In [1]:
import os
import pandas as pd
from transformers import pipeline

# Langchain modules
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import TextLoader
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.llms import HuggingFacePipeline
from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores import Chroma
from langchain_groq import ChatGroq

# Set API Key for Groq
os.environ["GROQ_API_KEY"] = "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"


# Download Sentence Transformers Embedding From Hugging Face
embeddings = HuggingFaceEmbeddings(model_name = 'sentence-transformers/all-MiniLM-L12-v2')
# embeddings = HuggingFaceEmbeddings() # Make sure you use the same embedders as you used to embed the original dataset into vector database.

## Unziping

In [3]:
import os
from zipfile import ZipFile

# Directory containing the zip files
zip_dir = "/content/"
extract_path = "/content/"  # Extraction path

# Iterate through all files in the directory
for file_name in os.listdir(zip_dir):
    if file_name.endswith(".zip"):  # Check if the file is a .zip file
        zip_path = os.path.join(zip_dir, file_name)
        print(f"Extracting {zip_path}...")

        # Unzip the file
        with ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(extract_path)
            print(f"Extraction completed for {file_name}!")

Extracting /content/parts_data_vectorDB.zip...
Extraction completed for parts_data_vectorDB.zip!
Extracting /content/service_data_vectorDB.zip...
Extraction completed for service_data_vectorDB.zip!


## Reading the vector database

In [9]:
import os
from langchain.vectorstores import Chroma

# Directory containing vector databases
vector_db_dir = "/content/content/"
embedding_function = embeddings  # Ensure this is defined earlier in your code

# Initialize a list to store retrievers
retrievers = []

# Iterate through all subdirectories in the specified directory
for sub_dir in os.listdir(vector_db_dir):
    full_path = os.path.join(vector_db_dir, sub_dir)
    if os.path.isdir(full_path):  # Check if it's a directory
        print(f"Loading vector database from: {full_path}")

        # Load the vector database
        vectordb = Chroma(
            persist_directory=full_path,
            embedding_function=embedding_function
        )

        # Create a retriever and add it to the list
        retrievers.append(vectordb.as_retriever())

# Define a combined retriever function
def combined_retriever(query):
    combined_results = []
    for retriever in retrievers:
        # Retrieve documents from each retriever
        combined_results.extend(retriever.get_relevant_documents(query))

    return combined_results

# Example Usage
query = "What is the price of a fuel injector for 2005 toyota rav 4 ?"
results = combined_retriever(query)
print(f"Retrieved {len(results)} documents.")

# Initialize the LLM
llm = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0
)

# Load a question-answering chain using the "stuff" chain type
qa_chain = load_qa_chain(llm, chain_type="map_reduce")

Loading vector database from: /content/content/parts_data
Loading vector database from: /content/content/service_data
Retrieved 8 documents.


## Chatbot finetuning and chatting

In [10]:
# Create a prompt template
chatbot_prompt = """
Automotive Assistant Protocol
Always follow these steps in order:

Vehicle Identification

If ANY of these are missing, ask immediately:
[Make], [Model], [Year], [Engine Size]

Example: "Please confirm your vehicle's engine size (e.g., 2.0L Turbo)."

Pricing Components

Parts: Mid-range OEM-equivalent only (always state brand + price)
Example: "Bosch 02+ Oxygen Sensor - $85"

Labor: (Hours × $130). If data missing:
"Labor estimate unavailable - consult local shop"

Fixed Services:
• Oil Change: $100 (5L oil + filter + labor)
• Tire Repair: $50 (puncture fix + balance)

Mandatory Checks

Engine Size Gate: For engine-dependent parts (filters, belts, pumps):
"Need engine size to continue (e.g., 3.5L V6)."

Year Handling: If part unavailable for requested year:
"No [Part] for 2015 - using 2012 version at $X."

Response Template

Copy
Confirming [Year] [Make] [Model] [Engine Size]:

[Service] Estimate:
• Part: [Brand/Name] - $X
• Labor: [Y] hrs × $130 = $Z
**TOTAL: $(X+Z)**

[For Oil Changes]:
Includes:
- 5W-30 Full Synthetic (5L)
- WIX XP Filter
- Labor & disposal

Related Services:
• [Fluid/Subpart 1] - $
• [Fluid/Subpart 2] - $
Rules

Always state: "Mid-range [Brand] selected for quality/value balance"

Never exceed 3 related items

If complex repair: "Professional installation strongly recommended"

Token limit: Strict 1,000 characters
Conversation so far:
{chat_history}

User's question:
{user_input}
"""

# Define the chatbot function
def chatbot():
    print("Welcome to the Car Issue Chatbot! Type 'exit' to end the conversation.")

    # Initialize chat history
    chat_history = []

    while True:
        # Get user input
        query = input("\n **You**: ")

        # Exit condition
        if query.lower() == "exit":
            print("Chatbot: Goodbye!")
            break

        # Automatically add the user message to the chat history
        chat_history.append({"role": "user", "content": query})

        # Create prompt for the current conversation context
        prompt = chatbot_prompt.format(
            chat_history="\n".join([f"{message['role'].capitalize()}: {message['content']}" for message in chat_history]),
            user_input=query
        )

        # Retrieve documents from both vector databases
        combined_results = combined_retriever(query)

        # Pass the combined results to the chain
        response = qa_chain.invoke(
            {"input_documents": combined_results, "question": query}, return_only_outputs=True # Passing the user query directly
        )

        # Print the result
        print(f"\n **SBG**: {response['output_text']}")

        # Add assistant's response to chat history automatically
        chat_history.append({"role": "assistant", "content": response})

# Run the chatbot
chatbot()

Welcome to the Car Issue Chatbot! Type 'exit' to end the conversation.

 **You**: 2000 Toyota Rav 4 Brkes

 **SBG**: I don't know the specific information about the brakes for a 2000 Toyota RAV4. The provided text does not contain relevant information about the brakes for this vehicle, although it does mention "Brembo" which is a brake manufacturer. However, the context is not directly related to the question. The only related text is "Rear Brake Reline 1.0 Resurface or replace rear brake rotors to restore proper braking function," but it does not provide specific information about the brakes for a 2000 Toyota RAV4.

 **You**: 2000 Toyota Corolla 2.4L Brakes

 **SBG**: The 2000 Toyota Corolla 2.4L has the following brake options mentioned:

1. Disc Brakes: 
   - Pro-Series OE PEDIS4806 for $201.95
   - Pro-Series OE Plus PSDIS1684 for $206.22

2. Drum Brakes: 
   - Brembo BODRU9921 for $188.94
   - Centric CCDRU1247 for $191.61

Additionally, the following services are mentioned for th