<a href="https://colab.research.google.com/github/WasudeoGurjalwar/Agentic_AI_Training/blob/main/LangChain_Components_Practice_Coding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Solve Real Problems using LangChain Components

- PromptTemplate
- document_loaders
- Vector Store ( **Embeddings** )
- Semantic Search
- (simple) **RAG** system over **Toyota Camry Hybrid 2022 Manual**
- **Assignment**

In [None]:
# Install latest compatible versions
!pip install -q -U \
  langchain \
  langchain-core \
  langchain-google-genai \
  langchain-openai

# Check version
import langchain_core
print(f"LangChain Core version: {langchain_core.__version__}")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.9/469.9 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.1/58.1 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.9/81.9 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m156.8/156.8 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.2/46.2 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.8/56.8 kB[0m [31m911.8 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import langchain
print(langchain.__version__) ## check your version

1.0.3


In [None]:
import pathlib
import textwrap
import getpass

import google.generativeai as genai

# use langchain's built-in import (minimal change)
try:
    from langchain.chat_models import ChatGoogleGenerativeAI
except Exception:
    from langchain_google_genai.chat_models import ChatGoogleGenerativeAI

# Used to securely store your API key (Colab)
from google.colab import userdata

from IPython.display import display
from IPython.display import Markdown

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))


In [None]:
## LLM API key setup
import os
from google.colab import userdata
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')

#os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Why is LangChain called Lang Chain ?

Here’s a short, intuitive example that shows the meaning behind the name LangChain—**you “chain” together components** (prompt, model, parser, etc.) so the output of one becomes the input of the next.

Below, we’ll use the Gemini API with LangChain’s pipe (|) operator to build a simple chain:

**Prompt → Gemini LLM → Output Parser**

# PromptTemplate

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_google_genai.chat_models import ChatGoogleGenerativeAI
#from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
import os

# Set your Gemini API key
# os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY') ## REPLACE WITH YOUR GEMINI KEY
#os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')  # Replace with your OPENAI API key

# 1. Define a prompt template
prompt1 = PromptTemplate.from_template("Tell me a fun fact about {topic}.")

# 2. Set up the Gemini LLM
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite", temperature=1.0, google_api_key=userdata.get('GOOGLE_API_KEY'))
#llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.2)

# 3. Set up an output parser to extract the string
parser = StrOutputParser()

# 4. Chain them together: prompt | llm | parser
chain = prompt1 | llm | parser

# 5. Run the chain with an input
result = chain.invoke({"topic": "BOSCH"})

print(result)

Here's a fun fact about Bosch:

**Bosch developed one of the first successful electric drills in 1932, which was a game-changer for both DIYers and professionals, and it originally had a top speed of only 3,000 RPM!**

It's pretty wild to think about how far that technology has come, and that Bosch was at the forefront of such a transformative invention.


In [None]:
to_markdown(result)

> Here's a fun fact about Bosch:
> 
> **Bosch developed one of the first successful electric drills in 1932, which was a game-changer for both DIYers and professionals, and it originally had a top speed of only 3,000 RPM!**
> 
> It's pretty wild to think about how far that technology has come, and that Bosch was at the forefront of such a transformative invention.

In [None]:
# 1. Define a prompt template
# Create prompt template for OBD code analysis
prompt2 = PromptTemplate(
    input_variables=["obd_code", "vehicle_make", "mileage"],
    template="""
    Analyze this OBD-II diagnostic code for automotive repair:

    Vehicle: {vehicle_make}
    Mileage: {mileage}
    Error Code: {obd_code}

    Provide:
    1. Problem description in about 100 words and bullet pointers
    2. Urgency level (Low/Medium/High)
    3. Estimated repair cost range in INR
    """
)

# 2. Set up the Gemini LLM
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite", temperature=1.0)


# 3. Set up an output parser to extract the string
parser = StrOutputParser()

# 4. Chain them together: prompt | llm | parser
chain = prompt2 | llm | parser

# 5. Run the chain with an input
result = chain.invoke({
    "obd_code": "P0301",
    "vehicle_make": "Toyota Camry",
    "mileage": "85000"
})

to_markdown(result)

> ## OBD-II Diagnostic Code Analysis: P0301 on a Toyota Camry (85,000 miles)
> 
> **1. Problem Description (P0301 - Cylinder 1 Misfire Detected)**
> 
> The P0301 trouble code indicates that the vehicle's Powertrain Control Module (PCM) has detected a misfire specifically in **Cylinder 1**. A misfire occurs when the combustion process in a cylinder fails to ignite fuel properly, resulting in a loss of power and incomplete combustion. This can be caused by a variety of issues affecting the ignition, fuel delivery, or air intake systems for that particular cylinder.
> 
> *   **What it means:** Cylinder 1 is not firing correctly.
> *   **Symptoms:** Rough idling, engine hesitation, reduced acceleration, potential engine stalling, illuminated check engine light.
> *   **Common causes:**
>     *   Faulty spark plug in Cylinder 1.
>     *   Damaged or worn spark plug wire/coil pack for Cylinder 1.
>     *   Issue with the fuel injector for Cylinder 1 (clogged, faulty).
>     *   Vacuum leak affecting Cylinder 1.
>     *   Low compression in Cylinder 1.
>     *   Issues with the engine control module (less common).
> 
> **2. Urgency Level:** **High**
> 
> A P0301 code signifies a significant problem that can impact drivability and potentially lead to further damage if left unaddressed. Ignoring a misfire can cause:
> 
> *   **Catalytic converter damage:** Unburnt fuel can enter the exhaust and overheat the catalytic converter, leading to expensive repairs.
> *   **Engine strain:** Other cylinders may be working harder, increasing wear.
> *   **Fuel inefficiency:** The engine will consume more fuel attempting to compensate for the misfire.
> *   **Poor performance:** The car will be noticeably less responsive and may even stall.
> 
> **3. Estimated Repair Cost Range (INR)**
> 
> The cost to repair a P0301 code can vary significantly depending on the root cause and the specific parts needed. Here's a general estimate:
> 
> *   **Spark Plug Replacement (per plug):** ₹200 - ₹800
> *   **Ignition Coil/Spark Plug Wire (per unit):** ₹1,500 - ₹4,000
> *   **Fuel Injector Replacement (per injector):** ₹2,000 - ₹7,000
> *   **Vacuum Leak Repair (minor):** ₹500 - ₹2,000
> *   **Compression Test/Diagnosis:** ₹1,000 - ₹2,500
> *   **Labor Costs:** ₹1,000 - ₹3,000 (depending on complexity and mechanic's rates)
> 
> **Therefore, the estimated repair cost range for a P0301 code on your Toyota Camry is approximately ₹3,000 to ₹15,000.**
> 
> **Important Note:** This is an estimate. The actual cost will depend on the diagnosis by a qualified mechanic. It's crucial to have the vehicle inspected promptly to identify the exact cause and get an accurate repair quote.

In [None]:
!pip install -q -U langchain-community pypdf

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━[0m [32m1.5/2.5 MB[0m [31m46.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m41.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/323.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m323.9/323.9 kB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m34.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/64.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Loaders - PyPDFLoader

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_google_genai.chat_models import ChatGoogleGenerativeAI

# Initialize Gemini LLM
## DONE IN ABOVE CODE CELL

# Load vehicle service manual
## visit https://www.toyota.com/owners/warranty-owners-manuals/vehicle/camry-hv/2022/
## I downloaded and used https://drive.google.com/file/d/1L-Vom51lSD2pYOGNcpgAmDzPEyE09yEq/view?usp=sharing
loader = PyPDFLoader("/content/MY22_Camry_HV_OM_Excerpt_for_Driving_Support_Systems_D2_ML_0208.pdf")
documents = loader.load()

# Extract specific maintenance info
maintenance_query = f"""
From this vehicle manual content:
{documents[0].page_content[:2000]}

Extract on How to avoid malfunction of the front camera
for a 2022 Toyota Camry.
"""

response = llm.invoke(maintenance_query)
print(response.content)

Based on the provided vehicle manual excerpt, there is **no information directly addressing how to avoid malfunctions of the front camera** for a 2022 Toyota Camry.

The excerpt only lists the following:

*   **Toyota Safety Sense 2.5+ ...................................246**
*   **LTA (Lane Tracing Assist) ...................................267**


To find information on how to avoid front camera malfunctions, you would need to refer to the full Owner's Manual, specifically sections related to the "Toyota Safety Sense 2.5+" system and potentially a section on "Pre-Collision System" or other camera-dependent features, as these systems rely on the front camera.


In [None]:
# Combine all PDF pages into one string
all_text = "\n".join([doc.page_content for doc in documents])

# Build query with full manual content
maintenance_query = f"""
From this vehicle manual content:
{all_text[:15000]}   # safety slice, prevents token overload

Extract on how to avoid malfunction of the front camera
for a 2022 Toyota Camry.
"""

response = llm.invoke(maintenance_query)
print(response.content)

Here's how to avoid malfunction of the front camera for a 2022 Toyota Camry, based on the provided manual excerpt:

**General Precautions:**

*   **Keep the windshield clean at all times.** This is the most important measure.
    *   If the windshield is dirty, covered with an oily film, water droplets, or snow, clean it.
    *   If a glass coating agent is applied, you will still need to use windshield wipers to clear water droplets from the area in front of the camera.
    *   If the inner side of the windshield where the camera is installed is dirty, contact your Toyota dealer.
*   **Remove fog, condensation, or ice.** If the area in front of the camera is fogged up or covered with condensation or ice, use the windshield defogger.
*   **Ensure proper windshield wiper function.** If water droplets cannot be properly removed by the wipers, replace the wiper insert or wiper blade.
*   **Do not apply window tint to the windshield.**
*   **Replace a damaged or cracked windshield.** After

In [None]:
to_markdown(response.content)

> Here's how to avoid malfunction of the front camera for a 2022 Toyota Camry, based on the provided manual excerpt:
> 
> **General Precautions:**
> 
> *   **Keep the windshield clean at all times.** This is the most important measure.
>     *   If the windshield is dirty, covered with an oily film, water droplets, or snow, clean it.
>     *   If a glass coating agent is applied, you will still need to use windshield wipers to clear water droplets from the area in front of the camera.
>     *   If the inner side of the windshield where the camera is installed is dirty, contact your Toyota dealer.
> *   **Remove fog, condensation, or ice.** If the area in front of the camera is fogged up or covered with condensation or ice, use the windshield defogger.
> *   **Ensure proper windshield wiper function.** If water droplets cannot be properly removed by the wipers, replace the wiper insert or wiper blade.
> *   **Do not apply window tint to the windshield.**
> *   **Replace a damaged or cracked windshield.** After replacement, the front camera must be recalibrated by a Toyota dealer.
> *   **Do not allow liquids to contact the front camera.**
> *   **Do not allow bright lights to shine into the front camera.**
> *   **Do not dirty or damage the front camera.** When cleaning the inside of the windshield, be careful not to let glass cleaner contact the lens. Do not touch the lens. If the lens is dirty or damaged, contact your Toyota dealer.
> *   **Do not attach objects to the outer side of the windshield in front of the camera.** This includes stickers, even transparent ones. The shaded area in the illustration on page 249 indicates the prohibited zone.
> *   **Do not subject the front camera to a strong impact.**
> *   **Do not change the installation position or direction of the front camera or remove it.**
> *   **Do not disassemble the front camera.**
> *   **Do not modify components around the front camera.** This includes the inside rear-view mirror or ceiling.
> *   **Do not attach accessories to the hood, front grille, or front bumper that may obstruct the front camera.** Contact your Toyota dealer for details.
> *   **Ensure roof-mounted objects do not obstruct the camera.** If mounting a surfboard or other long object on the roof, ensure it does not block the front camera's view.
> *   **Do not modify headlights or other lights.**
> 
> **Specific to Warning Messages:**
> 
> If a warning message related to the camera is displayed, you may need to take the following actions:
> 
> *   **If the area around a camera is covered with dirt, moisture, or foreign matter:** Use wipers and the A/C to remove the obstruction.
> *   **If the temperature around the front camera is outside the operational range:**
>     *   **If hot:** Use the air conditioning to decrease the temperature. Be aware that sunshades can sometimes cause excessive heat.
>     *   **If cold:** Use the air conditioning to increase the temperature.
> *   **If the area in front of the camera is obstructed (e.g., hood is open, sticker is attached):** Close the hood, remove the sticker, etc.
> 
> If these actions do not resolve the issue, or if the warning message persists, contact your Toyota dealer.

✅ Notes from Rocky:

- I limited to 15000 characters to avoid hitting token/context limits. You can increase/decrease depending on your LLM model.

- If your PDF is very large (hundreds of pages), better to use a retriever **(vector search)** instead of dumping the entire text. For quick tests, this concatenation works fine.

# FAQ - In langchain_community.document_loaders which other loaders do we have, other than PyPDFLoader ?

Document loaders available in LangChain (from `langchain_community.document_loaders`):

**File-based Loaders:**
- `TextLoader` - Plain text files
- `CSVLoader` - CSV files
- `UnstructuredCSVLoader` - CSV as single table element
- `JSONLoader` - JSON files
- `UnstructuredLoader` - Multiple formats (PDF, HTML, Markdown, etc.)
- `UnstructuredPDFLoader` - PDF files
- `UnstructuredHTMLLoader` - HTML files
- `UnstructuredMarkdownLoader` - Markdown files
- `UnstructuredWordDocumentLoader` - Word documents
- `UnstructuredPowerPointLoader` - PowerPoint files
- `UnstructuredExcelLoader` - Excel files

**Web-based Loaders:**
- `WebBaseLoader` - Web pages
- `UnstructuredURLLoader` - URLs
- `SeleniumURLLoader` - JavaScript-rendered pages

**Database Loaders:**
- `SQLDatabaseLoader` - SQL databases
- `MongoDBLoader` - MongoDB

**Directory Loaders:**
- `DirectoryLoader` - Load multiple files from directory
- `NotebookLoader` - Jupyter notebooks

**Popular automotive use cases:**
- `CSVLoader` for vehicle data logs
- `UnstructuredPDFLoader` for service manuals  
- `DirectoryLoader` for bulk maintenance records
- `WebBaseLoader` for manufacturer bulletins

## Vector Stores or Embeddings

NOTE : Below package installs would take long time to install.

In [None]:
# Install required packages
!pip install -q -U sentence-transformers transformers torch

# sentence-transformers will produce local embeddings (cheap & fast).
# torch needed for installing sentence-transformers

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m899.7/899.7 MB[0m [31m652.5 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m594.3/594.3 MB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.2/10.2 MB[0m [31m88.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.0/88.0 MB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m954.8/954.8 kB[0m [31m42.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.1/193.1 MB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m51.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.6/63.6 MB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
!pip install -q -U faiss-cpu

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m64.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from sentence_transformers import SentenceTransformer
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
import numpy as np

# Load PDF (same as before)
loader = PyPDFLoader("MY22_Camry_HV_OM_Excerpt_for_Driving_Support_Systems_D2_ML_0208.pdf")
documents = loader.load()
page_texts = [doc.page_content for doc in documents if doc.page_content.strip()]

# Create embeddings locally with sentence-transformers (all-MiniLM-L6-v2)
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
embs = embed_model.encode(page_texts, convert_to_numpy=True, show_progress_bar=True)

# LangChain's FAISS wrapper expects an embeddings object (we'll wrap with a small lambda)
# Simpler: create FAISS index directly using the numpy embeddings
import faiss
d = embs.shape[1]
index = faiss.IndexFlatL2(d)
index.add(embs)  # add embeddings
# Save mapping from vector idx -> page text / metadata
id_to_page = page_texts

# Semantic search function: returns top_k documents
def semantic_search(query, top_k=2):
    q_emb = embed_model.encode([query], convert_to_numpy=True)[0].astype(np.float32)
    distances, indices = index.search(np.array([q_emb]), top_k)
    results = []
    for idx in indices[0]:
        results.append(id_to_page[int(idx)])
    return results

# Example:
matches = semantic_search("How to avoid malfunction of the front camera for a 2022 Toyota Camry", top_k=2)
print("Found matches:", len(matches))
for i, m in enumerate(matches,1):
    print(f"\n--- Match {i} ---\n", m[:800])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Found matches: 2

--- Match 1 ---
 2494-5. Using the driving support systems
4Driving
CAMRY_HV_U
●Do not subject the front camera to a strong impact.
●Do not change the installation position or direction of the front camera or
remove it.
●Do not disassemble the front camera.
●Do not modify any components of the vehicle around the front camera
(inside rear view mirror, etc.) or ceiling.
●Do not attach any accessories to the hood, front grille or front bumper that
may obstruct the front camera. Contact your Toyota dealer for details.
●If a surfboard or other long object is to be mounted on the roof, make sure
that it will not obstruct the front camera.
●Do not modify the headlights or other lights.

--- Match 2 ---
 248 4-5. Using the driving support systems
CAMRY_HV_U
■To avoid malfunction of the front camera
Observe the following precautions.
Otherwise, the front camera may not operate properly, possibly leading to
an accident resulting in death or serious injury.
●Keep the windshield 

In [None]:
# --------------- last step) Prepare prompt with the matched pages and call Gemini once ----------------
# Concatenate the two matched pages (with small separator)
context_text = "\n\n--- PAGE BREAK ---\n\n".join(matches)

# Keep a safe slice to avoid excessive tokens; adjust as needed
MAX_CONTEXT_CHARS = 15000
context_text = context_text[:MAX_CONTEXT_CHARS]

prompt_template = PromptTemplate(
    input_variables=["manual_context", "vehicle_make", "year", "query"],
    template="""
You are an automotive repair assistant. Use ONLY the following manual content (do not invent details):

Manual context:
{manual_context}

Question:
{query}

Vehicle: {vehicle_make} ({year})

Provide:
1. Short description of likely causes.
2. Practical steps to avoid front-camera malfunction (maintenance / do's and don'ts).
3. If applicable, where in the full manual the user should look next (cite page numbers if present in the context).
"""
)

# Instantiate Gemini LLM via LangChain (single call)
## DONE IN ABOVE CELLs

chain = prompt_template | llm

# Invoke with variables
result = chain.invoke({
    "manual_context": context_text,
    "vehicle_make": "Toyota Camry",
    "year": "2022",
    "query": "How to avoid malfunction of the front camera"
})

# Print the assistant response
# result.content typically contains the generated text (depends on langchain-google-genai response structure)
try:
    print("\n\n=== LLM RESPONSE ===\n")
    print(result.content)
except Exception:
    # Fallback if structure differs
    print(result)



=== LLM RESPONSE ===

Here's how to avoid malfunction of the front camera based on the provided manual excerpt:

**1. Short description of likely causes:**

Malfunction of the front camera can be caused by a dirty or obstructed windshield, damage to the camera or its surrounding components, improper installation or modification of parts near the camera, or exposure to liquids or bright lights.

**2. Practical steps to avoid front-camera malfunction (maintenance / do's and don'ts):**

*   **Keep the windshield clean:**
    *   Clean the windshield if it's dirty, covered with an oily film, water droplets, snow, etc.
    *   If a glass coating agent is applied, still use wipers to clear water from the area in front of the camera.
    *   If the inner side of the windshield is dirty, contact your Toyota dealer.
*   **Defog/De-ice the windshield:** Use the windshield defogger if the area in front of the camera is fogged, condensed, or icy.
*   **Ensure proper wiper function:** If water dr

In [None]:
to_markdown(result.content)

> Here's how to avoid malfunction of the front camera based on the provided manual excerpt:
> 
> **1. Short description of likely causes:**
> 
> Malfunction of the front camera can be caused by a dirty or obstructed windshield, damage to the camera or its surrounding components, improper installation or modification of parts near the camera, or exposure to liquids or bright lights.
> 
> **2. Practical steps to avoid front-camera malfunction (maintenance / do's and don'ts):**
> 
> *   **Keep the windshield clean:**
>     *   Clean the windshield if it's dirty, covered with an oily film, water droplets, snow, etc.
>     *   If a glass coating agent is applied, still use wipers to clear water from the area in front of the camera.
>     *   If the inner side of the windshield is dirty, contact your Toyota dealer.
> *   **Defog/De-ice the windshield:** Use the windshield defogger if the area in front of the camera is fogged, condensed, or icy.
> *   **Ensure proper wiper function:** If water droplets aren't removed by wipers, replace the wiper insert or blade.
> *   **Avoid windshield modifications:**
>     *   Do not attach window tint to the windshield.
>     *   Replace a damaged or cracked windshield. After replacement, the camera needs recalibration by a Toyota dealer.
> *   **Protect the camera:**
>     *   Do not allow liquids to contact the front camera.
>     *   Do not allow bright lights to shine into the front camera.
>     *   Do not dirty or damage the front camera.
>     *   When cleaning the inside of the windshield, avoid getting glass cleaner on the camera lens and do not touch the lens. If the lens is dirty or damaged, contact your Toyota dealer.
> *   **Avoid obstructions:**
>     *   Do not attach any accessories to the hood, front grille, or front bumper that may obstruct the front camera.
>     *   If mounting long objects on the roof, ensure they don't obstruct the front camera.
>     *   Do not attach objects, such as stickers or transparent stickers, to the outer side of the windshield in front of the camera.
> *   **Avoid impacts and modifications:**
>     *   Do not subject the front camera to a strong impact.
>     *   Do not change the installation position or direction of the front camera or remove it.
>     *   Do not disassemble the front camera.
>     *   Do not modify components around the front camera (inside rear view mirror, etc.) or ceiling.
>     *   Do not modify the headlights or other lights.
> 
> **3. If applicable, where in the full manual the user should look next:**
> 
> *   For information on using the windshield defogger, refer to P. 370.
> *   For details on recalibrating the front camera after windshield replacement, contact your Toyota dealer.

# Quick 10-minute coding assignment — Automotive GenAI basics

Goal: test on understanding of **PromptTemplate**, **PyPDFLoader**, and a **vector store (FAISS)**.

---

## Instructions for learners

* Time: **10 minutes total**.

* Assumptions: you already have a PDF `your_manual.pdf` uploaded and the following objects/names available or re-creatable from the earlier notebook: `PyPDFLoader`, `page_texts`, `SentenceTransformer("all-MiniLM-L6-v2")` (call it `embed_model`), and FAISS index (we called it `index` with mapping `id_to_page`). See hints below if you need to recreate them.

---

## Task 1 — PromptTemplate (3 points, \~3 min)

Write a `PromptTemplate` that asks the LLM to extract a **single short maintenance tip** from provided context.

Starter code:

```python
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate(
    input_variables=["context", "vehicle", "query"],
    template="""
    # YOUR TEMPLATE HERE (one or two sentences)
    """
)
```

**Requirement:** when invoked with `context` (some page text), `vehicle="Toyota Camry (2022)"`, and `query="front camera maintenance"`, it should request a **single concise maintenance tip** (one short paragraph).

**Hint (code reference):** use same pattern as our `prompt_template` in the pipeline earlier — keep `input_variables` and a short `template`.

---



In [None]:
## Task 1 solution







## Task 2 — PyPDFLoader (3 points, \~3 min)

Load the PDF and return the **number of non-empty pages**.

Starter code:

```python
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/your_manual.pdf")
documents = loader.load()
# count non-empty pages here
```

**Requirement:** produce an integer `num_pages` (count of pages where `doc.page_content.strip()` is not empty). Print: `Loaded X non-empty pages`.

**Hint (code reference):** same loop we used earlier to build `page_texts` from `documents`.

---



In [None]:
## Task 2 solution







## Task 3 — Vector store semantic search (4 points, \~4 min)

Build a FAISS index from pages and return the **top-1** page matching the query:
`"How to avoid malfunction of the front camera for a 2022 Toyota Camry"`.

Starter code:

```python
from sentence_transformers import SentenceTransformer
import numpy as np
import faiss

# Use or create page_texts: list of strings (pages)
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
embs = embed_model.encode(page_texts, convert_to_numpy=True).astype('float32')

# build index
d = embs.shape[1]
index = faiss.IndexFlatL2(d)
index.add(embs)

# search
query = "How to avoid malfunction of the front camera for a 2022 Toyota Camry"
q_emb = embed_model.encode([query], convert_to_numpy=True)[0].astype('float32')

distances, indices = index.search(np.array([q_emb]), 1)
top_idx = int(indices[0][0])
top_page_text = page_texts[top_idx]
print("Top match page index:", top_idx + 1)
print(top_page_text[:800])
```

**Requirement:** return the index (1-based page number) and the first 800 characters of the matched page.

**Hint (code reference):** identical to the `semantic_search` helper in the FAISS pipeline you ran earlier (`embed_model`, `index`, `id_to_page` / `page_texts`).

---

In [None]:
## Task 3 solution






## Quick verification rubric (10 points)

* Task 1 (PromptTemplate): 3 points — valid template and when formatted asks for a single concise tip.
* Task 2 (PyPDFLoader): 3 points — correct `num_pages` count and printed message.
* Task 3 (FAISS search): 4 points — builds embeddings/index and prints correct top match (index + excerpt).

---


# FAQ 🚀

In the **vector database** world (for storing and searching embeddings), apart from **FAISS** and **ChromaDB**, which other **simple & popular** options are there for GenAI projects:

---

### 🔹 Pure Python / Lightweight options

1. **Annoy (Approximate Nearest Neighbors Oh Yeah)**

   * Developed by Spotify.
   * Super lightweight, great for read-heavy workloads (e.g., recommendation systems).
   * Easy to use but slower to update (best for static indexes).
   * Python package: `annoy`.

2. **hnswlib**

   * Implementation of **Hierarchical Navigable Small World graphs** (HNSW).
   * Very fast for both building and searching.
   * Low memory footprint, often faster than FAISS for many workloads.
   * Python package: `hnswlib`.

---

### 🔹 Cloud-hosted / SaaS (managed vector DBs)

3. **Pinecone**

   * One of the most popular fully managed vector DBs.
   * Very simple API, highly scalable, integrates with LangChain out of the box.
   * Free tier available (useful for demos).

4. **Weaviate**

   * Open-source + cloud-hosted option.
   * Supports hybrid search (combine text + vector).
   * Also integrates with LangChain.

5. **Qdrant**

   * Open-source vector DB written in Rust (fast!).
   * Can run locally (Docker) or via managed cloud.
   * Supports payloads/metadata nicely.

6. **Milvus**

   * One of the earliest large-scale open-source vector DBs.
   * Backed by Zilliz cloud.
   * Heavier than FAISS/Chroma, but production-grade.

---

### 🔹 Database plugins (vector support inside traditional DBs)

7. **Postgres + pgvector**

   * A **Postgres extension** for storing vectors and doing similarity search.
   * Very popular since you can keep embeddings + business data in the same DB.
   * Many companies prefer this for simplicity if they already use Postgres.

8. **ElasticSearch / OpenSearch (kNN plugin)**

   * Adds vector similarity search to ElasticSearch.
   * Useful if you already have ELK stack.
   * Supports hybrid (BM25 + vector) search.

---

✅ If you want **simple + popular + fast local use in Colab/notebooks**, the top alternatives to FAISS/Chroma are:

* **hnswlib** (fast, lightweight, pip install)
* **Annoy** (very simple, good for static datasets)

✅ If you want **production-ready / cloud-based**, go with:

* **Pinecone** or **Qdrant**

---
