In [0]:
from dotenv import load_dotenv
import os
from typing import Literal, TypedDict
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph.message import add_messages
from typing_extensions import Annotated
from databricks_langchain import ChatDatabricks
from langgraph.graph import StateGraph, START, END
from langgraph.graph import MessagesState

# Initialize LLM
llm = ChatDatabricks(endpoint="databricks-gpt-oss-120b")

def call_llm(state: MessagesState):
    """Node to call the LLM with a formatted prompt based on the latest user question."""
    
    
    # Build the prompt
    prompt_template = (
        "You are an expert AI assistant. Answer the user's question with clarity, accuracy, and conciseness.\n\n"
        "## Question:\n"
        "{question}\n\n"
        "## Guidelines:\n"
        "- Keep responses factual and to the point.\n"
        "- If relevant, provide examples or step-by-step instructions.\n"
        "- If the question is ambiguous, clarify before answering.\n\n"
        "Respond below:"
    )
    formatted_prompt = prompt_template.format(question=state['messages'])

    # Construct the message list for the model
    all_msgs = [
        SystemMessage(content="You are an expert AI assistant."),
        HumanMessage(content=formatted_prompt)
    ]

    # Call the LLM
    response = llm.invoke(all_msgs)

    return {"messages": state["messages"] + [response]}

# Build the graph
builder = StateGraph(MessagesState)
builder.add_node("call_llm", call_llm)

builder.add_edge(START, "call_llm")
builder.add_edge("call_llm", END)

graph = builder.compile()

# Example run
messages = graph.invoke({
    "messages": [HumanMessage(content="What is the return policy for Anirvan Decodes ecommerce products?")]
})


In [0]:
message = messages['messages'][-1]

for part in message.content:
    if part.get("type") == "text":
       ai_message = part.get("text", "")

print(ai_message)


## This is where we need RAG


## Why We Need RAG (Retrieval-Augmented Generation)

### 🧠 LLMs don’t know everything
- They’re trained on data up to a certain time.  
- They can’t magically know your company policies, recent data, or private documents.

### 🤔 LLMs can “hallucinate”
- Sometimes they just make stuff up — confidently!  
- Without grounding in real data, they may generate wrong or outdated answers.

### 🔄 Knowledge changes
- Laws, prices, product specs — they all change.  
- You don’t want to retrain or fine-tune a giant model every time something updates.

### 🏢 You need domain-specific answers
- Example: A bank chatbot answering questions about your bank’s loans — it needs **your** documents, not just generic financial knowledge.

### ⚡ Cheaper & faster than fine-tuning
- Instead of retraining an entire model, you just fetch relevant info (retrieval) and pass it along with the question.  
- That way, the model uses the right context instantly.


## What’s the problem with plain keyword search?
- Computers match **exact words**, not **meaning**.
- If a user types: “**laptop battery problem**”
  - Keyword search may **miss** a doc that says: “**notebook isn’t charging**”
  - Different words, **same idea** → missed!




## What’s an embedding (in plain words)?
An **embedding** turns text into a list of numbers (a vector) that captures **meaning**.
- Items with **similar meaning** → vectors are **close** together.
- Items with **different meaning** → vectors are **far** apart.

Think of a giant map:
- “laptop” and “notebook” are neighbors.
- “banana” is far away from both.

| Sentence                 | X (Topic) | Y (Tone) | Z (Sentiment) |
| ------------------------ | --------- | -------- | ------------- |
| “How to bake a cake”     | 0.9       | 0.2      | 0.8           |
| “Cooking is fun”         | 0.8       | 0.1      | 0.9           |
| “AI models are powerful” | 0.2       | 0.8      | 0.6           |
| “I hate bad food”        | 0.7       | 0.3      | 0.1           |


![formula.png](./formula.png "formula.png")


## Example: “How to bake a cake” vs “Cooking is fun”

Vector1 = [0.9, 0.2, 0.8]
Vector2 = [0.8, 0.1, 0.9]


Euclidean distance between this two vector is ≈ 0.173  → very small → very similar
 


### Example: “How to bake a cake” vs “AI models are powerful”


Euclidean distance between this two vector is ≈ 0.943 → far → not similar



✅ Short explanation:

“How to bake a cake” and “Cooking is fun” are close in topic, tone, and sentiment → match

“How to bake a cake” and “AI models are powerful” are far apart in topic → don’t match



## How embedding-based search works (step-by-step)
1. **Prepare your data**
   - Split docs into **small chunks** (e.g., paragraphs).
2. **Embed the chunks**
   - Convert each chunk into a **vector** (embedding).
   - Store vectors in a **vector database** (FAISS, Chroma, Milvus, etc.).
3. **Handle a user query**
   - Convert the user’s query into a **vector**.
4. **Find similar chunks**
   - Use **vector similarity** (e.g., cosine similarity) to find the **closest** vectors.
5. **Return ranked results**
   - Show the top-K most similar chunks (or feed them to an LLM for a grounded answer).


## 🛠️ How to Build a RAG System (Simple Terms)

### 1. Collect & prepare your knowledge
- Gather the stuff you want the AI to know: PDFs, web pages, database rows, manuals, FAQs, etc.  
- Break big documents into small, meaningful chunks (like splitting a book into paragraphs).

### 2. Index the knowledge so it’s searchable
- Turn those chunks into numbers (vectors) so a computer can “search by meaning” instead of just keywords.  
- Store them in a **vector database** (think: a smart search engine that understands concepts, not just exact words).

### 3. Connect retrieval to generation
- When someone asks a question:  
  - First, **search** the vector database for the most relevant chunks.  
  - Then, **feed** those chunks + the question into your LLM (like GPT).  
  - The LLM reads the context and gives a **grounded** answer (based on the actual data).
