In [None]:
'''
Load environment variables from a .env file
This script is used to load environment variables from a .env file
into the environment so that they can be accessed by the application.
This is useful for managing sensitive information such as API keys,
database credentials, and other configuration settings.
'''

from dotenv import load_dotenv
import os

load_dotenv()

True

## 🧠 1. User Query

In [3]:
# User input
query = "Is it charge any international charges?"

## ✅ Important Retriever Methods to Know

What you’ll mostly use in retriever:

| Method | Description |
|--------|-------------|
| `get_relevant_documents(query)` | Main method — gets the top `k` chunks that are semantically similar to your query. |
| `as_retriever(**kwargs)` | Converts your vectorstore (like FAISS) into a retriever. You can pass search settings here like `k`. |
| `add_documents(docs)` | Add more docs to your index (useful for dynamic updates). |

---

### ➕ Retriever for:
- Adding **filters** (e.g., only return docs with `source='terms_and_conditions'`)
- Combining multiple retrievers (like metadata filtering + semantic search)
- Switching to a different backend (e.g., Chroma, Qdrant) without changing your code

---

## 🔄 2. Query Conversion

> In this example we are usnig `Azure's OpenAI Embeddings` to be more specific `text-embedding-3-small` a 1536 dimension embeddings to convert the query into numbers. 

In [None]:
# Importing function ti use Azure OpenAI Embedding from LangChain's library
from langchain_openai import AzureOpenAIEmbeddings

In [None]:
# Embedding model
# This model is used to convert text into embeddings
embd = AzureOpenAIEmbeddings(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_deployment=os.environ["AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME"],
    openai_api_version=os.environ["AZURE_OPENAI_EMBEDDING_API_VERSION"],
    model="text-embedding-3-small",
    chunk_size=1
    )

In [None]:
# Embedding the query
query_embedding = embd.embed_query(query)
print(query_embedding)

[0.011821269989013672, -0.05828331038355827, 0.012534569948911667, -0.011211611330509186, -0.04784596338868141, 0.010650726035237312, 0.0039963084273040295, 0.02022845484316349, -0.007492697332054377, -0.05686890333890915, 0.004267606418579817, 0.0074317315593361855, -0.04865071177482605, 0.01477811112999916, -0.002197817200794816, -0.015436542220413685, -0.06550166010856628, 0.032799601554870605, -0.06223389506340027, 0.059844035655260086, 0.04679735377430916, 0.03996918350458145, 0.03336048871278763, -0.011577405966818333, 0.0003819888806901872, -0.03587228059768677, -0.04904089495539665, -0.036847732961177826, 0.005560081452131271, -0.005855765659362078, 0.0066940453834831715, -0.006864749826490879, -0.016448574140667915, -0.036481939256191254, -0.01718016341328621, -0.031629059463739395, -0.058478400111198425, 0.014887849800288677, 0.02584950067102909, -0.01957002282142639, -0.015582859516143799, -0.014851270243525505, 0.029068496078252792, 0.026215296238660812, -0.0435295850038528

## 📏 3. Vector Comparison

Now, the query vector is compared to the **pre-embedded document vectors** (which are stored in a **vector database**, in our case FAISS). As we are doing it for RAG and it's retrieval function, instead of compare it directly using cosine_similarity, I went through langChain's .as_retriever function reason discused briefly below in a table 

In [None]:
# Importing FAISS vector store
from langchain_community.vectorstores import FAISS

In [None]:
# Loading FAISS index from local storage which we created in the Ingestion step
new_index = FAISS.load_local("my_faiss_index", embd, allow_dangerous_deserialization=True)

## 🔝 4. Top-K Retrieval

Retrieving Top K document which match to our query(question) most. In this example I am using a small doc so I use k = 2 which we can change according to our task and document size. 

In [None]:
# Creating a retriever from the loaded index and setting the number of results (k=2) to return
retriever = new_index.as_retriever(search_kwargs={"k": 2})

## 📚 5. Data Retrieval

Using the retriever which we created above to fecth or to get top 2 relevent doc's, I used `get_relevant_documents()`

In [None]:
# Retrieving the top k documents based on the query
top_k_docs = retriever.get_relevant_documents(query)

  top_k_docs = retriever.get_relevant_documents(query)


In [18]:
# Printing the top k documents
print("Top k documents:\n")
for i, doc in enumerate(top_k_docs, 1):
    print(f"Doc {i}:", doc.page_content[:300], '\n')

Top k documents:

Doc 1: and may be restricted to the original country of sale. Call 
charges and international shipping charges may apply, 
depending on the location. Subject to the full terms and 
detailed information on obtaining service available at 
apple.com/legal/warranty  and support.apple.com , if you 
submit a val 

Doc 2: Before using iPhone, review the iPhone User Guide  at  
support.apple.com/guide/iphone .
Safety and Handling
See “Safety, handling, and support” in the iPhone  
User Guide.
Exposure to Radio Frequency
On iPhone, go to Settings > General > Legal &  
Regulatory > RF Exposure. Or go to apple.com/ 
lega 



## 🆚 retriever.get_relevant_documents() vs faiss_index.similarity_search()

At a high level, **they both do the same thing**: find documents most relevant to a query.

But here’s the **real difference**:

| Feature | `retriever.get_relevant_documents()` | `faiss_index.similarity_search()` |
|--------|---------------------------------------|------------------------------------|
| **Interface** | Follows LangChain's `Retriever` interface | Direct call to the vector DB (FAISS) |
| **Use case** | Preferred in RAG pipelines and LangChain chains | Good for testing/debugging |
| **Flexibility** | You can plug in filters, hybrid search, etc. | Raw semantic search only |
| **Standardized** | Works across multiple vector stores | Tied to FAISS |
| **Returns** | List of `Document` objects | Same |

---


## 🏁 Summary

| Concept | Think of it as... |
|--------|--------------------|
| `similarity_search()` | A raw query to the FAISS engine |
| `get_relevant_documents()` | A standardized method to retrieve top docs for a query — the RAG-friendly way |
| `Retriever` | A wrapper that lets you swap out vector DBs and methods easily |

---

---

<!-- Font Awesome CDN (Add in <head> if not already included) -->
<link
  rel="stylesheet"
  href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.0/css/all.min.css"
/>

<!-- Social Footer Section -->
<div style="
  background-color:rgb(199, 195, 195);
  padding: 40px 30px;
  border-radius: 20px;
  box-shadow: 0 4px 12px rgba(0,0,0,0.08);
  font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
  font-size: 18px;
  max-width: 900px;
  margin: 60px auto 30px;
  text-align: center;
  color: #444;
">
<!-- End of Notebook Note -->
  <h2 style="margin-bottom: 10px;">📘 End of Notebook</h2>
  <p style="color: #666; font-size: 14px;">
    Thank you for exploring! Feel free to connect via the links below.
  </p>

  <!-- Social Icons -->
<div style="
  display: flex;
  gap: 25px;
  align-items: center;
  flex-wrap: wrap;
  justify-content: center;
  margin-bottom: 25px;
">
  <!-- LinkedIn -->
  <a href="https://www.linkedin.com/in/ChiragB254" target="_blank" style="text-decoration: none; color: #0077b5;">
    <i class="fab fa-linkedin fa-lg"></i> LinkedIn
  </a>

  <!-- GitHub -->
  <a href="https://github.com/ChiragB254" target="_blank" style="text-decoration: none; color: #333;">
    <i class="fab fa-github fa-lg"></i> GitHub
  </a>

  <!-- Instagram -->
  <a href="https://www.instagram.com/data.scientist_chirag" target="_blank" style="text-decoration: none; color: #E1306C;">
    <i class="fab fa-instagram fa-lg"></i> Instagram
  </a>

  <!-- Email -->
  <a href="mailto:devchirag27@gmail.com" style="text-decoration: none; color: #D44638;">
    <i class="fas fa-envelope fa-lg"></i> Email
  </a>

  <!-- X (Twitter) -->
  <a href="https://x.com/ChiragB254" target="_blank" style="text-decoration: none; color: #000;">
    <i class="fab fa-x-twitter fa-lg"></i> X.com
  </a>
  </div>

  <p style="font-size: 13px; color: black; font-style: italic; margin-top: 8px;">
    <strong>Made with ❤️ by Chirag Bansal</strong>
  </p>
</div>
