# Explainable Retrieval in Document Serach
## Overview
This code implements an Explainable Retriever, a system that not only retrieves relevant documents based on a query but also provides explainations for why each retrieved document is relevant. It combines vector-based similarity serach with natural langugage explanations, enhancing the transparency and interpretability of the retrieval process.
## Motivation
Traditional document retrieval systems often work as black boxes, providing results without explaining why they were chosen. This lack of transparency can be problematic in scenarios where understanding the reasoning behind the results is crucial. The Explainable Retriever addresses this by offerin ginsights into the relevance of each retrieved document.
## Key Components
1. Vector store creation from input texts
2. Base retriever using FAISS for efficient similarity search
3. LLM for generating explanations
4. Custom ExpalinableRetriever class that combines retrieval and explanation generation
## Benefits of this Approach
1. Transparency: Users can understand why specific documents were retrieved
2. Trust: Explanations build user confidence in the system's results
3. Learning: Users can gain insights into the relationships between queries and documents
4. Debugging: Easier to identify and correct issues in the retrieval process
5. Customization: The explanation prompt can be tailored for different user cases or domains
## Conclusion
The Explainable Retriever represents a significant step towards more interpretable and trustworthy information retrieval systems. By providing natural language explanations alongside retrieved documents, it bridges the gap between powerful vector-based search techniques and human understanding. This approach has potential applications in various fields where the reasoning behind information retrieval is as important as the retrieved information itself.

In [2]:
import os
from dotenv import load_dotenv

from langchain_openai.chat_models.azure import AzureChatOpenAI
load_dotenv()
openai_endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
openai_api_key = os.environ.get("AZURE_OPENAI_API_KEY")
openai_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT_ID")
openai_api_version = os.getenv("AZURE_API_VERSION")

llm = AzureChatOpenAI(
    azure_deployment=openai_deployment,
    api_version="2024-10-01-preview",
    azure_endpoint=f"{openai_endpoint}openai/deployments/{openai_deployment}/chat/completions?api-version=2024-10-01-preview",
    temperature=0,
    logprobs=True,
)

In [4]:
openai_embedding = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_ID")

from langchain_openai.embeddings.azure import AzureOpenAIEmbeddings
embeddings = AzureOpenAIEmbeddings(
    deployment=openai_embedding,
    model="text-embedding-ada-002",
    chunk_size=16
)


In [5]:
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate

In [6]:
class ExplainableRetriever:
    def __init__(self, texts) -> None:
        self.embeddings = embeddings
        self.vectorstore = FAISS.from_texts(texts, self.embeddings)
        self.llm = llm

        self.retriever = self.vectorstore.as_retriever(search_kwargs={"k": 5})

        explain_prompt = PromptTemplate(
            input_variables=["query", "context"],
            template="""
            Analyze the relationship between the following query and the retrieved context.
            Explain why this context is relevant to the query and how it might help answer the query.
            
            Query: {query}
            
            Context: {context}
            
            Explanation:
            """
        )

        self.explain_chain = explain_prompt | self.llm

    def retrieve_and_explain(self, query):
        docs = self.retriever.invoke(query)

        explained_results = []
        for doc in docs:
            input_data = {
                "query": query,
                "context": doc.page_content
            }

            explanation = self.explain_chain.invoke(input_data).content
            explained_results.append({
                "content": doc.page_content,
                "explanation": explanation
            })

        return explained_results

In [7]:
texts = [
    "The sky is blue because of the way sunlight interacts with the atmosphere.",
    "Photosynthesis is the process by which plants use sunlight to produce energy.",
    "Global warming is caused by the increase of greenhouse gases in Earth's atmosphere."
]

explainable_retriever = ExplainableRetriever(texts)

In [8]:
query = "Why is the sky blue?"
results = explainable_retriever.retrieve_and_explain(query)

for i, result in enumerate(results, 1):
    print(f"Result {i}:")
    print(f"Content: {result['content']}")
    print(f"Explanation: {result['explanation']}")
    print()

Result 1:
Content: The sky is blue because of the way sunlight interacts with the atmosphere.
Explanation: The query "Why is the sky blue?" is asking for an explanation of the phenomenon that causes the sky to appear blue to the human eye. The provided context, "The sky is blue because of the way sunlight interacts with the atmosphere," is directly relevant to the query as it begins to address the underlying reason for the sky's color.

To elaborate, the context is relevant because it points to the interaction between sunlight and the Earth's atmosphere as the key factor. This interaction involves a process known as Rayleigh scattering, where shorter wavelengths of light (blue and violet) are scattered in all directions by the gases and particles in the atmosphere. Since our eyes are more sensitive to blue light and some of the violet light is absorbed by the upper atmosphere, the sky appears predominantly blue to us.

Therefore, the context helps answer the query by identifying the fu