# Introduction

Welcome to the **LangChain Retrieval Methods** notebook.  
In this tutorial you will:

1. **Load** a small corpus of John Wick movie reviews.
2. **Explore** seven distinct retrieval strategies:
   - Naive (whole‐document vectors)
   - BM25 (keyword matching)
   - Contextual Compression (reranking)
   - Multi‐Query (query expansion)
   - Parent Document (hierarchical chunks)
   - Ensemble (fusion of methods)
   - Semantic (boundary‐aware chunking)

3. **Compare** each method across:
   - Retrieval **quality** (recall, qualitative response patterns)
   - **Latency** (ms per query)
   - **Cost** (API/token usage)
   - **Resource footprint** (index size & shape)

4. **Visualize** key metrics and response examples to understand trade-offs.

By the end of the notebook, you’ll know:
- When to **start simple** (Naive or BM25) versus **scale up** (Ensemble or Semantic).
- How context-window advances (4 K → 32 K → 128 K) and loader-splitter decoupling shape modern RAG architectures.
- Practical tips for **production readiness**, including index sharding, zero-downtime reindexes, and drift monitoring.

> **Prerequisites**  
> - Python 3.11 environment (see Quickstart)  
> - Access to a Qdrant and Redis instance (using Docker)  
> - OpenAI API credentials for embedding & reranking  

Run the cells in order, or jump to the section that interests you. Let’s get started!  


## ☕ 📴 🔥 TLDR ☕ 📴 🔥

| Action | Command |
|---|---|
| ☕ Morning Coffee Startup | `docker compose up -d` |
| 📴 That's a Wrap | `docker compose down` |
| 🔥 Burn It All Down | `docker compose down --volumes --rmi all` |

## Foundation: Docker Containers for Qdrant, Redis, Postgres, and Arize Phoenix (oh my!)

- use Docker Compose to setup containers
- Draft [Docker Admin Guide](docs/docker-admin-guide.md)
- [the `docker-compose.yml file is located here](docker-compose.yml)

| Action | Command |
|---|---|
| Start containers | `docker compose up -d` |
| Stop containers | `docker compose down` |
| Stop containers - remove volumes, images | `docker compose down --volumes --rmi all` |

- the last option is great for resets and starting from scratch

## 🔧 Environment Configuration & API Setup

In [1]:
from pathlib import Path
import requests
from dotenv import load_dotenv
import os

import os
from datetime import datetime

load_dotenv()

# Build a dynamic project name (e.g. include timestamp)
project_name = f"retrieval-method-comparison-{datetime.now().strftime('%Y%m%d_%H%M%S')}"

os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')
os.environ["COHERE_API_KEY"] = os.getenv('COHERE_API_KEY')

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = project_name
os.environ["LANGSMITH_API_KEY"] = os.getenv('LANGSMITH_API_KEY')

QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")
QDRANT_API_URL = os.getenv("QDRANT_API_URL")

In [2]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

llm = ChatOpenAI(model="gpt-4.1-mini")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [3]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

## 📊 Dataset Loading & Preprocessing for RAG

##### TEMPORARY:  Qdrant and Redis Database reset

- For now I'm doing a data reset before each run (🤷‍♂️ not best practice 🤷‍♀️)

```bash
docker compose stop qdrant redis
docker compose rm -f qdrant redis
docker volume rm langchain-retrieval-methods_qdrant_data langchain-retrieval-methods_redis_data
docker compose up -d qdrant redis
```

In [4]:
# Set up a consistent data directory in the user's home directory
from pathlib import Path
DATA_DIR = Path.cwd() / "data"
DATA_DIR.mkdir(exist_ok=True)

# URLs and filenames
urls = [
    ("https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv", "john_wick_1.csv"),
    ("https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv", "john_wick_2.csv"),
    ("https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv", "john_wick_3.csv"),
    ("https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv", "john_wick_4.csv"),
]

# Download files if not already present
for url, fname in urls:
    file_path = DATA_DIR / fname
    if not file_path.exists():
        print(f"Downloading {fname}...")
        r = requests.get(url)
        r.raise_for_status()
        file_path.write_bytes(r.content)
    else:
        print(f"{fname} already exists.")

john_wick_1.csv already exists.
john_wick_2.csv already exists.
john_wick_3.csv already exists.
john_wick_4.csv already exists.


In [5]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

all_review_docs = []

for i in range(1, 5):
    loader = CSVLoader(
        file_path=(DATA_DIR / f"john_wick_{i}.csv"),
        metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
    )

    movie_docs = loader.load()
    for doc in movie_docs:

        # Add the "Movie Title" (John Wick 1, 2, ...)
        doc.metadata["Movie_Title"] = f"John Wick {i}"

        # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
        doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

        # newer movies have a more recent "last_accessed_at" (store as ISO string)
        doc.metadata["last_accessed_at"] = (datetime.now() - timedelta(days=4-i)).isoformat()

    all_review_docs.extend(movie_docs)

## 🏗️ Building RAG Infrastructure: Storage

In [6]:
from langchain_qdrant import QdrantVectorStore  # Updated import
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient, models
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_experimental.text_splitter import SemanticChunker
from langchain.retrievers import ParentDocumentRetriever
from langchain_community.storage import RedisStore
from langchain.storage import create_kv_docstore

### Level 1: Simple Vector Storage (Baseline)

- creates the vector store using the all_review_docs Document object

In [7]:
baseline_vectorstore = QdrantVectorStore.from_documents(
    all_review_docs,
    embeddings,
    url=QDRANT_API_URL,
    api_key=QDRANT_API_KEY,
    prefer_grpc=True,
    collection_name="johnwick_baseline"
)

  client = QdrantClient(**client_options)


### Level 2: Hierarchical Storage (Parent-Child Architecture)

#### Parent Documents: Redis Key-Value Store

In [8]:
redis_byte_store = RedisStore(redis_url="redis://localhost:6379")
parent_document_store = create_kv_docstore(redis_byte_store)

#### Child Records: Qdrant Vector Embeddings

In [9]:
# Initialize Qdrant client
cloud_client = QdrantClient(
    url=QDRANT_API_URL,
    api_key=QDRANT_API_KEY,
    prefer_grpc=True
)

# Check if the Qdrant collection exists
if not cloud_client.collection_exists("johnwick_parent_children"):
    cloud_client.create_collection(
        collection_name="johnwick_parent_children",
        vectors_config=models.VectorParams(
            size=1536,
            distance=models.Distance.COSINE
        ),
    )

# Construct the VectorStore using cloud client
parent_children_vectorstore = QdrantVectorStore(
    embedding=embeddings,
    client=cloud_client,
    collection_name="johnwick_parent_children",
)

  cloud_client = QdrantClient(


#### Parent Document Retriever definition

In [10]:
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)
parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_children_vectorstore,
    docstore=parent_document_store,
    child_splitter=child_splitter,
)

parent_document_retriever.add_documents(all_review_docs)

### Level 3: Vector Storage organized by Semantic Chunks

In [11]:
semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

semantic_documents = semantic_chunker.split_documents(all_review_docs)

In [12]:
semantic_vectorstore = QdrantVectorStore.from_documents(
    semantic_documents,
    embeddings,
    url=QDRANT_API_URL,
    api_key=QDRANT_API_KEY,
    prefer_grpc=True,
    collection_name="johnwick_semantic"
)

## 🎯 Core Learning: 7 Retrieval Strategies

### Strategy Setup: Tracing & Monitoring

In [13]:
# setup langsmith tracing

from langsmith import Client, traceable

langsmith_client = Client()

### Strategy 1: Naive Retrieval (Baseline)

In [14]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retriever = baseline_vectorstore.as_retriever(search_kwargs={"k" : 10})

naive_retrieval_chain = (
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Strategy 2: BM25 Retrieval (Keyword-Based)

In [15]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(all_review_docs)

bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Strategy 3: Contextual Compression (AI Reranking)

In [16]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=naive_retriever
)

In [17]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Strategy 4: Multi-Query Retrieval (Query Expansion)



In [18]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever,
    llm=llm
)

In [19]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Strategy 5: Parent Document Retrieval (Hierarchical)

In [20]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Strategy 6: Ensemble Retrieval (Combined Methods)

In [21]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]

equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list,
    weights=equal_weighting
)

ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Strategy 7: Semantic Retrieval (Semantic Chunking)

In [22]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

## 📈 Performance Monitoring & Evaluation Setup

In [23]:
from langsmith import traceable

@traceable(name="naive_retrieval", run_type="chain", metadata={"method":"naive"})
def trace_naive_retrieval(question: str):
    try:
        result = naive_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="bm25_retrieval", run_type="chain", metadata={"method":"bm25"})
def trace_bm25_retrieval(question: str):
    try:
        # Use the correct chain variable name here
        res = bm25_retrieval_chain.invoke({"question": question})
        return {
            "response": res["response"].content,
            "context_docs": len(res["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="contextual_compression", run_type="chain", metadata={"method":"compression"})
def trace_contextual_compression(question: str):
    try:
        result = contextual_compression_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="multi_query_retrieval", run_type="chain", metadata={"method":"multi_query"})
def trace_multi_query_retrieval(question: str):
    try:
        result = multi_query_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="parent_document_retrieval", run_type="chain", metadata={"method":"parent_document"})
def trace_parent_document_retrieval(question: str):
    try:
        result = parent_document_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="ensemble_retrieval", run_type="chain", metadata={"method":"ensemble"})
def trace_ensemble_retrieval(question: str):
    try:
        result = ensemble_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="semantic_retrieval", run_type="chain", metadata={"method":"semantic"})
def trace_semantic_retrieval(question: str):
    try:
        result = semantic_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

print("✅ Traceable wrappers defined")


✅ Traceable wrappers defined


## ⚡ Execution & Real-Time Comparison

In [24]:
import pandas as pd

question = "Did people generally like John Wick?"

naive_retrieval_chain_response = trace_naive_retrieval(question)["response"]
bm25_retrieval_chain_response = trace_bm25_retrieval(question)["response"]
contextual_compression_retrieval_chain_response = trace_contextual_compression(question)["response"]
multi_query_retrieval_chain_response = trace_multi_query_retrieval(question)["response"]
semantic_retrieval_chain_response = trace_semantic_retrieval(question)["response"]

print("✅ All methods executed with tracing")

✅ All methods executed with tracing


#### Parent Document Retrieval traces

- broke these two out due to some serialization issues after adopting Redis
- helped with troubleshooting

In [25]:
parent_document_retrieval_chain_response = trace_parent_document_retrieval(question)["response"]
ensemble_retrieval_chain_response = trace_ensemble_retrieval(question)["response"]

In [26]:
from langchain_core.tracers.langchain import wait_for_all_tracers

# run after all your traceable calls
wait_for_all_tracers()

## 📊 Results Analysis & Performance Visualization

In [27]:
from langsmith import Client
# Assume langsmith_client is already initialized,
# and `project_name` is set as above.

# 1. Ensure the dataset exists or create it
dataset_name = f"{project_name}_runs_ds"
try:
    dataset = langsmith_client.create_dataset(
        dataset_name=dataset_name,
        description=(
            "All root chain runs from the John Wick retrieval-method notebook, "
            "including method, response, context_docs, tokens, costs, durations, and errors."
        )
    )
    print(f"✅ Created dataset: {dataset.name!r}")
except Exception:
    dataset = langsmith_client.read_dataset(dataset_name=dataset_name)
    print(f"ℹ️  Using existing dataset: {dataset.name!r}")

# 2. Fetch all top-level chain runs for the project
runs = list(langsmith_client.list_runs(
    project_name=project_name,
    is_root=True,
    run_type="chain",
))

# 3. Ingest each run as an Example in the dataset
for run in runs:
    langsmith_client.create_example_from_run(
        run=run,
        dataset_id=dataset.id
    )

print(f"🚀 Added {len(runs)} runs to dataset {dataset.name!r}")


✅ Created dataset: 'retrieval-method-comparison-20250529_142100_runs_ds'
🚀 Added 7 runs to dataset 'retrieval-method-comparison-20250529_142100_runs_ds'


### View LangGraph Dataset

In [28]:
print(f"🔗 View your dataset: {dataset.url}")

🔗 View your dataset: https://smith.langchain.com/o/2ad170d9-2e91-430d-9d70-cf6501e2184c/datasets/d3859a9b-6781-4b72-9b94-27739fbb9efb


### Upload Custom Dataset

In [29]:
import pandas as pd
from IPython.display import display, Markdown
from datetime import timezone, datetime, timedelta

# Fetch all top-level chain runs for our project
runs = list(langsmith_client.list_runs(
    project_name=project_name,
    is_root=True,
    run_type="chain",
    start_time=datetime.now(timezone.utc) - timedelta(hours=1)  # last hour
))

# Build a record per run, pulling in every field “as is”
records = []
for run in runs:
    records.append({
        "run_id":          str(run.id),
        "name":             run.name,
        "method":           run.metadata.get("method"),
        "status":           run.status,
        "start_time":       run.start_time,
        "end_time":         run.end_time,
        "duration_ms":      ((run.end_time - run.start_time).total_seconds()*1000)
                              if run.start_time and run.end_time else None,
        # cost & tokens
        "prompt_tokens":    run.prompt_tokens,
        "completion_tokens":run.completion_tokens,
        "total_tokens":     run.total_tokens,
        "prompt_cost":      run.prompt_cost,
        "completion_cost":  run.completion_cost,
        "total_cost":       run.total_cost,
        # errors
        "error":            run.error,                                   
        "wrapper_error":    (run.outputs or {}).get("error"),
        # wrapper outputs
        "response":         (run.outputs or {}).get("response"),
        "context_docs":     (run.outputs or {}).get("context_docs"),
    })

df_runs = pd.DataFrame.from_records(records)

## 🎯 Core Learning: 7 Retrieval Results

In [30]:
display(df_runs)

Unnamed: 0,run_id,name,method,status,start_time,end_time,duration_ms,prompt_tokens,completion_tokens,total_tokens,prompt_cost,completion_cost,total_cost,error,wrapper_error,response,context_docs
0,c7a6e61b-90eb-467e-bc2a-0a03f7676c67,ensemble_retrieval,ensemble,success,2025-05-29 21:23:25.485757,2025-05-29 21:23:34.279574,8793.817,6995,177,7172,0.002798,0.0002832,0.0030812,,,"Yes, people generally liked the first John Wic...",21
1,6812e52d-9f14-4098-893e-63b6b230bbb9,parent_document_retrieval,parent_document,success,2025-05-29 21:23:22.866991,2025-05-29 21:23:25.485412,2618.421,784,104,888,0.0003136,0.0001664,0.00048,,,"Based on the provided reviews, people generall...",3
2,ed00d74d-571f-41ef-8f3c-0a7e20dcb98e,semantic_retrieval,semantic,success,2025-05-29 21:23:19.705658,2025-05-29 21:23:22.860377,3154.719,3195,142,3337,0.001278,0.0002272,0.0015052,,,"Yes, people generally liked John Wick, especia...",10
3,2087dfac-ac2b-4cdd-a4fe-c7afbdc2e9f0,multi_query_retrieval,multi_query,success,2025-05-29 21:23:11.354325,2025-05-29 21:23:19.704954,8350.629,5827,380,6207,0.0023308,0.000608,0.0029388,,,"Based on the reviews provided, people generall...",16
4,cd0891ff-657a-42ad-8342-be7788c14ede,contextual_compression,compression,success,2025-05-29 21:23:08.273032,2025-05-29 21:23:11.353782,3080.75,1579,92,1671,0.0006316,0.0001472,0.0007788,,,"Yes, people generally liked John Wick. The fir...",3
5,f73c8f24-8a2d-4875-8d0f-3cf457128b08,bm25_retrieval,bm25,success,2025-05-29 21:23:06.669245,2025-05-29 21:23:08.272582,1603.337,1292,194,1486,0.0005168,0.0003104,0.0008272,,,People's opinions about John Wick movies vary ...,4
6,71ea77f8-aea0-40c7-b59d-dcdd2fd9b614,naive_retrieval,naive,success,2025-05-29 21:23:04.529172,2025-05-29 21:23:06.667873,2138.701,3823,95,3918,0.0015292,0.000152,0.0016812,,,"Yes, people generally liked John Wick. The rev...",10


### Upload Retrieval Results

In [31]:
# 1) Create or load the dataset
dataset_name = f"{project_name}_runs_custom_ds"
try:
    dataset = langsmith_client.create_dataset(
        dataset_name=dataset_name,
        description=(
            "All root chain runs from the John Wick retrieval-method notebook, "
            "capturing inputs, outputs, tokens, costs, duration, and errors."
        )
    )
    print(f"✅ Created dataset {dataset.name!r}")
except Exception:
    dataset = langsmith_client.read_dataset(dataset_name=dataset_name)
    print(f"ℹ️  Using existing dataset {dataset.name!r}")

# 2) Bulk‐ingest each run as an Example
for _, row in df_runs.iterrows():
    langsmith_client.create_example(
        dataset_id=dataset.id,
        inputs={
            "run_id": row["run_id"],
            "method": row["method"],
        },
        outputs={
            # include whichever outputs you care about:
            "response": row["response"],
            "context_docs": int(row["context_docs"]),
        },
        metadata={
            # include metrics & error info as metadata
            "status": row["status"],
            "duration_ms": float(row["duration_ms"]),
            "prompt_tokens": int(row["prompt_tokens"]),
            "completion_tokens": int(row["completion_tokens"]),
            "total_tokens": int(row["total_tokens"]),
            "prompt_cost": float(row["prompt_cost"]),
            "completion_cost": float(row["completion_cost"]),
            "total_cost": float(row["total_cost"]),
            # optionally include error
            **({"error": row["error"]} if pd.notna(row.get("error")) else {}),
            **({"wrapper_error": row["wrapper_error"]} if pd.notna(row.get("wrapper_error")) else {}),
        }
    )
print(f"✅ Added {len(df_runs)} runs to dataset {dataset.name!r}")
print("🔗 Dataset URL:", dataset.url)


✅ Created dataset 'retrieval-method-comparison-20250529_142100_runs_custom_ds'
✅ Added 7 runs to dataset 'retrieval-method-comparison-20250529_142100_runs_custom_ds'
🔗 Dataset URL: https://smith.langchain.com/o/2ad170d9-2e91-430d-9d70-cf6501e2184c/datasets/0dda7929-ecbd-4e0d-8425-4c540e8f1059


## 🛠️ Exploration Tools & Cleanup Functions

### List of Qdrant collections

In [32]:
# display Qdrant collections

existing = [c.name for c in cloud_client.get_collections().collections]

print(type(existing))
print(existing)

<class 'list'>
['johnwick_parent_children', 'johnwick_semantic', 'johnwick_baseline']


### Qdrant info by Storage Type

In [33]:
# display Qdrant vector store collection metadata

stores = {
    "baseline": baseline_vectorstore,
    "parent":  parent_children_vectorstore,
    "semantic": semantic_vectorstore,
}

for name, vs in stores.items():
    client = vs.client
    col    = vs.collection_name
    print(f"=== {name} ===")
    # 1) Existence check
    print("Exists?      ", client.collection_exists(col))
    # 2) Point count
    print("Point count: ", client.count(collection_name=col))
    # 3) Full collection info
    desc   = client.get_collection(collection_name=col)
    params = desc.config.params

    # — Vector dims & metric
    vec_field = params.vectors
    if isinstance(vec_field, dict):
        # multi-vector mode: pick the first VectorParams
        vp = next(iter(vec_field.values()))
    else:
        # single-vector mode: vectors is itself a VectorParams
        vp = vec_field
    print("Dim / metric:", vp.size, "/", vp.distance)

    # — Shard count & replication factor live on params
    print("Shards / repl:", params.shard_number, "/", params.replication_factor)

    print()


=== baseline ===
Exists?       True
Point count:  count=100
Dim / metric: 1536 / Cosine
Shards / repl: 1 / 1

=== parent ===
Exists?       True
Point count:  count=4817
Dim / metric: 1536 / Cosine
Shards / repl: 1 / 1

=== semantic ===
Exists?       True
Point count:  count=179
Dim / metric: 1536 / Cosine
Shards / repl: 1 / 1



### Redis info for Parent Document Store

In [34]:


# Assume `parent_document_store` already has docs via ParentDocumentRetriever
#  (i.e. you already did retriever.add_documents(...) or similar)

# 1) List all stored keys (document IDs)
all_keys = list(parent_document_store.yield_keys())
print(f"Total documents in store: {len(all_keys)}")
# print("Document IDs:", all_keys)

# 2) Fetch all Document objects
docs = parent_document_store.mget(all_keys)

# 3) Examine metadata schema
#    Collect all metadata field names across docs
all_fields = set()
for doc in docs:
    all_fields.update(doc.metadata.keys())

print(f"Metadata fields present: {sorted(all_fields)}")

# 4) Show per-field value types and a sample value
field_types = {field: set() for field in all_fields}
for doc in docs:
    for field, val in doc.metadata.items():
        field_types[field].add(type(val).__name__)

print("Metadata field types:")
for field, types in field_types.items():
    sample = next((d.metadata[field] for d in docs if field in d.metadata), None)
    print(f" • {field}: types={sorted(types)}, sample={sample!r}")

# 5) (Optional) Print out first N docs’ text lengths to gauge “dimensions”
for i, doc in enumerate(docs[:5], 1):
    text_len = len(doc.page_content)
    print(f"Doc {i} (ID={all_keys[i-1]}): {text_len} characters")


Total documents in store: 100
Metadata fields present: ['Author', 'Movie_Title', 'Rating', 'Review_Date', 'Review_Title', 'Review_Url', 'last_accessed_at', 'row', 'source']
Metadata field types:
 • last_accessed_at: types=['str'], sample='2025-05-26T14:21:07.737461'
 • Movie_Title: types=['str'], sample='John Wick 1'
 • Review_Date: types=['str'], sample='23 March 2023'
 • Review_Url: types=['str'], sample='/review/rw8945545/?ref_=tt_urv'
 • Review_Title: types=['str'], sample=' Violent and gripping story with plenty of unstopped action , shootouts and breathtaking fights\n'
 • source: types=['str'], sample='/home/donbr/aim/langchain-retrieval-methods/data/john_wick_1.csv'
 • row: types=['int'], sample=5
 • Author: types=['str'], sample='ma-cortes'
 • Rating: types=['int'], sample=7
Doc 1 (ID=5b2c2231-42c9-49b3-b81d-f201ed19f504): 1576 characters
Doc 2 (ID=bb379887-1494-4d58-bdf1-76fb4d357cee): 562 characters
Doc 3 (ID=9d106938-d7aa-42f2-b5f6-e0bb81d84e12): 940 characters
Doc 4 (ID=5de

### Delete Qdrant collections

In [35]:
from qdrant_client import QdrantClient, models
from qdrant_client.http.models import Distance, VectorParams

# initialize client (cloud or on-prem)
cloud_client = QdrantClient(
    url=QDRANT_API_URL,
    api_key=QDRANT_API_KEY,
    prefer_grpc=True,
)

# create conditional deletion flag
delete_collection = False

if delete_collection:
    # list of collections to drop
    collections_to_reset = [
        "johnwick_baseline",
        "johnwick_parent_children",
        "johnwick_semantic",
    ]

    for col_name in collections_to_reset:
        # guard against missing collections
        if cloud_client.collection_exists(col_name):
            cloud_client.delete_collection(
                collection_name=col_name,
                timeout=60,  # seconds
            )
            print(f"Deleted collection: {col_name}")
        else:
            print(f"Collection not found (skipped): {col_name}")


  cloud_client = QdrantClient(


### Response Object validation

In [36]:
from IPython.display import Markdown, display

# Map of titles to response objects
responses = {
    "Naive Retrieval Chain Response":              naive_retrieval_chain_response,
    "BM25 Retrieval Chain Response":               bm25_retrieval_chain_response,
    "Contextual Compression Chain Response":       contextual_compression_retrieval_chain_response,
    "Multi-Query Retrieval Chain Response":        multi_query_retrieval_chain_response,
    "Parent Document Retrieval Chain Response":    parent_document_retrieval_chain_response,
    "Ensemble Retrieval Chain Response":           ensemble_retrieval_chain_response,
    "Semantic Retrieval Chain Response":           semantic_retrieval_chain_response,
}

for header, resp in responses.items():
    display(Markdown(f"## {header}\n"))
    print("\n")
    print(resp)
    print("\n")


## Naive Retrieval Chain Response




Yes, people generally liked John Wick. The reviews highlight that the film was praised for its stylish and well-choreographed action sequences, Keanu Reeves' confident and cool performance, and its brisk pacing. Many reviewers called it one of the best action films of the year or even the decade, appreciating its simple yet effective revenge plot and the unique criminal underworld it portrayed. While a few reviews were less enthusiastic, the overall reception was very positive, especially among action fans.




## BM25 Retrieval Chain Response




People's opinions about John Wick movies vary based on the reviews provided.

- The first John Wick film (2014) received very positive feedback, described as "something special" with smooth, stylish action sequences and a compelling criminal underworld. Reviewers praised Keanu Reeves' performance and the film's kinetic, concise action, recommending it highly, especially for action fans.

- However, later installments received more mixed or negative reviews. For example, "John Wick: Chapter 4" was called the weakest in the series by one reviewer, criticized for lacking plot and meaning despite long action sequences. "John Wick 3" was described by a reviewer as boring, plotless, excessively violent, and filled with stereotypes.

Overall, while the original John Wick film was generally liked and appreciated for its style and action, some later films in the series garnered more divided or negative reactions. So, people generally liked the first film but had more mixed feelings about the 

## Contextual Compression Chain Response




Yes, people generally liked John Wick. The first film received high praise for its slick action sequences, Keanu Reeves' confident performance, and unique style. Reviews describe it as one of the best action films in recent years, highly entertaining, and a must-see for action fans. However, some later installments like John Wick 3 received more mixed reviews, with comments that "the magic is gone," but the initial reception of John Wick was very positive overall.




## Multi-Query Retrieval Chain Response




Based on the reviews provided, people generally liked the John Wick series, especially the earlier films.

- The first John Wick movie is often praised as a "remarkable, surprising film," "slick, violent fun," and one of the best action films in recent years. Reviewers highlight Keanu Reeves' performance, the stylish choreography, and the straightforward yet gripping revenge story. Ratings for the first movie range mostly from 7 to 10 out of 10 in these reviews.

- The series as a whole, including John Wick 2 and John Wick 4, has maintained a strong fan base and critical acclaim. One review mentions that the first three films all share the same IMDb rating of 7.4/10 and that "John Wick: Chapter 4" might be the best in the series, with a rating of 9/10.

- There are some dissenting opinions, especially for the later films (e.g., John Wick 3 and 4), where some reviewers found the action too over-the-top, the plot nonsensical, or the fights repetitive and boring. For example, one review

## Parent Document Retrieval Chain Response




Based on the provided reviews, people generally liked John Wick. One review praises the first movie for its well-choreographed action and emotional setup, recommending it highly. Another review states that the John Wick series has been consistently well received and even considers "John Wick: Chapter 4" as the best in the series. However, there is also a negative review of "John Wick 4" that criticizes the plot and fight scenes. Overall, the majority of reviews suggest that people generally enjoyed the John Wick movies.




## Ensemble Retrieval Chain Response




Yes, people generally liked the first John Wick film. Reviews highlight the movie's stylish, well-choreographed action sequences, Keanu Reeves' compelling performance, and its unique and fun take on the revenge thriller genre. Several reviewers describe it as a must-see for action fans, praising its smooth action, emotional undercurrent, and fresh approach compared to typical action films. While some critics found certain aspects simple or formulaic, the overall reception was very positive, especially for the original 2014 film. However, opinions on later sequels are more mixed, with some viewers feeling the franchise lost momentum in the third and fourth installments. But overall, the original John Wick was highly appreciated.




## Semantic Retrieval Chain Response




Yes, people generally liked John Wick, especially the first film. Reviews highlight it as a stylish, well-choreographed, and exciting action movie with smooth, intense action sequences and a simple yet effective plot. Many reviewers praised Keanu Reeves' performance and the film's unique take on the action genre, describing it as one of the best action films in recent years. 

However, opinions on the later films, such as John Wick 3, appear more mixed, with some reviews noting a loss of the original's magic or issues with pacing and narrative. Despite that, the franchise overall is well-received, with John Wick 4 regarded as a strong continuation that maintains the high standards of the series.




### Retrieval Chain validation

In [37]:
from IPython.display import Markdown, display

# Map of titles to chains
chains = {
    "Naive Retrieval":              naive_retrieval_chain,
    "BM25 Retrieval":               bm25_retrieval_chain,
    "Contextual Compression":       contextual_compression_retrieval_chain,
    "Multi-Query Retrieval":        multi_query_retrieval_chain,
    "Parent Document Retrieval":    parent_document_retrieval_chain,
    "Ensemble Retrieval":           ensemble_retrieval_chain,
    "Semantic Retrieval":           semantic_retrieval_chain,
}

for title, chain in chains.items():
    display(Markdown(f"## {title}\n"))
    print(chain)
    # print(chain.get_graph().draw_ascii())
    print("\n")


## Naive Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f8317afb6d0>, search_kwargs={'k': 10}),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the question.\n\nIf you do not know the answer, or are unsure, say you don't know.\n\nQuery:\n{question}\n\nContext:\n{context}\n"), additional_kwargs={})])
            | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7f83521721

## BM25 Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x7f8317810410>),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the question.\n\nIf you do not know the answer, or are unsure, say you don't know.\n\nQuery:\n{question}\n\nContext:\n{context}\n"), additional_kwargs={})])
            | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7f8352172190>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7f8351ccf

## Contextual Compression


first={
  context: RunnableLambda(itemgetter('question'))
           | ContextualCompressionRetriever(base_compressor=CohereRerank(client=<cohere.client_v2.ClientV2 object at 0x7f82ef9e1150>, top_n=3, model='rerank-english-v3.0', cohere_api_key=SecretStr('**********'), base_url=None, user_agent='langchain:partner'), base_retriever=VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f8317afb6d0>, search_kwargs={'k': 10})),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the que

## Multi-Query Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | MultiQueryRetriever(retriever=VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f8317afb6d0>, search_kwargs={'k': 10}), llm_chain=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='You are an AI language model assistant. Your task is \n    to generate 3 different versions of the given user \n    question to retrieve relevant documents from a vector  database. \n    By generating multiple perspectives on the user question, \n    your goal is to help the user overcome some of the limitations \n    of distance-based similarity search. Provide these alternative \n    questions separated by newlines. Original question: {question}')
             | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7f8352172190>, async_client=<openai.resources.chat.completion

## Parent Document Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | ParentDocumentRetriever(vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f8314497190>, docstore=<langchain.storage.encoder_backed.EncoderBackedStore object at 0x7f83740e9990>, search_kwargs={}, child_splitter=<langchain_text_splitters.character.RecursiveCharacterTextSplitter object at 0x7f8314496990>),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the question.\n\nIf you do not know the answer, or are unsure, say you don't know.\n\nQuery:\n{question}\n\nContext:\n{conte

## Ensemble Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | EnsembleRetriever(retrievers=[BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x7f8317810410>), VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f8317afb6d0>, search_kwargs={'k': 10}), ParentDocumentRetriever(vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f8314497190>, docstore=<langchain.storage.encoder_backed.EncoderBackedStore object at 0x7f83740e9990>, search_kwargs={}, child_splitter=<langchain_text_splitters.character.RecursiveCharacterTextSplitter object at 0x7f8314496990>), ContextualCompressionRetriever(base_compressor=CohereRerank(client=<cohere.client_v2.ClientV2 object at 0x7f82ef9e1150>, top_n=3, model='rerank-english-v3.0', cohere_api_key=SecretStr('**********'), base_url=None, user_agent='langchain:partner'), base_retriever=VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddin

## Semantic Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f82fc2341d0>, search_kwargs={'k': 10}),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the question.\n\nIf you do not know the answer, or are unsure, say you don't know.\n\nQuery:\n{question}\n\nContext:\n{context}\n"), additional_kwargs={})])
            | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7f83521721