# Introduction

Welcome to the **LangChain Retrieval Methods** notebook.  
In this tutorial you will:

1. **Load** a small corpus of John Wick movie reviews.
2. **Explore** seven distinct retrieval strategies:
   - Naive (whole‚Äêdocument vectors)
   - BM25 (keyword matching)
   - Contextual Compression (reranking)
   - Multi‚ÄêQuery (query expansion)
   - Parent Document (hierarchical chunks)
   - Ensemble (fusion of methods)
   - Semantic (boundary‚Äêaware chunking)

3. **Compare** each method across:
   - Retrieval **quality** (recall@k, qualitative response patterns)
   - **Latency** (ms per query)
   - **Cost** (API/token usage)
   - **Resource footprint** (index size & shape)

4. **Visualize** key metrics and response examples to understand trade-offs.

By the end of the notebook, you‚Äôll know:
- When to **start simple** (Naive or BM25) versus **scale up** (Ensemble or Semantic).
- How context-window advances (4 K ‚Üí 32 K ‚Üí 128 K) and loader-splitter decoupling shape modern RAG architectures.
- Practical tips for **production readiness**, including index sharding, zero-downtime reindexes, and drift monitoring.

> **Prerequisites**  
> - Python 3.11 environment (see Quickstart)  
> - Access to a Qdrant Cloud instance (with API key)  
> - OpenAI API credentials for embedding & reranking  

Run the cells in order, or jump to the section that interests you. Let‚Äôs get started!  


## Setup

In [1]:
#!uv pip install -qU langchain langchain-community langchain-experimental langchain-openai langchain-qdrant langchain-cohere rank_bm25 qdrant-client langsmith grandalf

In [2]:
from pathlib import Path
import requests
from dotenv import load_dotenv
import os

import os
from datetime import datetime

load_dotenv()

# Build a dynamic project name (e.g. include timestamp)
project_name = f"retrieval-method-comparison-{datetime.now().strftime('%Y%m%d_%H%M%S')}"

os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')
os.environ["COHERE_API_KEY"] = os.getenv('COHERE_API_KEY')

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = project_name
os.environ["LANGSMITH_API_KEY"] = os.getenv('LANGSMITH_API_KEY')

QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")
QDRANT_API_URL = os.getenv("QDRANT_API_URL")

In [3]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

llm = ChatOpenAI(model="gpt-4.1-mini")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [4]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

## Data Preparation

In [5]:
"""### Data Preparation"""

# Set up a consistent data directory in the user's home directory
DATA_DIR = Path.home() / "data"
DATA_DIR.mkdir(exist_ok=True)

# URLs and filenames
urls = [
    ("https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv", "john_wick_1.csv"),
    ("https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv", "john_wick_2.csv"),
    ("https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv", "john_wick_3.csv"),
    ("https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv", "john_wick_4.csv"),
]

# Download files if not already present
for url, fname in urls:
    file_path = DATA_DIR / fname
    if not file_path.exists():
        print(f"Downloading {fname}...")
        r = requests.get(url)
        r.raise_for_status()
        file_path.write_bytes(r.content)
    else:
        print(f"{fname} already exists.")

john_wick_1.csv already exists.
john_wick_2.csv already exists.
john_wick_3.csv already exists.
john_wick_4.csv already exists.


In [6]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

parent_docs = documents

In [7]:
# display length of parent_docs and docs
print(f"Length of parent_docs: {len(parent_docs)}")
print(f"Length of documents: {len(documents)}")

Length of parent_docs: 100
Length of documents: 100


## Setup Vector Stores

In [8]:
import os
from langchain_qdrant import QdrantVectorStore  # Updated import
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient, models
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_experimental.text_splitter import SemanticChunker
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore

In [9]:
# 1. Main vectorstore using qdrant cloud
baseline_vectorstore = QdrantVectorStore.from_documents(
    documents,
    embeddings,
    url=QDRANT_API_URL,
    api_key=QDRANT_API_KEY,
    prefer_grpc=True,
    collection_name="johnwick_baseline"
)

In [10]:
# 2. Parent document setup with qdrant cloud client

# Initialize cloud client
cloud_client = QdrantClient(
    url=QDRANT_API_URL,
    api_key=QDRANT_API_KEY,
    prefer_grpc=True
)

# Check if the collection exists
if not cloud_client.collection_exists("johnwick_parent"):
    cloud_client.create_collection(
        collection_name="johnwick_parent",
        vectors_config=models.VectorParams(
            size=1536,
            distance=models.Distance.COSINE
        ),
    )

# Construct the VectorStore using cloud client
parent_vectorstore = QdrantVectorStore(
    embedding=embeddings,
    client=cloud_client,
    collection_name="johnwick_parent",
)

store = InMemoryStore()

child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

parent_document_retriever.add_documents(parent_docs, ids=None)

In [11]:
# 3. Semantic chunking using qdrant cloud
semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

semantic_documents = semantic_chunker.split_documents(documents)

semantic_vectorstore = QdrantVectorStore.from_documents(
    semantic_documents,
    embeddings,
    url=QDRANT_API_URL,
    api_key=QDRANT_API_KEY,
    prefer_grpc=True,
    collection_name="johnwick_semantic"
)

## Retrievers

### Setup

In [12]:
# setup langsmith tracing

from langsmith import Client, traceable

langsmith_client = Client()

### Naive Retriever

In [13]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retriever = baseline_vectorstore.as_retriever(search_kwargs={"k" : 10})

naive_retrieval_chain = (
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### BM25 Retriever

In [14]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Contextual Compression Retriever

In [15]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=naive_retriever
)

In [16]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Multi-Query Retriever



In [17]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever,
    llm=llm
)

In [18]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Parent Document Retriever

In [19]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Ensemble Retriever

In [20]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]

equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list,
    weights=equal_weighting
)

ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

### Semantic Retriever - using semantically chunked vector store

In [21]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

## Create Traceable Wrappers

In [22]:
from langsmith import traceable

@traceable(name="naive_retrieval", run_type="chain", metadata={"method":"naive"})
def trace_naive_retrieval(question: str):
    try:
        result = naive_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="bm25_retrieval", run_type="chain", metadata={"method":"bm25"})
def trace_bm25_retrieval(question: str):
    try:
        # Use the correct chain variable name here
        res = bm25_retrieval_chain.invoke({"question": question})
        return {
            "response": res["response"].content,
            "context_docs": len(res["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="contextual_compression", run_type="chain", metadata={"method":"compression"})
def trace_contextual_compression(question: str):
    try:
        result = contextual_compression_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="multi_query_retrieval", run_type="chain", metadata={"method":"multi_query"})
def trace_multi_query_retrieval(question: str):
    try:
        result = multi_query_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="parent_document_retrieval", run_type="chain", metadata={"method":"parent_document"})
def trace_parent_document_retrieval(question: str):
    try:
        result = parent_document_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="ensemble_retrieval", run_type="chain", metadata={"method":"ensemble"})
def trace_ensemble_retrieval(question: str):
    try:
        result = ensemble_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

@traceable(name="semantic_retrieval", run_type="chain", metadata={"method":"semantic"})
def trace_semantic_retrieval(question: str):
    try:
        result = semantic_retrieval_chain.invoke({"question": question})
        return {
            "response": result["response"].content,
            "context_docs": len(result["context"])
        }
    except Exception as e:
        return {"error": str(e)}

print("‚úÖ Traceable wrappers defined")


‚úÖ Traceable wrappers defined


## Run All Traceable Retrievals

In [23]:
import pandas as pd

question = "Did people generally like John Wick?"

naive_retrieval_chain_response = trace_naive_retrieval(question)["response"]
bm25_retrieval_chain_response = trace_bm25_retrieval(question)["response"]
contextual_compression_retrieval_chain_response = trace_contextual_compression(question)["response"]
multi_query_retrieval_chain_response = trace_multi_query_retrieval(question)["response"]
parent_document_retrieval_chain_response = trace_parent_document_retrieval(question)["response"]
ensemble_retrieval_chain_response = trace_ensemble_retrieval(question)["response"]
semantic_retrieval_chain_response = trace_semantic_retrieval(question)["response"]

print("‚úÖ All methods executed with tracing")

‚úÖ All methods executed with tracing


In [24]:
from langchain_core.tracers.langchain import wait_for_all_tracers

# ‚Ä¶ after all your traceable calls ‚Ä¶
wait_for_all_tracers()

In [25]:
# LangSmith Run Debugging information (for the first run)

root_runs_gen = langsmith_client.list_runs(
  project_name=project_name,
  is_root=True,
  run_type="chain"      # only top-level chain runs
)

# First, let's examine the structure of a single run before materializing all runs
first_run = next(root_runs_gen)
print("Available attributes in a run:", dir(first_run))
print("\nExample run metadata:", first_run.metadata)
print("\nExample run outputs:", first_run.outputs)

Available attributes in a run: ['Config', '__abstractmethods__', '__annotations__', '__class__', '__class_vars__', '__config__', '__custom_root_type__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__exclude_fields__', '__fields__', '__fields_set__', '__format__', '__ge__', '__get_validators__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__include_fields__', '__init__', '__init_subclass__', '__iter__', '__json_encoder__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__post_root_validators__', '__pre_root_validators__', '__pretty__', '__private_attributes__', '__reduce__', '__reduce_ex__', '__repr__', '__repr_args__', '__repr_name__', '__repr_str__', '__rich_repr__', '__schema_cache__', '__setattr__', '__setstate__', '__signature__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__try_update_forward_refs__', '__validators__', '_abc_impl', '_calculate_keys', '_copy_and_set_values', '_decompose_class', '_enforce_dict_if_root', '_get_v

## Create a LangSmith Dataset to preserve the memories!

In [26]:
# %% [markdown]
# ## Ingest All Root Chain Runs into LangSmith Dataset

# %%
from langsmith import Client

# Assume langsmith_client is already initialized,
# and `project_name` is set as above.

# 1. Ensure the dataset exists or create it
dataset_name = f"{project_name}_runs_ds"
try:
    dataset = langsmith_client.create_dataset(
        dataset_name=dataset_name,
        description=(
            "All root chain runs from the John Wick retrieval-method notebook, "
            "including method, response, context_docs, tokens, costs, durations, and errors."
        )
    )
    print(f"‚úÖ Created dataset: {dataset.name!r}")
except Exception:
    dataset = langsmith_client.read_dataset(dataset_name=dataset_name)
    print(f"‚ÑπÔ∏è  Using existing dataset: {dataset.name!r}")

# 2. Fetch all top-level chain runs for the project
runs = list(langsmith_client.list_runs(
    project_name=project_name,
    is_root=True,
    run_type="chain",
))

# 3. Ingest each run as an Example in the dataset
for run in runs:
    langsmith_client.create_example_from_run(
        run=run,
        dataset_id=dataset.id
    )

print(f"üöÄ Added {len(runs)} runs to dataset {dataset.name!r}")
print(f"üîó View your dataset: {dataset.url}")


‚úÖ Created dataset: 'retrieval-method-comparison-20250524_012506_runs_ds'
üöÄ Added 7 runs to dataset 'retrieval-method-comparison-20250524_012506_runs_ds'
üîó View your dataset: https://smith.langchain.com/o/2ad170d9-2e91-430d-9d70-cf6501e2184c/datasets/5c0a4bfd-9011-4f3d-98a8-3608d9fcac7c


## Upload Custom Dataset

In [27]:
# %% [markdown]
# ## 1Ô∏è‚É£ Build & Display Runs DataFrame

# %%
import pandas as pd
from IPython.display import display, Markdown
from datetime import timezone, datetime, timedelta

# Fetch all top-level chain runs for our project
runs = list(langsmith_client.list_runs(
    project_name=project_name,
    is_root=True,
    run_type="chain",
    start_time=datetime.now(timezone.utc) - timedelta(hours=1)  # last hour
))

# Build a record per run, pulling in every field ‚Äúas is‚Äù
records = []
for run in runs:
    records.append({
        "run_id":          str(run.id),
        "name":             run.name,
        "method":           run.metadata.get("method"),
        "status":           run.status,
        "start_time":       run.start_time,
        "end_time":         run.end_time,
        "duration_ms":      ((run.end_time - run.start_time).total_seconds()*1000)
                              if run.start_time and run.end_time else None,
        # cost & tokens
        "prompt_tokens":    run.prompt_tokens,
        "completion_tokens":run.completion_tokens,
        "total_tokens":     run.total_tokens,
        "prompt_cost":      run.prompt_cost,
        "completion_cost":  run.completion_cost,
        "total_cost":       run.total_cost,
        # errors
        "error":            run.error,                                   
        "wrapper_error":    (run.outputs or {}).get("error"),
        # wrapper outputs
        "response":         (run.outputs or {}).get("response"),
        "context_docs":     (run.outputs or {}).get("context_docs"),
    })

df_runs = pd.DataFrame.from_records(records)

display(Markdown("## üìä LangSmith Run Summaries"))
display(df_runs)


## üìä LangSmith Run Summaries

Unnamed: 0,run_id,name,method,status,start_time,end_time,duration_ms,prompt_tokens,completion_tokens,total_tokens,prompt_cost,completion_cost,total_cost,error,wrapper_error,response,context_docs
0,669f8f77-9fd8-4e5a-a72b-8b59297570af,semantic_retrieval,semantic,success,2025-05-24 08:27:58.224322,2025-05-24 08:28:00.744770,2520.448,3049,179,3228,0.0012196,0.0002864,0.001506,,,"Yes, people generally liked John Wick. The fir...",10
1,1a50dc51-bfc0-4673-8f4d-bac7f6e7e7aa,ensemble_retrieval,ensemble,success,2025-05-24 08:27:50.815286,2025-05-24 08:27:58.222934,7407.648,5977,283,6260,0.0023908,0.0004528,0.0028436,,,"Based on the provided context, people generall...",18
2,3dd0525f-8275-44d1-b65f-0c4c469d1f11,parent_document_retrieval,parent_document,success,2025-05-24 08:27:48.117814,2025-05-24 08:27:50.814115,2696.301,763,125,888,0.0003052,0.0002,0.0005052,,,"Based on the provided context, people generall...",3
3,8155bfd0-638a-442e-bb5c-aa84c495a736,multi_query_retrieval,multi_query,success,2025-05-24 08:27:38.911801,2025-05-24 08:27:48.115636,9203.835,5277,375,5652,0.0021108,0.0006,0.0027108,,,"Based on the provided reviews and ratings, peo...",16
4,7df0a0b9-ba8f-48b5-b781-9112f217a4b3,contextual_compression,compression,success,2025-05-24 08:27:36.480180,2025-05-24 08:27:38.911420,2431.24,1530,84,1614,0.000612,0.0001344,0.0007464,,,"Yes, people generally liked John Wick. Reviews...",3
5,3295ef13-8f04-432d-8bb4-d1fb453789e7,bm25_retrieval,bm25,success,2025-05-24 08:27:34.308511,2025-05-24 08:27:36.479561,2171.05,1264,99,1363,0.0005056,0.0001584,0.000664,,,People generally liked the first John Wick mov...,4
6,d8fab973-04b3-4f2f-91f2-25813f51d215,naive_retrieval,naive,success,2025-05-24 08:27:31.612182,2025-05-24 08:27:34.308149,2695.967,3660,98,3758,0.001464,0.0001568,0.0016208,,,"Yes, people generally liked ""John Wick."" The r...",10


In [28]:
# %% [markdown]
# ## 2Ô∏è‚É£ Ingest Runs into a LangSmith Dataset

# %%
# 1) Create or load the dataset
dataset_name = f"{project_name}_runs_custom_ds"
try:
    dataset = langsmith_client.create_dataset(
        dataset_name=dataset_name,
        description=(
            "All root chain runs from the John Wick retrieval-method notebook, "
            "capturing inputs, outputs, tokens, costs, duration, and errors."
        )
    )
    print(f"‚úÖ Created dataset {dataset.name!r}")
except Exception:
    dataset = langsmith_client.read_dataset(dataset_name=dataset_name)
    print(f"‚ÑπÔ∏è  Using existing dataset {dataset.name!r}")

# 2) Bulk‚Äêingest each run as an Example
for _, row in df_runs.iterrows():
    langsmith_client.create_example(
        dataset_id=dataset.id,
        inputs={
            "run_id": row["run_id"],
            "method": row["method"],
        },
        outputs={
            # include whichever outputs you care about:
            "response": row["response"],
            "context_docs": int(row["context_docs"]),
        },
        metadata={
            # include metrics & error info as metadata
            "status": row["status"],
            "duration_ms": float(row["duration_ms"]),
            "prompt_tokens": int(row["prompt_tokens"]),
            "completion_tokens": int(row["completion_tokens"]),
            "total_tokens": int(row["total_tokens"]),
            "prompt_cost": float(row["prompt_cost"]),
            "completion_cost": float(row["completion_cost"]),
            "total_cost": float(row["total_cost"]),
            # optionally include error
            **({"error": row["error"]} if pd.notna(row.get("error")) else {}),
            **({"wrapper_error": row["wrapper_error"]} if pd.notna(row.get("wrapper_error")) else {}),
        }
    )
print(f"‚úÖ Added {len(df_runs)} runs to dataset {dataset.name!r}")
print("üîó Dataset URL:", dataset.url)


‚úÖ Created dataset 'retrieval-method-comparison-20250524_012506_runs_custom_ds'
‚úÖ Added 7 runs to dataset 'retrieval-method-comparison-20250524_012506_runs_custom_ds'
üîó Dataset URL: https://smith.langchain.com/o/2ad170d9-2e91-430d-9d70-cf6501e2184c/datasets/dc8f154b-0e67-4dc6-82d3-e7ed23616d3e


## Utilities

In [29]:
# display existing collections

existing = [c.name for c in cloud_client.get_collections().collections]

print(type(existing))
print(existing)

<class 'list'>
['johnwick_baseline', 'johnwick_semantic', 'johnwick_parent', 'airbnb_pdf_rec_1000_200_images', 'mcp-anthropic-desktop', 'mcp-nsclc', 'ambrose_lake_covenant']


In [30]:
# display vector store collection metadata

stores = {
    "baseline": baseline_vectorstore,
    "parent":  parent_vectorstore,
    "semantic": semantic_vectorstore,
}

for name, vs in stores.items():
    client = vs.client
    col    = vs.collection_name
    print(f"=== {name} ===")
    # 1) Existence check
    print("Exists?      ", client.collection_exists(col))
    # 2) Point count
    print("Point count: ", client.count(collection_name=col))
    # 3) Full collection info
    desc   = client.get_collection(collection_name=col)
    params = desc.config.params

    # ‚Äî Vector dims & metric
    vec_field = params.vectors
    if isinstance(vec_field, dict):
        # multi-vector mode: pick the first VectorParams
        vp = next(iter(vec_field.values()))
    else:
        # single-vector mode: vectors is itself a VectorParams
        vp = vec_field
    print("Dim / metric:", vp.size, "/", vp.distance)

    # ‚Äî Shard count & replication factor live on params
    print("Shards / repl:", params.shard_number, "/", params.replication_factor)

    print()


=== baseline ===
Exists?       True
Point count:  count=100
Dim / metric: 1536 / Cosine
Shards / repl: 1 / 1

=== parent ===
Exists?       True
Point count:  count=4817
Dim / metric: 1536 / Cosine
Shards / repl: 1 / 1

=== semantic ===
Exists?       True
Point count:  count=179
Dim / metric: 1536 / Cosine
Shards / repl: 1 / 1



In [31]:
# Assume `store` already has docs via ParentDocumentRetriever
#  (i.e. you already did retriever.add_documents(...) or similar)

# 1) List all stored keys (document IDs)
all_keys = list(store.yield_keys())
print(f"Total documents in store: {len(all_keys)}")
# print("Document IDs:", all_keys)

# 2) Fetch all Document objects
docs = store.mget(all_keys)

# 3) Examine metadata schema
#    Collect all metadata field names across docs
all_fields = set()
for doc in docs:
    all_fields.update(doc.metadata.keys())

print(f"Metadata fields present: {sorted(all_fields)}")

# 4) Show per-field value types and a sample value
field_types = {field: set() for field in all_fields}
for doc in docs:
    for field, val in doc.metadata.items():
        field_types[field].add(type(val).__name__)

print("Metadata field types:")
for field, types in field_types.items():
    sample = next((d.metadata[field] for d in docs if field in d.metadata), None)
    print(f" ‚Ä¢ {field}: types={sorted(types)}, sample={sample!r}")

# 5) (Optional) Print out first N docs‚Äô text lengths to gauge ‚Äúdimensions‚Äù
for i, doc in enumerate(docs[:5], 1):
    text_len = len(doc.page_content)
    print(f"Doc {i} (ID={all_keys[i-1]}): {text_len} characters")


Total documents in store: 100
Metadata fields present: ['Author', 'Movie_Title', 'Rating', 'Review_Date', 'Review_Title', 'Review_Url', 'last_accessed_at', 'row', 'source']
Metadata field types:
 ‚Ä¢ Review_Date: types=['str'], sample='6 May 2015'
 ‚Ä¢ Rating: types=['int'], sample=8
 ‚Ä¢ Review_Url: types=['str'], sample='/review/rw3233896/?ref_=tt_urv'
 ‚Ä¢ Review_Title: types=['str'], sample=' Kinetic, concise, and stylish; John Wick kicks ass.\n'
 ‚Ä¢ Movie_Title: types=['str'], sample='John Wick 1'
 ‚Ä¢ last_accessed_at: types=['datetime'], sample=datetime.datetime(2025, 5, 21, 1, 25, 7, 682135)
 ‚Ä¢ row: types=['int'], sample=0
 ‚Ä¢ Author: types=['str'], sample='lnvicta'
 ‚Ä¢ source: types=['str'], sample='john_wick_1.csv'
Doc 1 (ID=1428cd20-5729-4101-a425-3e822aeff0e2): 599 characters
Doc 2 (ID=dbcd9e5e-f981-4f80-86ea-455ec001016e): 369 characters
Doc 3 (ID=5c2047df-757b-423b-8bfb-379fd2e23644): 256 characters
Doc 4 (ID=d0ad9d8e-301b-424e-b67c-9b03f96f326f): 426 characters
Doc 

### Delete qdrant collections

In [32]:
from qdrant_client import QdrantClient, models
from qdrant_client.http.models import Distance, VectorParams

# initialize client (cloud or on-prem)
cloud_client = QdrantClient(
    url=QDRANT_API_URL,
    api_key=QDRANT_API_KEY,
    prefer_grpc=True,
)

# create conditional deletion flag
delete_collection = False

if delete_collection:
    # list of collections to drop
    collections_to_reset = [
        "johnwick_baseline",
        "johnwick_parent",
        "johnwick_semantic",
    ]

    for col_name in collections_to_reset:
        # guard against missing collections
        if cloud_client.collection_exists(col_name):
            cloud_client.delete_collection(
                collection_name=col_name,
                timeout=60,  # seconds
            )
            print(f"Deleted collection: {col_name}")
        else:
            print(f"Collection not found (skipped): {col_name}")


### Response Object validation

In [33]:
from IPython.display import Markdown, display

# Map of titles to response objects
responses = {
    "Naive Retrieval Chain Response":              naive_retrieval_chain_response,
    "BM25 Retrieval Chain Response":               bm25_retrieval_chain_response,
    "Contextual Compression Chain Response":       contextual_compression_retrieval_chain_response,
    "Multi-Query Retrieval Chain Response":        multi_query_retrieval_chain_response,
    "Parent Document Retrieval Chain Response":    parent_document_retrieval_chain_response,
    "Ensemble Retrieval Chain Response":           ensemble_retrieval_chain_response,
    "Semantic Retrieval Chain Response":           semantic_retrieval_chain_response,
}

for header, resp in responses.items():
    display(Markdown(f"## {header}\n"))
    print("\n")
    print(resp)
    print("\n")


## Naive Retrieval Chain Response




Yes, people generally liked "John Wick." The reviews highlight the film's slick action sequences, Keanu Reeves' strong performance, stylish choreography, and unique take on the revenge thriller genre. Many reviewers praised it as one of the best action films in recent years, noting its intense and well-crafted fight scenes, engaging pace, and well-developed world. While a few found it somewhat generic, the majority regarded it as a highly entertaining and fresh action movie that set new standards for the genre.




## BM25 Retrieval Chain Response




People generally liked the first John Wick movie, as it received high praise for its stylish, kinetic action and unique criminal underworld, with ratings of 8 and 10 noted in the reviews. However, opinions seem mixed or negative for the later films, especially John Wick 3 and John Wick 4, which were criticized for lacking plot and being overly violent or weak compared to the original. Overall, the first film was well-liked, but the sequels received more divided or negative reception.




## Contextual Compression Chain Response




Yes, people generally liked John Wick. Reviews highlight Keanu Reeves' slick performance, brilliant and well-choreographed action sequences, and the film‚Äôs stylish and intense depiction of a criminal underworld. It was praised as one of the best action films in recent years and recommended especially for action fans. However, some later reviews indicate that the magic felt strongest in the first film. Overall, the reception was very positive.




## Multi-Query Retrieval Chain Response




Based on the provided reviews and ratings, people generally liked the John Wick films, especially the first two installments and John Wick 4, though opinions vary somewhat across the series.

- The first John Wick movie received very positive feedback, with multiple reviews praising Keanu Reeves' performance, slick action sequences, stylish choreography, and a compelling revenge storyline centered around his dog's murder. Ratings like 9, 10, 8 are common, and reviewers called it "the coolest action film you'll see all year," "smoothest action film in a long time," and "something special." Some saw it as a must-watch for action fans.

- John Wick 2 was also well-received by many, with ratings around 8 and positive comments about its fast-paced story and high-quality action, though some felt it didn't surprise them as much as the first film.

- The third film received more mixed reviews, with some fans enjoying the choreography but others feeling the magic was fading, noting issues wit

## Parent Document Retrieval Chain Response




Based on the provided context, people generally liked the John Wick series. For example, one review describes the first John Wick movie as highly recommended, praising its action and emotional setup. Another review mentions that the series has remained remarkably consistent and well received, with the reviewer considering "John Wick: Chapter 4" the best in the series.

However, there are some negative opinions as well, such as a review of "John Wick 4" that calls it horrible and criticizes the plot and fight scenes.

Overall, while there are mixed opinions, the general sentiment from the reviews suggests that many people do like John Wick.




## Ensemble Retrieval Chain Response




Based on the provided context, people generally liked the first John Wick movie. Multiple reviews praise its action sequences, Keanu Reeves's performance, stylish direction, and unique world-building. For example:

- One reviewer gave it a 10/10, calling it "something special" with "smooth" action sequences and a cool criminal underworld.
- Another rated it 9/10 and described it as the best action film of the year and one of the best in the past decade, highlighting the brutal but fun nature and Keanu Reeves's charisma.
- Several others rated it 8 or higher, emphasizing its kinetic action, stylish choreography, and fresh take on revenge thrillers.
- Although there are some moderate or mixed opinions (ratings around 5 or 6) that found it generic or simple, the overall tone is very positive.

In contrast, some later installments in the series received more mixed or negative reactions, but the question specifically targets "John Wick" (generally understood as the first film), where the 

## Semantic Retrieval Chain Response




Yes, people generally liked John Wick. The first film received very positive reviews praising its stylish action sequences, Keanu Reeves' performance, and its unique take on the action genre. Ratings for the first John Wick movie include 9/10 and 8/10, with reviewers calling it "the coolest action film you'll see all year," "slick, violent fun," and "the best action film of the year." 

While the third installment had some mixed reviews, with one review giving it a 5/10 and mentioning that "the magic is gone," the overall franchise remains well-received. The fourth film was also praised highly, with ratings of 9/10 and comments on its consistency and quality even after multiple sequels.

In summary, the John Wick series has been generally well-liked, especially the first film, and has maintained a strong fanbase throughout its sequels.




### Retrieval Chain visualizations

In [34]:
from IPython.display import Markdown, display

# Map of titles to chains
chains = {
    "Naive Retrieval":              naive_retrieval_chain,
    "BM25 Retrieval":               bm25_retrieval_chain,
    "Contextual Compression":       contextual_compression_retrieval_chain,
    "Multi-Query Retrieval":        multi_query_retrieval_chain,
    "Parent Document Retrieval":    parent_document_retrieval_chain,
    "Ensemble Retrieval":           ensemble_retrieval_chain,
    "Semantic Retrieval":           semantic_retrieval_chain,
}

for title, chain in chains.items():
    display(Markdown(f"## {title}\n"))
    print(chain)
    # print(chain.get_graph().draw_ascii())
    print("\n")


## Naive Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f350d1aacd0>, search_kwargs={'k': 10}),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the question.\n\nIf you do not know the answer, or are unsure, say you don't know.\n\nQuery:\n{question}\n\nContext:\n{context}\n"), additional_kwargs={})])
            | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7f354364fc

## BM25 Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x7f34c4b93890>),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the question.\n\nIf you do not know the answer, or are unsure, say you don't know.\n\nQuery:\n{question}\n\nContext:\n{context}\n"), additional_kwargs={})])
            | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7f354364fc10>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7f3543086

## Contextual Compression


first={
  context: RunnableLambda(itemgetter('question'))
           | ContextualCompressionRetriever(base_compressor=CohereRerank(client=<cohere.client_v2.ClientV2 object at 0x7f34c4b4c050>, top_n=3, model='rerank-english-v3.0', cohere_api_key=SecretStr('**********'), base_url=None, user_agent='langchain:partner'), base_retriever=VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f350d1aacd0>, search_kwargs={'k': 10})),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the que

## Multi-Query Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | MultiQueryRetriever(retriever=VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f350d1aacd0>, search_kwargs={'k': 10}), llm_chain=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='You are an AI language model assistant. Your task is \n    to generate 3 different versions of the given user \n    question to retrieve relevant documents from a vector  database. \n    By generating multiple perspectives on the user question, \n    your goal is to help the user overcome some of the limitations \n    of distance-based similarity search. Provide these alternative \n    questions separated by newlines. Original question: {question}')
             | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7f354364fc10>, async_client=<openai.resources.chat.completion

## Parent Document Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | ParentDocumentRetriever(vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f3504443e10>, docstore=<langchain_core.stores.InMemoryStore object at 0x7f350d231050>, search_kwargs={}, child_splitter=<langchain_text_splitters.character.RecursiveCharacterTextSplitter object at 0x7f3504443e50>),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the question.\n\nIf you do not know the answer, or are unsure, say you don't know.\n\nQuery:\n{question}\n\nContext:\n{context}\n"), additio

## Ensemble Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | EnsembleRetriever(retrievers=[BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x7f34c4b93890>), VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f350d1aacd0>, search_kwargs={'k': 10}), ParentDocumentRetriever(vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f3504443e10>, docstore=<langchain_core.stores.InMemoryStore object at 0x7f350d231050>, search_kwargs={}, child_splitter=<langchain_text_splitters.character.RecursiveCharacterTextSplitter object at 0x7f3504443e50>), ContextualCompressionRetriever(base_compressor=CohereRerank(client=<cohere.client_v2.ClientV2 object at 0x7f34c4b4c050>, top_n=3, model='rerank-english-v3.0', cohere_api_key=SecretStr('**********'), base_url=None, user_agent='langchain:partner'), base_retriever=VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstor

## Semantic Retrieval


first={
  context: RunnableLambda(itemgetter('question'))
           | VectorStoreRetriever(tags=['QdrantVectorStore', 'OpenAIEmbeddings'], vectorstore=<langchain_qdrant.qdrant.QdrantVectorStore object at 0x7f34c4bc2f90>, search_kwargs={'k': 10}),
  question: RunnableLambda(itemgetter('question'))
} middle=[RunnableAssign(mapper={
  context: RunnableLambda(itemgetter('context'))
})] last={
  response: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are a helpful and kind assistant. Use the context provided below to answer the question.\n\nIf you do not know the answer, or are unsure, say you don't know.\n\nQuery:\n{question}\n\nContext:\n{context}\n"), additional_kwargs={})])
            | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7f354364fc