Skip to content

Proposal: Add Retrieval Pipeline Abstractions (Microsoft.Extensions.DataRetrieval) #7507

@luisquintanilla

Description

@luisquintanilla

Proposal: Add Retrieval Pipeline Abstractions (Microsoft.Extensions.DataRetrieval)

Summary

This issue proposes adding retrieval pipeline abstractions to dotnet/extensions, enabling .NET developers to compose advanced Retrieval-Augmented Generation (RAG) retrieval pipelines using a consistent, pluggable model.

In a RAG application, retrieval is the step that finds relevant information to give to an LLM. These abstractions allow developers to add processing stages around vector search — transforming queries before searching and refining results after — without coupling to any specific vector store, LLM provider, or retrieval strategy.

These packages complement the existing Microsoft.Extensions.DataIngestion (for writing data into vector stores) by providing the symmetric read-side: pre-search query processing, vector search orchestration, and post-search result processing.

Motivation

What is a Retrieval Pipeline?

When a user asks a question in a RAG (Retrieval-Augmented Generation) application, the simplest approach is:

  1. Embed the question as a vector
  2. Find the closest document chunks in a vector store
  3. Pass those chunks to an LLM as context

This works for simple cases, but falls apart quickly:

  • Ambiguous queries — "How do I configure it?" matches poorly because embeddings don't know what "it" refers to
  • Vocabulary mismatch — The user says "set up auth" but the docs say "configure identity providers" — semantically similar but far apart in vector space
  • Noise in results — The top-5 nearest vectors often include irrelevant chunks that happen to share keywords
  • No quality signal — There's no way to know if the retrieved chunks actually answer the question before passing them to the LLM

A retrieval pipeline solves these by adding processing stages around the vector search:

[User Query] → [Query Processors] → [Vector Search] → [Result Processors] → [Final Results]
               ↑ expand, rewrite,    ↑ the actual      ↑ rerank, filter,
                 or augment the        database           validate relevance
                 query before          lookup
                 searching

Query processors (pre-search) transform the query to improve search quality. Result processors (post-search) refine what comes back. The pipeline orchestrates both around any vector store.

Why Abstractions?

Today, .NET developers implementing these patterns must:

  1. Write custom orchestration code for each retrieval strategy
  2. Tightly couple their logic to a specific vector store (Azure AI Search, Qdrant, Pinecone, etc.)
  3. Re-implement common patterns from research papers with no shared building blocks

The existing Microsoft.Extensions.DataIngestion packages solved the write-side (document → chunks → vector store). Microsoft.Extensions.DataRetrieval solves the read-side (query → search → process → results) with the same philosophy: thin abstractions that enable a rich ecosystem.

Symmetry with DataIngestion

Concern Write-Side Read-Side
Package Microsoft.Extensions.DataIngestion Microsoft.Extensions.DataRetrieval
Pipeline IngestionPipeline<T> RetrievalPipeline
Pre-step processors IngestionDocumentProcessor RetrievalQueryProcessor
Post-step processors IngestionChunkProcessor<T> RetrievalResultProcessor
Primary method ProcessAsync ProcessAsync
Data flows Documents → Chunks → Vector Store Query → Search → Ranked Results

Developers familiar with one side immediately understand the other.

Proposed API Surface

Package: Microsoft.Extensions.DataRetrieval.Abstractions

namespace Microsoft.Extensions.DataRetrieval;

// Core data types
public sealed class RetrievalQuery
{
    public RetrievalQuery(string text);
    public string Text { get; }
    public IList<string> Variants { get; set; }
    public IDictionary<string, object?> Metadata { get; }
}

public sealed class RetrievalChunk
{
    public RetrievalChunk(string content, double score);
    public string Content { get; }
    public double Score { get; set; }
    public IDictionary<string, object?> Record { get; }
}

public sealed class RetrievalResults
{
    public IList<RetrievalChunk> Chunks { get; set; }
    public IDictionary<string, object?> Metadata { get; }
}

// Processor abstractions
public abstract class RetrievalQueryProcessor
{
    public abstract Task<RetrievalQuery> ProcessAsync(
        RetrievalQuery query, CancellationToken cancellationToken = default);
}

public abstract class RetrievalResultProcessor
{
    public abstract Task<RetrievalResults> ProcessAsync(
        RetrievalResults results, RetrievalQuery query,
        CancellationToken cancellationToken = default);
}

// Re-ranking interface
public interface IReranker
{
    Task<IReadOnlyList<RetrievalChunk>> RerankAsync(
        string query, IReadOnlyList<RetrievalChunk> chunks,
        CancellationToken cancellationToken = default);
}

// Retrieval interface for DI and testability
public interface IRetriever
{
    Task<RetrievalResults> RetrieveAsync(
        string query, int topK = 5,
        CancellationToken cancellationToken = default);
}

Package: Microsoft.Extensions.DataRetrieval

namespace Microsoft.Extensions.DataRetrieval;

public sealed class RetrievalPipeline : IDisposable
{
    public RetrievalPipeline(
        RetrievalPipelineOptions? options = null,
        ILoggerFactory? loggerFactory = null);

    public IList<RetrievalQueryProcessor> QueryProcessors { get; }
    public IList<RetrievalResultProcessor> ResultProcessors { get; }

    public Task<RetrievalResults> ProcessAsync<TKey, TRecord>(
        VectorStoreCollection<TKey, TRecord> collection,
        string query,
        int topK = 5,
        Func<TRecord, string>? contentSelector = null,
        CancellationToken cancellationToken = default)
        where TKey : notnull
        where TRecord : class;
}

public sealed class RetrievalPipelineOptions
{
    public string ActivitySourceName { get; set; }
}

public sealed class VectorStoreRetriever<TKey, TRecord> : IRetriever
    where TKey : notnull
    where TRecord : class
{
    public VectorStoreRetriever(
        RetrievalPipeline pipeline,
        VectorStoreCollection<TKey, TRecord> collection,
        Func<TRecord, string>? contentSelector = null);
}

// Extension method for discoverability
public static class RetrievalPipelineExtensions
{
    public static IRetriever AsRetriever<TKey, TRecord>(
        this RetrievalPipeline pipeline,
        VectorStoreCollection<TKey, TRecord> collection,
        Func<TRecord, string>? contentSelector = null)
        where TKey : notnull
        where TRecord : class;
}

Design Principles

  1. Symmetry with DataIngestion. Query processors mirror chunk processors; RetrievalPipeline mirrors IngestionPipeline<T>. Developers familiar with one immediately understand the other.

  2. Composable pipelines. Zero processors = raw vector search. Add one processor = single enhancement. Stack many = advanced multi-stage retrieval. No dead weight.

  3. Vector store agnostic. ProcessAsync accepts any VectorStoreCollection<TKey, TRecord> (from Microsoft.Extensions.VectorData). Works with Azure AI Search, Qdrant, Pinecone, in-memory, or any MEVD provider.

  4. Observable. Built-in ActivitySource + ILogger support. Each processor invocation is traced with structured log entries.

Usage Example

var pipeline = new RetrievalPipeline(loggerFactory: loggerFactory);
pipeline.QueryProcessors.Add(new MultiQueryExpander(chatClient));
pipeline.ResultProcessors.Add(new LlmReranker(chatClient));

var results = await pipeline.ProcessAsync(
    collection,
    "What are the retention policies?",
    topK: 10,
    contentSelector: record => record.Content);

Relationship to Existing Packages

Package Role
Microsoft.Extensions.AI LLM client abstractions (IChatClient, IEmbeddingGenerator)
Microsoft.Extensions.VectorData Vector store abstractions (VectorStoreCollection<TKey, TRecord>)
Microsoft.Extensions.DataIngestion Write-side pipeline (document → chunks → vector store)
Microsoft.Extensions.DataRetrieval Read-side pipeline (query → search → process → results)

Together these four packages provide a complete, composable RAG stack without coupling to any specific provider.

Implementation

A reference implementation exists on the feature/retrieval-abstractions branch of this repo's fork, including:

  • Both packages with full XML documentation
  • OpenTelemetry tracing integration
  • Reciprocal Rank Fusion for multi-query deduplication
  • Tree-traversal search paradigm for hierarchical indices
  • README documentation for each package

Design Decisions (per Framework Design Guidelines audit)

Decision Rationale
Abstract classes for processors (not interfaces) Allows non-breaking additions of new virtual members in future versions. Follows Framework Design (FDG) Guidelines "DO prefer classes over interfaces."
IReranker as interface (not abstract class) Single-method stateless contract. Types may implement both RetrievalResultProcessor AND IReranker (adapter pattern). Single inheritance prevents this with two abstract classes.
Sealed data types (RetrievalQuery, RetrievalChunk, RetrievalResults) Leaf DTOs not intended for inheritance. Prevents fragile base class problems.
Sealed RetrievalPipeline Owns ActivitySource lifetime — subclassing would create resource management issues.
IList<T> for mutable collections Pipeline processors need to add/remove/reorder entries. IList<T> (not List<T>) follows FDG.
IDictionary<string, object?> for metadata Established pattern (HttpContext.Items, Activity.Tags). Allows extensibility without breaking changes.
CancellationToken last with = default Standard .NET async method convention.
Abstractions package has zero non-polyfill dependencies Consumers can reference abstractions without pulling heavy transitive deps.
IRetriever as interface in Abstractions Data-source-agnostic retrieval contract. Enables DI, testability, and future implementations (web search, SQL, hybrid) without coupling to vector stores.
RetrievalPipeline does NOT implement IRetriever Pipeline is a reusable processing engine that works with any collection per-call. VectorStoreRetriever adapts pipeline + collection → IRetriever for single-endpoint DI scenarios.
ProcessAsync (pipeline) vs RetrieveAsync (IRetriever) Establishes clear vocabulary: pipelines process queries through stages, retrievers retrieve results. Symmetric with IngestionPipeline.ProcessAsync — both pipelines use the same verb for their primary method.

Open Questions

  1. Should a fluent RetrievalPipelineBuilder be included in the core package, or ship separately?

Design Note: IRetriever and VectorStoreRetriever

RetrievalPipeline intentionally does NOT implement IRetriever. The pipeline is a processing engine — it defines what transformations occur (query expansion, reranking, validation) but requires a vector store collection per-call. This enables one pipeline to serve multiple collections:

// Same pipeline, different indices
var policyResults = await pipeline.ProcessAsync(policyCollection, query);
var faqResults = await pipeline.ProcessAsync(faqCollection, query);

VectorStoreRetriever<TKey, TRecord> bridges this gap. It captures a pipeline + collection + content selector, then exposes the simple IRetriever contract. This is the recommended DI registration pattern:

services.AddSingleton<IRetriever>(sp =>
    new VectorStoreRetriever<string, Article>(
        sp.GetRequiredService<RetrievalPipeline>(),
        sp.GetRequiredService<VectorStoreCollection<string, Article>>(),
        record => record.Content));

For discoverability, RetrievalPipeline also offers an AsRetriever() extension method:

// Direct usage — no DI needed
IRetriever retriever = pipeline.AsRetriever(collection, record => record.Content);
var results = await retriever.RetrieveAsync("What are the retention policies?");

The IRetriever abstraction is intentionally data-source-agnostic.Future implementations may include:

Implementation Data Source
VectorStoreRetriever<TKey, TRecord> Any MEVD vector store provider
WebSearchRetriever Bing, Google, or other search APIs
DatabaseRetriever SQL/NoSQL query-based retrieval
HybridRetriever Combines multiple IRetriever instances
CachingRetriever Wraps another IRetriever with response caching

This enables consumers to program against IRetriever regardless of backend, and swap implementations without changing calling code.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions