# Build a LanceDB VDB Operator

This tutorial mirrors the VDB operator walkthrough but uses LanceDB instead of Milvus/OpenSearch. You will ingest NV-Ingest extraction results into a local LanceDB table and run vector search against it.

**Important:** NVIDIA makes no claim about accuracy, performance, or functionality of any vector database except Milvus. If you use a different vector database, it's your responsibility to test and maintain it.

## Prerequisites

- NV-Ingest microservices running (see `docker-compose.yaml` and repo quickstart).
- NV-Ingest Python client installed.
- LanceDB and PyArrow available in your environment.

If you are running this from the repo, ensure the services are up before continuing.

In [None]:
# Optional: install LanceDB dependencies if needed
# %pip install -qU lancedb pyarrow

## VDB Interface Overview

NV-Ingest defines a lightweight `VDB` abstract class at `client/src/nv_ingest_client/util/vdb/adt_vdb.py`. The LanceDB operator in `client/src/nv_ingest_client/util/vdb/lancedb.py` implements this interface with methods for:

- `create_index(...)`: create the LanceDB table
- `write_to_index(...)`: build the vector index
- `run(...)`: orchestration entry point for ingestion

For retrieval, use the standalone `lancedb_retrieval` function which provides search with optional reranking support.

## Step 1: Ingest NV-Ingest Results into LanceDB

We instantiate the `LanceDB` operator, point it at a local database path, and then use the NV-Ingest `Ingestor` to extract, embed, and upload results into LanceDB.

In [None]:
from nv_ingest_client.client import Ingestor
from nv_ingest_client.util.vdb.lancedb import LanceDB

lancedb_uri = "./lancedb"
table_name = "nv-ingest"

vdb = LanceDB(
    uri=lancedb_uri,
    table_name=table_name,
    overwrite=True,
)

ingestor = (
    Ingestor(message_client_hostname="localhost")
    .files("../data/multimodal_test.pdf")
    .extract(
        extract_text=True,
        extract_tables=True,
        extract_images=False,
    )
    .embed()
    .vdb_upload(vdb_op=vdb)
)

results = ingestor.ingest()
results

The LanceDB database is created at `./lancedb`, and the table name is `nv-ingest`. If you rerun the cell with `overwrite=True`, it will recreate the table.

## Step 2: Search the LanceDB Table

Use the `retrieval` function to embed queries and run vector search against the table.

In [None]:
from nv_ingest_client.util.vdb.lancedb import retrieval

queries = [
    "What is shown in the charts?",
    "Summarize the table contents.",
]

search_results = retrieval(
    queries,
    table_path=lancedb_uri,
    table_name=table_name,
    embedding_endpoint="http://localhost:8012/v1",
    model_name="nvidia/llama-3.2-nv-embedqa-1b-v2",
    top_k=5,
)

search_results[0][:3]