# Tutorial on `opr.inference.index` subpackage

This short tutorial shows how to use `opr.inference.index` with a tiny, synthetic example.

What you'll learn:
- On-disk layout expected by the index: `descriptors.npy`, `meta.parquet` (must include `idx` and `pose[7]`; optional `pointcloud_path` with relative paths like `scans/000227.pcd` or `scans/000227.bin`), `schema.json`.
- How to load a FAISS Flat index (`FaissFlatIndex.load(...)`).
- How to run a top-k search and map results to dataset indices, poses, and pointcloud paths.

Requirements:
- `faiss` (faiss-cpu or faiss-gpu)
- A Parquet engine for pandas: `pyarrow` (recommended) or `fastparquet`.


In [1]:
# Check environment
import importlib, sys

missing = []
for pkg in ("faiss", "pandas", "numpy", "pyarrow"):
    if importlib.util.find_spec(pkg) is None:
        missing.append(pkg)

if missing:
    print("Missing packages:", ", ".join(missing))
    print("Please install them, e.g.: pip install faiss-cpu pyarrow pandas numpy")
else:
    print("Environment looks good.")


Environment looks good.


In [2]:
# Build a tiny on-disk example index
from pathlib import Path
import json
import numpy as np
import pandas as pd

base = Path("./_demo_index").resolve()
base.mkdir(parents=True, exist_ok=True)

N, D = 8, 4
rng = np.random.default_rng(0)
descriptors = rng.normal(size=(N, D)).astype(np.float32)
poses = [[float(i), float(i+1), float(i+2), 0.0, 0.0, 0.0, 1.0] for i in range(N)]
# Optional pointcloud paths: mix .pcd/.bin and NaN
pc_paths = [
    "scans/%06d.pcd" % i if i % 3 == 0 else ("scans/%06d.bin" % i if i % 3 == 1 else np.nan)
    for i in range(N)
]
meta = pd.DataFrame({
    "idx": np.arange(100, 100+N, dtype=np.int64),
    "pose": poses,
    "pointcloud_path": pc_paths,
})

np.save(base / "descriptors.npy", descriptors)
meta.to_parquet(base / "meta.parquet")
schema = {"version": "1", "dim": D, "metric": "l2", "created_at": "", "opr_version": ""}
(base / "schema.json").write_text(json.dumps(schema))

print(f"Wrote: {[p.name for p in base.iterdir()]}")


Wrote: ['meta.parquet', 'descriptors.npy', 'schema.json']


In [3]:
# Load index and run a top-k search
from opr.inference.index import FaissFlatIndex

index = FaissFlatIndex.load(base)
print(f"Index size: {index.size()}, dim: {index.dim()} metric: {index.metric()}")

# Query: take first two descriptors with a small offset
queries = descriptors[:2] + 0.01
k = 3
inds, dists = index.search(queries, k)

print(f"inds shape: {inds.shape}, dists shape: {dists.shape}")
print(f"Top-k indices for first query: {inds[0].tolist()}")
print(f"Distances for first query: {dists[0].tolist()}")

# Map to dataset idx, poses, and pointcloud paths for first query's candidates
db_idx, db_pose, db_pc = index.get_meta(inds[0])
print(f"DB idx: {db_idx.tolist()}")
print(f"DB poses (first 2): {db_pose[:2]}")
print(f"DB pointcloud paths: {db_pc.tolist()}")


Index size: 8, dim: 4 metric: l2
inds shape: (2, 3), dists shape: (2, 3)
Top-k indices for first query: [0, 4, 1]
Distances for first query: [0.0003999998443759978, 1.4175851345062256, 1.8044313192367554]
DB idx: [100, 104, 101]
DB poses (first 2): [[0. 1. 2. 0. 0. 0. 1.]
 [4. 5. 6. 0. 0. 0. 1.]]
DB pointcloud paths: ['scans/000000.pcd', 'scans/000004.bin', 'scans/000001.bin']


Notes:
- FAISS returns raw distances (L2 for this example); smaller is better.
- `get_meta` maps internal row positions to your dataset ids and poses.
- For large datasets, `descriptors.npy` can be memory-mapped automatically to reduce RAM usage.
- We do not persist FAISS index files; the index is built at load time.
