[Feature]: On-Disk Index

### Problem / Motivation

Current zvec index mainly rely on in-mem structure to achieve low-latency nearest neighbor search. While effective for moderate-sized datasets that fit entirely in RAM,  in-mem index becomes impractical as collections grow to large scale. 

Moreover, many real-world use cases involve infrequently accessed long-tail vectors where keeping all data in memory is wasteful. A disk-based indexing solution would enable cost-effective scaling by leveraging disk storage while maintaining acceptable query latency.

### Proposed Solution

```python
An on-disk based index will be introduced into Zvec with the following key components:

1. On-Disk Vector Storage:
Raw vector data (in FP32 or FP16 format) will be stored persistently on disk. Only compressed representations (e.g., quantized centroids, graph links, or PQ codes) and metadata will be kept in memory. During search, relevant raw vectors are fetched from disk only when needed for final distance re-ranking.

2. Support for Mainstream Similarity Metrics:
The on-disk index will natively support common similarity functions including:
 2.1. Cosine similarity
 2.2. Inner product (dot product)
 2.3. Euclidean (L2) distance
 Distance computations will be performed accurately using the original (uncompressed) vectors retrieved from disk during the refinement stage.
3. FP32 and FP16 Data Type Support:
Users can store vectors in either 32-bit or 16-bit floating point formats on disk. The system will handle type conversion and alignment transparently, enabling memory and I/O efficiency (especially with FP16) without sacrificing compatibility.
```

### Alternatives Considered

_No response_

### Affected Area

{"label" => "C++ Core (storage, indexing)"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: On-Disk Index #325

Problem / Motivation

Proposed Solution

Alternatives Considered

Affected Area

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: On-Disk Index #325

Description

Problem / Motivation

Proposed Solution

Alternatives Considered

Affected Area

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions