This is the official implementation of the paper Generalized Range Filtered Approximate Nearest Neighbor Search: Containment and Overlap
There are four atomic range filter conditions between the object range
The four atomic conditions are defined as:
$l_i \leq l_q \leq r_i \leq r_q$ $l_i \leq l_q \leq r_q \leq r_i$ $l_q \leq l_i \leq r_q \leq r_i$ $l_q \leq l_i \leq r_i \leq r_q$
Any range-range filter can be expressed as a combination (OR) of these atomic cases.
- GCC 11+
- CMake 3.22+
cd MSTG/
bash build.shOr manual build:
cd MSTG/
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -jThis will generate four executables in /build/test/:
-
build_intersection– Builds the RRANN index supporting composite range-range filters of the form ① ∨ ② ∨ ③ ∨ ④ -
search_intersection– Performs search on the RRANN index using the same composite range-range filter ① ∨ ② ∨ ③ ∨ ④ -
build_range– Builds the RFANN index for arbitrary range filters -
search_range– Performs search on the RFANN index
You can generate range files for RRANN and RFANN using the following commands:
-
Base range for RFANN:
python3 utils/gen_base_rfann.py <output_path> <num_points>
Example:
python3 utils/gen_base_rfann.py rfann.range 1000000
-
Base range for RRANN:
python3 utils/gen_base_rrann.py <output_path> <num_points> <categories>
Example:
python3 utils/gen_base_rrann.py rrann.range 1000000 10000
-
Query range generation:
./build/gen_query_range <base_range_path> <categories>
<output_path>: Path to the output.rangefile.<num_points>: Number of intervals to generate.<categories>: The domain size (e.g., 10000 means range values are in [0, 9999]).<base_range_path>: Path to an existing base.rangefile used as input to generate queries.
To build the RRANN index with composite range-range filters (① ∨ ② ∨ ③ ∨ ④):
./build_intersection \
--data_path path/to/data.fbin \
--data_range_path path/to/data.range \
--index_path path/to/output.index \
--M 32 \
--ef_construction 200 \
--threads 16To build the RFANN index for arbitrary range filters:
./build_range \
--data_path path/to/data.fbin \
--data_range_path path/to/data.range \
--index_path path/to/output.index \
--M 16 \
--ef_construction 200 \
--threads 16| Argument | Description |
|---|---|
--data_path |
Path to input vector file (.fbin format) |
--data_range_path |
Path to scalar attribute range file (.range) |
--index_path |
Path to save the output index |
--M |
Max number of neighbors per node |
--ef_construction |
Graph building ef parameter |
--threads |
Number of construction threads |
To search with the RRANN index (supports composite range-range filters ① ∨ ② ∨ ③ ∨ ④):
./search_intersection \
--data_path path/to/data.fbin \
--query_path path/to/query.fbin \
--query_range_path path/to/query_ranges \
--groundtruth_path path/to/groundtruth \
--index_file path/to/output.index \
--result_path path/to/results/result.csv \
--base_range_path path/to/base_ranges \
--M 32To search with the RFANN index (supports arbitrary range filters):
./search_range \
--data_path path/to/data.fbin \
--query_path path/to/query.fbin \
--query_range_path path/to/query_ranges \
--groundtruth_path path/to/groundtruth \
--index_file path/to/output.index \
--result_path path/to/results/result.csv \
--base_range_path path/to/base_ranges \
--M 16| Argument | Description |
|---|---|
--data_path |
Path to base data vectors (.fbin) |
--query_path |
Path to query vectors (.fbin) |
--query_range_path |
Path to the query range file |
--groundtruth_path |
Path to ground truth result file |
--index_file |
Path to the index file generated during index construction |
--result_path |
Path to save the final search result |
--base_range_path |
Path to base vector range file |
--M |
Number of neighbors per node; must match index build config |
Binary format for vectors:
[int32] points_num
[int32] dimension (D)
[float] D values of vector 1
[float] D values of vector 2
...
Each line contains a scalar range for one vector:
[range_start] [range_end]
