Skip to content

FanEDG/MSTG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MSTG: Labeled Multi-Segment Tree Index For Range-Range Filtering ANN search

This is the official implementation of the paper Generalized Range Filtered Approximate Nearest Neighbor Search: Containment and Overlap

Range-Range Filters

There are four atomic range filter conditions between the object range $[l_i, r_i]$ and the query range $[l_q, r_q]$, as shown in the figure below:

Range Filter Types

The four atomic conditions are defined as:

  1. $l_i \leq l_q \leq r_i \leq r_q$
  2. $l_i \leq l_q \leq r_q \leq r_i$
  3. $l_q \leq l_i \leq r_q \leq r_i$
  4. $l_q \leq l_i \leq r_i \leq r_q$

Any range-range filter can be expressed as a combination (OR) of these atomic cases.

Build Instructions

Prerequisites

  • GCC 11+
  • CMake 3.22+

Build

cd MSTG/
bash build.sh

Or manual build:

cd MSTG/
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j

This will generate four executables in /build/test/:

  • build_intersection – Builds the RRANN index supporting composite range-range filters of the form ① ∨ ② ∨ ③ ∨ ④

  • search_intersection – Performs search on the RRANN index using the same composite range-range filter ① ∨ ② ∨ ③ ∨ ④

  • build_range – Builds the RFANN index for arbitrary range filters

  • search_range – Performs search on the RFANN index

Range Generation

You can generate range files for RRANN and RFANN using the following commands:

  • Base range for RFANN:

    python3 utils/gen_base_rfann.py <output_path> <num_points>

    Example:

    python3 utils/gen_base_rfann.py rfann.range 1000000
  • Base range for RRANN:

    python3 utils/gen_base_rrann.py <output_path> <num_points> <categories>

    Example:

    python3 utils/gen_base_rrann.py rrann.range 1000000 10000
  • Query range generation:

    ./build/gen_query_range <base_range_path> <categories>

Notes

  • <output_path>: Path to the output .range file.
  • <num_points>: Number of intervals to generate.
  • <categories>: The domain size (e.g., 10000 means range values are in [0, 9999]).
  • <base_range_path>: Path to an existing base .range file used as input to generate queries.

Index Construction

To build the RRANN index with composite range-range filters (① ∨ ② ∨ ③ ∨ ④):

./build_intersection \
  --data_path path/to/data.fbin \
  --data_range_path path/to/data.range \
  --index_path path/to/output.index \
  --M 32 \
  --ef_construction 200 \
  --threads 16

To build the RFANN index for arbitrary range filters:

./build_range \
  --data_path path/to/data.fbin \
  --data_range_path path/to/data.range \
  --index_path path/to/output.index \
  --M 16 \
  --ef_construction 200 \
  --threads 16

Parameters

Argument Description
--data_path Path to input vector file (.fbin format)
--data_range_path Path to scalar attribute range file (.range)
--index_path Path to save the output index
--M Max number of neighbors per node
--ef_construction Graph building ef parameter
--threads Number of construction threads

Search

To search with the RRANN index (supports composite range-range filters ① ∨ ② ∨ ③ ∨ ④):

./search_intersection \
  --data_path path/to/data.fbin \
  --query_path path/to/query.fbin \
  --query_range_path path/to/query_ranges \
  --groundtruth_path path/to/groundtruth \
  --index_file path/to/output.index \
  --result_path path/to/results/result.csv \
  --base_range_path path/to/base_ranges \
  --M 32

To search with the RFANN index (supports arbitrary range filters):

./search_range \
  --data_path path/to/data.fbin \
  --query_path path/to/query.fbin \
  --query_range_path path/to/query_ranges \
  --groundtruth_path path/to/groundtruth \
  --index_file path/to/output.index \
  --result_path path/to/results/result.csv \
  --base_range_path path/to/base_ranges \
  --M 16

Parameters

Argument Description
--data_path Path to base data vectors (.fbin)
--query_path Path to query vectors (.fbin)
--query_range_path Path to the query range file
--groundtruth_path Path to ground truth result file
--index_file Path to the index file generated during index construction
--result_path Path to save the final search result
--base_range_path Path to base vector range file
--M Number of neighbors per node; must match index build config

Data Format

.fbin (Float Binary Format)

Binary format for vectors:

[int32]   points_num
[int32]   dimension (D)
[float]   D values of vector 1
[float]   D values of vector 2
...

.range Format

Each line contains a scalar range for one vector:

[range_start] [range_end]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages