VecLink

Cross-embedding vector correspondence via iterative geometric embedding hashing. Given two sets of embeddings (from different models) over the same data that are partially overlapping, VecLink identifies which vectors correspond to the same underlying entities using only a small set of seed anchors.

Installation

Requires Python 3.9–3.10.

# Install dependencies
uv sync

# PyTorch Geometric extensions (must be installed separately)
uv pip install torch-cluster torch-scatter torch-sparse torch-spline-conv \
  -f https://data.pyg.org/whl/torch-2.1.0+cu118.html

Quick Start

uv run veclink.py \
  --dataset scifact \
  --emb1 mistral \
  --emb2 openai \
  --overlap_ratio 0.3 \
  --n_seeds 15 \
  --seed 42 \
  --use_bernoulli_trials

Embeddings

Place embedding files in the embeddings/ directory as NumPy .npy files, named as:

corpus_embeddings_{model}_{dataset}.npy

For example: corpus_embeddings_mistral_scifact.npy, corpus_embeddings_openai_scifact.npy.

Key Arguments

Argument	Default	Description
`--dataset`	`scifact`	Dataset name
`--emb1` / `--emb2`	`mistral` / `openai`	Embedding model names
`--overlap_ratio`	`0.3`	Fraction of data shared between the two sets
`--n_seeds`	`None`	Number of seed anchor pairs
`--use_bernoulli_trials`	`True`	Beta–Bernoulli posterior ensemble selection (paper default; pass `False` for the raw-vote baseline)
`--max_iter`	`100`	Maximum refinement iterations
`--seed`	`None`	Random seed for reproducibility
`--use_gpu`	`True`	Enable GPU acceleration

Supported Datasets

BEIR benchmarks: scifact, scidocs, fiqa, nfcorpus, arguana.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
graph_utils		graph_utils
utils		utils
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
run_ablation.py		run_ablation.py
test_load_data.py		test_load_data.py
uv.lock		uv.lock
veclink.py		veclink.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VecLink

Installation

Quick Start

Embeddings

Key Arguments

Supported Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VecLink

Installation

Quick Start

Embeddings

Key Arguments

Supported Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages