Association ≠ Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval

Jason Dury — Eridos AI, Perth, Australia

Dense retrieval ranks passages by embedding similarity to a query, but multi-hop questions require passages that are associatively related through shared reasoning chains rather than semantically similar. Association-Augmented Retrieval (AAR) trains a lightweight MLP (4.2M parameters) to learn passage-to-passage associations from co-occurrence annotations using CLIP-style contrastive learning, then reranks dense retrieval results via bi-directional association scoring. AAR is transductive by design: it learns associations over the target corpus, mirroring how RAG systems are deployed in practice. On HotpotQA, AAR improves Recall@5 by +8.6 points without evaluation-set tuning, with a +28.5 point gain on the hardest questions. On MuSiQue it achieves +10.1 points. An inductive variant shows no significant improvement, consistent with corpus-specific co-occurrence learning. The method trains in under two minutes on a single GPU, adds 3.7ms per query, and requires no LLM-based indexing.

Paper: arXiv (forthcoming) | PAM framework: Zenodo

Key Results

Setting	Dataset	R@5	Delta R@5	95% CI
Dense baseline	HotpotQA	0.831	—	—
AAR transductive	HotpotQA	0.916	+8.6	[+8.1, +9.0]
AAR inductive	HotpotQA	0.832	+0.1	[-0.3, +0.5]
Dense baseline	MuSiQue	0.387	—	—
AAR transductive	MuSiQue	0.488	+10.1	—

Requirements

Python 3.10+
PyTorch 2.0+ (CUDA recommended)
FAISS (faiss-gpu or faiss-cpu)
sentence-transformers (for BGE-large-en-v1.5 embeddings)
datasets (HuggingFace)

pip install torch faiss-gpu sentence-transformers datasets numpy

Quick Start

1. Prepare data

# Download HotpotQA and MuSiQue, embed passages, build FAISS index
python -c "from src.utils import prepare_data; prepare_data()"

This downloads the datasets, extracts ~66K unique passages from HotpotQA, embeds them with BGE-large-en-v1.5, and builds a FAISS index. Takes ~15 minutes (mostly embedding).

2. Train (transductive)

python -m src.train

Trains on combined train+validation association pairs (~20,742 pairs). Completes in ~2 minutes on an RTX 4080 Super. Saves to models/association_mlp.pt.

3. Evaluate

python -m src.evaluate --model models/association_mlp.pt --alpha 0.50
python -m src.evaluate --model models/association_mlp.pt --alpha-sweep

4. Train inductive variant

python -m src.train_true_inductive

Trains on training-split pairs only (~8,758), evaluates on validation set.

Repository Structure

AAR/
├── README.md
├── LICENSE
├── paper/
│   └── aar_paper_submission.md       # Full paper (markdown)
├── src/
│   ├── model.py                      # AssociationMLP architecture
│   ├── train.py                      # Main training script (transductive)
│   ├── train_true_inductive.py       # Inductive training script
│   ├── evaluate.py                   # Evaluation / retrieval pipeline
│   └── utils.py                      # Data loading, metrics, retrieval
├── results/
│   ├── retrieval_matched_hp.csv      # Main results (Table 2)
│   ├── true_inductive_evaluation.csv # Inductive evaluation (Table 4)
│   ├── scoring_ablation.csv          # Scoring method ablation (Table 1)
│   ├── bm25_baseline.csv             # BM25 comparison (Table 7)
│   ├── qa_sanity_check.csv           # Downstream QA (Table 8)
│   ├── answer_coverage_matched_hp.csv# Answer coverage (Table 9)
│   ├── candidate_pool_sensitivity.csv# FAISS expansion depth (Table B1)
│   ├── latency_breakdown.csv         # Latency (Table 10)
│   ├── iteration_log.csv             # Development iteration log
│   └── bootstrap_ci.csv             # Bootstrap confidence intervals
├── data/
│   └── README.md                     # Data acquisition instructions
└── models/
    └── README.md                     # Model reproduction instructions

Method Overview

Candidate retrieval: FAISS top-100 by cosine similarity
Association reranking: For each candidate, compute blended score:

score(q, p) = (1 - lambda) * cos(q, p) + lambda * a(q, p)

where a(q, p) = 0.5 * [f(q) . p + f(p) . q] is the bi-directional association score and f is the trained MLP.
Return top-k by blended score.

Citation

@misc{dury2026aar,
  author       = {Dury, Jason},
  title        = {Association $\neq$ Similarity: Learning Corpus-Specific
                  Associations for Multi-Hop Retrieval},
  year         = {2026},
  note         = {arXiv preprint (forthcoming)}
}

@misc{dury2026pam,
  author       = {Dury, Jason},
  title        = {Predictive Associative Memory: Unified Retrieval, Imagination,
                  and Creative Recombination Through Predictive Traversal of
                  Meaning Space},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18595537}
}

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Association ≠ Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval

Key Results

Requirements

Quick Start

1. Prepare data

2. Train (transductive)

3. Evaluate

4. Train inductive variant

Repository Structure

Method Overview

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
models		models
paper		paper
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Association ≠ Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval

Key Results

Requirements

Quick Start

1. Prepare data

2. Train (transductive)

3. Evaluate

4. Train inductive variant

Repository Structure

Method Overview

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages