HyPE enhances explainability by generating hierarchical category paths step-by-step before decoding docid.
It employs hierarchical category paths as explanation, progressing from broad to specific semantic categories.
You can explore HyPE’s explainable retrieval on the demo page.
The dataset directory contains two main datasets: NQ320K and MSMARCO, along with a backbone category hierarchy.
Each dataset directory includes path-augmented datasets that have been linked with linearized hierarchical category paths derived from the backbone category hierarchy.
In each dataset's directory, the backbone_file directory contains path-augmented datasets for:
- Original training set
- Query generation (QG) set
- Indexing training set (documents)
Final path-augmented datasets can be generated within this directory for various docid types, such as atomic_docid and keyword_docid.
This structure allows for flexible dataset creation and manipulation based on different docid types and hierarchical paths.
With HyPE/src/ours/execute_shell/baseline_shell.sh, the model can be trained with the following command:
cd HyPE/src/ours/execute_shell
bash baseline_shell.sh
We can modify the baseline_shell.sh file to change the dataset, model, and other hyperparameters.
transformers == 4.35.2
sentence-transformers==2.5.1
marisa-trie==1.2.0
torch==2.0.1
pytorch_lightning==2.1.0
