GitHub - Guzpenha/slice_based_learning: Code for the SCAI'20 paper "Slice-Aware Neural Ranking"

Slice-Aware Neural Ranking

This repo has the source code for the SCAI'20 paper 'Slice-Aware Neural Ranking'. It has two base libraries forked in the project to do so Huggingface transformers (for fine-tunning BERT) and snorkel (for the SRAMs). The ir_slices folder contains the source code for the slicing functions specific to the conversational tasks and the retrieval domain. The model is a simple adaptation of the Slice-based learning (https://arxiv.org/pdf/1909.06349.pdf) for ranking models using BERT as the backbone.

On the paper we focus on finding slices of data for which neural ranking models might be ineffective. To do so we use slicing functions (SFs) that are functions that return a boolean indicating if an instance belongs to that slice or not. See two examples below:

The SFs are implemented on ir_slices/ir_slices/slice_functions.py. In order to run the experiments, first do the following to install dependencies:

#Create a virtual env
python3 -m venv env
source env/bin/activate    

#Install the requirements
pip install -r requirements.txt
cd snorkel/snorkel
pip install -e .
cd ../../transformers
pip install -e .
cd ../ir_slices
pip install -e .

Now in order to run the model:

python ./transformers/examples/run_glue.py \
     --model_type $MODEL \
     --model_name_or_path bert-base-uncased \
     --task_name $TASK \
     --do_train \
     --do_eval \
     --do_lower_case \
     --data_dir $DATA_DIR/$TASK \
     --max_seq_length $MAX_SEQ_LENGTH \
     --per_gpu_eval_batch_size=$BATCH_SIZE  \
     --per_gpu_train_batch_size=$BATCH_SIZE  \
     --learning_rate 2e-5 \
     --num_train_epochs $NUM_EPOCHS \
     --output_dir $DATA_DIR/${TASK}_output \
     --overwrite_output_dir \
     --seed $RANDOM_SEED \
     --save_steps 10000000 \
     --save_model \
     --evaluate_on 'test'

Where MODEL can be ['bert', 'bert-slice-aware' or 'bert-slice-aware-random-slices']. The datasets have to be downloaded and added to DATA_DIR/TASK first. Download the datsets from their respective repositories: https://guzpenha.github.io/MANtIS/ ; https://ciir.cs.umass.edu/downloads/msdialog/ ; https://ciir.cs.umass.edu/downloads/Antique/ and https://www.kaggle.com/c/quora-question-pairs.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
ir_slices		ir_slices
snorkel/snorkel		snorkel/snorkel
transformers		transformers
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
slice_aware_neural_ranking.PNG		slice_aware_neural_ranking.PNG
slicing_functions.PNG		slicing_functions.PNG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Slice-Aware Neural Ranking

About

Releases

Packages

Languages

Guzpenha/slice_based_learning

Folders and files

Latest commit

History

Repository files navigation

Slice-Aware Neural Ranking

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages