RMC - Recommending Missed Citations Identified by Reviewers

This repository contains datasets and code for our COLING 2024 paper: "Recommending Missed Citations Identified by Reviewers: A New Task, Dataset and Baselines".

The CitationR Dataset

CitationR is created by extracting recommended citations in reviews from NeurIPS and ICLR. In total, we collect 76,143 official reviews and 21,598 submissions, among which around 35% of submissions are identified as lacking citations. Moreover, to better replicate the actual situation in which researchers search for papers to cite, we establish a larger and more challenging version of CitationR. This version includes additional 40,810 papers published in top venues that reviewers frequently recommend citations from.

Download the base version from the following link: Baidu Drive (access code: q2vy)

Download the extended version from the following link: Baidu Drive (access code: gjwu)

Setup

Data Preparation

Unzip downloaded data and put them under data/ with names of base and extended respectively.

Environment Setup

conda create -n rmc python==3.7.12
source activate rmc
pip install -r requirements.txt

Baselines

BM25

Evaluation

python ./src/bm25.py

Citeomatic

Follow official introduction in citeomatic

HTransformer

Follow official introduction in Local-Citation-Recommendation

Pre-trained LM (BERT, SciBERT, SPECTER, CiteBERT, LinkBERT, SciNCL ...)

Evaluation of LMs not fine-tuned on CitationR

python ./src/evaluating/direct_plm_evaluating.py --dataset_name base --model_name scincl --way_ref concat

Evaluation of LMs fine-tuned on CitationR

python ./src/evaluating/evaluate.py --dataset_name base --config_name example --experiment_name example_evaluation

Train

python ./src/training/run_training.py --model scincl --dataset base --config_name example --experiment_suffix example_train

Our fine-tuned models are available: base(access code: nrbe), extended(access code: n3pb).

Put downloaded models under experiments/dataset_name/example_evaluation/model/

Citation

If you find our work useful, please cite the paper as:

@inproceedings{long24coling,
  title = {Recommending Missed Citations Identified by Reviewers: A New Task, Dataset and Baselines},
  author = {Kehan Long and Shasha Li and Pancheng Wang and Chenlong Bao and Jintao Tang and Ting Wang},
  booktitle = {COLING},
  year = {2024}
}

Acknowledgement

We use some of the code in Local-Citation-Recommendation and specter for implementing our project.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
experiments		experiments
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
statistics.png		statistics.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

experiments

experiments

src

src

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

statistics.png

statistics.png

Repository files navigation

RMC - Recommending Missed Citations Identified by Reviewers

The CitationR Dataset

Setup

Data Preparation

Environment Setup

Baselines

BM25

Citeomatic

HTransformer

Pre-trained LM (BERT, SciBERT, SPECTER, CiteBERT, LinkBERT, SciNCL ...)

Citation

Acknowledgement

About

Releases

Packages

Languages

ChainsawM/RMC

Folders and files

Latest commit

History

Repository files navigation

RMC - Recommending Missed Citations Identified by Reviewers

The CitationR Dataset

Setup

Data Preparation

Environment Setup

Baselines

BM25

Citeomatic

HTransformer

Pre-trained LM (BERT, SciBERT, SPECTER, CiteBERT, LinkBERT, SciNCL ...)

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages