ComFact

This is the source code for paper ComFact: A Benchmark for Linking Contextual Commonsense Knowledge.

Getting Started

Start with creating a python 3.6 venv and installing requirements.txt.

ComFact Datasets

Our ComFact dataset can be downloaded from this link, please place data/ under this root directory.

Pretrained Glove embeddings can be downloaded from this link, please place glove/ under the data/ directory and unzip glove.6B.zip in it.

Data portions:

Persona-Atomic data portion: persona/
Mutual-Atomic data portion: mutual/
Roc-Atomic data portion: roc/
Movie-Atomic data portion: movie/

Data Preprocessing

python data_preprocessing_main.py

Training and Evaluation

Prepare directory:

mkdir pred
mkdir runs

Training:

bash train_baseline.sh

Parameters:

language model ${lm}: "deberta-large" | "deberta-base" | "roberta-large" | "roberta-base" | "bert-large" | "bert-base" | "distilbert-base" | "lstm"
data portion ${portion}: "persona" | "mutual" | "roc" | "movie" | "all" (training on the union of all four data portions)
context window ${window}: "nlg" (half window without future context) | "nlu" (full context window)
linking task ${task}: "fact_full" (direct setting) | "head" (head entity linking, sub-task in pipeline setting) | "fact_cut" (fact linking of relevant head entities, sub-task in pipeline setting)
evaluation set ${eval_set}: "val" (validation set) | "test" (testing set)

Evaluating direct setting or sub-tasks in pipeline setting:

bash run_baseline.sh

parameters refer to Training.

Fine-grained analysis on fact linking results (after evaluating by run_baseline.sh):

python evaluate_linking.py --model ${lm} --window ${window} --portion ${portion} --linking ${task}

parameters refer to Training, ${task} should be fact_full | fact_cut

Evaluating full pipeline setting:

bash run_baseline_pipeline.sh

parameters refer to Training.

Evaluating head entity linkers in fact linking:

bash run_baseline_head_linker.sh

parameters refer to Training.

Cross evaluation:

bash cross_evaluation.sh

Parameters:

source data portion providing training set ${source_portion}: "persona" | "mutual" | "roc" | "movie" | "all"
target data portion providing validation or testing set ${target_portion}: "persona" | "mutual" | "roc" | "movie" | "all"

others refer to Training.

Plot heatmap for cross evaluation (lm: roberta-large, window: nlg, task: fact_full):

python plot_cross_evaluation.py

Downstream Dialogue Response Generation (CEM)

Setup NLG evaluation toolkits

pip install git+https://github.com/Maluuba/nlg-eval.git@master
nlg-eval --setup

Download CEM data from this link and place data/ under CEM/ directory.

Original preprocessed CEM data: ED/dataset_preproc.p

We also include our preprocessed CEM data with ComFact refined knowledge: ED/dataset_preproc_link.p

Prepare directory:

mkdir CEM/saved
mkdir CEM/vectors

Copy glove.6B.zip from data/glove/ to CEM/vectors/ directory.

Knowledge Refinement (For producing dataset_preproc_link.p, Optional)

Training fact linker for CEM knowledge refinement:

python preprocessing_rel_tail_link_x.py
bash train_baseline_rel_tail_link_x.sh

Extracting CEM data and preprocessing for knowledge refinement:

python cem_data_extract.py
python preprocess_cem_link.py

The extracted data will be placed in data/cem/rel_tail/nlg/test/${split}_data.json, where ${split}: "train" | "val" | "test"

Knowledge refining by fact linker, i.e., labeling the relevance of knowledge in the extracted CEM data:

bash run_baseline_cem_link_x.sh
python label_cem.py

Write back to the CEM data form:

python cem_data_back.py

Dialogue Modeling

Switch to the CEM folder:

cd CEM

Training CEM dialogue model:

python main.py --model cem --dataset ${dataset} --save_path ${save} --model_path ${save} --cuda

Parameters:

data source ${dataset}: dataset_preproc.p (original CEM dataset) | dataset_preproc_link.p (CEM dataset with ComFact refined knowledge),
${save}: your directory for saving the model and results.

Testing CEM dialogue model:

python main.py --test --model cem --dataset ${dataset} --save_path ${save} --model_path ${save} --cuda

NLG Evaluation:

Move the obtained results.txt from your result saving directory (${save}) to results/ directory, rename the file to ${name}.txt, then run:

python src/scripts/evaluate.py --results ${name}

Parameters:

${name}: name of the results file, e.g., CEM_link

We include the dialogue generation results: CEM_ori.txt (from original CEM) and CEM_link.txt (from CEM trained with ComFact refined knowledge) under results/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
CEM		CEM
baseline		baseline
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
cem_data_back.py		cem_data_back.py
cem_data_extract.py		cem_data_extract.py
cross_evaluation.sh		cross_evaluation.sh
data_preprocessing_main.py		data_preprocessing_main.py
data_preprocessing_pipeline_test.py		data_preprocessing_pipeline_test.py
evaluate_linking.py		evaluate_linking.py
evaluate_pipeline.py		evaluate_pipeline.py
label_cem.py		label_cem.py
plot_cross_evaluation.py		plot_cross_evaluation.py
preprocessing_cem_link.py		preprocessing_cem_link.py
preprocessing_rel_tail_link_x.py		preprocessing_rel_tail_link_x.py
requirements.txt		requirements.txt
run_baseline.sh		run_baseline.sh
run_baseline_cem_link_x.sh		run_baseline_cem_link_x.sh
run_baseline_head_linker.sh		run_baseline_head_linker.sh
run_baseline_pipeline.sh		run_baseline_pipeline.sh
train_baseline.sh		train_baseline.sh
train_baseline_rel_tail_link_x.sh		train_baseline_rel_tail_link_x.sh

License

Silin159/ComFact

Folders and files

Latest commit

History

Repository files navigation

ComFact

Getting Started

ComFact Datasets

Data Preprocessing

Training and Evaluation

Downstream Dialogue Response Generation (CEM)

Knowledge Refinement (For producing dataset_preproc_link.p, Optional)

Dialogue Modeling

About

Resources

License

Stars

Watchers

Forks

Languages