CONVINSE: Conversational Question Answering on Heterogeneous Sources

Link to our follow-up work in SIGIR 2023: EXPLAIGNN.

CONVINSE: Conversational Question Answering on Heterogeneous Sources

Description
Code
ConvMix
- Loading ConvMix
- Comparing on ConvMix
Data
- Wikidata
- Wikipedia
Feedback
License

Description

This repository contains the code and data for our SIGIR 2022 paper on conversational question answering on heterogeneous sources. In this work, we present a general pipeline for the task:

Question Understanding (QU) -- creating an explicit representation of an incoming question and its conversational context
Evidence Retrieval and Scoring (ERS) -- harnessing this frame-like representation to uniformly capture the most relevant evidences from different sources
Heterogeneous Answering (HA) -- deriving the answer from these evidences.

We propose CONVINSE as an instantiation of this 3-staged pipeline, and devise structured representations (SR) to capture the user information need explicitly during Question Understanding in this work, as an instantiation of the QU phase. Experiments are conducted on the newly collected dataset ConvMix, which is released on the CONVINSE website (or as download when instantiating this repo).

If you use this code, please cite:

@inproceedings{christmann2022conversational,
  title={Conversational Question Answering on Heterogeneous Sources},
  author={Philipp Christmann and Saha Roy, Rishiraj and Gerhard Weikum},
  booktitle={SIGIR},
  pages={144--154},
  year={2022}
}

Code

System requirements

All code was tested on Linux only.

Conda
PyTorch
GPU (suggested; never tried training/inference without a GPU)

Installation

Clone the repo via:

    git clone https://github.com/PhilippChr/CONVINSE.git
    cd CONVINSE/
    CONVINSE_ROOT=$(pwd)
    conda create --name convinse python=3.8
    conda activate convinse
    pip install -e .

    # install PyTorch without CUDA
    conda install pytorch torchvision torchaudio -c pytorch

    # install PyTorch for CUDA 10.2 (using GPU)
    conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

    # install PyTorch for CUDA 11.3 (using GPU)
    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Install dependencies

CONVINSE makes use of CLOCQ for retrieving relevant evidences, and FiD for answering. CLOCQ can be conveniently integrated via the publicly available API, using the client from the repo. If efficiency is a primary concern, it is recommended to directly run the CLOCQ code on the local machine (details are given in the repo).
In either case, install it via:

    git clone https://github.com/PhilippChr/CLOCQ.git
    cd CLOCQ/
    pip install -e .

FiD was built for PyTorch version 1.6.0, and the native code is therefore not compatible with more recent versions of the Transformers library. Therefore, we provide a wrapper to integrate FiD in the CONVINSE pipeline using subprocess. You can install it from the repo:

    cd $CONVINSE_ROOT 
    git clone https://github.com/PhilippChr/FiD.git convinse/heterogeneous_answering/fid_module/FiD
    cd convinse/heterogeneous_answering/fid_module/FiD/
    conda create --name fid python=3.6
    conda activate fid
    pip install -e .
    conda activate convinse

Optional: If you want to use or compare with QuReTeC for question resolution, you can run the following:

    cd convinse/question_understanding/question_resolution/
    git clone https://github.com/nickvosk/sigir2020-query-resolution.git quretec

Initialize the repo (download data, benchmark, models):

    cd $CONVINSE_ROOT
    bash scripts/initialize.sh

Reproduce paper results

You can either reproduce all results of CONVINSE in Table 6, or only the results for a specific source (combination).

For reproducing all results, run:

    bash scripts/pipeline.sh --main-results

If you would like to reproduce only the results of CONVINSE for a specific source combination, run:

    bash scripts/pipeline.sh --gold-answers config/convmix/convinse.yml kb_text_table_info

the last parameter (kb_text_table_info) specifies the sources to be used.
Note, that CONVINSE retrieves evidences on-the-fly by default. Given that the evidences in the information sources can change quickly (e.g. Wikipedia has many updates every day), results can easily change. A cache was implemented to improve the reproducability, and we provide a benchmark-related subset of Wikipedia (see details below).

The results will be logged in the out/convmix directory, and the metrics written to _results/convmix/convinse.res.

Training the pipeline

To train a pipeline, just choose the config that represents the pipeline you would like to train, and run:

    bash scripts/pipeline.sh --train [<PATH_TO_CONFIG>] [<SOURCES_STR>]

Example:

    bash scripts/pipeline.sh --train config/convmix/convinse.yml kb_text_table_info

The HA part of the pipeline will be trained with the set of information sources you give as parameter.

Testing the pipeline

If you create your own pipeline, it is recommended to test it once on an example, to verify that everythings runs smoothly.
You can do that via:

    bash scripts/pipeline.sh --example [<PATH_TO_CONFIG>]

and see the output file in out/<benchmark> for potential errors.

For standard evaluation, you can simply run:

    bash scripts/pipeline.sh --gold-answers [<PATH_TO_CONFIG>] [<SOURCES_STR>]

Example:

    bash scripts/pipeline.sh --gold-answers config/convmix/convinse.yml kb_text_table_info

For evaluating with all source combinations, e.g. for comparing with CONVINSE, run:

    bash scripts/pipeline.sh --main-results [<PATH_TO_CONFIG>] [<SOURCES_STR>]

Example:

    bash scripts/pipeline.sh --main-results config/convmix/convinse.yml kb_text_table_info

If you want to evaluate using the predicted answers of previous turns, you can run:

    bash scripts/pipeline.sh --pred-answers [<PATH_TO_CONFIG>] [<SOURCES_STR>]

Example:

    bash scripts/pipeline.sh --pred-answers config/convmix/convinse.yml kb_text_table_info

By default, the CONVINSE config and all sources will be used.

The results will be logged in the following directory: out/<DATA>/<CMD>-<FUNCTION>-<CONFIG_NAME>.out,
and the metrics are written to: _results/<DATA>/<CONFIG_NAME>.res.

Using the pipeline

For using the pipeline, e.g. for improving individual parts of the pipeline, you can simply implement your own method that inherits from the respective part of the pipeline, create a corresponding config file, and add the module to the pipeline.py file. You can then use the commands outlined above to train and test the pipeline. Please see the documentation of the individual modules for further details:

ConvMix

Loading ConvMix

The ConvMix dataset can be downloaded (if not already done so via the initialize-script) via:

	bash scripts/download.sh convmix

Then, the individual ConvMix data splits can be loaded via:

	import json
	with open ("_benchmarks/convmix/train_set_ALL.json", "r") as fp:
		train_data = json.load(fp)
	with open ("_benchmarks/convmix/dev_set_ALL.json", "r") as fp:
		dev_data = json.load(fp)
	with open ("_benchmarks/convmix/test_set_ALL.json", "r") as fp:
		test_data = json.load(fp)

You could also load domain-specific versions, by replacing "ALL" by either "books", "movies", "music", "soccer" or "tvseries".
The data will have the following format:

[
	// first conversation
	{	
		"conv_id": "<INT>",
		"domain": "<STRING>",
		"questions": [
			// question 1 (complete)
			{
				"turn": 0,
				"question_id": "<STRING: QUESTION-ID>", 
				"question": "<STRING: QUESTION>", 
				"answers": [
					{
						"id": "<STRING: Wikidata ID of answer>",
						"label": "<STRING: Item Label of answer>
					},
				]
				"answer_text": "<STRING: textual form of answer>",
				"answer_src": "<STRING: source the worker found the answer>",
				"entities": [
					{
						"id": "<STRING: Wikidata ID of question entity>",
						"label": "<STRING: Item Label of question entity>
					},
				],
				"paraphrase": "<STRING: paraphrase of current question>"
				
			},
			// question 2 (incomplete)
			{
				"turn": 1,
				"question_id": "<STRING: QUESTION-ID>", 
				"question": "<STRING: QUESTION>", 
				"answers": [
					{
						"id": "<STRING: Wikidata ID of answer>",
						"label": "<STRING: Item Label of answer>
					},
				]
				"answer_text": "<STRING: textual form of answer>",
				"answer_src": "<STRING: source the worker found the answer>",
				"entities": [
					{
						"id": "<STRING: Wikidata ID of question entity>",
						"label": "<STRING: Item Label of question entity>
					},
				],
				"paraphrase": "<STRING: paraphrase of current question>",
				"completed": "<STRING: completed version of current incomplete question>"
		]
	},
	// second conversation
	{
		...
	},
	// ...
]

Comparing on ConvMix

Please make sure that...

...you use our dedicated Wikipedia dump, to have a comparable Wikipedia version (see further details below).
...you use the same Wikidata dump (2022-01-31), which can be conveniently accessed using the CLOCQ API available at https://clocq.mpi-inf.mpg.de (see further details below).
...you use the same evaluation method as CONVINSE (as defined in convinse/evaluation.py).

Data

Wikidata

For the CONVINSE paper, the Wikidata dump with the timestamp 2022-01-31 was used. The most recent dump (could be more recent than the one used in CONVINSE) is also accessible via the CLOCQ API. Further information on how to retrieve evidences from Wikidata can be found in the ERS documentation.

Wikipedia

Wikipedia evidences can be retrieved on-the-fly using the WikipediaRetriever package. However, we provide a ConvMix-related subset, that can be downloaded via:

    bash scripts/download.sh wikipedia

Note that this data is also downloaded by the default initialize script).
The dump is provided as a .pickle file, and provides a mapping from Wikidata item IDs (e.g. Q38111) to Wikipedia evidences.

This ConvMix-related subset has been created as follows. We added evidences retrieved from Wikipedia in 2022-03/04 for the following Wikidata items:

all answer entities in the ConvMix benchmark,
all question entities in the ConvMix benchmark (as specified by crowdworkers),
the top-20 disambiguations for each entity mention detected by CLOCQ, with the input strings being the intent-explicit forms generated for the ConvMix dataset by the CONVINSE pipeline, or the baseline built upon the 'Prepend all' QU method,
and whenever new Wikidata entities occured (e.g. for the dynamic setup running the pipeline with predicted answers), we added the corresponding evidences to the dump.

We aim to maximize the number of entities (and thus of evidences) here, to allow for fair (as far as possible) comparison with dense retrieval methods. Crawling the whole Wikipedia dump was out of scope (also Wikimedia strongly discourages this). In total we collected ~230,000 entities, for which we tried retrieving Wikipedia evidences. Note that for some of these, the result was empty.

Further information on how to retrieve evidences from Wikipedia can be found in the ERS documentation

Feedback

We tried our best to document the code of this project, and for making it accessible for easy usage, and for testing your custom implementations of the individual pipeline-components. However, our strengths are not in software engineering, and there will very likely be suboptimal parts in the documentation and code. If you feel that some part of the documentation/code could be improved, or have other feedback, please do not hesitate and let use know! You can e.g. contact us via mail: pchristm@mpi-inf.mpg.de. Any feedback (also positive ;) ) is much appreciated!

License

The CONVINSE project by Philipp Christmann, Rishiraj Saha Roy and Gerhard Weikum is licensed under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
config/convmix		config/convmix
convinse		convinse
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

PhilippChr/CONVINSE

Folders and files

Latest commit

History

Repository files navigation

Link to our follow-up work in SIGIR 2023: EXPLAIGNN.

CONVINSE: Conversational Question Answering on Heterogeneous Sources

Description

Code

System requirements

Installation

Install dependencies

Reproduce paper results

Training the pipeline

Testing the pipeline

Using the pipeline

ConvMix

Loading ConvMix

Comparing on ConvMix

Data

Wikidata

Wikipedia

Feedback

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages