diff --git a/NOTICE.txt b/NOTICE.txt new file mode 100644 index 0000000..30065ea --- /dev/null +++ b/NOTICE.txt @@ -0,0 +1,9 @@ +------------------------------------------------------------------------------- +Copyright 2019 +Ubiquitous Knowledge Processing (UKP) Lab +Technische Universität Darmstadt + +------------------------------------------------------------------------------- +Third party legal information + + diff --git a/README.md b/README.md index 6315184..767aa02 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,15 @@ -![2010-07-07_ukp_banner](https://user-images.githubusercontent.com/29311022/27184688-27629126-51e3-11e7-9a23-276628da2430.png) + + + -![aiphes_logo - small](https://user-images.githubusercontent.com/29311022/27278631-2e19f99e-54e2-11e7-919c-f89ae0c90648.png) -![tud_weblogo](https://user-images.githubusercontent.com/29311022/27184769-65c6583a-51e3-11e7-90e0-12a4bdf292e2.png) +# Multi-Sentence Textual Entailment for Claim Verification +## This repository was constructed by team Athene for the [FEVER shared task 1](http://fever.ai/2018/task.html). The system reached the third rank in the overall results and first rank on the evidence recall sub-task +This repository builds upon the baseline system repository developed by the FEVER shared task organizers: https://github.com/sheffieldnlp/fever-naacl-2018 -## This repository was constructed by team Athene for the FEVER shared task 1 (http://fever.ai/2018/task.html) -### The system reached the third rank in the overall results and first rank on the evidence recall sub-task +This is an accompanying repository for our FEVER Workshop paper at EMNLP 2018. For more information see the paper: [UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification](https://arxiv.org/pdf/1809.01479.pdf) -This repository builts upon the baseline system repository developed by the FEVER shared task organizers: (https://github.com/sheffieldnlp/fever-naacl-2018) - -For more information see our paper: [UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification](https://arxiv.org/pdf/1809.01479.pdf) -* BibTeX: +Please use the following citation: @article{hanselowski2018ukp, title={UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification}, @@ -20,11 +19,34 @@ For more information see our paper: [UKP-Athene: Multi-Sentence Textual Entailme } +Disclaimer: +> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication. + +### Requirements +* Python 3.6 +* AllenNLP +* TensorFlow +### Installation +* Download and install Anaconda (https://www.anaconda.com/) +* Create a Python Environment and activate it: +```bash + conda create -n fever python=3.6 + source activate fever +``` +* Install the required dependencies +```bash + pip install -r requirements.txt +``` +* Download NLTK Punkt Tokenizer +```bash + python -c "import nltk; nltk.download('punkt')" +``` +* Proceed with downloading the data set, the embeddings, the models and the evidence data -## Download Data +### Download the FEVER data set Download the FEVER dataset from [the website of the FEVER share task](https://sheffieldnlp.github.io/fever/data.html) into the data directory mkdir data @@ -34,7 +56,10 @@ Download the FEVER dataset from [the website of the FEVER share task](https://sh wget -O data/fever-data/train.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/train.jsonl wget -O data/fever-data/dev.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/shared_task_dev.jsonl wget -O data/fever-data/test.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/shared_task_test.jsonl - + + +### Download the word embeddings + Download pretrained GloVe Vectors wget http://nlp.stanford.edu/data/wordvecs/glove.6B.zip @@ -47,20 +72,11 @@ Download pretrained Wiki FastText Vectors mkdir -p data/fasttext unzip wiki.en.zip -d data/fasttext -## Create Python Environment - conda create -n fever python=3.6 - source activate fever - pip install -r requirements.txt - - -Download NLTK Punkt Tokenizer - python -c "import nltk; nltk.download('punkt')" - -## Data Preparation -The data preparation consists of three steps: downloading the articles from Wikipedia, indexing these for the Evidence Retrieval and performing the negative sampling for training . +### Download evidence data +The data preparation consists of three steps: (1) downloading the articles from Wikipedia, (2) indexing these for the evidence retrieval and (3) performing the negative sampling for training . -### 1. Download Wikipedia data: +#### 1. Download Wikipedia data: Download the pre-processed Wikipedia articles and unzip it into the data folder. @@ -68,19 +84,34 @@ Download the pre-processed Wikipedia articles and unzip it into the data folder. unzip wiki-pages.zip -d data -### 2. Indexing +#### 2. Indexing Construct an SQLite Database (go grab a coffee while this runs) PYTHONPATH=src python src/scripts/build_db.py data/wiki-pages data/fever/fever.db + +#### 3. Sampling negative evidence +[coming soon] -## Run the end to end pipeline of the submitted models +### Download the UKP-Athene models +[coming soon] + +### Run the end-to-end pipeline of the submitted models PYTHONPATH=src python src/script/athene/pipeline.py -## Run the variation of the RTE model +### Run the variation of the RTE model Another variation of the ESIM model is configured through the config file in the conf folder. To run the models: - PYTHONPATH=src python src/scripts/athene/pipeline.py --config conf/ \ No newline at end of file + PYTHONPATH=src python src/scripts/athene/pipeline.py --config conf/ + +### Contacts: +If you have any questions regarding the code, please, don't hesitate to contact the authors or report an issue. + * \@ukp.informatik.tu-darmstadt.de + * https://www.informatik.tu-darmstadt.de/ukp/ukp_home/ + * https://www.tu-darmstadt.de + +### License: + * Apache License Version 2.0 \ No newline at end of file