Skip to content
No description, website, or topics provided.
Java Python HTML JavaScript Other
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
figures Initial import. Aug 1, 2019
src Added shuffling to evidence liking pre-training model; removed unused… Aug 1, 2019
LICENSE.txt
NOTICE.txt
Readme.rst Added citation information. Aug 9, 2019
pom.xml Initial import. Aug 1, 2019

Readme.rst

EDoHa

EDoHa, short for Evidence Detection fOr Hypothesis vAlidation, is an annotation tool on top of the INCEpTION platform. Its goal is to support researchers in doing qualitative research, such as analysing interviews or other forms of text. To do so, it offers two views in which a user can label sentences and link them to self-defined groups. Furthermore, EDoHa can learn from a user in what kind of sentences they are interested in and to which group a labeled sentence belongs.

User Interface

EDoHa uses two different views, one in which the user can label sentences and a second to group these sentences together.

In the Document view, the user can select a document and label sentences which are then highlighted with blue background colour. When provided with a pre-trained evidence detection model, EDoHa can suggest sentences that might be interesting to the user. These suggestions are highlighted with a green background colour and the user can set a confidence threshold to reduce the number of suggestions. If the user wishes to train an evidence detection model on the data they created, they can click on the Train Model button.

figures/screenshot-edoha.png

In the Evidence Linking view, the user can take the previously labeled sentences or pieces of evidence and link it to self-defined groups. A group can be created by typing its title into the title bar at the top. Pieces of evidence can then be linked to this group by drag-and-drop or by accepting a suggestion from the evidence linking model.

figures/screenshot-hypothesis-validation.png

Document import

EDoHa offers two kinds of data import, plain text and UIMA XMI. The UIMA XMI import allows to pre-process the text by segmenting it into sentences and tokens.

The script src/main/resources/preprocess.grooy provides this capability via dkro-script.

The resulting XMI documents can then be loaded by EDoHa. Be careful to select the XMI import and not the text input.

Pre-trained evidence detection and evidence linking models

The Python code to train the evidence detection and evidence linking models can be found under src/main/python/. The TFModelTrainer.py trains the evidence detection model and saves the vocabulary for the word embeddings. We defined the convention of using the terms input, output, and target to designate the interface. The training operation is named train.

The TFELModelTrainer.py the evidence linking model on data created by the ELDataCreator.py script. It reads the evidence detection data and creates random links between evidential sentences and topics to create non-linked evidence topic pairs. Using the model requires the names topic_input and topic_length for the topic sentence and length, sentence_input and sentence_length for the candidate sentence. The target, output, and training operation are identical to the evidence detection model.

Training data for pre-trained models

The training data for the pre-trained models is available under src/main/data. It contains the original IBM debater datasets, as well as the additional random links for the evidence linking task.

To import the models, they both need to be placed in a single ZIP archive with the vocabulary.txt.

Setup

EDoHa can be deployed as a war-archive and requires a database backend. We tested it with Apache Tomcat 8.0+ and MySQL 5.7.27 and MySQL 8.0.13.

Before starting, you need to set a few environment variables so that EDoHa knows how to connect to the database and where to store the trained machine learning models.

You can do this by adding the following lines to the start script of your container.

export EDOHA_DB_DIALECT="org.hibernate.dialect.MySQL8Dialect"
export EDOHA_DB_DRIVER="com.mysql.jdbc.Driver"
export EDOHA_DB_URL="jdbc:mysql://$HOST:3306/$DB"
export EDOHA_DB_USERNAME="$DB_USER"
export EDOHA_DB_PASSWORD="$DB_PASSWORD"
export EDOHA_ED_MODEL_PATH="$PATH_TO_YOUR_MODELS_FOLDER"

The database $DB, username $DB_USER, and $DB_PASSWORD have to be set in advance.

During the first startup, EDoHa will create an admin user with the username password combination admin/admin. The admin can then create users, projects, upload documents into projects, and add the settings for the machine learning components in the EDoHa Settings page. The EDoHa specific settings are the pre-trained models, the lengths of sentences and titles the pre-trained models expect, and parameters, such as batch size and number of epochs.

Cite

@inproceedings{Stahlhut:CombattingDisinformationInteractive-2019,
  address = {{London, UK}},
  title = {Combatting {{Disinformation}} via {{Interactive Evidence Detection}}},
  booktitle = {Proceedings of the First {{Conference}} on {{Truth}} and {{Trust Online}}},
  author = {Stahlhut, Chris},
  month = oct,
  year = {2019}
}
You can’t perform that action at this time.