Skip to content

UKPLab/aacl2022-TexPrax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TexPrax

Lorenz Stangier*, Ji-Ung Lee*, Yuxi Wang, Marvin Müller, Nicholas Frick, Joachim Metternich, and Iryna Gurevych

* Both authors contributed equally.

This repository contains code and data from our TexPrax demo paper published at AACL 2022.

Abstract: Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate problems, causes, and solutions that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.

Drop us a line or report an issue if something is broken (and shouldn't be) or if you have any questions.

For license information, please see the LICENSE and README files.

The code for the TexPrax project consists of three components:

  • recorder-bot
  • texpraxconnector
  • examples

The modification of the matrix-synapse server (synapserecording) has been removed from the main branch with the port to python 3.10.

It is still available in the branch python3.7

A detailed description and installation instructions can be found below.

A demo video of the project can be found here.

Disclaimer: This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Citing the paper

@inproceedings{stangier-etal-2022-texprax,
    title = "{T}ex{P}rax: A Messaging Application for Ethical, Real-time Data Collection and Annotation",
    author = {Stangier, Lorenz  and
      Lee, Ji-Ung  and
      Wang, Yuxi  and
      M{\"u}ller, Marvin  and
      Frick, Nicholas  and
      Metternich, Joachim  and
      Gurevych, Iryna},
    booktitle = "Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2022",
    address = "Taipei, Taiwan",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.aacl-demo.2",
    pages = "9--16",
}


Data

An anoymized version of the collected data including annotations can be downloaded from tudatalib or via huggingface-datasets (CC-by-NC).

Recorder Bot

The chatbot that keeps track of messages, provides label suggestions, and collects feedback via reactions.

Texprax Connector

Example code to exchange data with an external dashboard via HTTP requests. Please check the branch remote-storage to see an implementation that utilizes remote storage.

How to setup TexPrax

Detailed instructions on how to setup the TexPrax messaging and recording system.

Setting up Synapse

Clone the repostiory

git clone https://github.com/UKPLab/TexPrax.git

Setup your python environment.

conda create --name=texprax-demo python=3.10
conda activate texprax-demo

Install the synapse server first:

pip install matrix-synapse

Now we need to create a config file via:

python -m synapse.app.homeserver -c homeserver.yaml --generate-config --server-name=<server-name> --report-stats=<yes|no>

This has now created a homeserver.yaml file. Now you can start the homeserver via

synctl start

You can check if the installation is running by going to http://localhost:8008 in your browser. For further steps, we ask you to follow the instructions in the official synapse documentation.

Registering a new user

  1. Go to your homeserver.yaml location.

  2. Add a new user via

    register_new_matrix_user -c homeserver.yaml http://localhost:8008
    

    Note: Make sure that you are in the correct python environment e.g., conda activate texprax-demo

  3. Go to Element

  4. Go to Sign In, and Edit the homeserver from matrix.org to http://localhost:8008

  5. Sign in with your credentials

Setting up the recorder bot

Note: You can setup the bot independently of your synapse server, for instance, using a new env:

conda create --name=texprax-bot python=3.10
conda activate texprax-bot

OLM is required for encryption. Install it via:

git clone https://gitlab.matrix.org/matrix-org/olm.git olm
cd olm
cmake . -Bbuild
cmake --build build

Now go to the recorder-bot folder:

cd recorder-bot

and install the requirements: pip install -r requirements.txt .

Nake sure that you are in the correct python environment e.g., conda activate texprax-bot. If there are issues with python-olm, try this:

  pip install python-olm --extra-index-url https://gitlab.matrix.org/api/v4/projects/27/packages/pypi/simple

Now we need to create a config file with the respective paths etc. You can use sample.config.yaml as your base file.

We also need to add a new account for the bot (follow the steps above to create a new account).

As an example, we will use the username bot with the password bot.

Setting bot credentials (config.yaml):

matrix
    user_id: "@bot:texprax-demo"
    user_password: "bot"
    homeserver_url: "http://localhost:8008"

The default storage location of your messages will be ./store .

You will also have to supply a message_path (line 34 in config.yaml):

message_path: ".store/messages.json"

To use the models finetuned on German dialog data, download them from tudatalib and put them into a models folder:

mkdir models
cd models
wget -q --show-progress https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3534/sequence_classification_model.zip
wget -q --show-progress https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3534/token_classification_model.zip
unzip sequence_classification_model.zip
unzip token_classification_model.zip

Now add them to the config.yaml:

sequence_model_path: "models/sequence_classification_model"  
token_model_path: "models/token_classification_model"  

We further set the language of the bot to German by setting:

language_file_path: "language_files/DE.txt"

Finally, run the bot via:

LD_LIBRARY_PATH=<path-to-olm>/olm/build/ python autorecorderbot_start

After the bot is running, you can add it like any user to your room. The bot's id in this example will be: @bot:texprax-demo

Synapserecording (old version)

The modified Synapse instance to automatically invite the bot into newly created rooms.

Important: This requires some features that are only available in an older (deprecated) version that uses python 3.7. Please switch to the branch python3.7 for this.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •