Skip to content

HermannKroll/SupervisedTextProcessing

Repository files navigation

A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain

This repository is part of our JCDL2024 submission. You can find the publication under link will follow.

Please cite our following paper when working with our repository.

@inproceedings{kroll2021toolbox,
  author = {H. Kroll, P. Sackhoff, M. Thang, M. Ksouri and W.-T. Balke},
  booktitle = {2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL)},
  title = {A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain},
  year = {2024}
}

Documentation

The implementation is in the src folder. The following modules are provided:

  • Analysis - various analysis scripts
  • Data Generation - utilized datasets in our format
  • Prediction - processing-pipelines for the traditional and BERT models
  • Training - implementation of the model training

For the evaluation the following scripts were used:

Project setup

  1. The project is implemented in python. For that, we used Conda to create a new py environment:
conda create -n stpenv python=3.8
conda activate stpenv
  1. To reproduce the results of our evaluation, you first need to install the required python libraries.
pip -r requirements.txt
  1. Set the python path to our src root.
export PYTHONPATH="/home/USER/SupervisedTextProcessing/src/
  1. We publish our self-curated dataset Pharmaceutical Technologies along with the code. To use it in the evaluation, it has to be decompressed first. The script will extract the data into the correct location.
python src/narrarelex/init_benchmarks.py
  1. To use the relabeling module correctly, you first need to paste your API keys into the files HUGGINGFACE_TOKEN and OPENAI_TOKEN.

Benchmarks

Task 1: Relation Extraction

Please note, that our Chemprot variants, called as ChemprotE and ChemprotE, are created at runtime.

Task 2: Text Classification

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages