A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain
This repository is part of our JCDL2024 submission. You can find the publication under link will follow.
Please cite our following paper when working with our repository.
@inproceedings{kroll2021toolbox,
author = {H. Kroll, P. Sackhoff, M. Thang, M. Ksouri and W.-T. Balke},
booktitle = {2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL)},
title = {A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain},
year = {2024}
}
The implementation is in the src folder. The following modules are provided:
- Analysis - various analysis scripts
- Data Generation - utilized datasets in our format
- Prediction - processing-pipelines for the traditional and BERT models
- Training - implementation of the model training
For the evaluation the following scripts were used:
- evaluate_bert.py pipeline for the BERT language models
- evaluate_traditional.py pipeline for the traditional classification models
- evaluate_hs.py script to evaluate the hyperparameter search
- evaluate_text_classification_noise.py script to evaluate the noise impact of the TC task
- run_config.py contains the configuration parameters required by each evaluation script
- config.py contains directory organization
- The project is implemented in python. For that, we used Conda to create a new py environment:
conda create -n stpenv python=3.8
conda activate stpenv- To reproduce the results of our evaluation, you first need to install the required python libraries.
pip -r requirements.txt- Set the python path to our src root.
export PYTHONPATH="/home/USER/SupervisedTextProcessing/src/- We publish our self-curated dataset Pharmaceutical Technologies along with the code. To use it in the evaluation, it has to be decompressed first. The script will extract the data into the correct location.
python src/narrarelex/init_benchmarks.py- To use the relabeling module correctly, you first need to paste your API keys into the files HUGGINGFACE_TOKEN and OPENAI_TOKEN.
Please note, that our Chemprot variants, called as ChemprotE and ChemprotE, are created at runtime.
- HallmarksOfCancer Website
- Ohsumed Website
- Long Covid Website
- PharmaTech Repository