Cochrane-sections

This directory contains everything for our paper Section-level Simplification of Biomedical Abstracts.

Data

The original Cochrane abstracts and PLS in English were derived from the CDSR by Devaraj et al. (2021). We copied their data into data/cochrane. We also copied the manually and automatically aligned sentence pairs extracted by Joseph et al. (2023) from their repository into the data/multicochrane directory.

We placed our newly created Cochrane-sections dataset in the data/cochrane-sections directory.

We provide the LLM-generated labels for the test set within the subfolders of data/classifications.

Lastly, we provide our manual annotations for the test set under data/annotations.

Pretrained models

We provide the checkpoint for the neural CRF alignment model that was first trained by Jiang et al. (2020) and then fine-tuned and shared by Joseph et al. (2023) here. It leverages the BERT model that Jiang et al. trained on Wiki-manual and shared here.

We also share the checkpoints of our trained section classification models under classifiers.

Code

Alignment

Firstly, the script load_data.py can be used to (1) load the sentence-tokenized abstracts and PLS and (2) determine for each PLS sentence whether it is aligned to an abstract sentence, and if so, what abstract section that sentence belongs to. The resulting triples (pls_sent_id, abs_sent_id, abs_sect_id) are saved under alignments.

The script alignment.py can be used to generate automatic alignments for the test set and evaluate them against the manual alignments. The generated triples are then saved under alignments. We copied the required code from Jiang et al. to the aligner directory.

Classification

The script prepare.py contains our code to prepare for the classification step by embedding the source sentences and target labels.

The script classifier.py contains our implementation of the section classifier and the code used for training it.

The script classification.py contains our code for predicting section header labels with a trained classifier. The generated predictions are saved under classifications.

Two-step method

The script two_step_method.py can be used to determine the label of each sentence within a PLS using our two-step method, based on the alignment and classification results.

Dataset creation & analysis

The script create_dataset.py can be used to create a split of the Cochrane-sections dataset based on annotated (test) or predicted (train/val/auto) labels. This split is then saved under data/cochrane-sections.

Lastly, the script analysis.py can be used to visualize results by generating a table, barplot and confusion matrix as seen in our paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cochrane-sections

Data

Pretrained models

Code

Alignment

Classification

Two-step method

Dataset creation & analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
aligner		aligner
alignments		alignments
classifications		classifications
classifiers		classifiers
data		data
labels		labels
LICENSE		LICENSE
README.md		README.md
alignment.py		alignment.py
analysis.py		analysis.py
classification.py		classification.py
classifier.py		classifier.py
create_dataset.py		create_dataset.py
load_data.py		load_data.py
prepare.py		prepare.py
requirements.txt		requirements.txt
two_step_method.py		two_step_method.py

Folders and files

Latest commit

History

Repository files navigation

Cochrane-sections

Data

Pretrained models

Code

Alignment

Classification

Two-step method

Dataset creation & analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages