Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Resources related to ACL 2020 paper "Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain"

License

Notifications You must be signed in to change notification settings

boschresearch/joint_anonymization_extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Joint-Anonymization-NER

This is the companion code for the experiments reported in the paper

"Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain" by Lukas Lange, Heike Adel and Jannik Strötgen published at ACL 2020.

The paper can be found here. The code allows the users to reproduce the results reported in the paper and extend the model to new datasets and embedding configurations. Please cite the above paper when reporting, reproducing or extending the results as:

Citation

@inproceedings{lange-etal-2020-closing,
    title = "Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain",
    author = {Lange, Lukas  and
      Adel, Heike  and
      Str{\"o}tgen, Jannik},
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.621",
    pages = "6945--6952",
    abstract = "Exploiting natural language processing in the clinical domain requires de-identification, i.e., anonymization of personal information in texts. However, current research considers de-identification and downstream tasks, such as concept extraction, only in isolation and does not study the effects of de-identification on other tasks. In this paper, we close this gap by reporting concept extraction performance on automatically anonymized data and investigating joint models for de-identification and concept extraction. In particular, we propose a stacked model with restricted access to privacy sensitive information and a multitask model. We set the new state of the art on benchmark datasets in English (96.1{\%} F1 for de-identification and 88.9{\%} F1 for concept extraction) and Spanish (91.4{\%} F1 for concept extraction).",
}

Purpose of the project

This software is a research prototype, solely developed for and published as part of the publication "Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain". It will neither be maintained nor monitored in any way.

Setup

  • Install flair (Tested with flairNLP 0.4.5, PyTorch 1.3.1 and Python 3.6.8)
  • Download pre-trained word embeddings (using flair or your own).
  • Prepare corpus in BIO format.
  • Train a stacked or multitask model as described in the example notebook

Data

We do not ship the corpora used in the experiments from the paper. The sample files provided in the data directory are given to illustrate the used data format (BIO). More information can be found in the data/README.md.

License

Joint-Anonymization-NER is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

For a list of other open source components included in Joint-Anonymization-NER, see the file 3rd-party-licenses.txt.

The software including its dependencies may be covered by third party rights, including patents. You should not execute this code unless you have obtained the appropriate rights, which the authors are not purporting to give.

About

Resources related to ACL 2020 paper "Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published