BERT Transfer Learning for De-Identification of Sensitive Data

Bidirectional Encoder Representations from Transformers (BERT) transfer learning for named entity recognition and de-identification of sensitive data

You Don't Know My Name: Transfer Learning to De-Identify Protected Health Information in Electronic Health Records

Arnobio Morelix, Pauline Wang
School of Information, University of California, Berkeley

Date: August 2, 2019

Abstract

This paper presents an information extraction system using Bidirectional Encoder Representations from Transformers (BERT) to de-identify protected health information (PHI) from electronic health records (EHR). Past work associated with PHI have used a combination of dictionary-based, rule-based, and machine learning algorithms to deal with the inherent complexity in PHI categories. In this paper we use BERT, a pre-trained model with context-aware word embeddings, to classify PHI categories in a named entity recognition task. Our model performs in line with, and in some measures better than, models relying on extensive rules-based pre-processing. Because of the cost of data pre-processing in rules-based systems, and their reliance on specific styles of annotation (e.g., how a particular hospital might record and report on biometrics), we believe the type of model we present here has more generalization potential and we present further opportunities for refinement.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
code		code
README.md		README.md
Transfer Learning to De-Identify Protected Health Information in Electronic Health Records (Morelix, Wang).pdf		Transfer Learning to De-Identify Protected Health Information in Electronic Health Records (Morelix, Wang).pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT Transfer Learning for De-Identification of Sensitive Data

You Don't Know My Name: Transfer Learning to De-Identify Protected Health Information in Electronic Health Records

Abstract

Keywords: BERT, transfer learning, HIPAA, named entity recognition, i2b2 2014 de-identification challenge.

About

Releases

Packages

Languages

arnobiomorelix/berttransferlearning

Folders and files

Latest commit

History

Repository files navigation

BERT Transfer Learning for De-Identification of Sensitive Data

You Don't Know My Name: Transfer Learning to De-Identify Protected Health Information in Electronic Health Records

Abstract

Keywords: BERT, transfer learning, HIPAA, named entity recognition, i2b2 2014 de-identification challenge.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages