Skip to content

Bidirectional Encoder Representations from Transformers (BERT) transfer learning for named entity recognition and de-identification of sensitive data

Notifications You must be signed in to change notification settings

arnobiomorelix/berttransferlearning

Repository files navigation

BERT Transfer Learning for De-Identification of Sensitive Data

Bidirectional Encoder Representations from Transformers (BERT) transfer learning for named entity recognition and de-identification of sensitive data

You Don't Know My Name: Transfer Learning to De-Identify Protected Health Information in Electronic Health Records

Arnobio Morelix, Pauline Wang
School of Information, University of California, Berkeley

Date: August 2, 2019

Abstract

This paper presents an information extraction system using Bidirectional Encoder Representations from Transformers (BERT) to de-identify protected health information (PHI) from electronic health records (EHR). Past work associated with PHI have used a combination of dictionary-based, rule-based, and machine learning algorithms to deal with the inherent complexity in PHI categories. In this paper we use BERT, a pre-trained model with context-aware word embeddings, to classify PHI categories in a named entity recognition task. Our model performs in line with, and in some measures better than, models relying on extensive rules-based pre-processing. Because of the cost of data pre-processing in rules-based systems, and their reliance on specific styles of annotation (e.g., how a particular hospital might record and report on biometrics), we believe the type of model we present here has more generalization potential and we present further opportunities for refinement.

Keywords: BERT, transfer learning, HIPAA, named entity recognition, i2b2 2014 de-identification challenge.

About

Bidirectional Encoder Representations from Transformers (BERT) transfer learning for named entity recognition and de-identification of sensitive data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published