HebSafeHarbor

(version 2)

A de-identification toolkit for clinical text in Hebrew.
An improved version of Microsoft's HebSafeHarbor project.

HebSafeHarbor was developed according to the requirements described in the file about_hebsafeharbor (read more here)

The toolkit integrates and uses open source libraries and assets, including HebSpacy (that runs NER model based on AlephBERT, Nemo and BMC), Presidio, Wikipedia and public lexicons.

Establishing the work environment

Make sure you have Anaconda installed on your computer.

Unpack the packedhebsafeharbor.zip environment file to where you want the environment to go(usually at - C:\Users\..\Anaconda3\envs).
Open the standard Command Prompt (cmd.exe) on Windows.

Run:

cd packedhebsafeharbor
.\Scripts\activate.bat

The prompt should tell you that you're in the right environment after the last step, so you should see something like the following:
```
(packedhebsafeharbor) C:\Some\Path\Where\Your\Environment\Is>
```

Getting started

Use default anonymization

in this case, you don't need to initialize HebSafeHarbor object with parameters.
Days in medical dates will change to <יום_>

from hebsafeharbor import HebSafeHarbor

# use default anonymization
hsh = HebSafeHarbor()

text = """שרון לוי התאשפזה ב02.02.2012 """
doc = {"text": text}

output = hsh([doc])

print(output[0].anonymized_text.text)

#  > <שם_> התאשפזה ב<יום_>.02.2012

Use anonymization by context

in this case, initialize HebSafeHarbor object with context.
Current sypported contexts: ['imaging', 'general', 'family']
This will make sure that the anonymization will adapt itself to the world of content.

from hebsafeharbor import HebSafeHarbor

# use context anonymization
hsh = HebSafeHarbor(context='imaging')

text = """  .CT רופא מפנה:  דוק טור, (123456)  בדיקה:   ט.מ צוואר  """
doc = {"text": text}

output = hsh([doc])

print(output[0].anonymized_text.text)

# .CT רופא מפנה:  <שם_>, (<מזהה_>)  בדיקה:   ט.מ צוואר

Using a custom date shifting/anonymization

in this case, initialize HebSafeHarbor object with a shifting date function and it's additional parameters. The function signature should be:

def f(params:any, date_string: string) -> Tuple[string, string, string]:

for example:

from hebsafeharbor import HebSafeHarbor
from datetime import datetime, timedelta
from dateutil import parser

def shift_day(params,date_):
    date_obj = parser.parse(date_)
    new_date = date_obj + timedelta(days=params[0])
    return str(new_date.day), str(new_date.month), str(new_date.year)
    

# use default anonymization
hsh = HebSafeHarbor(shift_date_function=(shift_day, [17]))

text = """שרון לוי התאשפזה ב02.02.2012 """
doc = {"text": text}

output = hsh([doc])

print(output[0].anonymized_text.text)

# <שם_> התאשפזה ב19.02.2012

Versions

current - About version 2

previous versions:
About version 1

Special Thanks

NLP capabilities are based on resources developedg by ONLP Lab (the lab git). Especialy AlephBERT and NEMO.
HebSafeHarbor is an open-source project developed by 8400 The Health Network.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
docs		docs
global_variables		global_variables
hebsafeharbor		hebsafeharbor
ner_rec_names		ner_rec_names
tests		tests
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

docs

docs

global_variables

global_variables

hebsafeharbor

ner_rec_names

ner_rec_names

tests

tests

README.md

README.md

Repository files navigation

HebSafeHarbor - CLALIT Validation

(version 2)

Contents

Establishing the work environment

Getting started

Use default anonymization

Use anonymization by context

Using a custom date shifting/anonymization

Versions

Special Thanks

About

Releases

Packages

Languages

ChenMordehai/HebSafeHarbor_Clalit_Validation_Improvment

Folders and files

Latest commit

History

Repository files navigation

HebSafeHarbor - CLALIT Validation

(version 2)

Contents

Establishing the work environment

Getting started

Use default anonymization

Use anonymization by context

Using a custom date shifting/anonymization

Versions

Special Thanks

About

Resources

Stars

Watchers

Forks

Languages