Skip to content

cactiML/clinisift

Repository files navigation

./assets/clinisift.png

clinisift is a multitool for processing clinical medical records.

The main goal is to provide easy, off-the-shelf access to common NLP processes when working with medical records:

  • Sentence Tokenization and Section Identification from unstructured clinical textual data
  • Named Entity Recognition of medication-related data and clinical entities from records
  • Intuitive visualization of extracted information

Some motivating examples that can be accomplished in only a few lines of code to illustrate possible use-cases:

  • Extract clinical problems and procedures mentioned in a record’s CLINICAL HISTORY section.
  • When exploring a new dataset, visualize records with clinical and medication entities parsed and highlighted on-the-fly.
  • Check if both a particular medication and particular surgical procedure are mentioned in a patient’s PAST MEDICAL HISTORY.

Quick Features

  • Parse - Extract clinical and medical entities through Transformers-based Named Entity Recognition, as well as other components like medical record section identification. Also supports any NER model that can be loaded as a HuggingFace pipeline
  • Analyze - Built-in methods to quickly filter through parsed data with as little code overhead as possible.
  • Visualize - spaCy-based visualizer that integrates with Transformers NER to visualize medical record parses on-the-fly, programmatically or via command line.

Get Started

Installation

Install via pip:

pip install clinisift

Or, from source:

git clone git@github.com:clinisift/clinisift.git
cd clinisift && pip install -e .

Quickstart

For a comprehensive overview of clinisift’s capabilities, see the “Components” page on the wiki.

Components

clinisift is made up of Parser and Doc components. See the “Components” page on the wiki for an explanation of all the parameters.

class Parser(
    models=None,
    include_ents=[],
    exclude_ents=[],
    iob_resolve=True,
    sent_tokenizer="clinitokenizer",
    sent_per_line=False,
    extract_section_headers=False,
    section_header_expr=None,
    device=None,
) 
class Doc(
    filepath_or_str,
    parser,
    is_file=True
)

Examples

Below are some examples for common use-cases.

Extract all clinical entities and medications from a *.txt file

from clinisift.cliniparse import Parser
from clinisift.doc import Doc

parser = Parser() # med ner and clinical ner
doc = Doc(text_file_path, parser)

res = doc.parse()
# { "sentences": [...],
# "entities": [...l, }

Visualize entities extracted on-the-fly from a directory of .txt files

To launch a visualizer using the default Parser() config:

From the command line:

python -m clinisift.visualizer /my/data/dir

A Flask server will be launched:

./assets/visualizer_1.png

./assets/visualizer_2.png

The visualizer module can be integrated with any `Parser` for more customizability about the NER pipelines used, entities visualized, and so forth. More information is available in the wiki.

About

Multitool for off-the-shelf clinical NLP

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages