![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/entity_resolution/NLU_hpo_resolver_pipeline.ipynb)

# Pipeline for Human Phenotype Ontology (HPO) Sentence Entity Resolver

This advanced pipeline extracts human phenotype entities from clinical texts and utilizes the sbiobert_base_cased_mli Sentence Bert Embeddings to map these entities to their corresponding Human Phenotype Ontology (HPO) codes. It also returns associated codes from the following vocabularies for each HPO code: - MeSH (Medical Subject Headings)- SNOMED- UMLS (Unified Medical Language System ) - ORPHA (international reference resource for information on rare diseases and orphan drugs) - OMIM (Online Mendelian Inheritance in Man).

In [None]:
! pip install johnsnowlabs

In [2]:
import json
import os

import sparknlp
import sparknlp_jsl
import nlu

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G",
          "spark.kryosializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

spark

Spark NLP Version : 5.3.1
Spark NLP_JSL Version : 5.3.1


In [5]:
nlu.__file__

'/usr/local/lib/python3.10/dist-packages/nlu/__init__.py'

In [3]:
pipe = nlu.load("en.map_entity.hpo_resolver_pipe")

hpo_resolver_pipeline download started this may take some time.
Approx size to download 2.1 GB
[OK!]


In [4]:
text = ["""She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation. On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mild aortic stenosis. She has previously had a Persantine Myoview nuclear rest-stress test scan completed at ABCD Medical Center in 07/06 that was negative. She has had significant mitral valve regurgitation in the past being moderate, but on the most recent echocardiogram on 05/12/08, that was not felt to be significant. She does have a history of significant hypertension in the past. She has had dizzy spells and denies clearly any true syncope. She has had bradycardia in the past from beta-blocker therapy."""]

In [5]:
df = pipe.predict(text)

[91m🚨 Your Spark-Healthcare is outdated, installed==5.3.1 but latest version==5.3.0
You can run [92m nlp.install() [39mto update Spark-Healthcare


In [6]:
df

Unnamed: 0,document,entities_ner_chunk,entities_ner_chunk_class,entities_ner_chunk_confidence,entities_ner_chunk_origin_chunk,entities_ner_chunk_origin_sentence,resolution_resolution_code,resolution_resolution_confidence,resolution_resolution_distance,resolution_resolution_k_codes,...,resolution_resolution_meta_resolution_k_ORPHA_codes,resolution_resolution_meta_resolution_k_SNOMED_codes,resolution_resolution_meta_resolution_k_UMLS_codes,resolution_resolution_origin_sentence,resolution_resolution_resolved_text,resolution_resolution_target_text,resolution_resolution_token,sentence_dl,sentence_embedding_sbert_embeddings,word_embedding_word_embeddings
0,"She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation. On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mild...",tricuspid regurgitation,HP,0.9826,ner_chunk,0,HP:0005180,0.9899,0.0,"[[HP:0005180, HP:0010446, HP:0001704, HP:0001702, HP:0030732, HP:0031444, HP:0011662, HP:0031651, HP:0031441, HP:0031443, HP:0011575, HP:0001653, HP:0010316, HP:0001659, HP:0031440, HP:0001647, HP...",...,"[ORPHA:228410, ORPHA:391641, ORPHA:1101, ORPHA:1759, ORPHA:1724, None, ORPHA:391641, None, None, None, ORPHA:1880, ORPHA:363700, ORPHA:466791, ORPHA:2181, None, ORPHA:1772, ORPHA:2255, ORPHA:53647...","[SNOMED:111287006, SNOMED:49915006, SNOMED:253383003, None, None, None, SNOMED:253455004,63042009, None, None, None, None, SNOMED:48724000, SNOMED:204357006, SNOMED:60234000, None, SNOMED:72352009...","[UMLS:C0040961, UMLS:C0040963, UMLS:C0040962, UMLS:C4025753, UMLS:C4255215, None, UMLS:C0243002, None, None, None, UMLS:C4023292, UMLS:C0026266,C3551535, UMLS:C0013481, UMLS:C0003504, None, UMLS:C...",0,tricuspid regurgitation,tricuspid regurgitation,tricuspid regurgitation,"[She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation., On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mi...","[[0.40901488065719604, -0.09854041039943695, -0.21287906169891357, 0.3738410472869873, 1.4597432613372803, -0.2503955364227295, -0.29258039593696594, 0.7664100527763367, 1.0721549987792969, 0.7082...","[[-0.21964989602565765, -0.2844458520412445, -0.10418396443128586, -0.5357521772384644, -0.06646879762411118, -0.444497287273407, -0.5000978708267212, -0.5944756269454956, 0.1369660645723343, 0.04..."
0,"She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation. On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mild...",aortic stenosis,HP,,,1,HP:0001650,0.974,0.0,"[[HP:0005180, HP:0010446, HP:0001704, HP:0001702, HP:0030732, HP:0031444, HP:0011662, HP:0031651, HP:0031441, HP:0031443, HP:0011575, HP:0001653, HP:0010316, HP:0001659, HP:0031440, HP:0001647, HP...",...,"[ORPHA:228410, ORPHA:391641, ORPHA:1101, ORPHA:1759, ORPHA:1724, None, ORPHA:391641, None, None, None, ORPHA:1880, ORPHA:363700, ORPHA:466791, ORPHA:2181, None, ORPHA:1772, ORPHA:2255, ORPHA:53647...","[SNOMED:111287006, SNOMED:49915006, SNOMED:253383003, None, None, None, SNOMED:253455004,63042009, None, None, None, None, SNOMED:48724000, SNOMED:204357006, SNOMED:60234000, None, SNOMED:72352009...","[UMLS:C0040961, UMLS:C0040963, UMLS:C0040962, UMLS:C4025753, UMLS:C4255215, None, UMLS:C0243002, None, None, None, UMLS:C4023292, UMLS:C0026266,C3551535, UMLS:C0013481, UMLS:C0003504, None, UMLS:C...",1,aortic stenosis,aortic stenosis,aortic stenosis,"[She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation., On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mi...","[[0.40901488065719604, -0.09854041039943695, -0.21287906169891357, 0.3738410472869873, 1.4597432613372803, -0.2503955364227295, -0.29258039593696594, 0.7664100527763367, 1.0721549987792969, 0.7082...","[[-0.21964989602565765, -0.2844458520412445, -0.10418396443128586, -0.5357521772384644, -0.06646879762411118, -0.444497287273407, -0.5000978708267212, -0.5944756269454956, 0.1369660645723343, 0.04..."
0,"She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation. On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mild...",mitral valve regurgitation,HP,,,3,HP:0001653,0.9867,0.0,"[[HP:0005180, HP:0010446, HP:0001704, HP:0001702, HP:0030732, HP:0031444, HP:0011662, HP:0031651, HP:0031441, HP:0031443, HP:0011575, HP:0001653, HP:0010316, HP:0001659, HP:0031440, HP:0001647, HP...",...,"[ORPHA:228410, ORPHA:391641, ORPHA:1101, ORPHA:1759, ORPHA:1724, None, ORPHA:391641, None, None, None, ORPHA:1880, ORPHA:363700, ORPHA:466791, ORPHA:2181, None, ORPHA:1772, ORPHA:2255, ORPHA:53647...","[SNOMED:111287006, SNOMED:49915006, SNOMED:253383003, None, None, None, SNOMED:253455004,63042009, None, None, None, None, SNOMED:48724000, SNOMED:204357006, SNOMED:60234000, None, SNOMED:72352009...","[UMLS:C0040961, UMLS:C0040963, UMLS:C0040962, UMLS:C4025753, UMLS:C4255215, None, UMLS:C0243002, None, None, None, UMLS:C4023292, UMLS:C0026266,C3551535, UMLS:C0013481, UMLS:C0003504, None, UMLS:C...",3,mitral valve regurgitation,mitral valve regurgitation,mitral valve regurgitation,"[She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation., On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mi...","[[0.40901488065719604, -0.09854041039943695, -0.21287906169891357, 0.3738410472869873, 1.4597432613372803, -0.2503955364227295, -0.29258039593696594, 0.7664100527763367, 1.0721549987792969, 0.7082...","[[-0.21964989602565765, -0.2844458520412445, -0.10418396443128586, -0.5357521772384644, -0.06646879762411118, -0.444497287273407, -0.5000978708267212, -0.5944756269454956, 0.1369660645723343, 0.04..."
0,"She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation. On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mild...",hypertension,HP,,,4,HP:0000822,0.9974,0.0,"[[HP:0005180, HP:0010446, HP:0001704, HP:0001702, HP:0030732, HP:0031444, HP:0011662, HP:0031651, HP:0031441, HP:0031443, HP:0011575, HP:0001653, HP:0010316, HP:0001659, HP:0031440, HP:0001647, HP...",...,"[ORPHA:228410, ORPHA:391641, ORPHA:1101, ORPHA:1759, ORPHA:1724, None, ORPHA:391641, None, None, None, ORPHA:1880, ORPHA:363700, ORPHA:466791, ORPHA:2181, None, ORPHA:1772, ORPHA:2255, ORPHA:53647...","[SNOMED:111287006, SNOMED:49915006, SNOMED:253383003, None, None, None, SNOMED:253455004,63042009, None, None, None, None, SNOMED:48724000, SNOMED:204357006, SNOMED:60234000, None, SNOMED:72352009...","[UMLS:C0040961, UMLS:C0040963, UMLS:C0040962, UMLS:C4025753, UMLS:C4255215, None, UMLS:C0243002, None, None, None, UMLS:C4023292, UMLS:C0026266,C3551535, UMLS:C0013481, UMLS:C0003504, None, UMLS:C...",4,hypertension,hypertension,hypertension,"[She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation., On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mi...","[[0.40901488065719604, -0.09854041039943695, -0.21287906169891357, 0.3738410472869873, 1.4597432613372803, -0.2503955364227295, -0.29258039593696594, 0.7664100527763367, 1.0721549987792969, 0.7082...","[[-0.21964989602565765, -0.2844458520412445, -0.10418396443128586, -0.5357521772384644, -0.06646879762411118, -0.444497287273407, -0.5000978708267212, -0.5944756269454956, 0.1369660645723343, 0.04..."
0,"She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation. On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mild...",bradycardia,HP,,,5,HP:0001662,0.9724,0.0,"[[HP:0005180, HP:0010446, HP:0001704, HP:0001702, HP:0030732, HP:0031444, HP:0011662, HP:0031651, HP:0031441, HP:0031443, HP:0011575, HP:0001653, HP:0010316, HP:0001659, HP:0031440, HP:0001647, HP...",...,"[ORPHA:228410, ORPHA:391641, ORPHA:1101, ORPHA:1759, ORPHA:1724, None, ORPHA:391641, None, None, None, ORPHA:1880, ORPHA:363700, ORPHA:466791, ORPHA:2181, None, ORPHA:1772, ORPHA:2255, ORPHA:53647...","[SNOMED:111287006, SNOMED:49915006, SNOMED:253383003, None, None, None, SNOMED:253455004,63042009, None, None, None, None, SNOMED:48724000, SNOMED:204357006, SNOMED:60234000, None, SNOMED:72352009...","[UMLS:C0040961, UMLS:C0040963, UMLS:C0040962, UMLS:C4025753, UMLS:C4255215, None, UMLS:C0243002, None, None, None, UMLS:C4023292, UMLS:C0026266,C3551535, UMLS:C0013481, UMLS:C0003504, None, UMLS:C...",5,bradycardia,bradycardia,bradycardia,"[She is followed by Dr. X in our office and has a history of severe tricuspid regurgitation., On 05/12/08, preserved left and right ventricular systolic function, aortic sclerosis with apparent mi...","[[0.40901488065719604, -0.09854041039943695, -0.21287906169891357, 0.3738410472869873, 1.4597432613372803, -0.2503955364227295, -0.29258039593696594, 0.7664100527763367, 1.0721549987792969, 0.7082...","[[-0.21964989602565765, -0.2844458520412445, -0.10418396443128586, -0.5357521772384644, -0.06646879762411118, -0.444497287273407, -0.5000978708267212, -0.5944756269454956, 0.1369660645723343, 0.04..."
