[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/enterprise/healthcare/EntityResolution_ICDO_SNOMED.ipynb)

<img src="https://nlp.johnsnowlabs.com/assets/images/logo.png" width="180" height="50" style="float: left;">

In [3]:
import json
import os
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession

from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
import sparknlp_jsl

In [4]:
from pyspark.sql import functions as F
import pandas as pd
pd.set_option("display.max_colwidth", 1000)

# ICD-O - SNOMED Entity Resolution - version 2.4.6

## Example for ICD-O Entity Resolution Pipeline
A common NLP problem in medical applications is to identify histology behaviour in documented cancer studies.

In this example we will use Spark-NLP to identify and resolve histology behavior expressions and resolve them to an ICD-O code.

Some cancer related clinical notes (taken from https://www.cancernetwork.com/case-studies):  
https://www.cancernetwork.com/case-studies/large-scrotal-mass-multifocal-intra-abdominal-retroperitoneal-and-pelvic-metastases  
https://oncology.medicinematters.com/lymphoma/chronic-lymphocytic-leukemia/case-study-small-b-cell-lymphocytic-lymphoma-and-chronic-lymphoc/12133054
https://oncology.medicinematters.com/lymphoma/epidemiology/central-nervous-system-lymphoma/12124056
https://oncology.medicinematters.com/lymphoma/case-study-cutaneous-t-cell-lymphoma/12129416

Note 1: Desmoplastic small round cell tumor
<div style="border:2px solid #747474; background-color: #e3e3e3; margin: 5px; padding: 10px"> 
A 35-year-old African-American man was referred to our urology clinic by his primary care physician for consultation about a large left scrotal mass. The patient reported a 3-month history of left scrotal swelling that had progressively increased in size and was associated with mild left scrotal pain. He also had complaints of mild constipation, with hard stools every other day. He denied any urinary complaints. On physical examination, a hard paratesticular mass could be palpated in the left hemiscrotum extending into the left groin, separate from the left testicle, and measuring approximately 10 × 7 cm in size. A hard, lower abdominal mass in the suprapubic region could also be palpated in the midline. The patient was admitted urgently to the hospital for further evaluation with cross-sectional imaging and blood work.

Laboratory results, including results of a complete blood cell count with differential, liver function tests, coagulation panel, and basic chemistry panel, were unremarkable except for a serum creatinine level of 2.6 mg/dL. Typical markers for a testicular germ cell tumor were within normal limits: the beta–human chorionic gonadotropin level was less than 1 mIU/mL and the alpha fetoprotein level was less than 2.8 ng/mL. A CT scan of the chest, abdomen, and pelvis with intravenous contrast was obtained, and it showed large multifocal intra-abdominal, retroperitoneal, and pelvic masses (Figure 1). On cross-sectional imaging, a 7.8-cm para-aortic mass was visualized compressing the proximal portion of the left ureter, creating moderate left hydroureteronephrosis. Additionally, three separate pelvic masses were present in the retrovesical space, each measuring approximately 5 to 10 cm at their largest diameter; these displaced the bladder anteriorly and the rectum posteriorly.

The patient underwent ultrasound-guided needle biopsy of one of the pelvic masses on hospital day 3 for definitive diagnosis. Microscopic examination of the tissue by our pathologist revealed cellular islands with oval to elongated, irregular, and hyperchromatic nuclei; scant cytoplasm; and invading fibrous tissue—as well as three mitoses per high-powered field (Figure 2). Immunohistochemical staining demonstrated strong positivity for cytokeratin AE1/AE3, vimentin, and desmin. Further mutational analysis of the cells detected the presence of an EWS-WT1 fusion transcript consistent with a diagnosis of desmoplastic small round cell tumor.
</div>

Note 2: SLL and CLL
<div style="border:2px solid #747474; background-color: #e3e3e3; margin: 5px; padding: 10px"> 
A 72-year-old man with a history of diabetes mellitus, hypertension, and hypercholesterolemia self-palpated a left submandibular lump in 2012. Complete blood count (CBC) in his internist’s office showed solitary leukocytosis (white count 22) with predominant lymphocytes for which he was referred to a hematologist. Peripheral blood flow cytometry on 04/11/12 confirmed chronic lymphocytic leukemia (CLL)/small lymphocytic lymphoma (SLL): abnormal cell population comprising 63% of CD45 positive leukocytes, co-expressing CD5 and CD23 in CD19-positive B cells. CD38 was negative but other prognostic markers were not assessed at that time. The patient was observed regularly for the next 3 years and his white count trend was as follows: 22.8 (4/2012) --> 28.5 (07/2012) --> 32.2 (12/2012) --> 36.5 (02/2013) --> 42 (09/2013) --> 44.9 (01/2014) --> 75.8 (2/2015). His other counts stayed normal until early 2015 when he also developed anemia (hemoglobin [HGB] 10.9) although platelets remained normal at 215. He had been noticing enlargement of his cervical, submandibular, supraclavicular, and axillary lymphadenopathy for several months since 2014 and a positron emission tomography (PET)/computed tomography (CT) scan done in 12/2014 had shown extensive diffuse lymphadenopathy within the neck, chest, abdomen, and pelvis. Maximum standardized uptake value (SUV max) was similar to low baseline activity within the vasculature of the neck and chest. In the abdomen and pelvis, however, there was mild to moderately hypermetabolic adenopathy measuring up to SUV of 4. The largest right neck nodes measured up to 2.3 x 3 cm and left neck nodes measured up to 2.3 x 1.5 cm. His right axillary lymphadenopathy measured up to 5.5 x 2.6 cm and on the left measured up to 4.8 x 3.4 cm. Lymph nodes on the right abdomen and pelvis measured up to 6.7 cm and seemed to have some mass effect with compression on the urinary bladder without symptoms. He underwent a bone marrow biopsy on 02/03/15, which revealed hypercellular marrow (60%) with involvement by CLL (30%); flow cytometry showed CD38 and ZAP-70 positivity; fluorescence in situ hybridization (FISH) analysis showed 13q deletion/monosomy 13; IgVH was unmutated; karyotype was 46XY.
</div>

Note 3: CNS lymphoma
<div style="border:2px solid #747474; background-color: #e3e3e3; margin: 5px; padding: 10px"> 
A 56-year-old woman began to experience vertigo, headaches, and frequent falls. A computed tomography (CT) scan of the brain revealed the presence of a 1.6 x 1.6 x 2.1 cm mass involving the fourth ventricle (Figure 14.1). A gadolinium-enhanced magnetic resonance imaging (MRI) scan confirmed the presence of the mass, and a stereotactic biopsy was performed that demonstrated a primary central nervous system lymphoma (PCNSL) with a diffuse large B-cell histology. Complete blood count (CBC), lactate dehydrogenase (LDH), and beta-2-microglobulin were normal. Systemic staging with a positron emission tomography (PET)/CT scan and bone marrow biopsy showed no evidence of lymphomatous involvement outside the CNS. An eye exam and lumbar puncture showed no evidence of either ocular or leptomeningeal involvement.
</div>

Note 4: Cutaneous T-cell lymphoma
<div style="border:2px solid #747474; background-color: #e3e3e3; margin: 5px; padding: 10px"> 
An 83-year-old female presented with a progressing pruritic cutaneous rash that started 8 years ago. On clinical exam there were numerous coalescing, infiltrated, scaly, and partially crusted erythematous plaques distributed over her trunk and extremities and a large fungating ulcerated nodule on her right thigh covering 75% of her total body surface area (Figure 10.1). Lymphoma associated alopecia and a left axillary lymphadenopathy were also noted. For the past 3–4 months she reported fatigue, severe pruritus, night sweats, 20 pounds of weight loss, and loss of appetite. 
</div>

Let's create a dataset with all four case studies

In [7]:
notes = []
notes.append("""A 35-year-old African-American man was referred to our urology clinic by his primary care physician for consultation about a large left scrotal mass. The patient reported a 3-month history of left scrotal swelling that had progressively increased in size and was associated with mild left scrotal pain. He also had complaints of mild constipation, with hard stools every other day. He denied any urinary complaints. On physical examination, a hard paratesticular mass could be palpated in the left hemiscrotum extending into the left groin, separate from the left testicle, and measuring approximately 10 × 7 cm in size. A hard, lower abdominal mass in the suprapubic region could also be palpated in the midline. The patient was admitted urgently to the hospital for further evaluation with cross-sectional imaging and blood work.
Laboratory results, including results of a complete blood cell count with differential, liver function tests, coagulation panel, and basic chemistry panel, were unremarkable except for a serum creatinine level of 2.6 mg/dL. Typical markers for a testicular germ cell tumor were within normal limits: the beta–human chorionic gonadotropin level was less than 1 mIU/mL and the alpha fetoprotein level was less than 2.8 ng/mL. A CT scan of the chest, abdomen, and pelvis with intravenous contrast was obtained, and it showed large multifocal intra-abdominal, retroperitoneal, and pelvic masses (Figure 1). On cross-sectional imaging, a 7.8-cm para-aortic mass was visualized compressing the proximal portion of the left ureter, creating moderate left hydroureteronephrosis. Additionally, three separate pelvic masses were present in the retrovesical space, each measuring approximately 5 to 10 cm at their largest diameter; these displaced the bladder anteriorly and the rectum posteriorly.
The patient underwent ultrasound-guided needle biopsy of one of the pelvic masses on hospital day 3 for definitive diagnosis. Microscopic examination of the tissue by our pathologist revealed cellular islands with oval to elongated, irregular, and hyperchromatic nuclei; scant cytoplasm; and invading fibrous tissue—as well as three mitoses per high-powered field (Figure 2). Immunohistochemical staining demonstrated strong positivity for cytokeratin AE1/AE3, vimentin, and desmin. Further mutational analysis of the cells detected the presence of an EWS-WT1 fusion transcript consistent with a diagnosis of desmoplastic small round cell tumor.""")
notes.append("""A 72-year-old man with a history of diabetes mellitus, hypertension, and hypercholesterolemia self-palpated a left submandibular lump in 2012. Complete blood count (CBC) in his internist’s office showed solitary leukocytosis (white count 22) with predominant lymphocytes for which he was referred to a hematologist. Peripheral blood flow cytometry on 04/11/12 confirmed chronic lymphocytic leukemia (CLL)/small lymphocytic lymphoma (SLL): abnormal cell population comprising 63% of CD45 positive leukocytes, co-expressing CD5 and CD23 in CD19-positive B cells. CD38 was negative but other prognostic markers were not assessed at that time. The patient was observed regularly for the next 3 years and his white count trend was as follows: 22.8 (4/2012) --> 28.5 (07/2012) --> 32.2 (12/2012) --> 36.5 (02/2013) --> 42 (09/2013) --> 44.9 (01/2014) --> 75.8 (2/2015). His other counts stayed normal until early 2015 when he also developed anemia (hemoglobin [HGB] 10.9) although platelets remained normal at 215. He had been noticing enlargement of his cervical, submandibular, supraclavicular, and axillary lymphadenopathy for several months since 2014 and a positron emission tomography (PET)/computed tomography (CT) scan done in 12/2014 had shown extensive diffuse lymphadenopathy within the neck, chest, abdomen, and pelvis. Maximum standardized uptake value (SUV max) was similar to low baseline activity within the vasculature of the neck and chest. In the abdomen and pelvis, however, there was mild to moderately hypermetabolic adenopathy measuring up to SUV of 4. The largest right neck nodes measured up to 2.3 x 3 cm and left neck nodes measured up to 2.3 x 1.5 cm. His right axillary lymphadenopathy measured up to 5.5 x 2.6 cm and on the left measured up to 4.8 x 3.4 cm. Lymph nodes on the right abdomen and pelvis measured up to 6.7 cm and seemed to have some mass effect with compression on the urinary bladder without symptoms. He underwent a bone marrow biopsy on 02/03/15, which revealed hypercellular marrow (60%) with involvement by CLL (30%); flow cytometry showed CD38 and ZAP-70 positivity; fluorescence in situ hybridization (FISH) analysis showed 13q deletion/monosomy 13; IgVH was unmutated; karyotype was 46XY.""")
notes.append("A 56-year-old woman began to experience vertigo, headaches, and frequent falls. A computed tomography (CT) scan of the brain revealed the presence of a 1.6 x 1.6 x 2.1 cm mass involving the fourth ventricle (Figure 14.1). A gadolinium-enhanced magnetic resonance imaging (MRI) scan confirmed the presence of the mass, and a stereotactic biopsy was performed that demonstrated a primary central nervous system lymphoma (PCNSL) with a diffuse large B-cell histology. Complete blood count (CBC), lactate dehydrogenase (LDH), and beta-2-microglobulin were normal. Systemic staging with a positron emission tomography (PET)/CT scan and bone marrow biopsy showed no evidence of lymphomatous involvement outside the CNS. An eye exam and lumbar puncture showed no evidence of either ocular or leptomeningeal involvement.") 
notes.append("An 83-year-old female presented with a progressing pruritic cutaneous rash that started 8 years ago. On clinical exam there were numerous coalescing, infiltrated, scaly, and partially crusted erythematous plaques distributed over her trunk and extremities and a large fungating ulcerated nodule on her right thigh covering 75% of her total body surface area (Figure 10.1). Lymphoma associated alopecia and a left axillary lymphadenopathy were also noted. For the past 3–4 months she reported fatigue, severe pruritus, night sweats, 20 pounds of weight loss, and loss of appetite.")

# Notes column names

docid_col         = "doc_id"
note_col          = "text_feed"

data = spark.createDataFrame([(i,n.lower()) for i,n in enumerate(notes)]).toDF(docid_col, note_col)

And let's build a SparkNLP pipeline with the following stages:
- DocumentAssembler: Entry annotator for our pipelines; it creates the data structure for the Annotation Framework
- SentenceDetector: Annotator to pragmatically separate complete sentences inside each document
- Tokenizer: Annotator to separate sentences in tokens (generally words)
- WordEmbeddings: Vectorization of word tokens, in this case using word embeddings trained from PubMed, ICD10 and other clinical resources.
- EntityResolver: Annotator that performs search for the KNNs, in this case trained from ICDO Histology Behavior.

In order to find cancer related chunks, we are going to use a pretrained Search Trie wrapped up in our TextMatcher Annotator; and to identify treatments/procedures we are going to use our good old NER.

- NerDLModel: TensorFlow based Named Entity Recognizer, trained to extract PROBLEMS, TREATMENTS and TESTS
- NerConverter: Chunk builder out of tokens tagged by the Ner Model

In [10]:
docAssembler = DocumentAssembler().setInputCol(note_col).setOutputCol("document")

sentenceDetector = SentenceDetector().setInputCols("document").setOutputCol("sentence")

tokenizer = Tokenizer().setInputCols("sentence").setOutputCol("token")

#Working on adjusting WordEmbeddingsModel to work with the subset of matched tokens
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols("sentence", "token")\
    .setOutputCol("word_embeddings")

In [11]:
icdo_ner = NerDLModel.pretrained("ner_bionlp", "en", "clinical/models")\
    .setInputCols("sentence", "token", "word_embeddings")\
    .setOutputCol("icdo_ner")

icdo_chunk = NerConverter().setInputCols("sentence","token","icdo_ner").setOutputCol("icdo_chunk").setWhiteList(["Cancer"])

icdo_chunk_embeddings = ChunkEmbeddings()\
    .setInputCols("icdo_chunk", "word_embeddings")\
    .setOutputCol("icdo_chunk_embeddings")

icdo_chunk_resolver = ChunkEntityResolverModel.pretrained("chunkresolve_icdo_clinical", "en", "clinical/models")\
    .setInputCols("token","icdo_chunk_embeddings")\
    .setOutputCol("tm_icdo_code")

In [12]:
clinical_ner = NerDLModel.pretrained("ner_clinical", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "word_embeddings"]) \
  .setOutputCol("ner")

ner_converter = NerConverter() \
  .setInputCols(["sentence", "token", "ner"]) \
  .setOutputCol("ner_chunk").setWhiteList(["PROBLEM"])

ner_chunk_tokenizer = ChunkTokenizer()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_token")

ner_chunk_embeddings = ChunkEmbeddings()\
    .setInputCols("ner_chunk", "word_embeddings")\
    .setOutputCol("ner_chunk_embeddings")

In [13]:
ner_snomed_resolver = \
    ChunkEntityResolverModel.pretrained("chunkresolve_snomed_findings_clinical","en","clinical/models")\
    .setInputCols("ner_token","ner_chunk_embeddings").setOutputCol("snomed_result")\
    .setEnableWmd(True).setEnableTfidf(True).setEnableJaccard(True)\
    .setCaseSensitive(False).setDistanceWeights([1,7,7,0,0,0]).setExtramassPenalty(1).setNeighbours(30).setAllDistancesMetadata(True)

In [14]:
pipelineFull = Pipeline().setStages([
    docAssembler, 
    sentenceDetector, 
    tokenizer, 
    word_embeddings, 
    
    clinical_ner, 
    ner_converter, 
    ner_chunk_embeddings,
    ner_chunk_tokenizer,
    ner_snomed_resolver,
    
    icdo_ner,
    icdo_chunk,
    icdo_chunk_embeddings, 
    icdo_chunk_resolver
])

Let's train our Pipeline and make it ready to start transforming

In [16]:
pipelineModelFull = pipelineFull.fit(data)

In [17]:
output = pipelineModelFull.transform(data).cache()

### EntityResolver:  
Trained on an augmented ICDO Dataset from JSL Data Market it provides histology codes resolution for the matched expressions. Other than providing the code in the "result" field it provides more metadata about the matching process:  

- target_text -> Text to resolve
- resolved_text -> Best match text
- confidence -> Relative confidence for the top match (distance to probability)
- confidence_ratio -> Relative confidence for the top match. TopMatchConfidence / SecondMatchConfidence
- alternative_codes -> List of other plausible codes (in the KNN neighborhood)
- alternative_confidence_ratios -> Rest of confidence ratios
- all_k_results -> All resolved codes for metrics calculation purposes
- sentence -> SentenceId
- chunk -> ChunkId

In [20]:
def quick_metadata_analysis(df, doc_field, chunk_field, code_fields):
    code_res_meta = ", ".join([f"{cf}.metadata" for cf in code_fields])
    expression = f"explode(arrays_zip({chunk_field}.begin, {chunk_field}.end, {chunk_field}.result, {chunk_field}.metadata, "+code_res_meta+")) as a"
    top_n_rest = [(f"float(a['{i+4}'].confidence) as {(cf.split('_')[0])}_conf",
                    f"arrays_zip(split(a['{i+4}'].all_k_results,':::'),split(a['{i+4}'].all_k_resolutions,':::')) as {cf.split('_')[0]+'_opts'}")
                    for i, cf in enumerate(code_fields)]
    top_n_rest_args = []
    for tr in top_n_rest:
        for t in tr:
            top_n_rest_args.append(t)
    return df.selectExpr(doc_field, expression) \
        .orderBy(docid_col, F.expr("a['0']"), F.expr("a['1']"))\
        .selectExpr(f"concat_ws('::',{doc_field},a['0'],a['1']) as coords", "a['2'] as chunk","a['3'].entity as entity", *top_n_rest_args)

In [21]:
icdo = \
quick_metadata_analysis(output, docid_col, "icdo_chunk",["tm_icdo_code"]).toPandas()

In [22]:
snomed = \
quick_metadata_analysis(output, docid_col, "ner_chunk",["snomed_result"]).toPandas()

In [24]:
icdo

Unnamed: 0,coords,chunk,entity,tm_conf,tm_opts
0,0::448::461,paratesticular,Cancer,0.0555,"[(9540/3, Malignant peripheral nerve sheath tumor), (8815/0, Solitary fibrous tumor), (9473/3, Primitive neuroectodermal tumor), (9432/1, Pituicytoma), (9161/1, Hemangioblastoma), (8933/3, Adenosarcoma), (9502/3, Teratoid medulloepithelioma), (9150/1, Hemangiopericytoma, NOS), (8815/3, Solitary fibrous tumor, malignant), (9150/3, Hemangiopericytoma, malignant), (9150/0, Hemangiopericytoma, benign), (9063/3, Spermatocytic seminoma), (8630/3, Androblastoma, malignant), (8990/3, Mesenchymoma, malignant), (8470/2, Mucinous cystadenocarcinoma, non-invasive), (9230/3, Chondroblastoma, malignant), (9362/3, Pineoblastoma), (9170/3, Lymphangiosarcoma), (8896/3, Myxoid leiomyosarcoma), (8891/3, Epithelioid leiomyosarcoma), (8840/3, Myxosarcoma), (9361/1, Pineocytoma), (8632/3, Gynandroblastoma, malignant), (9120/3, Hemangiosarcoma), (9130/3, Hemangioendothelioma, malignant)]"
1,0::1078::1103,testicular germ cell tumor,Cancer,0.0635,"[(9085/3, Mixed germ cell tumor), (9065/3, Germ cell tumor, nonseminomatous), (9086/3, Germ cell tumors with associated hematological malignancy), (8242/3, Enterochromaffin-like cell tumor, malignant), (8550/3, Acinar cell carcinoma), (8630/3, Androblastoma, malignant), (8621/3, Granulosa cell-theca cell tumor, mal.), (8620/3, Granulosa cell tumor, malignant), (8005/0, Clear cell tumor, NOS), (8150/3, Islet cell carcinoma), (8151/3, Insulinoma, malignant), (8152/3, Glucagonoma, malignant), (9380/3, Glioma, malignant), (8155/3, Vipoma), (8085/3, Squamous cell carcinoma, HPV-positive), (8086/3, Squamous cell carcinoma, HPV-negative), (8247/3, Merkel cell carcinoma), (9731/3, Plasmacytoma, NOS), (9740/3, Mast cell sarcoma), (9741/3, Malignant mastocytosis), (8670/3, Steroid cell tumor, malignant), (8631/3, Sertoli-Leydig cell tumor, poorly differentiated), (8590/3, Ovarian stromal tumor, mal.), (9734/3, Plasmacytoma, extramedullary), (8153/3, Gastrinoma, malignant)]"
2,0::1632::1644,pelvic masses,Cancer,0.0513,"[(8312/3, Renal cell carcinoma), (8162/3, Klatskin tumor), (9080/0, Teratoma, benign), (9072/3, Polyembryoma), (8000/0, Neoplasm, benign), (8000/3, Neoplasm, malignant), (9170/3, Lymphangiosarcoma), (9350/1, Craniopharyngioma), (8155/3, Vipoma), (8243/3, Goblet cell carcinoid), (9533/0, Psammomatous meningioma), (9490/0, Ganglioneuroma), (9492/0, Gangliocytoma), (8972/3, Pulmonary blastoma), (9131/0, Capillary hemangioma), (9364/3, Peripheral neuroectodermal tumor), (8811/3, Fibromyxosarcoma), (8501/2, Comedocarcinoma, non-infiltrating), (9130/0, Hemangioendothelioma, benign), (8780/3, Blue nevus, malignant), (8815/0, Solitary fibrous tumor), (8390/3, Skin appendage carcinoma), (9352/1, Papillary craniopharyngioma), (9351/1, Adamantinomatous craniopharyngioma), (9831/3, T-cell large granular lymphocytic leukemia)]"
3,0::2429::2463,desmoplastic small round cell tumor,Cancer,0.108,"[(8806/3, Desmoplastic small round cell tumor), (8002/3, Malignant tumor, small cell type), (8853/3, Round cell liposarcoma), (8852/3, Myxoid liposarcoma), (8042/3, Oat cell carcinoma), (9185/3, Small cell osteosarcoma), (8803/3, Small cell sarcoma), (8920/3, Alveolar rhabdomyosarcoma), (8247/3, Merkel cell carcinoma), (8514/3, Duct carcinoma, desmoplastic type), (8045/3, Combined small cell carcinoma), (8650/3, Leydig cell tumor, malignant), (9560/3, Neurilemmoma, malignant), (8005/0, Clear cell tumor, NOS), (9740/3, Mast cell sarcoma), (9741/3, Malignant mastocytosis), (9380/3, Glioma, malignant), (8621/3, Granulosa cell-theca cell tumor, mal.), (8156/3, Somatostatinoma, malignant), (8044/3, Small cell carcinoma, intermediate cell), (8620/3, Granulosa cell tumor, malignant), (8005/3, Malignant tumor, clear cell type), (8003/3, Malignant tumor, giant cell type), (9560/0, Neurilemoma, NOS), (9580/0, Granular cell tumor, NOS)]"
4,1::370::397,chronic lymphocytic leukemia,Cancer,0.0594,"[(9729/3, Precursor T-cell lymphoblastic lymphoma), (9805/3, Acute biphenotypic leukemia), (9823/3, Chronic lymphocytic leukemia/small lymphocytic lymphoma), (9835/3, Precursor cell lymphoblastic leukemia, NOS), (9946/3, Juvenile myelomonocytic leukemia), (9963/3, Chronic neutrophilic leukemia), (9826/3, Burkitt cell leukemia), (9820/3, Lymphoid leukemia, NOS), (9863/3, Chronic myeloid leukemia, NOS), (9875/3, Chronic myelogenous leukemia, BCR/ABL positive), (9831/3, T-cell large granular lymphocytic leukemia), (9828/3, Acute lymphoblastic leukemia, L2 type, NOS), (9945/3, Chronic myelomonocytic leukemia, NOS), (9861/3, Acute myeloid leukemia), (9836/3, Precursor B-cell lymphoblastic leukemia), (9728/3, Precursor B-cell lymphoblastic lymphoma), (9867/3, Acute myelomonocytic leukemia), (9870/3, Acute basophilic leukemia), (9891/3, Acute monocytic leukemia), (9910/3, Acute megakaryoblastic leukemia), (9896/3, Acute myeloid leukemia, t(8;21)(q22;q22)), (9874/3, Acute myeloid leukemia ..."
5,1::399::399,(,Cancer,0.0514,"[(9591/3, Malignant lymphoma, non-Hodgkin), (9445/3, Glioblastoma, IDH-mutant), (8010/3, Carcinoma, NOS), (8011/3, Epithelioma, malignant), (8000/3, Neoplasm, malignant), (8000/0, Neoplasm, benign), (9590/3, Malignant lymphoma, NOS), (9651/3, Hodgkin lymphoma, lymphocyte-rich), (9800/3, Leukemia, NOS), (9827/3, Adult T-cell leukemia/lymphoma (HTLV-1 pos.)), (8155/3, Vipoma), (9440/3, Glioblastoma, NOS), (8700/3, Pheochromocytoma), (9500/3, Neuroblastoma, NOS), (9490/0, Ganglioneuroma), (9492/0, Gangliocytoma), (8540/3, Paget disease, mammary), (9896/3, Acute myeloid leukemia, t(8;21)(q22;q22)), (9814/3, Leukemia/lymphoma with t(12;21)(p13;q22);TEL-AML1(ETV6-RUNX1)), (9940/3, Hairy cell leukemia), (8042/3, Oat cell carcinoma), (8743/3, Superficial spreading melanoma), (9740/3, Mast cell sarcoma), (9741/3, Malignant mastocytosis), (9140/3, Kaposi sarcoma)]"
6,1::411::430,lymphocytic lymphoma,Cancer,0.0635,"[(9673/3, Mantle cell lymphoma), (9823/3, Chronic lymphocytic leukemia/small lymphocytic lymphoma), (9761/3, Waldenstrom macroglobulinemia), (9690/3, Follicular lymphoma, NOS), (9591/3, Malignant lymphoma, non-Hodgkin), (9653/3, Hodgkin lymphoma, lymphocytic deplet., NOS), (9701/3, Sezary syndrome), (9764/3, Immunoproliferative small intestinal disease), (9695/3, Follicular lymphoma, grade 1), (9691/3, Follicular lymphoma, grade 2), (9698/3, Follicular lymphoma, grade 3), (9651/3, Hodgkin lymphoma, lymphocyte-rich), (9729/3, Precursor T-cell lymphoblastic lymphoma), (9805/3, Acute biphenotypic leukemia), (9735/3, Plasmablastic lymphoma), (9755/3, Histiocytic sarcoma), (9750/3, Malignant histiocytosis), (9705/3, Angioimmunoblastic T-cell lymphoma), (9725/3, Hydroa vacciniforme-like lymphoma), (9826/3, Burkitt cell leukemia), (9820/3, Lymphoid leukemia, NOS), (9687/3, Burkitt lymphoma, NOS), (9717/3, Intestinal T-cell lymphoma), (9835/3, Precursor cell lymphoblastic leukemia, NOS), (..."
7,2::386::416,central nervous system lymphoma,Cancer,0.0672,"[(9501/3, Medulloepithelioma, NOS), (9475/3, Medulloblastoma, WNT-activated), (9477/3, Medulloblastoma, non-WNT/non-SHH), (9350/1, Craniopharyngioma), (9470/3, Medulloblastoma, NOS), (9450/3, Oligodendroglioma, NOS), (9451/3, Oligodendroglioma, anaplastic), (9476/3, Medulloblastoma, SHH-activated and TP53-mutant), (9186/3, Central osteosarcoma), (9591/3, Malignant lymphoma, non-Hodgkin), (9701/3, Sezary syndrome), (9590/3, Malignant lymphoma, NOS), (9702/3, Mature T-cell lymphoma, NOS), (9673/3, Mantle cell lymphoma), (9719/3, NK/T-cell lymphoma, nasal and nasal-type), (9726/3, Primary Cutaneous gamma-delta T-cell lymphoma), (9651/3, Hodgkin lymphoma, lymphocyte-rich), (9709/3, Cutaneous T-cell lymphoma, NOS), (9596/3, Composite Hodgkin and non-Hodgkin lymphoma), (9663/3, Hodgkin lymphoma, nodular sclerosis, NOS), (9712/3, Intravascular large B-cell lymphoma), (9738/3, Lrg B-cell lymphoma in HHV8-assoc. multicentric Castleman DZ), (8380/3, Endometrioid carcinoma), (9655/3, Hodgkin ..."
8,2::419::423,pcnsl,Cancer,0.064,"[(9679/3, Mediastinal large B-cell lymphoma), (9764/3, Immunoproliferative small intestinal disease), (9725/3, Hydroa vacciniforme-like lymphoma), (9755/3, Histiocytic sarcoma), (9750/3, Malignant histiocytosis), (9673/3, Mantle cell lymphoma), (9701/3, Sezary syndrome), (9591/3, Malignant lymphoma, non-Hodgkin), (9705/3, Angioimmunoblastic T-cell lymphoma), (9761/3, Waldenstrom macroglobulinemia), (9735/3, Plasmablastic lymphoma), (9362/3, Pineoblastoma), (9364/3, Peripheral neuroectodermal tumor), (8011/3, Epithelioma, malignant), (8000/3, Neoplasm, malignant), (9662/3, Hodgkin sarcoma [obs]), (9754/3, Langerhans cell histiocytosis, disseminated), (8162/3, Klatskin tumor), (9380/3, Glioma, malignant), (9170/3, Lymphangiosarcoma), (9396/3, Ependymoma, RELA fusion-positive), (9431/1, Angiocentric glioma), (9700/3, Mycosis fungoides), (9726/3, Primary Cutaneous gamma-delta T-cell lymphoma), (9836/3, Precursor B-cell lymphoblastic leukemia)]"
9,3::373::380,lymphoma,Cancer,0.0768,"[(9591/3, Malignant lymphoma, non-Hodgkin), (9673/3, Mantle cell lymphoma), (9651/3, Hodgkin lymphoma, lymphocyte-rich), (9701/3, Sezary syndrome), (9764/3, Immunoproliferative small intestinal disease), (9755/3, Histiocytic sarcoma), (9750/3, Malignant histiocytosis), (9717/3, Intestinal T-cell lymphoma), (9705/3, Angioimmunoblastic T-cell lymphoma), (9735/3, Plasmablastic lymphoma), (9761/3, Waldenstrom macroglobulinemia), (9708/3, Subcutaneous panniculitis-like T-cell lymphoma), (9725/3, Hydroa vacciniforme-like lymphoma), (9590/3, Malignant lymphoma, NOS), (9835/3, Precursor cell lymphoblastic leukemia, NOS), (9702/3, Mature T-cell lymphoma, NOS), (9709/3, Cutaneous T-cell lymphoma, NOS), (9687/3, Burkitt lymphoma, NOS), (9836/3, Precursor B-cell lymphoblastic leukemia), (9729/3, Precursor T-cell lymphoblastic lymphoma), (9728/3, Precursor B-cell lymphoblastic lymphoma), (9726/3, Primary Cutaneous gamma-delta T-cell lymphoma), (9719/3, NK/T-cell lymphoma, nasal and nasal-type),..."


In [25]:
snomed

Unnamed: 0,coords,chunk,entity,snomed_conf,snomed_opts
0,0::123::147,a large left scrotal mass,PROBLEM,0.0750,"[(15634751000119101, Mass of left ovary), (12240181000119103, Mass in left breast), (10682191000119102, Mass of skin of left foot), (10692461000119101, Mass of skin of left thumb), (10682271000119109, Mass of skin of left hand), (10680231000119105, Mass of skin of left lower leg), (10682391000119100, Mass of subcutaneous tissue of left foot), (10682311000119109, Mass of skin of left lower limb), (10692501000119101, Mass of subcutaneous tissue of left thumb), (10682431000119105, Mass of subcutaneous tissue of left lower extremity), (10679791000119100, Mass of subcutaneous tissue of left upper limb), (312355005, On examination - left lower abdominal mass), (457311000124106, Mass in central portion of left breast), (10682471000119108, Mass of subcutaneous tissue of left lower leg), (28640001000004103, Mass of left lower quadrant of abdominal wall), (39261000119104, Left lower quadrant abdominal swelling, mass, or lump), (53929009, Mass of scrotum), (12240221000119106, Mass in right br..."
1,0::192::212,left scrotal swelling,PROBLEM,0.1953,"[(390919006, On examination - left scrotal swelling), (12242351000119109, Swelling of left arm), (442648006, Swelling of left tonsil), (762916009, Swelling of left foot), (10692081000119100, Localised swelling of left thumb), (10679271000119108, Localised swelling of left forearm), (10678471000119104, Localised swelling of left foot), (390918003, On examination - right scrotal swelling), (10678711000119102, Localised swelling of left lower leg), (438457000, Swelling of testicle), (60728008, Swelling of abdomen), (441974004, Swelling of buttock), (271687003, Swelling of scrotum), (762915008, Swelling of right foot), (12242391000119104, Swelling of right arm), (15633761000119102, Swelling of bilateral extremities), (442664003, Swelling of right tonsil), (10678431000119102, Localised swelling of right foot), (10692131000119100, Localised swelling of right thumb), (10679191000119101, Localised swelling of right forearm), (10678631000119104, Localised swelling of right lower leg), (3014..."
2,0::223::253,progressively increased in size,PROBLEM,0.3513,"[(15454001, Increased size), (88673001, Increased size of penis), (249771001, Finding of change in ring size), (248824009, Nipples unequal in size (finding)), (248825005, Areolas unequal in size (finding)), (53311008, Normal variation in size (finding)), (251809001, Increased size of penis in breadth and development of glans, scrotum enlarged and darkened), (19167000, Inequality in size of kidneys (finding)), (19776001, Decreased size), (129725004, Decrease in size since previous mammogram (finding)), (129724000, Increase in size since previous mammogram (finding)), (276333003, Reduced size of penis), (251810006, Genitalia adult in size and shape, testes average size 15cm volume), (428586009, Heart rate increased, within normal range), (61515005, Abnormal increase in number (finding)), (247011006, Pupil size and shape normal (finding)), (396503002, pT3b: Tumor more than 5 cm in greatest dimension, limited to dermis and greater than 2 mm in thickness, but not more than 6 mm in thick..."
3,0::279::300,mild left scrotal pain,PROBLEM,0.1117,"[(722829006, Acute scrotal pain), (316851000119102, Pain of left wrist), (316751000119107, Pain in left foot), (316761000119109, Pain of left forearm), (316821000119105, Pain of left thigh), (287047008, Pain in left leg), (1076751000119100, Pain in left thumb), (1076811000119109, Pain of left heel), (16002911000119108, Chronic pain of left foot), (316801000119101, Pain of left lower leg), (16675301000119100, Pain of left testicle), (285387005, Left sided abdominal pain), (16442141000119109, Periorbital pain of left eye), (301368006, Left hypochondrial pain), (274278000, Complaining of left iliac fossa pain), (20502007, Pain of scrotum), (301367001, Right hypochondrial pain), (316961000119107, Pain of right thigh), (16675251000119106, Pain of bilateral testicles), (16675201000119107, Pain of right testicle), (1076791000119105, Pain in right heel), (287048003, Pain in right leg), (16002871000119105, Chronic pain of right foot), (15749801000119104, Bilateral chronic pain of feet), (30..."
4,0::329::345,mild constipation,PROBLEM,0.0858,"[(197118003, Functional constipation), (40196000, Mild pain), (70997004, Mild anxiety), (111360009, Intractable constipation), (304213008, Mild pyrexia), (427679007, Mild intermittent asthma), (301380003, Mild present pain), (426979002, Mild persistent asthma), (58230007, Intermittent constipation pattern), (444851000124109, Very mild pain), (331987008, Mild dietary indigestion), (10676071000119109, Mild persistent asthma controlled), (31499008, Chronic constipation with overflow), (10676191000119106, Mild persistent asthma uncontrolled), (707511009, Uncomplicated mild persistent asthma), (191973007, Psychogenic constipation (disorder)), (707445000, Exacerbation of mild persistent asthma), (276646003, Idiopathic infantile hypercalcaemia - mild form), (765756007, Benign infantile seizure with mild gastroenteritis syndrome), (707981009, Acute severe exacerbation of mild persistent asthma), (275297005, Diarrhea and vomiting, symptom), (409587002, Severe diarrhea), (76948002, Severe pa..."
5,0::353::363,hard stools,PROBLEM,0.9596,"[(75295004, Hard stools), (398032003, Loose stools), (35064005, Black stools), (160588008, Drinker of hard liquor), (70396004, Clay-colored stools), (18425006, Passage of rice water stools), (160590009, Drinks beer and hard liquor), (306776007, Does swallow soft foods), (306775006, Unable to swallow soft foods), (2901004, Black, tarry stool), (269899009, Faeces colour: tarry), (249626001, Pale faeces symptom), (449201000124107, Creamy stool), (27731006, Soft stool), (271865009, Pus in stool), (449191000124109, Frothy stool), (449181000124106, Seedy stool), (306774005, Able to swallow soft foods)]"
6,0::396::413,urinary complaints,PROBLEM,0.0796,"[(249274008, Urinary symptoms), (170877009, Urinary symptom change), (15803009, Urinary bladder pain), (301395001, Urinary tract tenderness), (247382002, Urinary tract pain), (129853007, Total urinary incontinence), (129847007, Functional urinary incontinence), (450841000, Intermittent urinary incontinence), (236667007, Psychogenic urinary incontinence), (461191000124104, Daily urinary incontinence), (236665004, Postural urinary incontinence), (5972002, Urinary hesitation), (236666003, Dependency urinary incontinence), (41368006, Disorder of urinary tract), (16580691000119107, Chronic urinary bladder pain), (236717007, Upper urinary tract hematuria), (444620007, Male urinary stress incontinence), (307541003, Lower urinary tract symptoms), (42643001, Disorder of urinary bladder), (60241006, Female urinary stress incontinence), (129692003, Risk for urge urinary incontinence), (165232002, Incontinence of urine), (87557004, Urge incontinence of urine), (236663006, Orgasmic incontinence..."
7,0::441::466,a hard paratesticular mass,PROBLEM,0.1410,"[(6370001000004104, Mass of hard palate), (102031000119109, Paratesticular mass (disorder)), (163292001, On examination - abdominal mass - hard), (163293006, On examination - abdominal mass-very hard), (6480001000004101, Mass of pleura), (53929009, Mass of scrotum), (289477004, Mass of vulva), (87860000, Mass of testicle), (94147001, Mass of mediastinum), (69559004, Mass of retroperitoneal structure), (126806005, Neoplasm of hard palate), (444905003, Mass of soft tissue), (92129006, Benign tumor of hard palate), (94324007, Secondary malignant neoplasm of hard palate), (163291008, On examination - abdominal mass - soft), (163485008, On examination - breast lump hard), (274748008, Localised swelling, mass and lump, trunk), (274747003, Localised swelling, mass and lump, neck), (274750000, Localised swelling, mass and lump, upper limb), (187666008, Malignant neoplasm of junction of hard and soft palate), (188023004, Malignant neoplasm of connective and soft tissue of sacrum or coccyx),..."
8,0::621::673,"a hard, lower abdominal mass in the suprapubic region",PROBLEM,0.0922,"[(39261000119104, Left lower quadrant abdominal swelling, mass, or lump (finding)), (163307004, On examination - abdominal mass - lower border defined), (312355005, On examination - left lower abdominal mass (finding)), (28640001000004103, Mass of left lower quadrant of abdominal wall (disorder)), (274745006, Localized swelling, mass and lump, lower limb (finding)), (274719002, Intra-abdominal and pelvic swelling, mass and lump (finding)), (438512007, Abdominal rigidity of periumbilical region (finding)), (11718111000119106, Mass of soft tissue of left lower limb (finding)), (11718061000119105, Mass of soft tissue of right lower limb (finding)), (289352001, Ballottement of fetal head in suprapubic area (finding)), (10682951000119109, Mass of subcutaneous tissue of right lower leg (finding)), (274747003, Localized swelling, mass and lump, neck (finding)), (16027031000119107, Mass of soft tissue of bilateral lower limbs (finding)), (163282009, On examination - abdominal mass fills ha..."
9,0::1076::1103,a testicular germ cell tumor,PROBLEM,0.1045,"[(237059008, Germ cell tumor of ovary), (713577007, Germ cell tumor of testis), (773283006, Cervical malignant germ cell tumor), (770686005, Vaginal germ cell malignant tumor), (713646001, Malignant germ cell tumor of testis), (254869000, Malignant germ cell tumor of ovary), (107691000119101, Nonseminomatous germ cell tumor of testis), (254873002, Benign germ cell tumor of ovary), (429565004, Germ cell tumor of the brain), (10737861000119101, Malignant germ cell tumor of right ovary), (10737911000119105, Malignant germ cell tumor of left ovary), (277508009, Pineal germ cell tumour), (278055006, Malignant Leydig cell tumor of testis), (702405001, Malignant granulosa cell tumor of testis), (278057003, Sertoli cell tumor of testis), (67871000119105, Leydig cell neoplasm of testis)]"
