![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/05.8.Resolving_Medical_Terms_to_Terminology_Codes_Directly.ipynb)

In this notebook, you will find how to optimize the process to get `SentenceEntityResolverModel` model outputs. As the first step, we will extract Named Entities related to the resolver model concept, then create a 3-stage pipeline with resolver models and get the resolutions of the extracted entities.

## Colab Setup

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.settings.enforce_versions=True
nlp.install(refresh_install=True)

In [None]:
from johnsnowlabs import nlp, medical
import functools
import pandas as pd
import numpy as np
from scipy import spatial
from pyspark.sql.types import StringType
import pyspark.sql.functions as F

# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

In [5]:
spark

## Healthcare NLP for Data Scientists Course

If you are not familiar with the components in this notebook, you can check [Healthcare NLP for Data Scientists Udemy Course](https://www.udemy.com/course/healthcare-nlp-for-data-scientists/) and the [MOOC Notebooks](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP) for each components.

## Data

We will use MT Samples dataset to extract the entities and map their corresponding ICD-10-CM codes in this example.

In [6]:
# Downloading sample datasets.
! wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/data/mt_samples_10.csv

In [7]:
mt_samples_df = spark.read.csv("mt_samples_10.csv", header=True, multiLine=True)

mt_samples_df.show()

+-----+--------------------+
|index|                text|
+-----+--------------------+
|    0|Sample Type / Med...|
|    1|Sample Type / Med...|
|    2|Sample Type / Med...|
|    3|Sample Type / Med...|
|    4|Sample Type / Med...|
|    5|Sample Type / Med...|
|    6|Sample Type / Med...|
|    7|Sample Type / Med...|
|    8|Sample Type / Med...|
|    9|Sample Type / Med...|
+-----+--------------------+



Let's check how the data looks like.

In [8]:
print(mt_samples_df.limit(1).collect()[0]['text'])

Sample Type / Medical Specialty:
Hematology - Oncology
Sample Name:
Discharge Summary - Mesothelioma - 1
Description:
Mesothelioma, pleural effusion, atrial fibrillation, anemia, ascites, esophageal reflux, and history of deep venous thrombosis.
(Medical Transcription Sample Report)
PRINCIPAL DIAGNOSIS:
Mesothelioma.
SECONDARY DIAGNOSES:
Pleural effusion, atrial fibrillation, anemia, ascites, esophageal reflux, and history of deep venous thrombosis.
PROCEDURES
1. On August 24, 2007, decortication of the lung with pleural biopsy and transpleural fluoroscopy.
2. On August 20, 2007, thoracentesis.
3. On August 31, 2007, Port-A-Cath placement.
HISTORY AND PHYSICAL:
The patient is a 41-year-old Vietnamese female with a nonproductive cough that started last week. She has had right-sided chest pain radiating to her back with fever starting yesterday. She has a history of pericarditis and pericardectomy in May 2006 and developed cough with right-sided chest pain, and went to an urgent care cen

## Clinical NER Pipeline (with pretrained models)

The entities we will feed to the resolver model, should be related to the concept. So we should create a robust pipeline for entity extraction using the NER models. Here we just use the `ner_jsl` model which has more than 80 different labels, by filtering the related labels with the ICD-10-CM concept. But you can use any NER models together in the same NER pipeline, and merge their results to get a single chunk using `ChunkMergeApproach` annotator.

To speed-up the process, we will use `.transform` method for entity extraction. In this way, we can repartition the dataset according to the resources we have and get the results faster.

In [9]:
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") \
    .setInputCols(["document"]) \
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

jsl_ner = medical.NerModel.pretrained("ner_jsl", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("clinical_ner")

jsl_ner_converter = medical.NerConverterInternal() \
    .setInputCols(["sentence", "token", "clinical_ner"]) \
    .setOutputCol("clinical_ner_chunk")\
    .setWhiteList(['Cerebrovascular_Disease',
                   'Communicable_Disease', 'Diabetes',
                   'Disease_Syndrome_Disorder',
                   'EKG_Findings', 'Heart_Disease',
                   'Hyperlipidemia', 'Hypertension',
                   'ImagingFindings', 'Injury_or_Poisoning',
                   'Kidney_Disease', 'Obesity', 'Oncological',
                   'Overweight', 'Pregnancy',
                   'Psychological_Condition', 'Symptom', 'VS_Finding'])

jsl_ner_pipeline = nlp.Pipeline(
    stages=[
      documentAssembler,
      sentenceDetector,
      tokenizer,
      word_embeddings,
      jsl_ner,
      jsl_ner_converter])

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_jsl download started this may take some time.
Approximate size to download 14.5 MB
[OK!]


Now we will fit and transform our data on the NER pipeline.

In [10]:
result = jsl_ner_pipeline.fit(mt_samples_df).transform(mt_samples_df)
result = result.withColumnRenamed("index", "doc_id")
result.show()

+------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|doc_id|                text|            document|            sentence|               token|          embeddings|        clinical_ner|  clinical_ner_chunk|
+------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|     0|Sample Type / Med...|[{document, 0, 54...|[{document, 0, 53...|[{token, 0, 5, Sa...|[{word_embeddings...|[{named_entity, 0...|[{chunk, 88, 99, ...|
|     1|Sample Type / Med...|[{document, 0, 32...|[{document, 0, 53...|[{token, 0, 5, Sa...|[{word_embeddings...|[{named_entity, 0...|[{chunk, 344, 363...|
|     2|Sample Type / Med...|[{document, 0, 42...|[{document, 0, 53...|[{token, 0, 5, Sa...|[{word_embeddings...|[{named_entity, 0...|[{chunk, 68, 73, ...|
|     3|Sample Type / Med...|[{document, 0, 20...|[{document, 0,

We need th detected entities as a list for the next step, so we will explode the dataset and convert it to Pandas dataframe.

In [11]:
ner_result_df = result.select("doc_id", F.explode(F.arrays_zip(result.clinical_ner_chunk.result,
                                     result.clinical_ner_chunk.begin,
                                     result.clinical_ner_chunk.end,
                                     result.clinical_ner_chunk.metadata)).alias("cols"))\
        .select("doc_id", F.expr("cols['0']").alias("chunk"),
                F.expr("cols['1']").alias("begin"),
                F.expr("cols['2']").alias("end"),
                F.expr("cols['3']['entity']").alias("entity"),
                F.expr("cols['3']['ner_source']").alias("ner_source")).toPandas()
ner_result_df

Unnamed: 0,doc_id,chunk,begin,end,entity,ner_source
0,0,Mesothelioma,88,99,Oncological,clinical_ner_chunk
1,0,Mesothelioma,118,129,Oncological,clinical_ner_chunk
2,0,pleural effusion,132,147,Disease_Syndrome_Disorder,clinical_ner_chunk
3,0,atrial fibrillation,150,168,Heart_Disease,clinical_ner_chunk
4,0,anemia,171,176,Disease_Syndrome_Disorder,clinical_ner_chunk
...,...,...,...,...,...,...
297,9,ductal carcinoma of the breast,593,622,Oncological,clinical_ner_chunk
298,9,axillary adenopathy,846,864,Symptom,clinical_ner_chunk
299,9,lesion,905,910,Symptom,clinical_ner_chunk
300,9,wound,1673,1677,Symptom,clinical_ner_chunk


## Entity Resolution

We will create a 3-stage pipeline with `DocumentAssembler`, `BertSentenceEmbeddings` and `SentenceEntityResolverModel` components. Then we will create a `LightPipeline` and feed the entity list into it.

We have several ICD-10-CM Sentence Entity Resover models in Saprk NLP for Healthcare which are trained with different embeddings or in different sizes of datasets. You can [check here](https://nlp.johnsnowlabs.com/models?q=icd10cm&task=Entity+Resolution) and use one of them in your pipeline.

Here we will use `sbiobertresolve_icd10cm_slim_billable_hcc` resolver model to get the ICD-10-CM codes of the detected entities. It returns the official resolution text within the brackets and also provides billable and HCC information of the codes in `all_k_aux_labels` parameter in the metadata. This column can be divided to get further details: `billable status || hcc status || hcc score`. For example, if `all_k_aux_labels` is like `1||1||19` which means the `billable status` is 1, `hcc status` is 1, and `hcc score` is 19.

In [12]:
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("chunk")\
    .setOutputCol("ner_chunk")

sbert_embedder = nlp.BertSentenceEmbeddings.pretrained('sbiobert_base_cased_mli', 'en','clinical/models')\
    .setInputCols(["ner_chunk"])\
    .setOutputCol("sentence_embeddings")\
    .setCaseSensitive(False)

rxnorm_resolver = medical.SentenceEntityResolverModel.pretrained("sbiobertresolve_icd10cm_slim_billable_hcc","en", "clinical/models") \
    .setInputCols(["sentence_embeddings"]) \
    .setOutputCol("icd10cm_code")\
    .setDistanceFunction("EUCLIDEAN")

rxnorm_pipelineModel = nlp.PipelineModel(
    stages = [
        documentAssembler,
        sbert_embedder,
        rxnorm_resolver])

sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sbiobertresolve_icd10cm_slim_billable_hcc download started this may take some time.
Approximate size to download 417.9 MB
[OK!]


### LightPipelines

In [13]:
lmodel=nlp.LightPipeline(rxnorm_pipelineModel)

Now we will create a unique chunk list not to get the BERT embeddings of the duplications. Then we will feed this list to the resolver pipeline.

In [14]:
chunk_list = ner_result_df.chunk.unique().tolist()
len(chunk_list)

210

In [15]:
chunk_list[:10]

['Mesothelioma',
 'pleural effusion',
 'atrial fibrillation',
 'anemia',
 'ascites',
 'esophageal reflux',
 'deep venous thrombosis',
 'Pleural effusion',
 'cough',
 'chest pain']

In [16]:
%%time
results = lmodel.fullAnnotate(chunk_list)

CPU times: user 4.28 s, sys: 1.31 s, total: 5.58 s
Wall time: 1min 58s


You can see the elapsed time when we run `fullAnnotate` on the results above.

In [17]:
%%time

chunks = [i["icd10cm_code"][0].metadata["target_text"] for i in results]
codes = [i["icd10cm_code"][0].result for i in results]
resolutions = [i["icd10cm_code"][0].metadata["resolved_text"] for i in results]
all_codes = [i["icd10cm_code"][0].metadata["all_k_results"].split(':::') for i in results]
all_resolutions = [i["icd10cm_code"][0].metadata["all_k_resolutions"].split(':::') for i in results]
all_k_aux_labels = [i["icd10cm_code"][0].metadata["all_k_aux_labels"].split(':::') for i in results]

resolver_result_df = pd.DataFrame({'chunk':chunks,
                                   'icd10cm_code':codes,
                                   'resolution':resolutions,
                                   'all_codes':all_codes,
                                   'all_resolutions':all_resolutions,
                                   'all_k_aux_labels':all_k_aux_labels})

resolver_result_df

CPU times: user 88.8 ms, sys: 25.7 ms, total: 114 ms
Wall time: 188 ms


Unnamed: 0,chunk,icd10cm_code,resolution,all_codes,all_resolutions,all_k_aux_labels
0,Mesothelioma,C45,mesothelioma [mesothelioma],"[C45, C45.0, C45.9, C45.1, C45.2, C4A, C7B.1, ...","[mesothelioma [mesothelioma], mesothelioma of ...","[0||0||0, 1||1||9, 1||1||9, 1||1||9, 1||1||9, ..."
1,pleural effusion,J94.0,chylous effusion [chylous effusion],"[J94.0, J91.0, J92, S27.63, J91, R09.1, J81, R...","[chylous effusion [chylous effusion], malignan...","[1||0||0, 1||0||0, 0||0||0, 0||0||0, 0||0||0, ..."
2,atrial fibrillation,I48.1,persistent atrial fibrillation [persistent atr...,"[I48.1, I48.2, I48.0, I48.21, I48, I48.19, I48...",[persistent atrial fibrillation [persistent at...,"[0||0||0, 0||0||0, 1||1||96, 1||1||96, 0||0||0..."
3,anemia,D53.2,scorbutic anemia [scorbutic anemia],"[D53.2, D50, D72.825, D53.0, D74, R43.0, D70, ...","[scorbutic anemia [scorbutic anemia], iron def...","[1||0||0, 0||0||0, 1||0||0, 1||0||0, 0||0||0, ..."
4,ascites,R18,ascites [ascites],"[R18, R06.6, H53.54, L94.6, R14.2, Z67.A2, Q06...","[ascites [ascites], hiccough [hiccough], prota...","[0||0||0, 1||0||0, 1||0||0, 1||0||0, 1||0||0, ..."
...,...,...,...,...,...,...
205,lumps,R06.6,hiccough [hiccough],"[R06.6, R18, R60.0, L02.23, L02.13, L02.43, R4...","[hiccough [hiccough], ascites [ascites], local...","[1||0||0, 0||0||0, 1||0||0, 0||0||0, 1||0||0, ..."
206,bumps,L94.6,ainhum [ainhum],"[L94.6, R06.6, R18, Z67.A2, Q06.0, H53.54, R43...","[ainhum [ainhum], hiccough [hiccough], ascites...","[1||0||0, 1||0||0, 0||0||0, 0||0||0, 1||1||72,..."
207,hepatomegaly,K76.7,hepatorenal syndrome [hepatorenal syndrome],"[K76.7, K76.4, K72.1, D73.2, Q44.6, K76.81, P5...","[hepatorenal syndrome [hepatorenal syndrome], ...","[1||1||27, 1||0||0, 0||0||0, 1||0||0, 1||0||0,..."
208,axillary adenopathy,J35.02,chronic adenoiditis [chronic adenoiditis],"[J35.02, K11.2, L40.2, E71.522, M89.09, K14.0,...","[chronic adenoiditis [chronic adenoiditis], si...","[1||0||0, 0||0||0, 1||0||0, 1||1||23, 1||0||0,..."


As you can see, there were 302 chunks detected by the NER pipeline but we just got the resolutions of the unique ones, 210 chunks instead of 302. This kept us to use the resources and the time effectively.

Now lets merge the resolutions with the ner_result_df.

In [18]:
merged_resolver_df = pd.merge(ner_result_df, resolver_result_df, on="chunk", how="left")
merged_resolver_df.drop(columns=['begin','end','entity','ner_source'], inplace=True)
merged_resolver_df

Unnamed: 0,doc_id,chunk,icd10cm_code,resolution,all_codes,all_resolutions,all_k_aux_labels
0,0,Mesothelioma,C45,mesothelioma [mesothelioma],"[C45, C45.0, C45.9, C45.1, C45.2, C4A, C7B.1, ...","[mesothelioma [mesothelioma], mesothelioma of ...","[0||0||0, 1||1||9, 1||1||9, 1||1||9, 1||1||9, ..."
1,0,Mesothelioma,C45,mesothelioma [mesothelioma],"[C45, C45.0, C45.9, C45.1, C45.2, C4A, C7B.1, ...","[mesothelioma [mesothelioma], mesothelioma of ...","[0||0||0, 1||1||9, 1||1||9, 1||1||9, 1||1||9, ..."
2,0,pleural effusion,J94.0,chylous effusion [chylous effusion],"[J94.0, J91.0, J92, S27.63, J91, R09.1, J81, R...","[chylous effusion [chylous effusion], malignan...","[1||0||0, 1||0||0, 0||0||0, 0||0||0, 0||0||0, ..."
3,0,atrial fibrillation,I48.1,persistent atrial fibrillation [persistent atr...,"[I48.1, I48.2, I48.0, I48.21, I48, I48.19, I48...",[persistent atrial fibrillation [persistent at...,"[0||0||0, 0||0||0, 1||1||96, 1||1||96, 0||0||0..."
4,0,anemia,D53.2,scorbutic anemia [scorbutic anemia],"[D53.2, D50, D72.825, D53.0, D74, R43.0, D70, ...","[scorbutic anemia [scorbutic anemia], iron def...","[1||0||0, 0||0||0, 1||0||0, 1||0||0, 0||0||0, ..."
...,...,...,...,...,...,...,...
297,9,ductal carcinoma of the breast,D05.0,lobular carcinoma in situ of breast [lobular c...,"[D05.0, D05, D05.1, C44.521, C4A.52, C50.6, C4...",[lobular carcinoma in situ of breast [lobular ...,"[0||0||0, 0||0||0, 0||0||0, 1||0||0, 1||1||12,..."
298,9,axillary adenopathy,J35.02,chronic adenoiditis [chronic adenoiditis],"[J35.02, K11.2, L40.2, E71.522, M89.09, K14.0,...","[chronic adenoiditis [chronic adenoiditis], si...","[1||0||0, 0||0||0, 1||0||0, 1||1||23, 1||0||0,..."
299,9,lesion,J63.4,siderosis [siderosis],"[J63.4, R14.2, Z73.82, T75.4, M75, H83.2X, R29...","[siderosis [siderosis], eructation [eructation...","[1||1||112, 1||0||0, 1||0||0, 0||0||0, 0||0||0..."
300,9,wound,S51.8,open wound of forearm [open wound of forearm],"[S51.8, S81.8, S01, S01.0, S61.4, R14.2, L02.4...",[open wound of forearm [open wound of forearm]...,"[0||0||0, 0||0||0, 0||0||0, 0||0||0, 0||0||0, ..."


And finally, you can see the total time taken for all the results to be returned. Additionally, you can split the information in `all_k_aux_labels` to see the `billable`, `hcc_status`, and `hcc_code` details.

In [19]:
merged_resolver_df['billable'] = merged_resolver_df['all_k_aux_labels'].apply(lambda x: [i.split('||')[0] for i in x])
merged_resolver_df['hcc_status'] = merged_resolver_df['all_k_aux_labels'].apply(lambda x: [i.split('||')[1] for i in x])
merged_resolver_df['hcc_code'] = merged_resolver_df['all_k_aux_labels'].apply(lambda x: [i.split('||')[2] for i in x])
merged_resolver_df = merged_resolver_df.drop(['all_k_aux_labels'], axis=1)

merged_resolver_df.head(15)

Unnamed: 0,doc_id,chunk,icd10cm_code,resolution,all_codes,all_resolutions,billable,hcc_status,hcc_code
0,0,Mesothelioma,C45,mesothelioma [mesothelioma],"[C45, C45.0, C45.9, C45.1, C45.2, C4A, C7B.1, ...","[mesothelioma [mesothelioma], mesothelioma of ...","[0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, ...","[0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, ...","[0, 9, 9, 9, 9, 0, 8, 0, 11, 12, 0, 8, 9, 10, ..."
1,0,Mesothelioma,C45,mesothelioma [mesothelioma],"[C45, C45.0, C45.9, C45.1, C45.2, C4A, C7B.1, ...","[mesothelioma [mesothelioma], mesothelioma of ...","[0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, ...","[0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, ...","[0, 9, 9, 9, 9, 0, 8, 0, 11, 12, 0, 8, 9, 10, ..."
2,0,pleural effusion,J94.0,chylous effusion [chylous effusion],"[J94.0, J91.0, J92, S27.63, J91, R09.1, J81, R...","[chylous effusion [chylous effusion], malignan...","[1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, ..."
3,0,atrial fibrillation,I48.1,persistent atrial fibrillation [persistent atr...,"[I48.1, I48.2, I48.0, I48.21, I48, I48.19, I48...",[persistent atrial fibrillation [persistent at...,"[0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, ...","[0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, ...","[0, 0, 96, 96, 0, 96, 96, 84, 96, 0, 0, 96, 0,..."
4,0,anemia,D53.2,scorbutic anemia [scorbutic anemia],"[D53.2, D50, D72.825, D53.0, D74, R43.0, D70, ...","[scorbutic anemia [scorbutic anemia], iron def...","[1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 46, 0, 0..."
5,0,ascites,R18,ascites [ascites],"[R18, R06.6, H53.54, L94.6, R14.2, Z67.A2, Q06...","[ascites [ascites], hiccough [hiccough], prota...","[0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 72, 0, 0, 0, 0, 0, 0, 0, 0,..."
6,0,esophageal reflux,K21,gastro-esophageal reflux disease [gastro-esoph...,"[K21, K22.2, K21.0, K22.4, K20.8, K20, T28.6, ...",[gastro-esophageal reflux disease [gastro-esop...,"[0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
7,0,deep venous thrombosis,I81,portal vein thrombosis [portal vein thrombosis],"[I81, I82.72, I82.5, I82.592, I82.722, K64.5, ...",[portal vein thrombosis [portal vein thrombosi...,"[1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, ...","[0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, ...","[0, 0, 0, 108, 108, 0, 0, 108, 108, 108, 0, 0,..."
8,0,Mesothelioma,C45,mesothelioma [mesothelioma],"[C45, C45.0, C45.9, C45.1, C45.2, C4A, C7B.1, ...","[mesothelioma [mesothelioma], mesothelioma of ...","[0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, ...","[0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, ...","[0, 9, 9, 9, 9, 0, 8, 0, 11, 12, 0, 8, 9, 10, ..."
9,0,Pleural effusion,J94.0,chylous effusion [chylous effusion],"[J94.0, J91.0, J92, S27.63, J91, R09.1, J81, R...","[chylous effusion [chylous effusion], malignan...","[1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, ..."


We can save the `result_df` as a CSV file to use it later.

In [20]:
merged_resolver_df.to_csv("result_df.csv", index=False)

In [21]:
merged_resolver_df = pd.read_csv("/content/result_df.csv")
merged_resolver_df.head(10)

Unnamed: 0,doc_id,chunk,icd10cm_code,resolution,all_codes,all_resolutions,billable,hcc_status,hcc_code
0,0,Mesothelioma,C45,mesothelioma [mesothelioma],"['C45', 'C45.0', 'C45.9', 'C45.1', 'C45.2', 'C...","['mesothelioma [mesothelioma]', 'mesothelioma ...","['0', '1', '1', '1', '1', '0', '1', '0', '1', ...","['0', '1', '1', '1', '1', '0', '1', '0', '1', ...","['0', '9', '9', '9', '9', '0', '8', '0', '11',..."
1,0,Mesothelioma,C45,mesothelioma [mesothelioma],"['C45', 'C45.0', 'C45.9', 'C45.1', 'C45.2', 'C...","['mesothelioma [mesothelioma]', 'mesothelioma ...","['0', '1', '1', '1', '1', '0', '1', '0', '1', ...","['0', '1', '1', '1', '1', '0', '1', '0', '1', ...","['0', '9', '9', '9', '9', '0', '8', '0', '11',..."
2,0,pleural effusion,J94.0,chylous effusion [chylous effusion],"['J94.0', 'J91.0', 'J92', 'S27.63', 'J91', 'R0...","['chylous effusion [chylous effusion]', 'malig...","['1', '1', '0', '0', '0', '1', '0', '1', '1', ...","['0', '0', '0', '0', '0', '0', '0', '0', '0', ...","['0', '0', '0', '0', '0', '0', '0', '0', '0', ..."
3,0,atrial fibrillation,I48.1,persistent atrial fibrillation [persistent atr...,"['I48.1', 'I48.2', 'I48.0', 'I48.21', 'I48', '...",['persistent atrial fibrillation [persistent a...,"['0', '0', '1', '1', '0', '1', '1', '1', '1', ...","['0', '0', '1', '1', '0', '1', '1', '1', '1', ...","['0', '0', '96', '96', '0', '96', '96', '84', ..."
4,0,anemia,D53.2,scorbutic anemia [scorbutic anemia],"['D53.2', 'D50', 'D72.825', 'D53.0', 'D74', 'R...","['scorbutic anemia [scorbutic anemia]', 'iron ...","['1', '0', '1', '1', '0', '1', '0', '0', '0', ...","['0', '0', '0', '0', '0', '0', '0', '0', '0', ...","['0', '0', '0', '0', '0', '0', '0', '0', '0', ..."
5,0,ascites,R18,ascites [ascites],"['R18', 'R06.6', 'H53.54', 'L94.6', 'R14.2', '...","['ascites [ascites]', 'hiccough [hiccough]', '...","['0', '1', '1', '1', '1', '0', '1', '0', '1', ...","['0', '0', '0', '0', '0', '0', '1', '0', '0', ...","['0', '0', '0', '0', '0', '0', '72', '0', '0',..."
6,0,esophageal reflux,K21,gastro-esophageal reflux disease [gastro-esoph...,"['K21', 'K22.2', 'K21.0', 'K22.4', 'K20.8', 'K...",['gastro-esophageal reflux disease [gastro-eso...,"['0', '1', '0', '1', '0', '0', '0', '0', '1', ...","['0', '0', '0', '0', '0', '0', '0', '0', '0', ...","['0', '0', '0', '0', '0', '0', '0', '0', '0', ..."
7,0,deep venous thrombosis,I81,portal vein thrombosis [portal vein thrombosis],"['I81', 'I82.72', 'I82.5', 'I82.592', 'I82.722...",['portal vein thrombosis [portal vein thrombos...,"['1', '0', '0', '1', '1', '1', '0', '1', '1', ...","['0', '0', '0', '1', '1', '0', '0', '1', '1', ...","['0', '0', '0', '108', '108', '0', '0', '108',..."
8,0,Mesothelioma,C45,mesothelioma [mesothelioma],"['C45', 'C45.0', 'C45.9', 'C45.1', 'C45.2', 'C...","['mesothelioma [mesothelioma]', 'mesothelioma ...","['0', '1', '1', '1', '1', '0', '1', '0', '1', ...","['0', '1', '1', '1', '1', '0', '1', '0', '1', ...","['0', '9', '9', '9', '9', '0', '8', '0', '11',..."
9,0,Pleural effusion,J94.0,chylous effusion [chylous effusion],"['J94.0', 'J91.0', 'J92', 'S27.63', 'J91', 'R0...","['chylous effusion [chylous effusion]', 'malig...","['1', '1', '0', '0', '0', '1', '0', '1', '1', ...","['0', '0', '0', '0', '0', '0', '0', '0', '0', ...","['0', '0', '0', '0', '0', '0', '0', '0', '0', ..."
