![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.1.Healthcare_Code_Mapping.ipynb)

In [None]:
import json

from google.colab import files

license_keys = files.upload()

with open(list(license_keys.keys())[0]) as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)

# Adding license key-value pairs to environment variables
import os
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

In [None]:
import json
import os
import sparknlp_jsl
import sparknlp
from pyspark.ml import Pipeline, PipelineModel
from sparknlp.pretrained import PretrainedPipeline
from pyspark.sql import SparkSession
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.util import *
from sparknlp_jsl.annotator import *

params = {"spark.driver.memory":"16G",
"spark.kryoserializer.buffer.max":"2000M",
"spark.driver.maxResultSize":"2000M"}

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

print (sparknlp.version())
print (sparknlp_jsl.version())

3.3.0
3.3.0


# HEALTHCARE CODES MAPPING BY USING PRETRAINED PIPELINES

In [None]:
from sparknlp.pretrained import PretrainedPipeline

## 1. ICD10CM to SNOMED Code Mapping

This pretrained pipeline maps ICD10CM codes to SNOMED codes without using any text data. You’ll just feed a comma or white space delimited ICD10CM codes and it will return the corresponding SNOMED codes as a list. For the time being, it supports 132K Snomed codes and will be augmented & enriched in the next releases.

In [None]:
icd10_snomed_pipeline = PretrainedPipeline("icd10cm_snomed_mapping", "en", "clinical/models")

icd10cm_snomed_mapping download started this may take some time.
Approx size to download 514.5 KB
[OK!]


In [None]:
icd10_snomed_pipeline.model.stages

[DocumentAssembler_effe917bc86b,
 REGEX_TOKENIZER_a2e7a20a20d4,
 LEMMATIZER_0ca0f7005a90,
 Finisher_07470acb09e3]

In [None]:
icd10_snomed_pipeline.annotate('M89.50 I288 H16269')

{'icd10cm': ['M89.50', 'I288', 'H16269'],
 'snomed': ['733187009', '449433008', '51264003']}

|**ICD10CM** | **Details** | 
| ---------- | -----------:|
| M89.50 |  Osteolysis, unspecified site |
| I288 | Other diseases of pulmonary vessels |
| H16269 | Vernal keratoconjunctivitis, with limbar and corneal involvement, unspecified eye |

| **SNOMED** | **Details** |
| ---------- | -----------:|
| 733187009 | Osteolysis following surgical procedure on skeletal system |
| 449433008 | Diffuse stenosis of left pulmonary artery |
| 51264003 | Limbal AND/OR corneal involvement in vernal conjunctivitis |

## 2. SNOMED to ICD10CM Code Mapping

This pretrained pipeline maps SNOMED codes to ICD10CM codes without using any text data. You'll just feed a comma or white space delimited SNOMED codes and it will return the corresponding candidate ICD10CM codes as a list (multiple ICD10 codes for each Snomed code). For the time being, it supports 132K Snomed codes and 30K ICD10 codes and will be augmented & enriched in the next releases.

In [None]:
snomed_icd10_pipeline = PretrainedPipeline("snomed_icd10cm_mapping","en","clinical/models")

snomed_icd10cm_mapping download started this may take some time.
Approx size to download 1.8 MB
[OK!]


In [None]:
snomed_icd10_pipeline.model.stages

[DocumentAssembler_136f968cb1ef,
 REGEX_TOKENIZER_ecc8d3a8dbc9,
 LEMMATIZER_e9ae88d69d05,
 Finisher_790dd28aacd1]

In [None]:
snomed_icd10_pipeline.annotate('733187009 449433008 51264003')

{'icd10cm': ['M89.59, M89.50, M96.89',
  'Q25.6, I28.8',
  'H10.45, H10.1, H16.269'],
 'snomed': ['733187009', '449433008', '51264003']}

| **SNOMED** | **Details** |
| ------ | ------:|
| 733187009| Osteolysis following surgical procedure on skeletal system |
| 449433008 | Diffuse stenosis of left pulmonary artery |
| 51264003 | Limbal AND/OR corneal involvement in vernal conjunctivitis|

| **ICDM10CM** | **Details** |  
| ---------- | ---------:|
| M89.59 | Osteolysis, multiple sites |  
| M89.50 | Osteolysis, unspecified site |
| M96.89 | Other intraoperative and postprocedural complications and disorders of the musculoskeletal system | 
| Q25.6 | Stenosis of pulmonary artery |    
| I28.8 | Other diseases of pulmonary vessels |
| H10.45 | Other chronic allergic conjunctivitis |
| H10.1 | Acute atopic conjunctivitis | 
| H16.269 | Vernal keratoconjunctivitis, with limbar and corneal involvement, unspecified eye |

## 3. ICD10CM to UMLS Code Mapping

This pretrained pipeline maps ICD10CM codes to UMLS codes without using any text data. You’ll just feed white space delimited ICD10CM codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
icd10_umls_pipeline = PretrainedPipeline( "icd10cm_umls_mapping","en","clinical/models")

icd10cm_umls_mapping download started this may take some time.
Approx size to download 897.8 KB
[OK!]


In [None]:
icd10_umls_pipeline.model.stages

[DocumentAssembler_321db079dcc3,
 REGEX_TOKENIZER_cfa82a0b8d92,
 LEMMATIZER_da9a62c0c58e,
 Finisher_cd27b2ac8b2c]

In [None]:
icd10_umls_pipeline.annotate("M89.50 R82.2 R09.01")

{'icd10cm': ['M89.50', 'R82.2', 'R09.01'],
 'umls': ['C4721411', 'C0159076', 'C0004044']}

|**ICD10CM** | **Details** | 
| ---------- | -----------:|
| M89.50 |  Osteolysis, unspecified site |
| R82.2 | Biliuria |
| R09.01 | Asphyxia |

| **UMLS** | **Details** |
| ---------- | -----------:|
| C4721411 | osteolysis |
| C0159076 | Biliuria |
| C0004044 | Asphyxia |

## 4. SNOMED to UMLS Code Mapping

This pretrained pipeline maps SNOMED codes to UMLS codes without using any text data. You’ll just feed white space delimited SNOMED codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
snomed_umls_pipeline = PretrainedPipeline( "snomed_umls_mapping","en","clinical/models")

snomed_umls_mapping download started this may take some time.
Approx size to download 4.6 MB
[OK!]


In [None]:
snomed_umls_pipeline.model.stages

[DocumentAssembler_b2ae33f2655e,
 REGEX_TOKENIZER_d517b2cf8024,
 LEMMATIZER_42d9736d37e4,
 Finisher_a8f2c3917be5]

In [None]:
snomed_umls_pipeline.annotate('733187009 449433008 51264003')

{'snomed': ['733187009', '449433008', '51264003'],
 'umls': ['C4546029', 'C3164619', 'C0271267']}

|**SNOMED** | **Details** | 
| ---------- | -----------:|
| 733187009 | osteolysis following surgical procedure on skeletal system |
| 449433008 | Diffuse stenosis of left pulmonary artery |
| 51264003 | Limbal AND/OR corneal involvement in vernal conjunctivitis |

| **UMLS** | **Details** |
| ---------- | -----------:|
| C4546029 | osteolysis following surgical procedure on skeletal system |
| C3164619 | diffuse stenosis of left pulmonary artery |
| C0271267 | limbal and/or corneal involvement in vernal conjunctivitis |

## 5. RXNORM to UMLS Code Mapping

This pretrained pipeline maps RxNorm codes to UMLS codes without using any text data. You’ll just feed white space-delimited RxNorm codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
rxnorm_umls_pipeline = PretrainedPipeline( "rxnorm_umls_mapping","en","clinical/models")

rxnorm_umls_mapping download started this may take some time.
Approx size to download 1.8 MB
[OK!]


In [None]:
rxnorm_umls_pipeline.model.stages

[DocumentAssembler_8a4aba7aa2d6,
 REGEX_TOKENIZER_be27abc336fd,
 LEMMATIZER_93482244f96b,
 Finisher_9440fd80a5d9]

In [None]:
rxnorm_umls_pipeline.annotate("1161611 315677 343663")

{'rxnorm': ['1161611', '315677', '343663'],
 'umls': ['C3215948', 'C0984912', 'C1146501']}

|**RxNorm** | **Details** | 
| ---------- | -----------:|
| 1161611 |  metformin Pill |
| 315677 | cimetidine 100 mg |
| 343663 | insulin lispro 50 UNT/ML |

| **UMLS** | **Details** |
| ---------- | -----------:|
| C3215948 | metformin pill |
| C0984912 | cimetidine 100 mg |
| C1146501 | insulin lispro 50 unt/ml |

## 6. MESH to UMLS Code Mapping

This pretrained pipeline maps MeSH codes to UMLS codes without using any text data. You’ll just feed white space delimited MeSH codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
mesh_umls_pipeline = PretrainedPipeline( "mesh_umls_mapping","en","clinical/models")

mesh_umls_mapping download started this may take some time.
Approx size to download 2.6 MB
[OK!]


In [None]:
mesh_umls_pipeline.model.stages

[DocumentAssembler_0ebc1b554d55,
 REGEX_TOKENIZER_912f3f1caa74,
 LEMMATIZER_971946054af9,
 Finisher_0119071594da]

In [None]:
mesh_umls_pipeline.annotate("C028491 D019326 C579867")

{'mesh': ['C028491', 'D019326', 'C579867'],
 'umls': ['C0970275', 'C0886627', 'C3696376']}

|**MeSH** | **Details** | 
| ---------- | -----------:|
| C028491 |  1,3-butylene glycol |
| D019326 | 17-alpha-Hydroxyprogesterone |
| C579867 | 3-Methylglutaconic Aciduria |

| **UMLS** | **Details** |
| ---------- | -----------:|
| C0970275 | 1,3-butylene glycol |
| C0886627 | 17-hydroxyprogesterone |
| C3696376 | 3-methylglutaconic aciduria |

## 7. RXNORM to MESH Code Mapping

This pretrained pipeline maps RxNorm codes to MeSH codes without using any text data. You’ll just feed white space-delimited RxNorm codes and it will return the corresponding MeSH codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
rxnorm_mesh_pipeline = PretrainedPipeline( "rxnorm_mesh_mapping","en","clinical/models")

rxnorm_mesh_mapping download started this may take some time.
Approx size to download 101.2 KB
[OK!]


In [None]:
rxnorm_mesh_pipeline.model.stages

[DocumentAssembler_d554433bf767,
 REGEX_TOKENIZER_91752b58618c,
 LEMMATIZER_568c2c2ed9f2,
 Finisher_9aef0b33bc5c]

In [None]:
rxnorm_mesh_pipeline.annotate("1191 6809 47613")

{'mesh': ['D001241', 'D008687', 'D019355'],
 'rxnorm': ['1191', '6809', '47613']}

|**RxNorm** | **Details** | 
| ---------- | -----------:|
| 1191 |  aspirin |
| 6809 | metformin |
| 47613 | calcium citrate |

| **MeSH** | **Details** |
| ---------- | -----------:|
| D001241 | Aspirin |
| D008687 | Metformin |
| D019355 | Calcium Citrate |