![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/06.1.Code_Mapping_Pipelines.ipynb)

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs==5.1.0

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, medical, visual

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.install()

In [None]:
from johnsnowlabs import nlp, medical, visual

# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

In [None]:
spark

# HEALTHCARE CODES MAPPING BY USING PRETRAINED PIPELINES


- **Chunk Mapper Pretrained Pipelines**

|index|model|
|-----:|:-----|
| 1| [icd10_icd9_mapping](https://nlp.johnsnowlabs.com/2022/09/30/icd10_icd9_mapping_en.html)   |
| 2| [icdo_snomed_mapping](https://nlp.johnsnowlabs.com/2022/06/27/icdo_snomed_mapping_en_3_0.html)   |
| 3| [icd10cm_snomed_mapping](https://nlp.johnsnowlabs.com/2022/06/27/icd10cm_snomed_mapping_en_3_0.html)   |
| 4| [rxnorm_ndc_mapping](https://nlp.johnsnowlabs.com/2022/06/27/rxnorm_ndc_mapping_en_3_0.html)   |
| 5| [rxnorm_umls_mapping](https://nlp.johnsnowlabs.com/2022/06/27/rxnorm_umls_mapping_en_3_0.html)   |
| 6| [snomed_icd10cm_mapping](https://nlp.johnsnowlabs.com/2022/06/27/snomed_icd10cm_mapping_en_3_0.html)   |
| 7| [snomed_icdo_mapping](https://nlp.johnsnowlabs.com/2022/06/27/snomed_icdo_mapping_en_3_0.html)   |
| 8| [snomed_umls_mapping](https://nlp.johnsnowlabs.com/2022/06/27/snomed_umls_mapping_en_3_0.html)   |
| 9| [icd10cm_umls_mapping](https://nlp.johnsnowlabs.com/2022/06/27/icd10cm_umls_mapping_en_3_0.html)   |
| 10| [mesh_umls_mapping](https://nlp.johnsnowlabs.com/2021/07/01/mesh_umls_mapping_en.html)   |
| 11| [rxnorm_mesh_mapping](https://nlp.johnsnowlabs.com/2021/07/01/rxnorm_mesh_mapping_en.html)   |


**You can find all these models and more [NLP Models Hub](https://nlp.johnsnowlabs.com/models?q=Chunk+Mapping&edition=Spark+NLP+for+Healthcare)**

## ICD10CM to SNOMED Code Mapping

This pretrained pipeline maps ICD10CM codes to SNOMED codes without using any text data. You’ll just feed a comma or white space delimited ICD10CM codes and it will return the corresponding SNOMED codes as a list. For the time being, it supports 132K Snomed codes and will be augmented & enriched in the next releases.

In [None]:
icd10_snomed_pipeline = nlp.PretrainedPipeline("icd10cm_snomed_mapping", "en", "clinical/models")

icd10cm_snomed_mapping download started this may take some time.
Approx size to download 1 MB
[OK!]


In [None]:
icd10_snomed_pipeline.model.stages

[DocumentAssembler_9ae20ea07807,
 REGEX_TOKENIZER_53bbcca15dd6,
 CHUNKER-MAPPER_5968de7123c1]

In [None]:
icd10_snomed_pipeline.annotate('M8950 E119 H16269')

{'document': ['M8950 E119 H16269'],
 'icd10cm_code': ['M8950', 'E119', 'H16269'],
 'snomed_code': ['716868003', '170771004', '51264003']}

|**ICD10CM Code** | **ICD10CM Details** | **SNOMED Code** | **SNOMED Details** |
| ---------- | -----------:| ---------- | -----------:|
| M8950 |  Osteolysis, unspecified site | 716868003 | Multicentric osteolysis nodulosis arthropathy spectrum |
| E119 | Type 2 diabetes mellitus | 170771004 | Diabetic - follow-up default |
| H16269 | Vernal keratoconjunctivitis, with limbar and corneal involvement, unspecified eye | 51264003 | Limbal AND/OR corneal involvement in vernal conjunctivitis |


## SNOMED to ICD10CM Code Mapping

This pretrained pipeline maps SNOMED codes to ICD10CM codes without using any text data. You'll just feed a comma or white space delimited SNOMED codes and it will return the corresponding candidate ICD10CM codes as a list (multiple ICD10 codes for each Snomed code). For the time being, it supports 132K Snomed codes and 30K ICD10 codes and will be augmented & enriched in the next releases.

In [None]:
snomed_icd10_pipeline = nlp.PretrainedPipeline("snomed_icd10cm_mapping","en","clinical/models")

snomed_icd10cm_mapping download started this may take some time.
Approx size to download 1.5 MB
[OK!]


In [None]:
snomed_icd10_pipeline.model.stages

[DocumentAssembler_4d05b9c5d71e,
 REGEX_TOKENIZER_a5ce08d2cedf,
 CHUNKER-MAPPER_fa42286a8f92]

In [None]:
snomed_icd10_pipeline.annotate('716868003 170771004 51264003')

{'document': ['716868003 170771004 51264003'],
 'snomed_code': ['716868003', '170771004', '51264003'],
 'icd10cm_code': ['M89.50', 'E11.9', 'H16.269']}

| **SNOMED Code** | **SNOMED Details** |**ICD10CM Code** | **ICD10CM Details** |
| ---------- | -----------:| ---------- | -----------:|
| 716868003 | Multicentric osteolysis nodulosis arthropathy spectrum | M89.50 |  Osteolysis, unspecified site |
| 170771004 | Diabetic - follow-up default | E11.9 | Type 2 diabetes mellitus |
| 51264003 | Limbal AND/OR corneal involvement in vernal conjunctivitis | H16.269 | Vernal keratoconjunctivitis, with limbar and corneal involvement, unspecified eye |



## ICD-O to SNOMED Code Mapping

This pretrained pipeline maps ICD-O codes to SNOMED codes without using any text data. You’ll just feed a comma or white space delimited ICD-O codes and it will return the corresponding SNOMED codes as a list.

In [None]:
icdo_snomed_pipeline = nlp.PretrainedPipeline("icdo_snomed_mapping", "en", "clinical/models")

icdo_snomed_mapping download started this may take some time.
Approx size to download 134.1 KB
[OK!]


In [None]:
icdo_snomed_pipeline.model.stages

[DocumentAssembler_666bb5cbda3a,
 REGEX_TOKENIZER_594e2894eaa6,
 CHUNKER-MAPPER_e9822ba7753b]

In [None]:
icdo_snomed_pipeline.annotate('8172/3 C77.5 8982/0')

{'document': ['8172/3 C77.5 8982/0'],
 'icdo_code': ['8172/3', 'C77.5', '8982/0'],
 'snomed_code': ['128646008', '5394000', '69291002']}

|**ICDO Code** | **ICDOCM Details** | **SNOMED Code** | **SNOMED Details** |
| ---------- | -----------:| ---------- | -----------:|
| 8172/3 |  Hepatocellular carcinoma, scirrhous | 128646008 | Hepatocellular carcinoma, scirrhous |
| C77.5 | Pelvic lymph nodes | 5394000 | Structure of uterine paracervical lymph node |
| 8982/0 | Myoepithelioma, NOS | 69291002 | Myoepithelial adenoma |


## SNOMED to ICD-O Code Mapping

This pretrained pipeline maps SNOMED codes to ICD-O codes without using any text data. You’ll just feed a comma or white space delimited SNOMED codes and it will return the corresponding ICD-O codes as a list.

In [None]:
snomed_icdo_pipeline = nlp.PretrainedPipeline("snomed_icdo_mapping", "en", "clinical/models")

snomed_icdo_mapping download started this may take some time.
Approx size to download 207.9 KB
[OK!]


In [None]:
snomed_icdo_pipeline.model.stages

[DocumentAssembler_82ca3d0764d1,
 REGEX_TOKENIZER_7b708353dd11,
 CHUNKER-MAPPER_2e80f745c696]

In [None]:
snomed_icdo_pipeline.annotate('128646008 5394000 69291002')

{'document': ['128646008 5394000 69291002'],
 'snomed_code': ['128646008', '5394000', '69291002'],
 'icdo_code': ['8172/3', 'C77.5', '8982/0']}

|**SNOMED Code** | **SNOMED Details** |**ICDO Code** | **ICDOCM Details** |
| ---------- | -----------:| ---------- | -----------:|
|128646008 | Hepatocellular carcinoma, scirrhous | 8172/3 |  Hepatocellular carcinoma, scirrhous |
|5394000 | Structure of uterine paracervical lymph node | C77.5 | Pelvic lymph nodes |
|69291002 | Myoepithelial adenoma | 8982/0 | Myoepithelioma, NOS |





## ICD10CM to UMLS Code Mapping

This pretrained pipeline maps ICD10CM codes to UMLS codes without using any text data. You’ll just feed white space delimited ICD10CM codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
icd10_umls_pipeline = nlp.PretrainedPipeline( "icd10cm_umls_mapping","en","clinical/models")

icd10cm_umls_mapping download started this may take some time.
Approx size to download 934.2 KB
[OK!]


In [None]:
icd10_umls_pipeline.model.stages

[DocumentAssembler_25b094820972,
 REGEX_TOKENIZER_b314ec6a557a,
 CHUNKER-MAPPER_fc0d256341f7]

In [None]:
icd10_umls_pipeline.annotate("M8950 R822 R0901")

{'document': ['M8950 R822 R0901'],
 'icd10cm_code': ['M8950', 'R822', 'R0901'],
 'umls_code': ['C4721411', 'C0159076', 'C0004044']}

|**ICD10CM Code** | **ICD10CM Details** | **UMLS Code** | **UMLS Details** |
| ---------- | -----------:| ---------- | -----------:|
| M8950 |  Osteolysis, unspecified site | C4721411 | osteolysis |
| R822 | Biliuria | C0159076 | Biliuria |
| R0901 | Asphyxia | C0004044 | Asphyxia |



## SNOMED to UMLS Code Mapping

This pretrained pipeline maps SNOMED codes to UMLS codes without using any text data. You’ll just feed white space delimited SNOMED codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
snomed_umls_pipeline = nlp.PretrainedPipeline("snomed_umls_mapping","en","clinical/models")

snomed_umls_mapping download started this may take some time.
Approx size to download 4.9 MB
[OK!]


In [None]:
snomed_umls_pipeline.model.stages

[DocumentAssembler_9e30bb9a34d6,
 REGEX_TOKENIZER_500ab0eff963,
 CHUNKER-MAPPER_6ecd1e714646]

In [None]:
snomed_umls_pipeline.annotate('733187009 449433008 51264003')

{'document': ['733187009 449433008 51264003'],
 'snomed_code': ['733187009', '449433008', '51264003'],
 'umls_code': ['C4546029', 'C3164619', 'C0271267']}

|**SNOMED Code** | **SNOMED Details** | **UMLS Code** | **UMLS Details** |
| ---------- | -----------:| ---------- | -----------:|
| 733187009 | osteolysis following surgical procedure on skeletal system | C4546029 | osteolysis following surgical procedure on skeletal system |
| 449433008 | Diffuse stenosis of left pulmonary artery | C3164619 | diffuse stenosis of left pulmonary artery |
| 51264003 | Limbal AND/OR corneal involvement in vernal conjunctivitis | C0271267 | limbal and/or corneal involvement in vernal conjunctivitis |




## RXNORM to UMLS Code Mapping

This pretrained pipeline maps RxNorm codes to UMLS codes without using any text data. You’ll just feed white space-delimited RxNorm codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
rxnorm_umls_pipeline = nlp.PretrainedPipeline( "rxnorm_umls_mapping","en","clinical/models")

rxnorm_umls_mapping download started this may take some time.
Approx size to download 1.8 MB
[OK!]


In [None]:
rxnorm_umls_pipeline.model.stages

[DocumentAssembler_4e185a5b8ed5,
 REGEX_TOKENIZER_69e9306f61a7,
 CHUNKER-MAPPER_5ac1d410c34d]

In [None]:
rxnorm_umls_pipeline.annotate("1161611 315677 343663")

{'document': ['1161611 315677 343663'],
 'rxnorm_code': ['1161611', '315677', '343663'],
 'umls_code': ['C3215948', 'C0984912', 'C1146501']}

|**RxNorm Code** | **RxNorm Details** | **UMLS Code** | **UMLS Details** |
| ---------- | -----------:| ---------- | -----------:|
| 1161611 |  metformin Pill | C3215948 | metformin pill |
| 315677 | cimetidine 100 mg | C0984912 | cimetidine 100 mg |
| 343663 | insulin lispro 50 UNT/ML | C1146501 | insulin lispro 50 unt/ml |

## MESH to UMLS Code Mapping

This pretrained pipeline maps MeSH codes to UMLS codes without using any text data. You’ll just feed white space delimited MeSH codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
mesh_umls_pipeline = nlp.PretrainedPipeline( "mesh_umls_mapping","en","clinical/models")

mesh_umls_mapping download started this may take some time.
Approx size to download 3.7 MB
[OK!]


In [None]:
mesh_umls_pipeline.model.stages

[DocumentAssembler_7610a83bd4ea,
 REGEX_TOKENIZER_3c8b5579ba5f,
 CHUNKER-MAPPER_4039b8f2b99c]

In [None]:
mesh_umls_pipeline.annotate("C028491 D019326 C579867")

{'document': ['C028491 D019326 C579867'],
 'mesh_code': ['C028491', 'D019326', 'C579867'],
 'umls_code': ['C0043904', 'C0045010', 'C3696376']}

|**MeSH Code** | **MeSH Details** | **UMLS Code** | **UMLS Details** |
|-| -| - | -|
| C028491 |  1,3-butylene glycol | C0043904 | 1,3-butylene glycol |
| D019326 | 17-alpha-Hydroxyprogesterone | C0045010 | 17-alpha-hydroxyprogesterone |
| C579867 | 3-Methylglutaconic Aciduria | C3696376 | 3-Methylglutaconic Aciduria |

## RXNORM to MESH Code Mapping

This pretrained pipeline maps RxNorm codes to MeSH codes without using any text data. You’ll just feed white space-delimited RxNorm codes and it will return the corresponding MeSH codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
rxnorm_mesh_pipeline = nlp.PretrainedPipeline( "rxnorm_mesh_mapping","en","clinical/models")

rxnorm_mesh_mapping download started this may take some time.
Approx size to download 101.6 KB
[OK!]


In [None]:
rxnorm_mesh_pipeline.model.stages

[DocumentAssembler_d554433bf767,
 REGEX_TOKENIZER_4fc66a0cc65c,
 LEMMATIZER_568c2c2ed9f2,
 Finisher_9aef0b33bc5c]

In [None]:
rxnorm_mesh_pipeline.annotate("1191 6809 47613")

{'rxnorm': ['1191', '6809', '47613'],
 'mesh': ['D001241', 'D008687', 'D019355']}

|**RxNorm** | **Details** |
| ---------- | -----------:|
| 1191 |  aspirin |
| 6809 | metformin |
| 47613 | calcium citrate |

| **MeSH** | **Details** |
| ---------- | -----------:|
| D001241 | Aspirin |
| D008687 | Metformin |
| D019355 | Calcium Citrate |

## RXNORM to NDC Code Mapping

This pretrained pipeline maps RxNorm codes to NDC codes without using any text data. You’ll just feed white space-delimited RxNorm codes and it will return the corresponding NDC codes as a list.

In [None]:
rxnorm_ndc_pipeline = nlp.PretrainedPipeline( "rxnorm_ndc_mapping","en","clinical/models")

rxnorm_ndc_mapping download started this may take some time.
Approx size to download 3.9 MB
[OK!]


In [None]:
rxnorm_ndc_pipeline.model.stages

[DocumentAssembler_f92d7dac3ac1,
 REGEX_TOKENIZER_62c251e1d103,
 CHUNKER-MAPPER_2d7b0e176787,
 CHUNKER-MAPPER_2d7b0e176787]

In [None]:
rxnorm_ndc_pipeline.annotate("1191 6809 47613")

{'document': ['1191 6809 47613'],
 'rxnorm_code': ['1191', '6809', '47613'],
 'package_ndc': ['62991-1176-06', '38779-2126-04', '00178-0796-30'],
 'product_ndc': ['62991-1176', '38779-2126', '00178-0796']}

|**RxNorm** | **Details** |
| ---------- | -----------:|
| 1191 |  aspirin |
| 6809 | metformin |
| 47613 | calcium citrate |

## ICD10 to ICD9 Code Mapping
This pretrained pipeline maps ICD10 codes to ICD9 codes without using any text data. You’ll just feed a comma or white space delimited ICD10 codes and it will return the corresponding ICD9 codes as a list. If there is no mapping, the original code is returned with no mapping.

In [None]:
icd10_icd9_pipeline = nlp.PretrainedPipeline("icd10_icd9_mapping", "en", "clinical/models")

icd10_icd9_mapping download started this may take some time.
Approx size to download 579.7 KB
[OK!]


In [None]:
icd10_icd9_pipeline.model.stages

[DocumentAssembler_78a9faf419f2,
 REGEX_TOKENIZER_0ea795317267,
 CHUNKER-MAPPER_3ac48fe0df3d]

In [None]:
icd10_icd9_pipeline.annotate('E669 R630 J988')

{'document': ['E669 R630 J988'],
 'icd10cm_code': ['E669', 'R630', 'J988'],
 'icd9_code': ['27800', '7830', '5198']}

| ICD10 | Details |
| ---------- | ----------------------------:|
| E669 | Obesity |
| R630 | Anorexia |
| J988 | Other specified respiratory disorders |



| ICD9 | Details |
| ---------- | ---------------------------:|
| 27800 | Obesity |
| 7830 | Anorexia |
| 5198 | Other diseases of respiratory system |