Skip to content

Commit

Permalink
Models hub internal (#1191)
Browse files Browse the repository at this point in the history
  • Loading branch information
muhammetsnts committed May 8, 2024
1 parent 442e626 commit 202c138
Show file tree
Hide file tree
Showing 29 changed files with 5,340 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
layout: model
title: Sentence Entity Resolver for UMLS CUI Codes
author: John Snow Labs
name: sbiobertresolve_umls_major_concepts
date: 2024-04-24
tags: [entity_resolution, umls, licensed, clinical, en]
task: Entity Resolution
language: en
edition: Healthcare NLP 5.3.1
spark_version: 3.0
supported: true
annotator: SentenceEntityResolverModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This model maps clinical entities and concepts to 4 major categories of UMLS CUI codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings.

## Predicted Entities

This model returns CUI (concept unique identifier) codes for `Clinical Findings`, `Medical Devices`, `Anatomical Structures`, `Injuries & Poisoning terms`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
[Open in Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/3.Clinical_Entity_Resolvers.ipynb){:.button.button-orange.button-orange-trans.co.button-icon}
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_umls_major_concepts_en_5.3.1_3.0_1713967626766.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_umls_major_concepts_en_5.3.1_3.0_1713967626766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use

`sbiobertresolve_umls_major_concepts` resolver model must be used with `sbiobert_base_cased_mli` as embeddings `ner_jsl` as NER model. `Cerebrovascular_Disease, Communicable_Disease, Diabetes, Disease_Syndrome_Disorder, Heart_Disease, Hyperlipidemia, Hypertension, Injury_or_Poisoning, Kidney_Disease, Medical-Device, Obesity, Oncological, Overweight, Psychological_Condition, Symptom, VS_Finding, ImagingFindings, EKG_Findings, Vaccine_Name, RelativeDate` set in `.setWhiteList()`.

<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
document_assembler = DocumentAssembler()\
.setInputCol('text')\
.setOutputCol('document')

sentence_detector =SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", 'clinical/models')\
.setInputCols(["document"])\
.setOutputCol("sentence")

tokenizer = Tokenizer()\
.setInputCols("sentence")\
.setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")

ner_model_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")\
.setWhiteList(["Cerebrovascular_Disease", "Communicable_Disease", "Diabetes", "Disease_Syndrome_Disorder",
"Heart_Disease", "Hyperlipidemia", "Hypertension", "Injury_or_Poisoning", "Kidney_Disease", "Medical-Device", "Obesity",
"Oncological", "Overweight", "Psychological_Condition",
"Symptom", "VS_Finding", "ImagingFindings", "EKG_Findings",
"Vaccine_Name", "RelativeDate"])

chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")

sbert_embedder = BertSentenceEmbeddings\
.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")

resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_umls_major_concepts","en", "clinical/models") \
.setInputCols(["sbert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")

pipeline = Pipeline(stages = [document_assembler, sentence_detector, tokenizer, word_embeddings, ner_model, ner_model_converter, chunk2doc, sbert_embedder, resolver])

data = spark.createDataFrame([["""A female patient got influenza vaccine and one day after she has complains of ankle pain. She has only history of gestational diabetes mellitus diagnosed prior to presentation and subsequent type two diabetes mellitus (T2DM)"""]]).toDF("text")

results = pipeline.fit(data).transform(data)
```
```scala
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val sentence_detector = new SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", 'clinical/models')
.setInputCols(Array("document"))
.setOutputCol("sentence")

val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")

val word_embeddings = WordEmbeddingsModel
.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")

val ner_model = MedicalNerModel
.pretrained("ner_jsl", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")

val ner_model_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
.setWhiteList(Array("Cerebrovascular_Disease",
"Communicable_Disease", "Diabetes", "Disease_Syndrome_Disorder",
"Heart_Disease", "Hyperlipidemia", "Hypertension", "Injury_or_Poisoning", "Kidney_Disease", "Medical-Device", "Obesity",
"Oncological", "Overweight", "Psychological_Condition",
"Symptom", "VS_Finding", "ImagingFindings", "EKG_Findings",
"Vaccine_Name", "RelativeDate"))

val chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")

val sbert_embedder = BertSentenceEmbeddings
.pretrained("sbiobert_base_cased_mli", "en","clinical/models")
.setInputCols(Array("ner_chunk_doc"))
.setOutputCol("sbert_embeddings")

val resolver = SentenceEntityResolverModel
.pretrained("sbiobertresolve_umls_major_concepts", "en", "clinical/models")
.setInputCols(Array("ner_chunk_doc", "sbert_embeddings"))
.setOutputCol("resolution")
.setDistanceFunction("EUCLIDEAN")

val p_model = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner_model, ner_model_converter, chunk2doc, sbert_embedder, resolver))

val data = Seq("A female patient got influenza vaccine and one day after she has complains of ankle pain. She has only history of gestational diabetes mellitus diagnosed prior to presentation and subsequent type two diabetes mellitus (T2DM).").toDF("text")

val res = p_model.fit(data).transform(data)
```
</div>

## Results

```bash
| | ner_chunk | entity | umls_code | resolution | all_k_results | all_k_distances | all_k_cosine_distances | all_k_resolutions |
|---:|:------------------------------|:-------------|:------------|:-------------------------------------------|:----------------------------------------------------|:---------------------------------------------|:---------------------------------------------|:---------------------------------------------------------------------------------|
| 0 | influenza vaccine | Vaccine_Name | C0260381 | influenza vaccination | C0260381:::C1260452:::C4302763:::C4473357:::C3476...| 6.5367:::6.8250:::7.2029:::7.5281:::7.7098...| 0.0708:::0.0776:::0.0854:::0.0947:::0.0969...| influenza vaccination:::vaccin for influenza:::influenza vaccination given:::d...|
| 1 | one day after | RelativeDate | C0420328 | follow-up 1 day (finding) | C0420328:::C4534547:::C5441960:::C3843067:::C3842...| 7.2691:::8.1345:::8.6351:::9.3661:::9.6892...| 0.0814:::0.1016:::0.1151:::0.1348:::0.1451...| follow-up 1 day (finding):::initial day:::1/day:::within 1 day or less:::sudde...|
| 2 | ankle pain | Symptom | C4047548 | bilateral ankle joint pain | C4047548:::C4315239:::C2032293:::C2089776:::C0576...| 4.8134:::6.7158:::6.9567:::7.1444:::7.1515...| 0.0337:::0.0665:::0.0703:::0.0751:::0.0741...| bilateral ankle joint pain:::joint and leg pain:::bilateral calf pain:::ankle ...|
| 3 | gestational diabetes mellitus | Diabetes | C2183115 | diabetes mellitus during pregnancy | C2183115:::C3161145:::C3532257:::C4303558:::C3840...| 5.2200:::6.3563:::6.9305:::7.1692:::7.2144...| 0.0401:::0.0596:::0.0717:::0.0750:::0.0773...| diabetes mellitus during pregnancy:::hx gestational diabetes:::gestational dia...|
| 4 | type two diabetes mellitus | Diabetes | C4016960 | type 2 diabetes mellitus, association with | C4016960:::C4014362:::C4016735:::C3532488:::C0260...| 4.3761:::5.4035:::5.5192:::6.1712:::6.2650...| 0.0285:::0.0438:::0.0460:::0.0568:::0.0583...| type 2 diabetes mellitus, association with:::type 2 diabetes mellitus (t2d):::...|
| 5 | T2DM | Diabetes | C4014362 | type 2 diabetes mellitus (t2d) | C4014362:::C1320657:::C4016960:::C4016735:::C0260...| 7.2798:::7.7099:::8.2517:::8.6288:::8.7378...| 0.0821:::0.0929:::0.1043:::0.1171:::0.1163...| type 2 diabetes mellitus (t2d):::type diabetes:::type 2 diabetes mellitus, ass...|
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|sbiobertresolve_umls_major_concepts|
|Compatibility:|Healthcare NLP 5.3.1+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence_embeddings]|
|Output Labels:|[umls_code]|
|Language:|en|
|Size:|4.2 GB|
|Case sensitive:|false|

## References

Trained on concepts from clinical major concepts for the 2023AB release of the Unified Medical Language System® (UMLS) Knowledge Sources: https://www.nlm.nih.gov/research/umls/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
layout: model
title: Pipeline to Detect Drug Entities - Generic
author: John Snow Labs
name: ner_medication_generic_pipeline
date: 2024-04-25
tags: [licensed, en, medication, ner, pipeline]
task: [Pipeline Healthcare, Named Entity Recognition]
language: en
edition: Healthcare NLP 5.3.1
spark_version: 3.2
supported: true
annotator: PipelineModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This pre-trained pipeline is designed to identify generic `DRUG` entities in clinical texts. It was built on top of the `ner_posology_greedy`, `ner_jsl_greedy`, `ner_drugs_large` and `drug_matcher` models to detect the entities `DRUG`, `DOSAGE`, `ROUTE` and `STRENGTH`, chunking them into a larger entity as `DRUG` when they appear together.
The main distinction from the `medication_ner_pipeline` is that it chunks these entities together, whereas the `medication_ner_pipeline` chunks them separately.

Predicted entities: `DRUG`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_medication_generic_pipeline_en_5.3.1_3.2_1714046788033.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_medication_generic_pipeline_en_5.3.1_3.2_1714046788033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python

from sparknlp.pretrained import PretrainedPipeline

ner_pipeline = PretrainedPipeline("ner_medication_generic_pipeline", "en", "clinical/models")

result = ner_pipeline.annotate("""The patient described the epigastric pain as burning and worsening after meals, often accompanied by heartburn and regurgitation, particularly when lying down.
Additionally, he reported discomfort and bloating associated with infrequent bowel movements. In response, his doctor prescribed a regimen tailored to his conditions:
Thiamine 100 mg , Folic acid 1 mg , multivitamins , Calcium carbonate plus Vitamin D 250 mg , Heparin 5000 units subcutaneously , Prilosec 20 mg , Senna two tabs .""")

```
```scala

import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val ner_pipeline = PretrainedPipeline("ner_medication_generic_pipeline", "en", "clinical/models")

val result = ner_pipeline.annotate("""The patient described the epigastric pain as burning and worsening after meals, often accompanied by heartburn and regurgitation, particularly when lying down.
Additionally, he reported discomfort and bloating associated with infrequent bowel movements. In response, his doctor prescribed a regimen tailored to his conditions:
Thiamine 100 mg , Folic acid 1 mg , multivitamins , Calcium carbonate plus Vitamin D 250 mg , Heparin 5000 units subcutaneously , Prilosec 20 mg , Senna two tabs .""")

```
</div>

## Results

```bash

+---------------------------------+---------+
|medication_greedy_chunk |ner_label|
+---------------------------------+---------+
|Thiamine 100 mg |DRUG |
|Folic acid 1 mg |DRUG |
|multivitamins |DRUG |
|Calcium carbonate |DRUG |
|Vitamin D 250 mg |DRUG |
|Heparin 5000 units subcutaneously|DRUG |
|Prilosec 20 mg |DRUG |
|Senna two tabs |DRUG |
+---------------------------------+---------+

```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|ner_medication_generic_pipeline|
|Type:|pipeline|
|Compatibility:|Healthcare NLP 5.3.1+|
|License:|Licensed|
|Edition:|Official|
|Language:|en|
|Size:|1.7 GB|

## Included Models

- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverter
- TextMatcherInternalModel
- ChunkMergeModel
Loading

0 comments on commit 202c138

Please sign in to comment.