diff --git a/docs/_posts/Cabir40/2023-04-11-umls_clinical_findings_resolver_pipeline_en.md b/docs/_posts/Cabir40/2023-04-11-umls_clinical_findings_resolver_pipeline_en.md new file mode 100644 index 0000000000..9516d50b01 --- /dev/null +++ b/docs/_posts/Cabir40/2023-04-11-umls_clinical_findings_resolver_pipeline_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Clinical Findings to UMLS Code Pipeline +author: John Snow Labs +name: umls_clinical_findings_resolver_pipeline +date: 2023-04-11 +tags: [licensed, clinical, en, umls, pipeline] +task: Chunk Mapping +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pretrained pipeline maps entities (Clinical Findings) with their corresponding UMLS CUI codes. You’ll just feed your text and it will return the corresponding UMLS codes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/umls_clinical_findings_resolver_pipeline_en_4.3.2_3.0_1681216655167.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/umls_clinical_findings_resolver_pipeline_en_4.3.2_3.0_1681216655167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +from sparknlp.pretrained import PretrainedPipeline + +pipeline = PretrainedPipeline("umls_clinical_findings_resolver_pipeline", "en", "clinical/models") + +text = 'HTG-induced pancreatitis associated with an acute hepatitis, and obesity' + +result = pipeline.annotate(text) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val pipeline = new PretrainedPipeline("umls_clinical_findings_resolver_pipeline", "en", "clinical/models") + +val text = "HTG-induced pancreatitis associated with an acute hepatitis, and obesity" + +val result = pipeline.annotate(text) +``` +
+ +## Results + +```bash ++------------------------+---------+---------+ +|chunk |ner_label|umls_code| ++------------------------+---------+---------+ +|HTG-induced pancreatitis|PROBLEM |C1963198 | +|an acute hepatitis |PROBLEM |C4750596 | +|obesity |PROBLEM |C1963185 | ++------------------------+---------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|umls_clinical_findings_resolver_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|4.3 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperFilterer +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel +- ResolverMerger \ No newline at end of file diff --git a/docs/_posts/Cabir40/2023-04-11-umls_drug_substance_resolver_pipeline_en.md b/docs/_posts/Cabir40/2023-04-11-umls_drug_substance_resolver_pipeline_en.md new file mode 100644 index 0000000000..a2659efd60 --- /dev/null +++ b/docs/_posts/Cabir40/2023-04-11-umls_drug_substance_resolver_pipeline_en.md @@ -0,0 +1,89 @@ +--- +layout: model +title: Drug Substance to UMLS Code Pipeline +author: John Snow Labs +name: umls_drug_substance_resolver_pipeline +date: 2023-04-11 +tags: [licensed, clinical, en, umls, pipeline, drug, subtance] +task: Chunk Mapping +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pretrained pipeline maps entities (Drug Substances) with their corresponding UMLS CUI codes. You’ll just feed your text and it will return the corresponding UMLS codes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/umls_drug_substance_resolver_pipeline_en_4.3.2_3.0_1681217098344.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/umls_drug_substance_resolver_pipeline_en_4.3.2_3.0_1681217098344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +from sparknlp.pretrained import PretrainedPipeline + +pipeline = PretrainedPipeline("umls_drug_substance_resolver_pipeline", "en", "clinical/models") + +result = pipeline.annotate("The patient was given metformin, lenvatinib and Magnesium hydroxide 100mg/1ml") +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val pipeline = PretrainedPipeline("umls_drug_substance_resolver_pipeline", "en", "clinical/models") + +val result = pipeline.annotate("The patient was given metformin, lenvatinib and Magnesium hydroxide 100mg/1ml") +``` + +{:.nlu-block} +```python ++-----------------------------+---------+---------+ +|chunk |ner_label|umls_code| ++-----------------------------+---------+---------+ +|metformin |DRUG |C0025598 | +|lenvatinib |DRUG |C2986924 | +|Magnesium hydroxide 100mg/1ml|DRUG |C1134402 | ++-----------------------------+---------+---------+ +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|umls_drug_substance_resolver_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|5.1 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetector +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverter +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperFilterer +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel +- ResolverMerger \ No newline at end of file diff --git a/docs/_posts/Cabir40/2023-04-12-biogpt_chat_jsl_en.md b/docs/_posts/Cabir40/2023-04-12-biogpt_chat_jsl_en.md new file mode 100644 index 0000000000..c0a65ba797 --- /dev/null +++ b/docs/_posts/Cabir40/2023-04-12-biogpt_chat_jsl_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Clinical QA BioGPT (JSL) +author: John Snow Labs +name: biogpt_chat_jsl +date: 2023-04-12 +tags: [licensed, en, clinical, text_generation, biogpt, tensorflow] +task: Text Generation +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalTextGenerator +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model is based on BioGPT finetuned with medical conversations happening in a clinical settings and can answer clinical questions related to symptoms, drugs, tests, and diseases. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/biogpt_chat_jsl_en_4.3.2_3.0_1681319163583.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/biogpt_chat_jsl_en_4.3.2_3.0_1681319163583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + +gpt_qa = MedicalTextGenerator.pretrained("biogpt_chat_jsl", "en", "clinical/models")\ + .setInputCols("documents")\ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([document_assembler, gpt_qa]) + +data = spark.createDataFrame([["How to treat asthma ?"]]).toDF("text") + +pipeline.fit(data).transform(data) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val summarizer = MedicalTextGenerator.pretrained("biogpt_chat_jsl", "en", "clinical/models") + .setInputCols("documents") + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(document_assembler, summarizer)) + +val text = "How to treat asthma ?" + +val data = Seq(Array(text)).toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) + +``` +
+ +## Results + +```bash + +['Asthma is itself an allergic disease due to cold or dust or pollen or grass etc. irrespective of the triggering factor. You can go for pulmonary function tests if not done. Treatment is mainly symptomatic which might require inhalation steroids, beta agonists, anticholinergics as MDI or rota haler as a regular treatment. To decrease the inflammation of bronchi and bronchioles, you might be given oral antihistamines with mast cell stabilizers (montelukast) and steroids (prednisolone) with nebulization and frequently steam inhalation. To decrease the bronchoconstriction caused by allergens, you might be given oral antihistamines with mast cell stabilizers (montelukast) and steroids (prednisolone) with nebulization and frequently steam inhalation. The best way to cure any allergy is a complete avoidance of allergen or triggering factor. Consult your pulmonologist for further advise.'] + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biogpt_chat_jsl| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.4 GB| +|Case sensitive:|true| \ No newline at end of file diff --git a/docs/_posts/Damla-Gurbaz/2023-04-12-ner_jsl_emb_clinical_large_en.md b/docs/_posts/Damla-Gurbaz/2023-04-12-ner_jsl_emb_clinical_large_en.md new file mode 100644 index 0000000000..ddb188833a --- /dev/null +++ b/docs/_posts/Damla-Gurbaz/2023-04-12-ner_jsl_emb_clinical_large_en.md @@ -0,0 +1,325 @@ +--- +layout: model +title: Detect Clinical Entities (clinical_large) +author: John Snow Labs +name: ner_jsl_emb_clinical_large +date: 2023-04-12 +tags: [ner, clinical_large, en, licensed] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained named entity recognition deep learning model for clinical terminology. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state-of-the-art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN. This model is the official version of jsl_ner_wip_clinical model. + +Definitions of Predicted Entities: + +- `Injury_or_Poisoning`: Physical harm or injury caused to the body, including those caused by accidents, falls, or poisoning of a patient or someone else. +- `Direction`: All the information relating to the laterality of the internal and external organs. +- `Test`: Mentions of laboratory, pathology, and radiological tests. +- `Admission_Discharge`: Terms that indicate the admission and/or the discharge of a patient. +- `Death_Entity`: Mentions that indicate the death of a patient. +- `Relationship_Status`: State of patients romantic or social relationships (e.g. single, married, divorced). +- `Duration`: The duration of a medical treatment or medication use. +- `Respiration`: Number of breaths per minute. +- `Hyperlipidemia`: Terms that indicate hyperlipidemia with relevant subtypes and synonims. +- `Birth_Entity`: Mentions that indicate giving birth. +- `Age`: All mention of ages, past or present, related to the patient or with anybody else. +- `Labour_Delivery`: Extractions include stages of labor and delivery. +- `Family_History_Header`: identifies section headers that correspond to Family History of the patient. +- `BMI`: Numeric values and other text information related to Body Mass Index. +- `Temperature`: All mentions that refer to body temperature. +- `Alcohol`: Terms that indicate alcohol use, abuse or drinking issues of a patient or someone else. +- `Kidney_Disease`: Terms that refer to any kidney diseases (includes mentions of modifiers such as "Acute" or "Chronic"). +- `Oncological`: All the cancer, tumor or metastasis related extractions mentioned in the document, of the patient or someone else. +- `Medical_History_Header`: Identifies section headers that correspond to Past Medical History of a patient. +- `Cerebrovascular_Disease`: All terms that refer to cerebrovascular diseases and events. +- `Oxygen_Therapy`: Breathing support triggered by patient or entirely or partially by machine (e.g. ventilator, BPAP, CPAP). +- `O2_Saturation`: Systemic arterial, venous or peripheral oxygen saturation measurements. +- `Psychological_Condition`: All the Mental health diagnosis, disorders, conditions or syndromes of a patient or someone else. +- `Heart_Disease`: All mentions of acquired, congenital or degenerative heart diseases. +- `Employment`: All mentions of patient or provider occupational titles and employment status . +- `Obesity`: Terms related to a patient being obese (overweight and BMI are extracted as different labels). +- `Disease_Syndrome_Disorder`: All the diseases mentioned in the document, of the patient or someone else (excluding diseases that are extracted with their specific labels, such as "Heart_Disease" etc.). +- `Pregnancy`: All terms related to Pregnancy (excluding terms that are extracted with their specific labels, such as "Labour_Delivery" etc.). +- `ImagingFindings`: All mentions of radiographic and imagistic findings. +- `Procedure`: All mentions of invasive medical or surgical procedures or treatments. +- `Medical_Device`: All mentions related to medical devices and supplies. +- `Race_Ethnicity`: All terms that refer to racial and national origin of sociocultural groups. +- `Section_Header`: All the section headers present in the text (Medical History, Family History, Social History, Physical Examination and Vital signs Headers are extracted separately with their specific labels). +- `Symptom`: All the symptoms mentioned in the document, of a patient or someone else. +- `Treatment`: Includes therapeutic and minimally invasive treatment and procedures (invasive treatments or procedures are extracted as "Procedure"). +- `Substance`: All mentions of substance use related to the patient or someone else (recreational drugs, illicit drugs). +- `Route`: Drug and medication administration routes available described by [FDA](http://wayback.archive-it.org/7993/20171115111313/https:/www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm). +- `Drug_Ingredient`: Active ingredient/s found in drug products. +- `Blood_Pressure`: Systemic blood pressure, mean arterial pressure, systolic and/or diastolic are extracted. +- `Diet`: All mentions and information regarding patients dietary habits. +- `External_body_part_or_region`: All mentions related to external body parts or organs that can be examined by naked eye. +- `LDL`: All mentions related to the lab test and results for LDL (Low Density Lipoprotein). +- `VS_Finding`: Qualitative data (e.g. Fever, Cyanosis, Tachycardia) and any other symptoms that refers to vital signs. +- `Allergen`: Allergen related extractions mentioned in the document. +- `EKG_Findings`: All mentions of EKG readings. +- `Imaging_Technique`: All mentions of special radiographic views or special imaging techniques used in radiology. +- `Triglycerides`: All mentions terms related to specific lab test for Triglycerides. +- `RelativeTime`: Time references that are relative to different times or events (e.g. words such as "approximately", "in the morning"). +- `Gender`: Gender-specific nouns and pronouns. +- `Pulse`: Peripheral heart rate, without advanced information like measurement location. +- `Social_History_Header`: Identifies section headers that correspond to Social History of a patient. +- `Substance_Quantity`: All mentions of substance quantity (quantitative information related to illicit/recreational drugs). +- `Diabetes`: All terms related to diabetes mellitus. +- `Modifier`: Terms that modify the symptoms, diseases or risk factors. If a modifier is included in ICD-10 name of a specific disease, the respective modifier is not extracted separately. +- `Internal_organ_or_component`: All mentions related to internal body parts or organs that can not be examined by naked eye. +- `Clinical_Dept`: Terms that indicate the medical and/or surgical departments. +- `Form`: Drug and medication forms available described by [FDA](http://wayback.archive-it.org/7993/20171115111313/https:/www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm). +- `Drug_BrandName`: Commercial labeling name chosen by the labeler or the drug manufacturer for a drug containing a single or multiple drug active ingredients. +- `Strength`: Potency of one unit of drug (or a combination of drugs) the measurement units available are described by [FDA](http://wayback.archive-it.org/7993/20171115111313/https:/www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm). +- `Fetus_NewBorn`: All terms related to fetus, infant, new born (excluding terms that are extracted with their specific labels, such as "Labour_Delivery", "Pregnancy" etc.). +- `RelativeDate`: Temporal references that are relative to the date of the text or to any other specific date (e.g. "approximately two years ago", "about two days ago"). +- `Height`: All mentions related to a patients height. +- `Test_Result`: Terms related to all the test results present in the document (clinical tests results are included). +- `Sexually_Active_or_Sexual_Orientation`: All terms that are related to sexuality, sexual orientations and sexual activity. +- `Frequency`: Frequency of administration for a dose prescribed. +- `Time`: Specific time references (hour and/or minutes). +- `Weight`: All mentions related to a patients weight. +- `Vaccine`: Generic and brand name of vaccines or vaccination procedure. +- `Vital_Signs_Header`: Identifies section headers that correspond to Vital Signs of a patient. +- `Communicable_Disease`: Includes all mentions of communicable diseases. +- `Dosage`: Quantity prescribed by the physician for an active ingredient; measurement units are available described by [FDA](http://wayback.archive-it.org/7993/20171115111313/https:/www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm). +- `Overweight`: Terms related to the patient being overweight (BMI and Obesity is extracted separately). +- `Hypertension`: All terms related to Hypertension (quantitative data such as 150/100 is extracted as Blood_Pressure). +- `HDL`: Terms related to the lab test for HDL (High Density Lipoprotein). +- `Total_Cholesterol`: Terms related to the lab test and results for cholesterol. +- `Smoking`: All mentions of smoking status of a patient. +- `Date`: Mentions of an exact date, in any format, including day number, month and/or year. + +## Predicted Entities + +`Injury_or_Poisoning`, `Direction`, `Test`, `Admission_Discharge`, `Death_Entity`, `Relationship_Status`, `Duration`, `Respiration`, `Hyperlipidemia`, `Birth_Entity`, `Age`, `Labour_Delivery`, `Family_History_Header`, `BMI`, `Temperature`, `Alcohol`, `Kidney_Disease`, `Oncological`, `Medical_History_Header`, `Cerebrovascular_Disease`, `Oxygen_Therapy`, `O2_Saturation`, `Psychological_Condition`, `Heart_Disease`, `Employment`, `Obesity`, `Disease_Syndrome_Disorder`, `Pregnancy`, `ImagingFindings`, `Procedure`, `Medical_Device`, `Race_Ethnicity`, `Section_Header`, `Symptom`, `Treatment`, `Substance`, `Route`, `Drug_Ingredient`, `Blood_Pressure`, `Diet`, `External_body_part_or_region`, `LDL`, `VS_Finding`, `Allergen`, `EKG_Findings`, `Imaging_Technique`, `Triglycerides`, `RelativeTime`, `Gender`, `Pulse`, `Social_History_Header`, `Substance_Quantity`, `Diabetes`, `Modifier`, `Internal_organ_or_component`, `Clinical_Dept`, `Form`, `Drug_BrandName`, `Strength`, `Fetus_NewBorn`, `RelativeDate`, `Height`, `Test_Result`, `Sexually_Active_or_Sexual_Orientation`, `Frequency`, `Time`, `Weight`, `Vaccine`, `Vaccine_Name`, `Vital_Signs_Header`, `Communicable_Disease`, `Dosage`, `Overweight`, `Hypertension`, `HDL`, `Total_Cholesterol`, `Smoking`, `Date` + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/NER_JSL/){:.button.button-orange} +[Open in Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/1.Clinical_Named_Entity_Recognition_Model.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_jsl_emb_clinical_large_en_4.3.2_3.0_1681313273872.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_jsl_emb_clinical_large_en_4.3.2_3.0_1681313273872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") \ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +ner = MedicalNerModel.pretrained("ner_jsl_emb_clinical_large", "en", "clinical/models")\ + .setInputCols(["sentence","token","embeddings"])\ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +ner_pipeline = Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + word_embeddings, + ner, + ner_converter]) + + +data = spark.createDataFrame([["""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). Additionally, there is no side effect observed after Influenza vaccine. One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature. +"""]]).toDF("text") + +result = ner_pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val jsl_ner = MedicalNerModel.pretrained("ner_jsl_emb_clinical_large", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "embeddings")) + .setOutputCol("ner") + +val jsl_ner_converter = new NerConverter() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + +val jsl_ner_pipeline = new Pipeline().setStages(Array( + documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + jsl_ner, + jsl_ner_converter)) + + +val data = Seq("""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). Additionally, there is no side effect observed after Influenza vaccine. One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature.""").toDS.toDF("text") + +val result = jsl_ner_pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++-----------------------------------------+-----+---+----------------------------+ +|chunk |begin|end|ner_label | ++-----------------------------------------+-----+---+----------------------------+ +|21-day-old |17 |26 |Age | +|Caucasian |28 |36 |Race_Ethnicity | +|male |38 |41 |Gender | +|for 2 days |48 |57 |Duration | +|congestion |62 |71 |Symptom | +|mom |75 |77 |Gender | +|suctioning |88 |97 |Modifier | +|yellow |99 |104|Modifier | +|discharge |106 |114|Symptom | +|nares |135 |139|External_body_part_or_region| +|she |147 |149|Gender | +|mild |168 |171|Modifier | +|problems with his breathing while feeding|173 |213|Symptom | +|perioral cyanosis |237 |253|Symptom | +|retractions |258 |268|Symptom | +|Influenza vaccine |325 |341|Vaccine_Name | +|One day ago |344 |354|RelativeDate | +|mom |357 |359|Gender | +|tactile temperature |376 |394|Symptom | +|Tylenol |417 |423|Drug_BrandName | ++-----------------------------------------+-----+---+----------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_jsl_emb_clinical_large| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, word_embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 MB| + +## Benchmarking + +```bash + label precision recall f1-score support + Internal_organ_or_component 0.88 0.90 0.89 10419 + Injury_or_Poisoning 0.90 0.78 0.83 945 + Diabetes 0.99 0.97 0.98 146 + Drug_Ingredient 0.92 0.92 0.92 1988 + Frequency 0.91 0.91 0.91 1110 + Height 0.97 0.88 0.92 81 + Disease_Syndrome_Disorder 0.83 0.90 0.86 4909 + Strength 0.96 0.90 0.93 848 + Form 0.85 0.79 0.82 261 + Symptom 0.87 0.80 0.83 11966 + Route 0.91 0.92 0.91 976 + Procedure 0.88 0.88 0.88 6395 + Gender 0.99 0.99 0.99 5686 + RelativeTime 0.82 0.68 0.75 367 + Vaccine 0.50 0.14 0.22 14 + Psychological_Condition 0.89 0.70 0.78 186 + Direction 0.89 0.92 0.90 4447 + External_body_part_or_region 0.88 0.84 0.86 3246 + Section_Header 0.98 0.96 0.97 9564 + Age 0.90 0.92 0.91 750 + Modifier 0.85 0.76 0.81 3027 + Heart_Disease 0.97 0.82 0.89 849 + Drug_BrandName 0.92 0.93 0.92 1011 + Hyperlipidemia 0.93 0.84 0.88 31 + Test 0.89 0.83 0.86 4337 + Oncological 0.93 0.94 0.94 781 + Labour_Delivery 0.78 0.68 0.73 158 + Clinical_Dept 0.94 0.91 0.93 1714 + Treatment 0.84 0.78 0.81 347 + Oxygen_Therapy 0.81 0.80 0.81 120 + Duration 0.81 0.87 0.84 927 + Admission_Discharge 0.94 0.94 0.94 343 + RelativeDate 0.91 0.86 0.89 1403 + Hypertension 0.87 0.97 0.91 122 + Employment 0.90 0.76 0.83 369 + Dosage 0.87 0.84 0.85 461 + Medical_Device 0.88 0.92 0.90 5499 + Test_Result 0.85 0.78 0.81 1321 + Time 0.73 0.65 0.69 34 + Date 0.97 0.94 0.95 591 + Obesity 0.88 1.00 0.94 45 + Race_Ethnicity 0.99 0.99 0.99 120 + Imaging_Technique 0.75 0.42 0.54 91 + ImagingFindings 0.69 0.40 0.50 291 + Cerebrovascular_Disease 0.85 0.65 0.74 133 + Diet 0.78 0.52 0.62 114 + Fetus_NewBorn 0.75 0.53 0.62 180 + Kidney_Disease 0.95 0.94 0.94 168 + Weight 0.90 0.91 0.91 243 + Blood_Pressure 0.84 0.84 0.84 336 + Pulse 0.83 0.96 0.89 311 + Temperature 0.88 0.96 0.92 182 + O2_Saturation 0.90 0.64 0.75 95 + VS_Finding 0.72 0.74 0.73 311 + Death_Entity 0.79 0.66 0.72 50 + Total_Cholesterol 0.72 0.87 0.79 30 + Substance 0.94 0.89 0.92 103 + Relationship_Status 0.93 0.81 0.87 48 + Alcohol 0.92 0.87 0.90 84 + Vital_Signs_Header 0.93 0.99 0.96 656 + Respiration 0.94 0.95 0.95 156 + Family_History_Header 0.97 0.99 0.98 224 + Pregnancy 0.82 0.69 0.75 203 + Smoking 0.98 0.98 0.98 109 + Vaccine_Name 0.89 0.55 0.68 31 + EKG_Findings 0.64 0.27 0.38 154 + Allergen 0.60 0.75 0.67 12 + Medical_History_Header 0.95 0.96 0.95 411 + Social_History_Header 0.91 0.97 0.94 213 + Overweight 0.83 0.83 0.83 6 + Communicable_Disease 0.73 0.51 0.60 47 + Birth_Entity 0.00 0.00 0.00 6 + Triglycerides 1.00 1.00 1.00 4 + HDL 0.62 1.00 0.77 5 + LDL 1.00 1.00 1.00 5 + BMI 1.00 1.00 1.00 17 +Sexually_Active_or_Sexual_Orientation 1.00 0.57 0.73 7 + micro-avg 0.90 0.88 0.89 92950 + macro-avg 0.86 0.81 0.83 92950 + weighted-avg 0.90 0.88 0.89 92950 +``` \ No newline at end of file diff --git a/docs/_posts/Damla-Gurbaz/2023-04-12-ner_jsl_emb_clinical_medium_en.md b/docs/_posts/Damla-Gurbaz/2023-04-12-ner_jsl_emb_clinical_medium_en.md new file mode 100644 index 0000000000..1eefd5abe3 --- /dev/null +++ b/docs/_posts/Damla-Gurbaz/2023-04-12-ner_jsl_emb_clinical_medium_en.md @@ -0,0 +1,326 @@ +--- +layout: model +title: Detect Clinical Entities (clinical_medium) +author: John Snow Labs +name: ner_jsl_emb_clinical_medium +date: 2023-04-12 +tags: [ner, licensed, clinical, en, clinical_medium] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained named entity recognition deep learning model for clinical terminology. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state-of-the-art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN. This model is the official version of jsl_ner_wip_clinical model. + +Definitions of Predicted Entities: + +- `Injury_or_Poisoning`: Physical harm or injury caused to the body, including those caused by accidents, falls, or poisoning of a patient or someone else. +- `Direction`: All the information relating to the laterality of the internal and external organs. +- `Test`: Mentions of laboratory, pathology, and radiological tests. +- `Admission_Discharge`: Terms that indicate the admission and/or the discharge of a patient. +- `Death_Entity`: Mentions that indicate the death of a patient. +- `Relationship_Status`: State of patients romantic or social relationships (e.g. single, married, divorced). +- `Duration`: The duration of a medical treatment or medication use. +- `Respiration`: Number of breaths per minute. +- `Hyperlipidemia`: Terms that indicate hyperlipidemia with relevant subtypes and synonims. +- `Birth_Entity`: Mentions that indicate giving birth. +- `Age`: All mention of ages, past or present, related to the patient or with anybody else. +- `Labour_Delivery`: Extractions include stages of labor and delivery. +- `Family_History_Header`: identifies section headers that correspond to Family History of the patient. +- `BMI`: Numeric values and other text information related to Body Mass Index. +- `Temperature`: All mentions that refer to body temperature. +- `Alcohol`: Terms that indicate alcohol use, abuse or drinking issues of a patient or someone else. +- `Kidney_Disease`: Terms that refer to any kidney diseases (includes mentions of modifiers such as "Acute" or "Chronic"). +- `Oncological`: All the cancer, tumor or metastasis related extractions mentioned in the document, of the patient or someone else. +- `Medical_History_Header`: Identifies section headers that correspond to Past Medical History of a patient. +- `Cerebrovascular_Disease`: All terms that refer to cerebrovascular diseases and events. +- `Oxygen_Therapy`: Breathing support triggered by patient or entirely or partially by machine (e.g. ventilator, BPAP, CPAP). +- `O2_Saturation`: Systemic arterial, venous or peripheral oxygen saturation measurements. +- `Psychological_Condition`: All the Mental health diagnosis, disorders, conditions or syndromes of a patient or someone else. +- `Heart_Disease`: All mentions of acquired, congenital or degenerative heart diseases. +- `Employment`: All mentions of patient or provider occupational titles and employment status . +- `Obesity`: Terms related to a patient being obese (overweight and BMI are extracted as different labels). +- `Disease_Syndrome_Disorder`: All the diseases mentioned in the document, of the patient or someone else (excluding diseases that are extracted with their specific labels, such as "Heart_Disease" etc.). +- `Pregnancy`: All terms related to Pregnancy (excluding terms that are extracted with their specific labels, such as "Labour_Delivery" etc.). +- `ImagingFindings`: All mentions of radiographic and imagistic findings. +- `Procedure`: All mentions of invasive medical or surgical procedures or treatments. +- `Medical_Device`: All mentions related to medical devices and supplies. +- `Race_Ethnicity`: All terms that refer to racial and national origin of sociocultural groups. +- `Section_Header`: All the section headers present in the text (Medical History, Family History, Social History, Physical Examination and Vital signs Headers are extracted separately with their specific labels). +- `Symptom`: All the symptoms mentioned in the document, of a patient or someone else. +- `Treatment`: Includes therapeutic and minimally invasive treatment and procedures (invasive treatments or procedures are extracted as "Procedure"). +- `Substance`: All mentions of substance use related to the patient or someone else (recreational drugs, illicit drugs). +- `Route`: Drug and medication administration routes available described by [FDA](http://wayback.archive-it.org/7993/20171115111313/https:/www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm). +- `Drug_Ingredient`: Active ingredient/s found in drug products. +- `Blood_Pressure`: Systemic blood pressure, mean arterial pressure, systolic and/or diastolic are extracted. +- `Diet`: All mentions and information regarding patients dietary habits. +- `External_body_part_or_region`: All mentions related to external body parts or organs that can be examined by naked eye. +- `LDL`: All mentions related to the lab test and results for LDL (Low Density Lipoprotein). +- `VS_Finding`: Qualitative data (e.g. Fever, Cyanosis, Tachycardia) and any other symptoms that refers to vital signs. +- `Allergen`: Allergen related extractions mentioned in the document. +- `EKG_Findings`: All mentions of EKG readings. +- `Imaging_Technique`: All mentions of special radiographic views or special imaging techniques used in radiology. +- `Triglycerides`: All mentions terms related to specific lab test for Triglycerides. +- `RelativeTime`: Time references that are relative to different times or events (e.g. words such as "approximately", "in the morning"). +- `Gender`: Gender-specific nouns and pronouns. +- `Pulse`: Peripheral heart rate, without advanced information like measurement location. +- `Social_History_Header`: Identifies section headers that correspond to Social History of a patient. +- `Substance_Quantity`: All mentions of substance quantity (quantitative information related to illicit/recreational drugs). +- `Diabetes`: All terms related to diabetes mellitus. +- `Modifier`: Terms that modify the symptoms, diseases or risk factors. If a modifier is included in ICD-10 name of a specific disease, the respective modifier is not extracted separately. +- `Internal_organ_or_component`: All mentions related to internal body parts or organs that can not be examined by naked eye. +- `Clinical_Dept`: Terms that indicate the medical and/or surgical departments. +- `Form`: Drug and medication forms available described by [FDA](http://wayback.archive-it.org/7993/20171115111313/https:/www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm). +- `Drug_BrandName`: Commercial labeling name chosen by the labeler or the drug manufacturer for a drug containing a single or multiple drug active ingredients. +- `Strength`: Potency of one unit of drug (or a combination of drugs) the measurement units available are described by [FDA](http://wayback.archive-it.org/7993/20171115111313/https:/www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm). +- `Fetus_NewBorn`: All terms related to fetus, infant, new born (excluding terms that are extracted with their specific labels, such as "Labour_Delivery", "Pregnancy" etc.). +- `RelativeDate`: Temporal references that are relative to the date of the text or to any other specific date (e.g. "approximately two years ago", "about two days ago"). +- `Height`: All mentions related to a patients height. +- `Test_Result`: Terms related to all the test results present in the document (clinical tests results are included). +- `Sexually_Active_or_Sexual_Orientation`: All terms that are related to sexuality, sexual orientations and sexual activity. +- `Frequency`: Frequency of administration for a dose prescribed. +- `Time`: Specific time references (hour and/or minutes). +- `Weight`: All mentions related to a patients weight. +- `Vaccine`: Generic and brand name of vaccines or vaccination procedure. +- `Vital_Signs_Header`: Identifies section headers that correspond to Vital Signs of a patient. +- `Communicable_Disease`: Includes all mentions of communicable diseases. +- `Dosage`: Quantity prescribed by the physician for an active ingredient; measurement units are available described by [FDA](http://wayback.archive-it.org/7993/20171115111313/https:/www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm). +- `Overweight`: Terms related to the patient being overweight (BMI and Obesity is extracted separately). +- `Hypertension`: All terms related to Hypertension (quantitative data such as 150/100 is extracted as Blood_Pressure). +- `HDL`: Terms related to the lab test for HDL (High Density Lipoprotein). +- `Total_Cholesterol`: Terms related to the lab test and results for cholesterol. +- `Smoking`: All mentions of smoking status of a patient. +- `Date`: Mentions of an exact date, in any format, including day number, month and/or year. + +## Predicted Entities + +`Injury_or_Poisoning`, `Direction`, `Test`, `Admission_Discharge`, `Death_Entity`, `Relationship_Status`, `Duration`, `Respiration`, `Hyperlipidemia`, `Birth_Entity`, `Age`, `Labour_Delivery`, `Family_History_Header`, `BMI`, `Temperature`, `Alcohol`, `Kidney_Disease`, `Oncological`, `Medical_History_Header`, `Cerebrovascular_Disease`, `Oxygen_Therapy`, `O2_Saturation`, `Psychological_Condition`, `Heart_Disease`, `Employment`, `Obesity`, `Disease_Syndrome_Disorder`, `Pregnancy`, `ImagingFindings`, `Procedure`, `Medical_Device`, `Race_Ethnicity`, `Section_Header`, `Symptom`, `Treatment`, `Substance`, `Route`, `Drug_Ingredient`, `Blood_Pressure`, `Diet`, `External_body_part_or_region`, `LDL`, `VS_Finding`, `Allergen`, `EKG_Findings`, `Imaging_Technique`, `Triglycerides`, `RelativeTime`, `Gender`, `Pulse`, `Social_History_Header`, `Substance_Quantity`, `Diabetes`, `Modifier`, `Internal_organ_or_component`, `Clinical_Dept`, `Form`, `Drug_BrandName`, `Strength`, `Fetus_NewBorn`, `RelativeDate`, `Height`, `Test_Result`, `Sexually_Active_or_Sexual_Orientation`, `Frequency`, `Time`, `Weight`, `Vaccine`, `Vaccine_Name`, `Vital_Signs_Header`, `Communicable_Disease`, `Dosage`, `Overweight`, `Hypertension`, `HDL`, `Total_Cholesterol`, `Smoking`, `Date` + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/NER_JSL/){:.button.button-orange} +[Open in Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/1.Clinical_Named_Entity_Recognition_Model.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_jsl_emb_clinical_medium_en_4.3.2_3.0_1681306334405.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_jsl_emb_clinical_medium_en_4.3.2_3.0_1681306334405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") \ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_medium", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +ner = MedicalNerModel.pretrained("ner_jsl_emb_clinical_medium", "en", "clinical/models")\ + .setInputCols(["sentence","token","embeddings"])\ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +ner_pipeline = Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + word_embeddings, + ner, + ner_converter]) + + +data = spark.createDataFrame([["""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). Additionally, there is no side effect observed after Influenza vaccine. One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature. +"""]]).toDF("text") + +result = ner_pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_medium", "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val jsl_ner = MedicalNerModel.pretrained("ner_jsl_emb_clinical_medium", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "embeddings")) + .setOutputCol("ner") + +val jsl_ner_converter = new NerConverter() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + +val jsl_ner_pipeline = new Pipeline().setStages(Array( + documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + jsl_ner, + jsl_ner_converter)) + + +val data = Seq("""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). Additionally, there is no side effect observed after Influenza vaccine. One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature.""").toDS.toDF("text") + +val result = jsl_ner_pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++-----------------------------------------+-----+---+----------------------------+ +|chunk |begin|end|ner_label | ++-----------------------------------------+-----+---+----------------------------+ +|21-day-old |17 |26 |Age | +|Caucasian |28 |36 |Race_Ethnicity | +|male |38 |41 |Gender | +|2 days |52 |57 |Duration | +|congestion |62 |71 |Symptom | +|mom |75 |77 |Gender | +|suctioning yellow discharge |88 |114|Symptom | +|nares |135 |139|External_body_part_or_region| +|she |147 |149|Gender | +|mild |168 |171|Modifier | +|problems with his breathing while feeding|173 |213|Symptom | +|perioral cyanosis |237 |253|Symptom | +|retractions |258 |268|Symptom | +|Influenza vaccine |325 |341|Vaccine_Name | +|One day ago |344 |354|RelativeDate | +|mom |357 |359|Gender | +|tactile temperature |376 |394|Symptom | +|Tylenol |417 |423|Drug_BrandName | +|Baby |426 |429|Age | +|decreased p.o |449 |461|Symptom | ++-----------------------------------------+-----+---+----------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_jsl_emb_clinical_medium| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|15.2 MB| + +## Benchmarking + +```bash + label precision recall f1-score support + Drug_Ingredient 0.91 0.94 0.93 1905 + Disease_Syndrome_Disorder 0.85 0.89 0.87 4949 + Drug_BrandName 0.94 0.92 0.93 963 + Strength 0.95 0.93 0.94 759 + Route 0.90 0.94 0.92 943 + Internal_organ_or_component 0.89 0.90 0.89 10310 + Dosage 0.95 0.78 0.86 478 + Frequency 0.90 0.87 0.88 1016 + Treatment 0.90 0.70 0.78 332 + Procedure 0.85 0.91 0.88 6433 + Gender 0.98 0.99 0.99 5586 + RelativeTime 0.79 0.70 0.74 306 + Direction 0.91 0.90 0.91 4344 + Modifier 0.84 0.82 0.83 2863 + Symptom 0.84 0.83 0.84 11599 + Date 0.94 0.98 0.96 546 + External_body_part_or_region 0.89 0.85 0.87 3270 + Section_Header 0.98 0.97 0.98 9320 + Age 0.85 0.92 0.88 757 + Substance 0.91 0.85 0.88 113 + VS_Finding 0.86 0.60 0.70 304 + Medical_Device 0.87 0.93 0.90 5475 + Oxygen_Therapy 0.89 0.85 0.87 117 + Test 0.87 0.87 0.87 4491 + Diabetes 0.95 0.97 0.96 149 + Duration 0.86 0.86 0.86 1009 + ImagingFindings 0.83 0.50 0.63 353 + Hyperlipidemia 0.80 0.87 0.84 47 + Hypertension 0.97 0.94 0.96 152 + RelativeDate 0.90 0.89 0.89 1338 + Clinical_Dept 0.92 0.94 0.93 1771 + Kidney_Disease 0.90 0.96 0.93 228 + Heart_Disease 0.93 0.85 0.89 967 + Diet 0.67 0.62 0.65 106 + Weight 0.93 0.93 0.93 254 + Test_Result 0.79 0.81 0.80 1470 + Form 0.85 0.85 0.85 254 + Time 0.80 0.72 0.76 76 + Psychological_Condition 0.81 0.76 0.79 187 + Injury_or_Poisoning 0.84 0.80 0.82 889 + Admission_Discharge 0.91 0.97 0.94 301 + Labour_Delivery 0.74 0.73 0.73 110 + Employment 0.90 0.73 0.81 389 + Vaccine 0.83 0.33 0.48 15 + Obesity 0.92 0.91 0.92 54 + Oncological 0.92 0.93 0.93 784 + Smoking 0.96 0.94 0.95 106 + Imaging_Technique 0.70 0.53 0.60 98 + Blood_Pressure 0.86 0.87 0.86 314 + Pulse 0.87 0.92 0.89 278 + Respiration 0.96 0.94 0.95 180 + O2_Saturation 0.86 0.77 0.81 96 + Medical_History_Header 0.93 0.99 0.96 396 + Total_Cholesterol 0.88 0.37 0.52 19 + Cerebrovascular_Disease 0.75 0.78 0.76 108 + Pregnancy 0.90 0.71 0.79 201 + Death_Entity 0.85 0.76 0.80 46 + EKG_Findings 0.78 0.43 0.56 186 + Race_Ethnicity 0.97 0.99 0.98 118 + Family_History_Header 0.97 0.99 0.98 273 + Alcohol 0.84 0.95 0.89 84 + Fetus_NewBorn 0.75 0.54 0.63 235 + Vital_Signs_Header 0.96 0.94 0.95 710 + Relationship_Status 0.93 0.95 0.94 41 + Height 0.98 0.85 0.91 68 + Temperature 0.91 0.96 0.94 141 + Triglycerides 0.40 0.40 0.40 10 + LDL 0.94 0.68 0.79 22 + Social_History_Header 0.98 0.97 0.98 259 + Communicable_Disease 0.79 0.62 0.70 50 + Overweight 0.86 0.86 0.86 7 + Allergen 0.00 0.00 0.00 23 + Substance_Quantity 0.33 1.00 0.50 2 + Vaccine_Name 0.68 1.00 0.81 15 + Birth_Entity 0.00 0.00 0.00 3 + BMI 1.00 0.67 0.80 15 +Sexually_Active_or_Sexual_Orientation 1.00 0.80 0.89 5 + HDL 1.00 1.00 1.00 2 + micro-avg 0.89 0.89 0.89 92193 + macro-avg 0.85 0.81 0.82 92193 + weighted-avg 0.89 0.89 0.89 92193 +``` \ No newline at end of file diff --git a/docs/_posts/HashamUlHaq/2023-03-25-summarizer_clinical_jsl.md b/docs/_posts/HashamUlHaq/2023-03-25-summarizer_clinical_jsl.md new file mode 100644 index 0000000000..9be6fc6988 --- /dev/null +++ b/docs/_posts/HashamUlHaq/2023-03-25-summarizer_clinical_jsl.md @@ -0,0 +1,116 @@ +--- +layout: model +title: Summarize clinical notes +author: John Snow Labs +name: summarizer_clinical_jsl +date: 2023-03-25 +tags: [en, licensed, clinical, summarization, tensorflow] +task: Summarization +language: en +edition: Healthcare NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalSummarizer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Summarize clinical notes, encounters, critical care notes, discharge notes, reports, etc. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/summarizer_clinical_jsl_en_4.3.1_3.0_1679772340755.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/summarizer_clinical_jsl_en_4.3.1_3.0_1679772340755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document = DocumentAssembler().setInputCol('text').setOutputCol('document') + +summarizer = MedicalSummarizer.pretrained("summarizer_clinical_jsl", "en", "clinical/models").setInputCols(['document'])\ + .setOutputCol('summary')\ + .setMaxTextLength(512)\ + .setMaxNewTokens(512) + +pipeline = sparknlp.base.Pipeline(stages=[ + document, + summarizer +]) + +text = """Patient with hypertension, syncope, and spinal stenosis - for recheck. +(Medical Transcription Sample Report) +SUBJECTIVE: +The patient is a 78-year-old female who returns for recheck. She has hypertension. She denies difficulty with chest pain, palpations, orthopnea, nocturnal dyspnea, or edema. +PAST MEDICAL HISTORY / SURGERY / HOSPITALIZATIONS: +Reviewed and unchanged from the dictation on 12/03/2003. +MEDICATIONS: +Atenolol 50 mg daily, Premarin 0.625 mg daily, calcium with vitamin D two to three pills daily, multivitamin daily, aspirin as needed, and TriViFlor 25 mg two pills daily. She also has Elocon cream 0.1% and Synalar cream 0.01% that she uses as needed for rash.""" + +data = spark.createDataFrame([[text]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val summarizer = MedicalSummarizer.pretrained("summarizer_clinical_jsl", "en", "clinical/models") + .setInputCols(['document']) + .setOutputCol('summary') + .setMaxTextLength(512) + .setMaxNewTokens(512) + +val pipeline = new Pipeline().setStages(Array(document_assembler, summarizer)) + +val text = """Patient with hypertension, syncope, and spinal stenosis - for recheck. +(Medical Transcription Sample Report) +SUBJECTIVE: +The patient is a 78-year-old female who returns for recheck. She has hypertension. She denies difficulty with chest pain, palpations, orthopnea, nocturnal dyspnea, or edema. +PAST MEDICAL HISTORY / SURGERY / HOSPITALIZATIONS: +Reviewed and unchanged from the dictation on 12/03/2003. +MEDICATIONS: +Atenolol 50 mg daily, Premarin 0.625 mg daily, calcium with vitamin D two to three pills daily, multivitamin daily, aspirin as needed, and TriViFlor 25 mg two pills daily. She also has Elocon cream 0.1% and Synalar cream 0.01% that she uses as needed for rash.""" + +val data = Seq(text).toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) + +``` +
+ +## Results + +```bash +A 78-year-old female with hypertension, syncope, and spinal stenosis returns for recheck. She denies chest pain, palpations, orthopnea, nocturnal dyspnea, or edema. She is on multiple medications and has Elocon cream and Synalar cream for rash. +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|summarizer_clinical_jsl| +|Compatibility:|Healthcare NLP 4.3.1+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|920.1 MB| + +## References + +Trained on in-house curated dataset \ No newline at end of file diff --git a/docs/_posts/Meryem1425/2023-04-12-ner_oncology_emb_clinical_large_en.md b/docs/_posts/Meryem1425/2023-04-12-ner_oncology_emb_clinical_large_en.md new file mode 100644 index 0000000000..41bf6f0fdc --- /dev/null +++ b/docs/_posts/Meryem1425/2023-04-12-ner_oncology_emb_clinical_large_en.md @@ -0,0 +1,270 @@ +--- +layout: model +title: Detect Oncology-Specific Entities (clinical_large) +author: John Snow Labs +name: ner_oncology_emb_clinical_large +date: 2023-04-12 +tags: [licensed, clinical, en, oncology, biomarker, treatment, ner, clinical_large] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model extracts more than 40 oncology-related entities, including therapies, tests and staging. + +Definitions of Predicted Entities: + +`Adenopathy:` Mentions of pathological findings of the lymph nodes. +`Age:` All mention of ages, past or present, related to the patient or with anybody else. +`Biomarker:` Biological molecules that indicate the presence or absence of cancer, or the type of cancer. Oncogenes are excluded from this category. +`Biomarker_Result:` Terms or values that are identified as the result of a biomarkers. +`Cancer_Dx:` Mentions of cancer diagnoses (such as “breast cancer”) or pathological types that are usually used as synonyms for “cancer” (e.g. “carcinoma”). When anatomical references are present, they are included in the Cancer_Dx extraction. +`Cancer_Score:` Clinical or imaging scores that are specific for cancer settings (e.g. “BI-RADS” or “Allred score”). +`Cancer_Surgery:` Terms that indicate surgery as a form of cancer treatment. +`Chemotherapy:` Mentions of chemotherapy drugs, or unspecific words such as “chemotherapy”. +`Cycle_Coun:` The total number of cycles being administered of an oncological therapy (e.g. “5 cycles”). +`Cycle_Day:` References to the day of the cycle of oncological therapy (e.g. “day 5”). +`Cycle_Number:` The number of the cycle of an oncological therapy that is being applied (e.g. “third cycle”). +`Date:` Mentions of exact dates, in any format, including day number, month and/or year. +`Death_Entity:` Words that indicate the death of the patient or someone else (including family members), such as “died” or “passed away”. +`Direction:` Directional and laterality terms, such as “left”, “right”, “bilateral”, “upper” and “lower”. +`Dosage:` The quantity prescribed by the physician for an active ingredient. +`Duration:` Words indicating the duration of a treatment (e.g. “for 2 weeks”). +`Frequency:` Words indicating the frequency of treatment administration (e.g. “daily” or “bid”). +`Gender:` Gender-specific nouns and pronouns (including words such as “him” or “she”, and family members such as “father”). +`Grade:` All pathological grading of tumors (e.g. “grade 1”) or degrees of cellular differentiation (e.g. “well-differentiated”) +`Histological_Type:` Histological variants or cancer subtypes, such as “papillary”, “clear cell” or “medullary”. +`Hormonal_Therapy:` Mentions of hormonal drugs used to treat cancer, or unspecific words such as “hormonal therapy”. +`Imaging_Test:` Imaging tests mentioned in texts, such as “chest CT scan”. +`Immunotherapy:` Mentions of immunotherapy drugs, or unspecific words such as “immunotherapy”. +`Invasion:` Mentions that refer to tumor invasion, such as “invasion” or “involvement”. Metastases or lymph node involvement are excluded from this category. +`Line_Of_Therapy:` Explicit references to the line of therapy of an oncological therapy (e.g. “first-line treatment”). +`Metastasis:` Terms that indicate a metastatic disease. Anatomical references are not included in these extractions. +`Oncogene:` Mentions of genes that are implicated in the etiology of cancer. +`Pathology_Result:` The findings of a biopsy from the pathology report that is not covered by another entity (e.g. “malignant ductal cells”). +`Pathology_Test:` Mentions of biopsies or tests that use tissue samples. +`Performance_Status:` Mentions of performance status scores, such as ECOG and Karnofsky. The name of the score is extracted together with the result (e.g. “ECOG performance status of 4”). +`Race_Ethnicity:` The race and ethnicity categories include racial and national origin or sociocultural groups. +`Radiotherapy:` Terms that indicate the use of Radiotherapy. +`Response_To_Treatment:` Terms related to clinical progress of the patient related to cancer treatment, including “recurrence”, “bad response” or “improvement”. +`Relative_Date:` Temporal references that are relative to the date of the text or to any other specific date (e.g. “yesterday” or “three years later”). +`Route:` Words indicating the type of administration route (such as “PO” or “transdermal”). +`Site_Bone:` Anatomical terms that refer to the human skeleton. +`Site_Brain:` Anatomical terms that refer to the central nervous system (including the brain stem and the cerebellum). +`Site_Breast:` Anatomical terms that refer to the breasts. +`Site_Liver:` Anatomical terms that refer to the liver. +`Site_Lung:` Anatomical terms that refer to the lungs. +`Site_Lymph_Node:` Anatomical terms that refer to lymph nodes, excluding adenopathies. +`Site_Other_Body_Part:` Relevant anatomical terms that are not included in the rest of the anatomical entities. +`Smoking_Status:` All mentions of smoking related to the patient or to someone else. +`Staging:` Mentions of cancer stage such as “stage 2b” or “T2N1M0”. It also includes words such as “in situ”, “early-stage” or “advanced”. +`Targeted_Therapy:` Mentions of targeted therapy drugs, or unspecific words such as “targeted therapy”. +`Tumor_Finding:` All nonspecific terms that may be related to tumors, either malignant or benign (for example: “mass”, “tumor”, “lesion”, or “neoplasm”). +`Tumor_Size:` Size of the tumor, including numerical value and unit of measurement (e.g. “3 cm”). +`Unspecific_Therapy:` Terms that indicate a known cancer therapy but that is not specific to any other therapy entity (e.g. “chemoradiotherapy” or “adjuvant therapy”). + +## Predicted Entities + +`Histological_Type`, `Direction`, `Staging`, `Cancer_Score`, `Imaging_Test`, `Cycle_Number`, `Tumor_Finding`, `Site_Lymph_Node`, `Invasion`, `Response_To_Treatment`, `Smoking_Status`, `Tumor_Size`, `Cycle_Count`, `Adenopathy`, `Age`, `Biomarker_Result`, `Unspecific_Therapy`, `Site_Breast`, `Chemotherapy`, `Targeted_Therapy`, `Radiotherapy`, `Performance_Status`, `Pathology_Test`, `Site_Other_Body_Part`, `Cancer_Surgery`, `Line_Of_Therapy`, `Pathology_Result`, `Hormonal_Therapy`, `Site_Bone`, `Biomarker`, `Immunotherapy`, `Cycle_Day`, `Frequency`, `Route`, `Duration`, `Death_Entity`, `Metastasis`, `Site_Liver`, `Cancer_Dx`, `Grade`, `Date`, `Site_Lung`, `Site_Brain`, `Relative_Date`, `Race_Ethnicity`, `Gender`, `Oncogene`, `Dosage`, `Radiation_Dose` + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/NER_ONCOLOGY_CLINICAL/){:.button.button-orange} +[Open in Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/27.Oncology_Model.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_oncology_emb_clinical_large_en_4.3.2_3.0_1681316109615.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_oncology_emb_clinical_large_en_4.3.2_3.0_1681316109615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols(["sentence"]) \ + .setOutputCol("token") + +embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large","en","clinical/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("word_embeddings") + +ner = MedicalNerModel.pretrained("ner_oncology_emb_clinical_large", "en", "clinical/models")\ + .setInputCols(["sentence", "token", "word_embeddings"]) \ + .setOutputCol("ner") + +ner_converter = NerConverterInternal() \ + .setInputCols(["sentence", "token", "ner"]) \ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner, + ner_converter]) + +data = spark.createDataFrame([["""The had previously undergone a left mastectomy and an axillary lymph node dissection for a left breast cancer twenty years ago. +The tumor was positive for ER and PR. Postoperatively, radiotherapy was administered to the residual breast. +The cancer recurred as a right lung metastasis 13 years later. The patient underwent a regimen consisting of adriamycin (60 mg/m2) and cyclophosphamide (600 mg/m2) over six courses, as first line therapy."""]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical_large","en","clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("word_embeddings") + +val ner = MedicalNerModel.pretrained("ner_oncology_emb_clinical_large", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "word_embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner, + ner_converter)) + +val data = Seq("The had previously undergone a left mastectomy and an axillary lymph node dissection for a left breast cancer twenty years ago. +The tumor was positive for ER and PR. Postoperatively, radiotherapy was administered to the residual breast. +The cancer recurred as a right lung metastasis 13 years later. The patient underwent a regimen consisting of adriamycin (60 mg/m2) and cyclophosphamide (600 mg/m2) over six courses, as first line therapy.").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++------------------------------+-----+---+---------------------+ +|chunk |begin|end|ner_label | ++------------------------------+-----+---+---------------------+ +|left |31 |34 |Direction | +|mastectomy |36 |45 |Cancer_Surgery | +|axillary lymph node dissection|54 |83 |Cancer_Surgery | +|left |91 |94 |Direction | +|breast cancer |96 |108|Cancer_Dx | +|twenty years ago |110 |125|Relative_Date | +|tumor |132 |136|Tumor_Finding | +|positive |142 |149|Biomarker_Result | +|ER |155 |156|Biomarker | +|PR |162 |163|Biomarker | +|radiotherapy |183 |194|Radiotherapy | +|breast |229 |234|Site_Breast | +|cancer |241 |246|Cancer_Dx | +|recurred |248 |255|Response_To_Treatment| +|right |262 |266|Direction | +|lung |268 |271|Site_Lung | +|metastasis |273 |282|Metastasis | +|13 years later |284 |297|Relative_Date | +|adriamycin |346 |355|Chemotherapy | +|60 mg/m2 |358 |365|Dosage | +|cyclophosphamide |372 |387|Chemotherapy | +|600 mg/m2 |390 |398|Dosage | +|six courses |406 |416|Cycle_Count | +|first line |422 |431|Line_Of_Therapy | ++------------------------------+-----+---+---------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_oncology_emb_clinical_large| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, word_embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|15.3 MB| + +## Benchmarking + +```bash + label tp fp fn total precision recall f1 + Histological_Type 141.0 40.0 70.0 211.0 0.779 0.6682 0.7194 + Direction 672.0 150.0 159.0 831.0 0.8175 0.8087 0.8131 + Staging 102.0 27.0 36.0 138.0 0.7907 0.7391 0.764 + Cancer_Score 10.0 2.0 11.0 21.0 0.8333 0.4762 0.6061 + Imaging_Test 754.0 147.0 146.0 900.0 0.8368 0.8378 0.8373 + Cycle_Number 48.0 43.0 12.0 60.0 0.5275 0.8 0.6358 + Tumor_Finding 970.0 89.0 109.0 1079.0 0.916 0.899 0.9074 + Site_Lymph_Node 210.0 68.0 61.0 271.0 0.7554 0.7749 0.765 + Invasion 146.0 39.0 21.0 167.0 0.7892 0.8743 0.8295 +Response_To_Treat... 280.0 146.0 90.0 370.0 0.6573 0.7568 0.7035 + Smoking_Status 42.0 11.0 6.0 48.0 0.7925 0.875 0.8317 + Cycle_Count 104.0 23.0 40.0 144.0 0.8189 0.7222 0.7675 + Tumor_Size 197.0 37.0 41.0 238.0 0.8419 0.8277 0.8347 + Adenopathy 30.0 13.0 13.0 43.0 0.6977 0.6977 0.6977 + Age 205.0 15.0 23.0 228.0 0.9318 0.8991 0.9152 + Biomarker_Result 564.0 160.0 121.0 685.0 0.779 0.8234 0.8006 + Unspecific_Therapy 108.0 30.0 66.0 174.0 0.7826 0.6207 0.6923 + Site_Breast 92.0 18.0 18.0 110.0 0.8364 0.8364 0.8364 + Chemotherapy 687.0 59.0 55.0 742.0 0.9209 0.9259 0.9234 + Targeted_Therapy 178.0 29.0 28.0 206.0 0.8599 0.8641 0.862 + Radiotherapy 143.0 22.0 18.0 161.0 0.8667 0.8882 0.8773 + Performance_Status 17.0 15.0 15.0 32.0 0.5313 0.5313 0.5313 + Pathology_Test 387.0 197.0 99.0 486.0 0.6627 0.7963 0.7234 +Site_Other_Body_Part 678.0 287.0 460.0 1138.0 0.7026 0.5958 0.6448 + Cancer_Surgery 398.0 82.0 95.0 493.0 0.8292 0.8073 0.8181 + Line_Of_Therapy 38.0 9.0 10.0 48.0 0.8085 0.7917 0.8 + Pathology_Result 180.0 206.0 161.0 341.0 0.4663 0.5279 0.4952 + Hormonal_Therapy 98.0 12.0 25.0 123.0 0.8909 0.7967 0.8412 + Site_Bone 172.0 43.0 51.0 223.0 0.8 0.7713 0.7854 + Biomarker 693.0 144.0 138.0 831.0 0.828 0.8339 0.8309 + Immunotherapy 66.0 17.0 16.0 82.0 0.7952 0.8049 0.8 + Cycle_Day 85.0 44.0 43.0 128.0 0.6589 0.6641 0.6615 + Frequency 199.0 37.0 36.0 235.0 0.8432 0.8468 0.845 + Route 91.0 10.0 25.0 116.0 0.901 0.7845 0.8387 + Duration 179.0 57.0 117.0 296.0 0.7585 0.6047 0.6729 + Death_Entity 40.0 10.0 4.0 44.0 0.8 0.9091 0.8511 + Metastasis 337.0 27.0 25.0 362.0 0.9258 0.9309 0.9284 + Site_Liver 149.0 56.0 25.0 174.0 0.7268 0.8563 0.7863 + Cancer_Dx 723.0 114.0 107.0 830.0 0.8638 0.8711 0.8674 + Grade 47.0 21.0 19.0 66.0 0.6912 0.7121 0.7015 + Date 403.0 15.0 14.0 417.0 0.9641 0.9664 0.9653 + Site_Lung 338.0 134.0 64.0 402.0 0.7161 0.8408 0.7735 + Site_Brain 165.0 53.0 41.0 206.0 0.7569 0.801 0.7783 + Relative_Date 376.0 271.0 84.0 460.0 0.5811 0.8174 0.6793 + Race_Ethnicity 42.0 0.0 13.0 55.0 1.0 0.7636 0.866 + Gender 1255.0 17.0 7.0 1262.0 0.9866 0.9945 0.9905 + Dosage 417.0 53.0 68.0 485.0 0.8872 0.8598 0.8733 + Oncogene 178.0 83.0 57.0 235.0 0.682 0.7574 0.7177 + Radiation_Dose 41.0 4.0 11.0 52.0 0.9111 0.7885 0.8454 + macro - - - - - - 0.7863 + micro - - - - - - 0.8145 +``` \ No newline at end of file diff --git a/docs/_posts/Meryem1425/2023-04-12-ner_oncology_emb_clinical_medium_en.md b/docs/_posts/Meryem1425/2023-04-12-ner_oncology_emb_clinical_medium_en.md new file mode 100644 index 0000000000..fcbf2fe71e --- /dev/null +++ b/docs/_posts/Meryem1425/2023-04-12-ner_oncology_emb_clinical_medium_en.md @@ -0,0 +1,271 @@ +--- +layout: model +title: Detect Oncology-Specific Entities (clinical_medium) +author: John Snow Labs +name: ner_oncology_emb_clinical_medium +date: 2023-04-12 +tags: [licensed, en, clinical, clinical_medium, ner, oncology, biomarker, treatment] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model extracts more than 40 oncology-related entities, including therapies, tests and staging. + +Definitions of Predicted Entities: + +`Adenopathy:` Mentions of pathological findings of the lymph nodes. +`Age:` All mention of ages, past or present, related to the patient or with anybody else. +`Biomarker:` Biological molecules that indicate the presence or absence of cancer, or the type of cancer. Oncogenes are excluded from this category. +`Biomarker_Result:` Terms or values that are identified as the result of a biomarkers. +`Cancer_Dx:` Mentions of cancer diagnoses (such as “breast cancer”) or pathological types that are usually used as synonyms for “cancer” (e.g. “carcinoma”). When anatomical references are present, they are included in the Cancer_Dx extraction. +`Cancer_Score:` Clinical or imaging scores that are specific for cancer settings (e.g. “BI-RADS” or “Allred score”). +`Cancer_Surgery:` Terms that indicate surgery as a form of cancer treatment. +`Chemotherapy:` Mentions of chemotherapy drugs, or unspecific words such as “chemotherapy”. +`Cycle_Coun:` The total number of cycles being administered of an oncological therapy (e.g. “5 cycles”). +`Cycle_Day:` References to the day of the cycle of oncological therapy (e.g. “day 5”). +`Cycle_Number:` The number of the cycle of an oncological therapy that is being applied (e.g. “third cycle”). +`Date:` Mentions of exact dates, in any format, including day number, month and/or year. +`Death_Entity:` Words that indicate the death of the patient or someone else (including family members), such as “died” or “passed away”. +`Direction:` Directional and laterality terms, such as “left”, “right”, “bilateral”, “upper” and “lower”. +`Dosage:` The quantity prescribed by the physician for an active ingredient. +`Duration:` Words indicating the duration of a treatment (e.g. “for 2 weeks”). +`Frequency:` Words indicating the frequency of treatment administration (e.g. “daily” or “bid”). +`Gender:` Gender-specific nouns and pronouns (including words such as “him” or “she”, and family members such as “father”). +`Grade:` All pathological grading of tumors (e.g. “grade 1”) or degrees of cellular differentiation (e.g. “well-differentiated”) +`Histological_Type:` Histological variants or cancer subtypes, such as “papillary”, “clear cell” or “medullary”. +`Hormonal_Therapy:` Mentions of hormonal drugs used to treat cancer, or unspecific words such as “hormonal therapy”. +`Imaging_Test:` Imaging tests mentioned in texts, such as “chest CT scan”. +`Immunotherapy:` Mentions of immunotherapy drugs, or unspecific words such as “immunotherapy”. +`Invasion:` Mentions that refer to tumor invasion, such as “invasion” or “involvement”. Metastases or lymph node involvement are excluded from this category. +`Line_Of_Therapy:` Explicit references to the line of therapy of an oncological therapy (e.g. “first-line treatment”). +`Metastasis:` Terms that indicate a metastatic disease. Anatomical references are not included in these extractions. +`Oncogene:` Mentions of genes that are implicated in the etiology of cancer. +`Pathology_Result:` The findings of a biopsy from the pathology report that is not covered by another entity (e.g. “malignant ductal cells”). +`Pathology_Test:` Mentions of biopsies or tests that use tissue samples. +`Performance_Status:` Mentions of performance status scores, such as ECOG and Karnofsky. The name of the score is extracted together with the result (e.g. “ECOG performance status of 4”). +`Race_Ethnicity:` The race and ethnicity categories include racial and national origin or sociocultural groups. +`Radiotherapy:` Terms that indicate the use of Radiotherapy. +`Response_To_Treatment:` Terms related to clinical progress of the patient related to cancer treatment, including “recurrence”, “bad response” or “improvement”. +`Relative_Date:` Temporal references that are relative to the date of the text or to any other specific date (e.g. “yesterday” or “three years later”). +`Route:` Words indicating the type of administration route (such as “PO” or “transdermal”). +`Site_Bone:` Anatomical terms that refer to the human skeleton. +`Site_Brain:` Anatomical terms that refer to the central nervous system (including the brain stem and the cerebellum). +`Site_Breast:` Anatomical terms that refer to the breasts. +`Site_Liver:` Anatomical terms that refer to the liver. +`Site_Lung:` Anatomical terms that refer to the lungs. +`Site_Lymph_Node:` Anatomical terms that refer to lymph nodes, excluding adenopathies. +`Site_Other_Body_Part:` Relevant anatomical terms that are not included in the rest of the anatomical entities. +`Smoking_Status:` All mentions of smoking related to the patient or to someone else. +`Staging:` Mentions of cancer stage such as “stage 2b” or “T2N1M0”. It also includes words such as “in situ”, “early-stage” or “advanced”. +`Targeted_Therapy:` Mentions of targeted therapy drugs, or unspecific words such as “targeted therapy”. +`Tumor_Finding:` All nonspecific terms that may be related to tumors, either malignant or benign (for example: “mass”, “tumor”, “lesion”, or “neoplasm”). +`Tumor_Size:` Size of the tumor, including numerical value and unit of measurement (e.g. “3 cm”). +`Unspecific_Therapy:` Terms that indicate a known cancer therapy but that is not specific to any other therapy entity (e.g. “chemoradiotherapy” or “adjuvant therapy”). + +## Predicted Entities + +`Histological_Type`, `Direction`, `Staging`, `Cancer_Score`, `Imaging_Test`, `Cycle_Number`, `Tumor_Finding`, `Site_Lymph_Node`, `Invasion`, `Response_To_Treatment`, `Smoking_Status`, `Tumor_Size`, `Cycle_Count`, `Adenopathy`, `Age`, `Biomarker_Result`, `Unspecific_Therapy`, `Site_Breast`, `Chemotherapy`, `Targeted_Therapy`, `Radiotherapy`, `Performance_Status`, `Pathology_Test`, `Site_Other_Body_Part`, `Cancer_Surgery`, `Line_Of_Therapy`, `Pathology_Result`, `Hormonal_Therapy`, `Site_Bone`, `Biomarker`, `Immunotherapy`, `Cycle_Day`, `Frequency`, `Route`, `Duration`, `Death_Entity`, `Metastasis`, `Site_Liver`, `Cancer_Dx`, `Grade`, `Date`, `Site_Lung`, `Site_Brain`, `Relative_Date`, `Race_Ethnicity`, `Gender`, `Oncogene`, `Dosage`, `Radiation_Dose` + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/NER_ONCOLOGY_CLINICAL/){:.button.button-orange} +[Open in Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/27.Oncology_Model.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_oncology_emb_clinical_medium_en_4.3.2_3.0_1681316892301.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_oncology_emb_clinical_medium_en_4.3.2_3.0_1681316892301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols(["sentence"]) \ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical_medium", "en", "clinical/models")\ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("embeddings") + +ner = MedicalNerModel.pretrained("ner_oncology_emb_clinical_medium", "en", "clinical/models")\ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("ner") + +ner_converter = NerConverterInternal() \ + .setInputCols(["sentence", "token", "ner"]) \ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner, + ner_converter]) + +data = spark.createDataFrame([["""The had previously undergone a left mastectomy and an axillary lymph node dissection for a left breast cancer twenty years ago. +The tumor was positive for ER and PR. Postoperatively, radiotherapy was administered to the residual breast. +The cancer recurred as a right lung metastasis 13 years later. The patient underwent a regimen consisting of adriamycin (60 mg/m2) and cyclophosphamide (600 mg/m2) over six courses, as first line therapy."""]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical_medium", "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val ner = MedicalNerModel.pretrained("ner_oncology_emb_clinical_medium", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner, + ner_converter)) + +val data = Seq("The had previously undergone a left mastectomy and an axillary lymph node dissection for a left breast cancer twenty years ago. +The tumor was positive for ER and PR. Postoperatively, radiotherapy was administered to the residual breast. +The cancer recurred as a right lung metastasis 13 years later. The patient underwent a regimen consisting of adriamycin (60 mg/m2) and cyclophosphamide (600 mg/m2) over six courses, as first line therapy.").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++-------------------+-----+---+---------------------+ +|chunk |begin|end|ner_label | ++-------------------+-----+---+---------------------+ +|left |31 |34 |Direction | +|mastectomy |36 |45 |Cancer_Surgery | +|axillary lymph node|54 |72 |Site_Lymph_Node | +|dissection |74 |83 |Cancer_Surgery | +|left |91 |94 |Direction | +|breast cancer |96 |108|Cancer_Dx | +|twenty years ago |110 |125|Relative_Date | +|tumor |132 |136|Tumor_Finding | +|positive |142 |149|Biomarker_Result | +|ER |155 |156|Biomarker | +|PR |162 |163|Response_To_Treatment| +|radiotherapy |183 |194|Radiotherapy | +|breast |229 |234|Site_Breast | +|cancer |241 |246|Cancer_Dx | +|recurred |248 |255|Response_To_Treatment| +|right |262 |266|Direction | +|lung |268 |271|Site_Lung | +|metastasis |273 |282|Metastasis | +|13 years later |284 |297|Relative_Date | +|adriamycin |346 |355|Chemotherapy | +|60 mg/m2 |358 |365|Chemotherapy | +|cyclophosphamide |372 |387|Chemotherapy | +|600 mg/m2 |390 |398|Dosage | +|six courses |406 |416|Cycle_Count | +|first line |422 |431|Line_Of_Therapy | ++-------------------+-----+---+---------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_oncology_emb_clinical_medium| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|15.4 MB| + +## Benchmarking + +```bash + label tp fp fn total precision recall f1 + Histological_Type 138.0 27.0 73.0 211.0 0.8364 0.654 0.734 + Direction 679.0 163.0 152.0 831.0 0.8064 0.8171 0.8117 + Staging 112.0 24.0 26.0 138.0 0.8235 0.8116 0.8175 + Cancer_Score 9.0 2.0 12.0 21.0 0.8182 0.4286 0.5625 + Imaging_Test 759.0 132.0 141.0 900.0 0.8519 0.8433 0.8476 + Cycle_Number 43.0 29.0 17.0 60.0 0.5972 0.7167 0.6515 + Tumor_Finding 971.0 98.0 108.0 1079.0 0.9083 0.8999 0.9041 + Site_Lymph_Node 210.0 80.0 61.0 271.0 0.7241 0.7749 0.7487 + Invasion 146.0 33.0 21.0 167.0 0.8156 0.8743 0.8439 +Response_To_Treat... 224.0 98.0 146.0 370.0 0.6957 0.6054 0.6474 + Smoking_Status 39.0 14.0 9.0 48.0 0.7358 0.8125 0.7723 + Cycle_Count 113.0 34.0 31.0 144.0 0.7687 0.7847 0.7766 + Tumor_Size 203.0 44.0 35.0 238.0 0.8219 0.8529 0.8371 + Adenopathy 32.0 12.0 11.0 43.0 0.7273 0.7442 0.7356 + Age 203.0 20.0 25.0 228.0 0.9103 0.8904 0.9002 + Biomarker_Result 537.0 117.0 148.0 685.0 0.8211 0.7839 0.8021 + Unspecific_Therapy 107.0 32.0 67.0 174.0 0.7698 0.6149 0.6837 + Site_Breast 95.0 17.0 15.0 110.0 0.8482 0.8636 0.8559 + Chemotherapy 684.0 72.0 58.0 742.0 0.9048 0.9218 0.9132 + Targeted_Therapy 170.0 31.0 36.0 206.0 0.8458 0.8252 0.8354 + Radiotherapy 141.0 43.0 20.0 161.0 0.7663 0.8758 0.8174 + Performance_Status 20.0 12.0 12.0 32.0 0.625 0.625 0.625 + Pathology_Test 359.0 159.0 127.0 486.0 0.6931 0.7387 0.7151 +Site_Other_Body_Part 744.0 338.0 394.0 1138.0 0.6876 0.6538 0.6703 + Cancer_Surgery 380.0 83.0 113.0 493.0 0.8207 0.7708 0.795 + Line_Of_Therapy 38.0 7.0 10.0 48.0 0.8444 0.7917 0.8172 + Pathology_Result 124.0 144.0 217.0 341.0 0.4627 0.3636 0.4072 + Hormonal_Therapy 96.0 13.0 27.0 123.0 0.8807 0.7805 0.8276 + Site_Bone 167.0 50.0 56.0 223.0 0.7696 0.7489 0.7591 + Immunotherapy 61.0 13.0 21.0 82.0 0.8243 0.7439 0.7821 + Biomarker 681.0 88.0 150.0 831.0 0.8856 0.8195 0.8513 + Cycle_Day 85.0 43.0 43.0 128.0 0.6641 0.6641 0.6641 + Frequency 200.0 40.0 35.0 235.0 0.8333 0.8511 0.8421 + Route 98.0 13.0 18.0 116.0 0.8829 0.8448 0.8634 + Duration 195.0 57.0 101.0 296.0 0.7738 0.6588 0.7117 + Death_Entity 40.0 9.0 4.0 44.0 0.8163 0.9091 0.8602 + Metastasis 335.0 34.0 27.0 362.0 0.9079 0.9254 0.9166 + Site_Liver 146.0 64.0 28.0 174.0 0.6952 0.8391 0.7604 + Cancer_Dx 722.0 96.0 108.0 830.0 0.8826 0.8699 0.8762 + Grade 55.0 19.0 11.0 66.0 0.7432 0.8333 0.7857 + Date 403.0 16.0 14.0 417.0 0.9618 0.9664 0.9641 + Site_Lung 341.0 151.0 61.0 402.0 0.6931 0.8483 0.7629 + Site_Brain 184.0 82.0 22.0 206.0 0.6917 0.8932 0.7797 + Relative_Date 365.0 249.0 95.0 460.0 0.5945 0.7935 0.6797 + Race_Ethnicity 47.0 2.0 8.0 55.0 0.9592 0.8545 0.9038 + Gender 1260.0 15.0 2.0 1262.0 0.9882 0.9984 0.9933 + Dosage 425.0 76.0 60.0 485.0 0.8483 0.8763 0.8621 + Oncogene 178.0 89.0 57.0 235.0 0.6667 0.7574 0.7092 + Radiation_Dose 41.0 6.0 11.0 52.0 0.8723 0.7885 0.8283 + macro - - - - - - 0.7859 + micro - - - - - - 0.8130 +``` \ No newline at end of file diff --git a/docs/_posts/SKocer/2023-04-10-medication_resolver_pipeline_en.md b/docs/_posts/SKocer/2023-04-10-medication_resolver_pipeline_en.md new file mode 100644 index 0000000000..3a126dcb05 --- /dev/null +++ b/docs/_posts/SKocer/2023-04-10-medication_resolver_pipeline_en.md @@ -0,0 +1,105 @@ +--- +layout: model +title: Pipeline to Resolve Medication Codes +author: John Snow Labs +name: medication_resolver_pipeline +date: 2023-04-10 +tags: [resolver, snomed, umls, rxnorm, ndc, ade, en, licensed, pipeline] +task: Entity Resolution +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +A pretrained resolver pipeline to extract medications and resolve their adverse reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT codes, and action/treatments in clinical text. + +Action/treatments are available for branded medication, and SNOMED codes are available for non-branded medication. + +This pipeline can be used as Lightpipeline (with `annotate/fullAnnotate`). You can use `medication_resolver_transform_pipeline` for Spark transform. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/medication_resolver_pipeline_en_4.3.2_3.0_1681151954032.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/medication_resolver_pipeline_en_4.3.2_3.0_1681151954032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +from sparknlp.pretrained import PretrainedPipeline + +med_resolver_pipeline = PretrainedPipeline("medication_resolver_pipeline", "en", "clinical/models") + +text = """The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera. The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet.""" + +result = med_resolver_pipeline.fullAnnotate(text) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val med_resolver_pipeline = new PretrainedPipeline("medication_resolver_pipeline", "en", "clinical/models") + +val result = med_resolver_pipeline.fullAnnotate("""The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera. The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet.""") +``` +
+ +## Results + +```bash +| | chunks | entities | ADE | RxNorm | Action | Treatment | UMLS | SNOMED_CT | NDC_Product | NDC_Package | +|---:|:-----------------------------|:-----------|:----------------------------|---------:|:---------------------------|:-------------------------------------------|:---------|:------------|:--------------|:--------------| +| 0 | Amlodopine Vallarta 10-320mg | DRUG | Gynaecomastia | 722131 | NONE | NONE | C1949334 | 425838008 | 00093-7693 | 00093-7693-56 | +| 1 | Eviplera | DRUG | Anxiety | 217010 | Inhibitory Bone Resorption | Osteoporosis | C0720318 | NONE | NONE | NONE | +| 2 | Lescol 40 MG | DRUG | NONE | 103919 | Hypocholesterolemic | Heterozygous Familial Hypercholesterolemia | C0353573 | NONE | 00078-0234 | 00078-0234-05 | +| 3 | Everolimus 1.5 mg tablet | DRUG | Acute myocardial infarction | 2056895 | NONE | NONE | C4723581 | NONE | 00054-0604 | 00054-0604-21 | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medication_resolver_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.2 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- TextMatcherModel +- ChunkMergeModel +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperFilterer +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel +- ResolverMerger +- ResolverMerger +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperModel +- Finisher \ No newline at end of file diff --git a/docs/_posts/SKocer/2023-04-11-medication_resolver_transform_pipeline_en.md b/docs/_posts/SKocer/2023-04-11-medication_resolver_transform_pipeline_en.md new file mode 100644 index 0000000000..b5781440fc --- /dev/null +++ b/docs/_posts/SKocer/2023-04-11-medication_resolver_transform_pipeline_en.md @@ -0,0 +1,111 @@ +--- +layout: model +title: Pipeline to Resolve Medication Codes(Transform) +author: John Snow Labs +name: medication_resolver_transform_pipeline +date: 2023-04-11 +tags: [resolver, rxnorm, ndc, snomed, umls, ade, pipeline, en, licensed] +task: Entity Resolution +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +A pretrained resolver pipeline to extract medications and resolve their adverse reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT codes, and action/treatments in clinical text. + +Action/treatments are available for branded medication, and SNOMED codes are available for non-branded medication. + +This pipeline can be used with Spark transform. You can use `medication_resolver_pipeline` as Lightpipeline (with `annotate/fullAnnotate`). + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/medication_resolver_transform_pipeline_en_4.3.2_3.0_1681190723377.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/medication_resolver_transform_pipeline_en_4.3.2_3.0_1681190723377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +from sparknlp.pretrained import PretrainedPipeline + +medication_resolver_pipeline = PretrainedPipeline("medication_resolver_transform_pipeline", "en", "clinical/models") + +text = """The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera. The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet.""" + +data = spark.createDataFrame([[text]]).toDF("text") + +result = medication_resolver_pipeline.transform(data) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val medication_resolver_pipeline = new PretrainedPipeline("medication_resolver_transform_pipeline", "en", "clinical/models") + +val data = Seq("""The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera. The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet.""").toDS.toDF("text") + +val result = medication_resolver_pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash +| chunk | ner_label | ADE | RxNorm | Action | Treatment | UMLS | SNOMED_CT | NDC_Product | NDC_Package | +|:-----------------------------|:------------|:----------------------------|---------:|:---------------------------|:-------------------------------------------|:---------|:------------|:--------------|:--------------| +| Amlodopine Vallarta 10-320mg | DRUG | Gynaecomastia | 722131 | NONE | NONE | C1949334 | 425838008 | 00093-7693 | 00093-7693-56 | +| Eviplera | DRUG | Anxiety | 217010 | Inhibitory Bone Resorption | Osteoporosis | C0720318 | NONE | NONE | NONE | +| Lescol 40 MG | DRUG | NONE | 103919 | Hypocholesterolemic | Heterozygous Familial Hypercholesterolemia | C0353573 | NONE | 00078-0234 | 00078-0234-05 | +| Everolimus 1.5 mg tablet | DRUG | Acute myocardial infarction | 2056895 | NONE | NONE | C4723581 | NONE | 00054-0604 | 00054-0604-21 | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medication_resolver_transform_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.2 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- TextMatcherModel +- ChunkMergeModel +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperFilterer +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel +- ResolverMerger +- Doc2Chunk +- ResolverMerger +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperModel +- Doc2Chunk +- ChunkMapperModel +- ChunkMapperModel +- ChunkMapperModel +- Finisher \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-04-12-ner_deid_large_emb_clinical_large_en.md b/docs/_posts/gokhanturer/2023-04-12-ner_deid_large_emb_clinical_large_en.md new file mode 100644 index 0000000000..794478d0e3 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-04-12-ner_deid_large_emb_clinical_large_en.md @@ -0,0 +1,166 @@ +--- +layout: model +title: Detect PHI (Deidentification)(clinical_large) +author: John Snow Labs +name: ner_deid_large_emb_clinical_large +date: 2023-04-12 +tags: [ner, licensed, clinical, phi, deidentification, english, en] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Description : Deidentification NER (Large) is a Named Entity Recognition model that annotates text to find protected health information that may need to be deidentified. The entities it annotates are Age, Contact, Date, Id, Location, Name, and Profession. This model is trained with the 'embeddings_clinical_large' word embeddings model, so be sure to use the same embeddings in the pipeline. + +We sticked to official annotation guideline (AG) for 2014 i2b2 Deid challenge while annotating new datasets for this model. All the details regarding the nuances and explanations for AG can be found here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978170/ + +## Predicted Entities + + + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/NER_DEMOGRAPHICS/){:.button.button-orange} + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_deid_large_emb_clinical_large_en_4.3.2_3.0_1681321107196.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_deid_large_emb_clinical_large_en_4.3.2_3.0_1681321107196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +documentAssembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +deid_ner = MedicalNerModel.load( "ner_deid_large_emb_clinical_large", "en", "clinical/models") \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("deid_ner") + +deid_ner_converter = NerConverter() \ + .setInputCols(["sentence", "token", "deid_ner"]) \ + .setOutputCol("deid_ner_chunk") + +deid_ner_pipeline = Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + word_embeddings, + deid_ner, + deid_ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +deid_ner_model = deid_ner_pipeline.fit(empty_data) + +results = deid_ner_model.transform(spark.createDataFrame([["""HISTORY OF PRESENT ILLNESS: Mr. Smith is a 60-year-old white male veteran with multiple comorbidities, who has a history of bladder cancer diagnosed approximately two years ago by the VA Hospital. He underwent a resection there. He was to be admitted to the Day Hospital for cystectomy. He was seen in Urology Clinic and Radiology Clinic on 02/04/2003. HOSPITAL COURSE: Mr. Smith presented to the Day Hospital in anticipation for Urology surgery. On evaluation, EKG, echocardiogram was abnormal, a Cardiology consult was obtained. A cardiac adenosine stress MRI was then proceeded, same was positive for inducible ischemia, mild-to-moderate inferolateral subendocardial infarction with peri-infarct ischemia. In addition, inducible ischemia seen in the inferior lateral septum. Mr. Smith underwent a left heart catheterization, which revealed two vessel coronary artery disease. The RCA, proximal was 95% stenosed and the distal 80% stenosed. The mid LAD was 85% stenosed and the distal LAD was 85% stenosed. There was four Multi-Link Vision bare metal stents placed to decrease all four lesions to 0%. Following intervention, Mr. Smith was admitted to 7 Ardmore Tower under Cardiology Service under the direction of Dr. Hart. Mr. Smith had a noncomplicated post-intervention hospital course. He was stable for discharge home on 02/07/2003 with instructions to take Plavix daily for one month and Urology is aware of the same."""]]).toDF("text")) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\ + .setInputCols(Array("sentence", "token"))\ + .setOutputCol("embeddings") + +val deid_ner_model = BertForTokenClassification.pretrained('ner_deid_large_emb_clinical_large' "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("deid_ner") + +val deid_ner_converter = new NerConverter() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("deid_ner_chunk") + +val deid_pipeline = new PipelineModel().setStages(Array(document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + deid_ner_model, + deid_ner_converter)) + +val data = Seq(""" """HISTORY OF PRESENT ILLNESS: Mr. Smith is a 60-year-old white male veteran with multiple comorbidities, who has a history of bladder cancer diagnosed approximately two years ago by the VA Hospital. He underwent a resection there. He was to be admitted to the Day Hospital for cystectomy. He was seen in Urology Clinic and Radiology Clinic on 02/04/2003. HOSPITAL COURSE: Mr. Smith presented to the Day Hospital in anticipation for Urology surgery. On evaluation, EKG, echocardiogram was abnormal, a Cardiology consult was obtained. A cardiac adenosine stress MRI was then proceeded, same was positive for inducible ischemia, mild-to-moderate inferolateral subendocardial infarction with peri-infarct ischemia. In addition, inducible ischemia seen in the inferior lateral septum. Mr. Smith underwent a left heart catheterization, which revealed two vessel coronary artery disease. The RCA, proximal was 95% stenosed and the distal 80% stenosed. The mid LAD was 85% stenosed and the distal LAD was 85% stenosed. There was four Multi-Link Vision bare metal stents placed to decrease all four lesions to 0%. Following intervention, Mr. Smith was admitted to 7 Ardmore Tower under Cardiology Service under the direction of Dr. Hart. Mr. Smith had a noncomplicated post-intervention hospital course. He was stable for discharge home on 02/07/2003 with instructions to take Plavix daily for one month and Urology is aware of the same."""""").toDS.toDF("text") + +val result = model.fit(data).transform(data) +``` +
+ +## Results + +```bash +| | chunks | begin | end | entities | +|---:|:----------------|--------:|------:|:-----------| +| 0 | Smith | 32 | 36 | NAME | +| 1 | VA Hospital | 184 | 194 | LOCATION | +| 2 | Day Hospital | 258 | 269 | LOCATION | +| 3 | 02/04/2003 | 341 | 350 | DATE | +| 4 | Smith | 374 | 378 | NAME | +| 5 | Day Hospital | 397 | 408 | LOCATION | +| 6 | Smith | 782 | 786 | NAME | +| 7 | Smith | 1131 | 1135 | NAME | +| 8 | 7 Ardmore Tower | 1153 | 1167 | LOCATION | +| 9 | Hart | 1221 | 1224 | NAME | +| 10 | Smith | 1231 | 1235 | NAME | +| 11 | 02/07/2003 | 1329 | 1338 | DATE | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_deid_large_emb_clinical_large| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|2.8 MB| + +## Benchmarking + +```bash + precision recall f1-score support + CONTACT 0.92 0.98 0.95 126 + DATE 0.99 0.99 0.99 2631 + NAME 0.98 0.98 0.98 2594 + AGE 0.99 0.94 0.97 284 + LOCATION 0.97 0.94 0.95 1511 + ID 0.91 0.96 0.94 213 + PROFESSION 0.81 0.84 0.83 160 + micro avg 0.97 0.97 0.97 7519 + macro avg 0.94 0.95 0.94 7519 +weighted avg 0.97 0.97 0.97 7519 +``` diff --git a/docs/_posts/gokhanturer/2023-04-12-ner_deid_large_emb_clinical_medium_en.md b/docs/_posts/gokhanturer/2023-04-12-ner_deid_large_emb_clinical_medium_en.md new file mode 100644 index 0000000000..1684663ad2 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-04-12-ner_deid_large_emb_clinical_medium_en.md @@ -0,0 +1,164 @@ +--- +layout: model +title: Detect PHI (Deidentification)(clinical_medium) +author: John Snow Labs +name: ner_deid_large_emb_clinical_medium +date: 2023-04-12 +tags: [ner, clinical, english, licensed, phi, deidentification, en] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Deidentification NER (Large) is a Named Entity Recognition model that annotates text to find protected health information that may need to be deidentified. The entities it annotates are Age, Contact, Date, Id, Location, Name, and Profession. This model is trained with the 'embeddings_clinical_medium word embeddings model, so be sure to use the same embeddings in the pipeline. + +We sticked to official annotation guideline (AG) for 2014 i2b2 Deid challenge while annotating new datasets for this model. All the details regarding the nuances and explanations for AG can be found here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978170/ + +## Predicted Entities + + + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/NER_DEMOGRAPHICS/){:.button.button-orange} + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_deid_large_emb_clinical_medium_en_4.3.2_3.0_1681322146240.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_deid_large_emb_clinical_medium_en_4.3.2_3.0_1681322146240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +documentAssembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +deid_ner = MedicalNerModel.pretrained('ner_deid_large_emb_clinical_large', "en", "clinical/models") \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("deid_ner") + +deid_ner_converter = NerConverter() \ + .setInputCols(["sentence", "token", "deid_ner"]) \ + .setOutputCol("deid_ner_chunk") + +deid_ner_pipeline = Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + word_embeddings, + deid_ner, + deid_ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +deid_ner_model = deid_ner_pipeline.fit(empty_data) + +results = deid_ner_model.transform(spark.createDataFrame([["""HISTORY OF PRESENT ILLNESS: Mr. Smith is a 60-year-old white male veteran with multiple comorbidities, who has a history of bladder cancer diagnosed approximately two years ago by the VA Hospital. He underwent a resection there. He was to be admitted to the Day Hospital for cystectomy. He was seen in Urology Clinic and Radiology Clinic on 02/04/2003. HOSPITAL COURSE: Mr. Smith presented to the Day Hospital in anticipation for Urology surgery. On evaluation, EKG, echocardiogram was abnormal, a Cardiology consult was obtained. A cardiac adenosine stress MRI was then proceeded, same was positive for inducible ischemia, mild-to-moderate inferolateral subendocardial infarction with peri-infarct ischemia. In addition, inducible ischemia seen in the inferior lateral septum. Mr. Smith underwent a left heart catheterization, which revealed two vessel coronary artery disease. The RCA, proximal was 95% stenosed and the distal 80% stenosed. The mid LAD was 85% stenosed and the distal LAD was 85% stenosed. There was four Multi-Link Vision bare metal stents placed to decrease all four lesions to 0%. Following intervention, Mr. Smith was admitted to 7 Ardmore Tower under Cardiology Service under the direction of Dr. Hart. Mr. Smith had a noncomplicated post-intervention hospital course. He was stable for discharge home on 02/07/2003 with instructions to take Plavix daily for one month and Urology is aware of the same."""]]).toDF("text")) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\ + .setInputCols(Array("sentence", "token"))\ + .setOutputCol("embeddings") + +val deid_ner_model = MedicalNerModel.pretrained('ner_deid_large_emb_clinical_large' , "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("deid_ner") + +val deid_ner_converter = new NerConverter() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("deid_ner_chunk") + +val deid_pipeline = new PipelineModel().setStages(Array(document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + deid_ner_model, + deid_ner_converter)) + +val data = Seq(""" """HISTORY OF PRESENT ILLNESS: Mr. Smith is a 60-year-old white male veteran with multiple comorbidities, who has a history of bladder cancer diagnosed approximately two years ago by the VA Hospital. He underwent a resection there. He was to be admitted to the Day Hospital for cystectomy. He was seen in Urology Clinic and Radiology Clinic on 02/04/2003. HOSPITAL COURSE: Mr. Smith presented to the Day Hospital in anticipation for Urology surgery. On evaluation, EKG, echocardiogram was abnormal, a Cardiology consult was obtained. A cardiac adenosine stress MRI was then proceeded, same was positive for inducible ischemia, mild-to-moderate inferolateral subendocardial infarction with peri-infarct ischemia. In addition, inducible ischemia seen in the inferior lateral septum. Mr. Smith underwent a left heart catheterization, which revealed two vessel coronary artery disease. The RCA, proximal was 95% stenosed and the distal 80% stenosed. The mid LAD was 85% stenosed and the distal LAD was 85% stenosed. There was four Multi-Link Vision bare metal stents placed to decrease all four lesions to 0%. Following intervention, Mr. Smith was admitted to 7 Ardmore Tower under Cardiology Service under the direction of Dr. Hart. Mr. Smith had a noncomplicated post-intervention hospital course. He was stable for discharge home on 02/07/2003 with instructions to take Plavix daily for one month and Urology is aware of the same."""""").toDS.toDF("text") + +val result = model.fit(data).transform(data) +``` +
+ +## Results + +```bash +| | chunks | begin | end | entities | +|---:|:----------------|--------:|------:|:-----------| +| 0 | Day Hospital | 258 | 269 | NAME | +| 1 | Radiology | 321 | 329 | LOCATION | +| 2 | 02/04/2003 | 341 | 350 | DATE | +| 3 | COURSE | 362 | 367 | NAME | +| 4 | Hospital | 401 | 408 | NAME | +| 5 | Urology surgery | 430 | 444 | NAME | +| 6 | On | 447 | 448 | NAME | +| 7 | Following | 1103 | 1111 | NAME | +| 8 | 02/07/2003 | 1329 | 1338 | DATE | +| 9 | Plavix daily | 1366 | 1377 | NAME | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_deid_large_emb_clinical_medium| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|2.8 MB| + +## Benchmarking + +```bash + precision recall f1-score support + CONTACT 0.92 0.96 0.94 126 + DATE 0.99 0.99 0.99 2631 + NAME 0.97 0.98 0.97 2594 + AGE 0.96 0.94 0.95 284 + LOCATION 0.95 0.92 0.94 1511 + ID 0.93 0.95 0.94 213 + PROFESSION 0.84 0.73 0.78 160 + micro avg 0.97 0.96 0.97 7519 + macro avg 0.94 0.92 0.93 7519 +weighted avg 0.97 0.96 0.96 7519 +``` diff --git a/docs/_posts/gokhanturer/2023-04-12-ner_posology_emb_clinical_large_en.md b/docs/_posts/gokhanturer/2023-04-12-ner_posology_emb_clinical_large_en.md new file mode 100644 index 0000000000..2519a31f0a --- /dev/null +++ b/docs/_posts/gokhanturer/2023-04-12-ner_posology_emb_clinical_large_en.md @@ -0,0 +1,164 @@ +--- +layout: model +title: Detect Posology concepts (clinical_large) +author: John Snow Labs +name: ner_posology_emb_clinical_large +date: 2023-04-12 +tags: [ner, clinical, licensed, en, posology] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model detects Drug, Dosage, and administration instructions in text using pretrained NER model. + +## Predicted Entities + + + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/NER_POSOLOGY/){:.button.button-orange} + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_posology_emb_clinical_large_en_4.3.2_3.0_1681303545819.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_posology_emb_clinical_large_en_4.3.2_3.0_1681303545819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +documentAssembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +posology_ner = MedicalNerModel.pretrained("ner_posology_emb_clinical_large", "en", "clinical/models")) \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("posology_ner") + +posology_ner_converter = NerConverter() \ + .setInputCols(["sentence", "token", "posology_ner"]) \ + .setOutputCol("posology_ner_chunk") + +posology_ner_pipeline = Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + word_embeddings, + posology_ner, + posology_ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +posology_ner_model = posology_ner_pipeline.fit(empty_data) + +results = posology_ner_model.transform(spark.createDataFrame([["The patient has been advised Aspirin 81 milligrams QDay. insulin 50 units in a.m. HCTZ 50 mg QDay. Nitroglycerin 1/150 sublingually."]]).toDF("text")) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\ + .setInputCols(Array("sentence", "token"))\ + .setOutputCol("embeddings") + +val posology_ner_model = MedicalNerModel.pretrained('ner_posology_emb_clinical_large' "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("posology_ner") + +val posology_ner_converter = new NerConverter() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("posology_ner_chunk") + +val posology_pipeline = new PipelineModel().setStages(Array(document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + posology_ner_model, + posology_ner_converter)) + +val data = Seq(""" The patient has been advised Aspirin 81 milligrams QDay. insulin 50 units in a.m. HCTZ 50 mg QDay. Nitroglycerin 1/150 sublingually.""").toDS.toDF("text") + +val result = model.fit(data).transform(data) +``` +
+ +## Results + +```bash +| | chunks | begin | end | entities | +|---:|:--------------|--------:|------:|:-----------| +| 0 | Aspirin | 268 | 274 | DRUG | +| 1 | 81 milligrams | 276 | 288 | STRENGTH | +| 2 | QDay | 290 | 293 | FREQUENCY | +| 3 | insulin | 296 | 302 | DRUG | +| 4 | 50 units | 304 | 311 | STRENGTH | +| 5 | HCTZ | 321 | 324 | DRUG | +| 6 | 50 mg | 326 | 330 | STRENGTH | +| 7 | QDay | 332 | 335 | FREQUENCY | +| 8 | Nitroglycerin | 338 | 350 | DRUG | +| 9 | 1/150 | 352 | 356 | STRENGTH | +| 10 | sublingually | 358 | 369 | FREQUENCY | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_posology_emb_clinical_large| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|2.8 MB| + +## Benchmarking + +```bash + precision recall f1-score support + DRUG 0.88 0.92 0.90 2252 + STRENGTH 0.89 0.92 0.91 2290 + FREQUENCY 0.92 0.90 0.91 1782 + DURATION 0.76 0.83 0.79 463 + DOSAGE 0.62 0.65 0.64 476 + ROUTE 0.88 0.88 0.88 394 + FORM 0.89 0.72 0.79 773 + micro avg 0.87 0.87 0.87 8430 + macro avg 0.83 0.83 0.83 8430 +weighted avg 0.87 0.87 0.87 8430 +``` diff --git a/docs/_posts/gokhanturer/2023-04-12-ner_posology_emb_clinical_medium_en.md b/docs/_posts/gokhanturer/2023-04-12-ner_posology_emb_clinical_medium_en.md new file mode 100644 index 0000000000..84d72ede0a --- /dev/null +++ b/docs/_posts/gokhanturer/2023-04-12-ner_posology_emb_clinical_medium_en.md @@ -0,0 +1,163 @@ +--- +layout: model +title: Detect Posology concepts (clinical_medium) +author: John Snow Labs +name: ner_posology_emb_clinical_medium +date: 2023-04-12 +tags: [ner, licensed, english, clinical, posology, en] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.0 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model detects Drug, Dosage, and administration instructions in text using pretrained NER model. + +## Predicted Entities + + + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/NER_POSOLOGY/){:.button.button-orange} + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_posology_emb_clinical_medium_en_4.3.2_3.0_1681315841950.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_posology_emb_clinical_medium_en_4.3.2_3.0_1681315841950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +documentAssembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_medium", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +posology_ner = MedicalNerModel.pretrained('ner_posology_emb_clinical_medium' , "en", "clinical/models") \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("posology_ner") + +posology_ner_converter = NerConverter() \ + .setInputCols(["sentence", "token", "posology_ner"]) \ + .setOutputCol("posology_ner_chunk") + +posology_ner_pipeline = Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + word_embeddings, + posology_ner, + posology_ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +posology_ner_model = posology_ner_pipeline.fit(empty_data) + +results = posology_ner_model.transform(spark.createDataFrame([["The patient has been advised Aspirin 81 milligrams QDay. insulin 50 units in a.m. HCTZ 50 mg QDay. Nitroglycerin 1/150 sublingually."]]).toDF("text")) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_medium", "en", "clinical/models")\ + .setInputCols(Array("sentence", "token"))\ + .setOutputCol("embeddings") + +val posology_ner_model = MedicalNerModel.pretrained('ner_posology_emb_clinical_large' "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("posology_ner") + +val posology_ner_converter = new NerConverter() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("posology_ner_chunk") + +val posology_pipeline = new PipelineModel().setStages(Array(document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + posology_ner_model, + posology_ner_converter)) + +val data = Seq(""" The patient has been advised Aspirin 81 milligrams QDay. insulin 50 units in a.m. HCTZ 50 mg QDay. Nitroglycerin 1/150 sublingually.""").toDS.toDF("text") + +val result = model.fit(data).transform(data) +``` +
+ +## Results + +```bash +| | chunks | begin | end | entities | +|---:|:--------------|--------:|------:|:-----------| +| 0 | Aspirin | 29 | 35 | DRUG | +| 1 | 81 milligrams | 37 | 49 | STRENGTH | +| 2 | QDay | 51 | 54 | FREQUENCY | +| 3 | insulin | 57 | 63 | DRUG | +| 4 | 50 units | 65 | 72 | STRENGTH | +| 5 | HCTZ | 82 | 85 | DRUG | +| 6 | 50 mg | 87 | 91 | STRENGTH | +| 7 | QDay | 93 | 96 | FREQUENCY | +| 8 | Nitroglycerin | 99 | 111 | DRUG | +| 9 | 1/150 | 113 | 117 | STRENGTH | +| 10 | sublingually | 119 | 130 | ROUTE | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_posology_emb_clinical_medium| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|2.8 MB| + +## Benchmarking + +```bash + precision recall f1-score support + DRUG 0.91 0.91 0.91 2252 + STRENGTH 0.88 0.93 0.91 2290 + FREQUENCY 0.90 0.94 0.92 1782 + DURATION 0.78 0.84 0.81 463 + DOSAGE 0.66 0.63 0.65 476 + ROUTE 0.89 0.89 0.89 394 + FORM 0.86 0.76 0.81 773 + micro avg 0.87 0.89 0.88 8430 + macro avg 0.84 0.84 0.84 8430 +weighted avg 0.87 0.89 0.88 8430 +``` diff --git a/docs/_posts/hsaglamlar/2023-04-12-ner_sdoh_emb_clinical_large_wip_en.md b/docs/_posts/hsaglamlar/2023-04-12-ner_sdoh_emb_clinical_large_wip_en.md new file mode 100644 index 0000000000..65d37ecd17 --- /dev/null +++ b/docs/_posts/hsaglamlar/2023-04-12-ner_sdoh_emb_clinical_large_wip_en.md @@ -0,0 +1,245 @@ +--- +layout: model +title: Social Determinants of Health (clinical_large) +author: John Snow Labs +name: ner_sdoh_emb_clinical_large_wip +date: 2023-04-12 +tags: [en, clinical_large, social_determinants, public_health, ner, sdoh, licensed] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.2 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model extracts terminology related to Social Determinants of Health from various kinds of biomedical documents. + +## Predicted Entities + +`Access_To_Care`, `Age`, `Alcohol`, `Chidhood_Event`, `Communicable_Disease`, `Community_Safety`, `Diet`, `Disability`, `Eating_Disorder`, `Education`, `Employment`, `Environmental_Condition`, `Exercise`, `Family_Member`, `Financial_Status`, `Food_Insecurity`, `Gender`, `Geographic_Entity`, `Healthcare_Institution`, `Housing`, `Hyperlipidemia`, `Hypertension`, `Income`, `Insurance_Status`, `Language`, `Legal_Issues`, `Marital_Status`, `Mental_Health`, `Obesity`, `Other_Disease`, `Other_SDoH_Keywords`, `Population_Group`, `Quality_Of_Life`, `Race_Ethnicity`, `Sexual_Activity`, `Sexual_Orientation`, `Smoking`, `Social_Exclusion`, `Social_Support`, `Spiritual_Beliefs`, `Substance_Duration`, `Substance_Frequency`, `Substance_Quantity`, `Substance_Use`, `Transportation`, `Violence_Or_Abuse` + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/SOCIAL_DETERMINANT_NER/){:.button.button-orange} +[Open in Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/SOCIAL_DETERMINANT_NER.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_sdoh_emb_clinical_large_wip_en_4.3.2_3.2_1681303888925.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_sdoh_emb_clinical_large_wip_en_4.3.2_3.2_1681303888925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +ner_model = MedicalNerModel.pretrained("ner_sdoh_emb_clinical_large_wip", "en", "clinical/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = NerConverterInternal()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + clinical_embeddings, + ner_model, + ner_converter + ]) + +sample_texts = [["Smith is a 55 years old, divorced Mexcian American woman with financial problems. She speaks spanish. She lives in an apartment. She has been struggling with diabetes for the past 10 years and has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and does not have access to health insurance or paid sick leave. She has a son student at college. Pt with likely long-standing depression. She is aware she needs rehab. Pt reprots having her catholic faith as a means of support as well. She has long history of etoh abuse, beginning in her teens. She reports she has been a daily drinker for 30 years, most recently drinking beer daily. She smokes a pack of cigarettes a day. She had DUI back in April and was due to be in court this week."]] + +data = spark.createDataFrame(sample_texts).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val ner_model = MedicalNerModel.pretrained("ner_sdoh_emb_clinical_large_wip", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + sentence_detector, + tokenizer, + clinical_embeddings, + ner_model, + ner_converter +)) + +val data = Seq("Smith is a 55 years old, divorced Mexcian American woman with financial problems. She speaks spanish. She lives in an apartment. She has been struggling with diabetes for the past 10 years and has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and does not have access to health insurance or paid sick leave. She has a son student at college. Pt with likely long-standing depression. She is aware she needs rehab. Pt reprots having her catholic faith as a means of support as well. She has long history of etoh abuse, beginning in her teens. She reports she has been a daily drinker for 30 years, most recently drinking beer daily. She smokes a pack of cigarettes a day. She had DUI back in April and was due to be in court this week.").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++------------------+-----+---+-------------------+ +|chunk |begin|end|ner_label | ++------------------+-----+---+-------------------+ +|55 years old |11 |22 |Age | +|divorced |25 |32 |Marital_Status | +|Mexcian |34 |40 |Race_Ethnicity | +|American |42 |49 |Race_Ethnicity | +|woman |51 |55 |Gender | +|financial problems|62 |79 |Financial_Status | +|She |82 |84 |Gender | +|spanish |93 |99 |Language | +|She |102 |104|Gender | +|apartment |118 |126|Housing | +|She |129 |131|Gender | +|diabetes |158 |165|Other_Disease | +|hospitalizations |233 |248|Other_SDoH_Keywords| +|cleaning assistant|307 |324|Employment | +|health insurance |354 |369|Insurance_Status | +|She |391 |393|Gender | +|son |401 |403|Family_Member | +|student |405 |411|Education | +|college |416 |422|Education | +|depression |454 |463|Mental_Health | +|She |466 |468|Gender | +|she |479 |481|Gender | +|rehab |489 |493|Access_To_Care | +|her |514 |516|Gender | +|catholic faith |518 |531|Spiritual_Beliefs | +|support |547 |553|Social_Support | +|She |565 |567|Gender | +|etoh abuse |589 |598|Alcohol | +|her |614 |616|Gender | +|teens |618 |622|Age | +|She |625 |627|Gender | +|she |637 |639|Gender | +|daily |652 |656|Substance_Quantity | +|drinker |658 |664|Alcohol | +|30 years |670 |677|Substance_Duration | +|drinking beer |694 |706|Alcohol | +|daily |708 |712|Substance_Frequency| +|She |715 |717|Gender | +|smokes |719 |724|Smoking | +|a pack |726 |731|Substance_Quantity | +|cigarettes |736 |745|Smoking | +|a day |747 |751|Substance_Frequency| +|She |754 |756|Gender | +|DUI |762 |764|Legal_Issues | ++------------------+-----+---+-------------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_sdoh_emb_clinical_large_wip| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|3.0 MB| +|Dependencies:|embeddings_clinical_large| + +## References + +Internal SHOP Project + +## Benchmarking + +```bash + label precision recall f1-score support + Employment 0.94 0.96 0.95 2075 + Social_Support 0.91 0.90 0.90 658 + Other_SDoH_Keywords 0.82 0.87 0.85 259 + Healthcare_Institution 0.99 0.95 0.97 781 + Alcohol 0.96 0.97 0.96 258 + Gender 0.99 0.99 0.99 4957 + Other_Disease 0.89 0.94 0.91 583 + Access_To_Care 0.86 0.88 0.87 520 + Mental_Health 0.89 0.81 0.85 494 + Age 0.92 0.96 0.94 433 + Marital_Status 1.00 1.00 1.00 92 + Substance_Quantity 0.88 0.86 0.87 58 + Substance_Use 0.91 0.97 0.94 192 + Family_Member 0.97 0.99 0.98 2094 + Financial_Status 0.86 0.65 0.74 124 + Race_Ethnicity 0.93 0.93 0.93 27 + Insurance_Status 0.93 0.87 0.90 85 + Spiritual_Beliefs 0.86 0.81 0.83 52 + Housing 0.88 0.85 0.87 400 + Geographic_Entity 0.86 0.88 0.87 113 + Disability 0.93 0.93 0.93 44 + Quality_Of_Life 0.89 0.75 0.81 67 + Income 0.89 0.77 0.83 31 + Education 0.85 0.88 0.86 58 + Transportation 0.86 0.89 0.88 57 + Legal_Issues 0.72 0.91 0.80 47 + Smoking 0.98 0.97 0.98 66 + Substance_Frequency 0.93 0.75 0.83 57 + Hypertension 1.00 1.00 1.00 21 + Violence_Or_Abuse 0.83 0.62 0.71 63 + Exercise 0.96 0.88 0.92 57 + Diet 0.95 0.87 0.91 70 + Sexual_Orientation 0.68 1.00 0.81 13 + Language 0.89 0.73 0.80 22 + Social_Exclusion 0.96 0.90 0.93 29 + Substance_Duration 0.75 0.85 0.80 39 + Communicable_Disease 1.00 0.84 0.91 31 + Chidhood_Event 0.88 0.61 0.72 23 + Community_Safety 0.95 0.93 0.94 44 + Population_Group 0.89 0.62 0.73 13 + Hyperlipidemia 0.78 1.00 0.88 7 + Food_Insecurity 1.00 0.93 0.96 29 + Eating_Disorder 0.67 0.92 0.77 13 + Sexual_Activity 0.84 0.90 0.87 29 +Environmental_Condition 1.00 1.00 1.00 20 + Obesity 1.00 1.00 1.00 12 + micro-avg 0.95 0.95 0.95 15217 + macro-avg 0.90 0.88 0.88 15217 + weighted-avg 0.95 0.95 0.95 15217 +``` \ No newline at end of file diff --git a/docs/_posts/hsaglamlar/2023-04-12-ner_sdoh_emb_clinical_medium_wip_en.md b/docs/_posts/hsaglamlar/2023-04-12-ner_sdoh_emb_clinical_medium_wip_en.md new file mode 100644 index 0000000000..17e06f00e7 --- /dev/null +++ b/docs/_posts/hsaglamlar/2023-04-12-ner_sdoh_emb_clinical_medium_wip_en.md @@ -0,0 +1,245 @@ +--- +layout: model +title: Social Determinants of Health (clinical_medium) +author: John Snow Labs +name: ner_sdoh_emb_clinical_medium_wip +date: 2023-04-12 +tags: [en, clinical_medium, social_determinants, ner, public_health, sdoh, licensed] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.3.2 +spark_version: 3.2 +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model extracts terminology related to Social Determinants of Health from various kinds of biomedical documents. + +## Predicted Entities + +`Access_To_Care`, `Age`, `Alcohol`, `Chidhood_Event`, `Communicable_Disease`, `Community_Safety`, `Diet`, `Disability`, `Eating_Disorder`, `Education`, `Employment`, `Environmental_Condition`, `Exercise`, `Family_Member`, `Financial_Status`, `Food_Insecurity`, `Gender`, `Geographic_Entity`, `Healthcare_Institution`, `Housing`, `Hyperlipidemia`, `Hypertension`, `Income`, `Insurance_Status`, `Language`, `Legal_Issues`, `Marital_Status`, `Mental_Health`, `Obesity`, `Other_Disease`, `Other_SDoH_Keywords`, `Population_Group`, `Quality_Of_Life`, `Race_Ethnicity`, `Sexual_Activity`, `Sexual_Orientation`, `Smoking`, `Social_Exclusion`, `Social_Support`, `Spiritual_Beliefs`, `Substance_Duration`, `Substance_Frequency`, `Substance_Quantity`, `Substance_Use`, `Transportation`, `Violence_Or_Abuse` + +{:.btn-box} +[Live Demo](https://demo.johnsnowlabs.com/healthcare/SOCIAL_DETERMINANT_NER/){:.button.button-orange} +[Open in Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/SOCIAL_DETERMINANT_NER.ipynb){:.button.button-orange.button-orange-trans.co.button-icon} +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_sdoh_emb_clinical_medium_wip_en_4.3.2_3.2_1681303578006.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_sdoh_emb_clinical_medium_wip_en_4.3.2_3.2_1681303578006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_medium", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +ner_model = MedicalNerModel.pretrained("ner_sdoh_emb_clinical_medium_wip", "en", "clinical/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = NerConverterInternal()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + clinical_embeddings, + ner_model, + ner_converter + ]) + +sample_texts = [["Smith is a 55 years old, divorced Mexcian American woman with financial problems. She speaks spanish. She lives in an apartment. She has been struggling with diabetes for the past 10 years and has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and does not have access to health insurance or paid sick leave. She has a son student at college. Pt with likely long-standing depression. She is aware she needs rehab. Pt reprots having her catholic faith as a means of support as well. She has long history of etoh abuse, beginning in her teens. She reports she has been a daily drinker for 30 years, most recently drinking beer daily. She smokes a pack of cigarettes a day. She had DUI back in April and was due to be in court this week."]] + +data = spark.createDataFrame(sample_texts).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_medium", "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val ner_model = MedicalNerModel.pretrained("ner_sdoh_emb_clinical_medium_wip", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + sentence_detector, + tokenizer, + clinical_embeddings, + ner_model, + ner_converter +)) + +val data = Seq("Smith is a 55 years old, divorced Mexcian American woman with financial problems. She speaks spanish. She lives in an apartment. She has been struggling with diabetes for the past 10 years and has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and does not have access to health insurance or paid sick leave. She has a son student at college. Pt with likely long-standing depression. She is aware she needs rehab. Pt reprots having her catholic faith as a means of support as well. She has long history of etoh abuse, beginning in her teens. She reports she has been a daily drinker for 30 years, most recently drinking beer daily. She smokes a pack of cigarettes a day. She had DUI back in April and was due to be in court this week.").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++------------------+-----+---+-------------------+ +|chunk |begin|end|ner_label | ++------------------+-----+---+-------------------+ +|55 years old |11 |22 |Age | +|divorced |25 |32 |Marital_Status | +|Mexcian |34 |40 |Gender | +|American |42 |49 |Race_Ethnicity | +|woman |51 |55 |Gender | +|financial problems|62 |79 |Financial_Status | +|She |82 |84 |Gender | +|spanish |93 |99 |Language | +|She |102 |104|Gender | +|apartment |118 |126|Housing | +|She |129 |131|Gender | +|diabetes |158 |165|Other_Disease | +|hospitalizations |233 |248|Other_SDoH_Keywords| +|cleaning assistant|307 |324|Employment | +|health insurance |354 |369|Insurance_Status | +|She |391 |393|Gender | +|son |401 |403|Family_Member | +|student |405 |411|Education | +|college |416 |422|Education | +|depression |454 |463|Mental_Health | +|She |466 |468|Gender | +|she |479 |481|Gender | +|rehab |489 |493|Access_To_Care | +|her |514 |516|Gender | +|catholic faith |518 |531|Spiritual_Beliefs | +|support |547 |553|Social_Support | +|She |565 |567|Gender | +|etoh abuse |589 |598|Alcohol | +|her |614 |616|Gender | +|teens |618 |622|Age | +|She |625 |627|Gender | +|she |637 |639|Gender | +|daily |652 |656|Substance_Frequency| +|drinker |658 |664|Alcohol | +|30 years |670 |677|Substance_Duration | +|drinking |694 |701|Alcohol | +|beer |703 |706|Alcohol | +|daily |708 |712|Substance_Frequency| +|She |715 |717|Gender | +|smokes |719 |724|Smoking | +|a pack |726 |731|Substance_Quantity | +|cigarettes |736 |745|Smoking | +|a day |747 |751|Substance_Frequency| +|She |754 |756|Gender | +|DUI |762 |764|Legal_Issues | ++------------------+-----+---+-------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_sdoh_emb_clinical_medium_wip| +|Compatibility:|Healthcare NLP 4.3.2+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|3.0 MB| +|Dependencies:|embeddings_clinical_medium| + +## References + +Internal SHOP Project + +## Benchmarking + +```bash + label precision recall f1-score support + Geographic_Entity 0.89 0.88 0.88 106 + Gender 0.99 0.99 0.99 4957 + Healthcare_Institution 0.98 0.96 0.97 776 + Employment 0.95 0.95 0.95 2120 + Access_To_Care 0.90 0.81 0.85 459 + Income 0.79 0.79 0.79 29 + Social_Support 0.90 0.92 0.91 629 + Family_Member 0.97 0.99 0.98 2101 + Age 0.94 0.93 0.94 436 + Mental_Health 0.89 0.86 0.87 479 + Alcohol 0.96 0.96 0.96 254 + Substance_Use 0.88 0.95 0.91 208 + Hypertension 0.96 1.00 0.98 24 + Other_Disease 0.90 0.94 0.92 583 + Disability 0.93 0.97 0.95 40 + Insurance_Status 0.87 0.85 0.86 85 + Transportation 0.82 0.96 0.89 53 + Sexual_Orientation 0.78 0.95 0.86 19 + Marital_Status 0.98 0.96 0.97 90 + Race_Ethnicity 0.92 0.96 0.94 25 + Spiritual_Beliefs 0.80 0.80 0.80 51 + Housing 0.89 0.85 0.87 366 + Education 0.87 0.86 0.86 70 + Other_SDoH_Keywords 0.78 0.88 0.83 237 + Language 0.87 0.77 0.82 26 + Substance_Frequency 0.92 0.83 0.87 65 + Legal_Issues 0.77 0.85 0.81 55 + Social_Exclusion 0.97 0.97 0.97 30 + Financial_Status 0.88 0.66 0.75 123 + Violence_Or_Abuse 0.82 0.65 0.73 57 + Substance_Quantity 0.88 0.93 0.90 56 + Smoking 0.99 0.99 0.99 71 + Population_Group 0.91 0.71 0.80 14 + Hyperlipidemia 0.78 1.00 0.88 7 + Community_Safety 0.98 1.00 0.99 47 + Exercise 0.91 0.88 0.90 60 + Food_Insecurity 1.00 1.00 1.00 29 + Eating_Disorder 0.67 0.92 0.77 13 + Quality_Of_Life 0.79 0.82 0.81 61 + Sexual_Activity 0.89 0.83 0.86 29 + Chidhood_Event 0.90 0.72 0.80 25 + Diet 0.97 0.92 0.94 62 + Substance_Duration 0.66 0.95 0.78 39 +Environmental_Condition 1.00 1.00 1.00 20 + Obesity 1.00 1.00 1.00 14 + Communicable_Disease 1.00 0.94 0.97 32 + micro-avg 0.95 0.95 0.95 15132 + macro-avg 0.89 0.90 0.89 15132 + weighted-avg 0.95 0.95 0.95 15132 +``` \ No newline at end of file diff --git a/docs/_posts/mauro-nievoff/2023-04-12-ner_vop_emb_clinical_large_wip_en.md b/docs/_posts/mauro-nievoff/2023-04-12-ner_vop_emb_clinical_large_wip_en.md new file mode 100644 index 0000000000..d623cf9e09 --- /dev/null +++ b/docs/_posts/mauro-nievoff/2023-04-12-ner_vop_emb_clinical_large_wip_en.md @@ -0,0 +1,237 @@ +--- +layout: model +title: Voice of the Patients (embeddings_clinical_large) +author: John Snow Labs +name: ner_vop_emb_clinical_large_wip +date: 2023-04-12 +tags: [licensed, clinical, en, ner, vop, patient] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.4.0 +spark_version: [3.2, 3.0] +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model extracts healthcare-related terms from the documents transferred from the patient’s own sentences. + +Note: ‘wip’ suffix indicates that the model development is work-in-progress and will be finalised and the model performance will improved in the upcoming releases. + +## Predicted Entities + +`Allergen`, `SubstanceQuantity`, `RaceEthnicity`, `Measurements`, `InjuryOrPoisoning`, `Treatment`, `TestResult`, `Modifier`, `Route`, `MedicalDevice`, `Vaccine`, `RelationshipStatus`, `Frequency`, `HealthStatus`, `Procedure`, `Duration`, `DateTime`, `Disease`, `Test`, `Substance`, `Symptom`, `Laterality`, `Dosage`, `ClinicalDept`, `PsychologicalCondition`, `VitalTest`, `Age`, `Drug`, `BodyPart`, `AdmissionDischarge`, `Form`, `Employment`, `Gender` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_vop_emb_clinical_large_wip_en_4.4.0_3.2_1681315187438.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_vop_emb_clinical_large_wip_en_4.4.0_3.2_1681315187438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols(["sentence"]) \ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel().pretrained(embeddings_clinical_large, "en", "clinical/models")\ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("embeddings") + +ner = MedicalNerModel.pretrained("ner_vop_emb_clinical_large_wip", "en", "clinical/models") \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("ner") + +ner_converter = NerConverterInternal() \ + .setInputCols(["sentence", "token", "ner"]) \ + .setOutputCol("ner_chunk") +pipeline = Pipeline(stages=[document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner, + ner_converter]) + +data = spark.createDataFrame([["Hello,I"m 20 year old girl. I"m diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem. Finally I got an appointment with a homeopathy doctor finally he find that i was suffering from hyperthyroid and my TSH was 0.15 T3 and T4 is normal . Also i have b12 deficiency and vitamin D deficiency so I"m taking weekly supplement of vitamin D and 1000 mcg b12 daily. I"m taking homeopathy medicine for 40 days and took 2nd test after 30 days. My TSH is 0.5 now. I feel a little bit relief from weakness and depression but I"m facing with 2 new problem from last week that is breathtaking problem and very rapid heartrate. I just want to know if i should start allopathy medicine or homeopathy is okay? Bcs i heard that thyroid take time to start recover. So please let me know if both of medicines take same time. Because some of my friends advising me to start allopathy and never take a chance as i can develop some serious problems.Sorry for my poor english😐Thank you."]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel().pretrained(embeddings_clinical_large, "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val ner = MedicalNerModel.pretrained("ner_vop_emb_clinical_large_wip", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + + +val pipeline = new Pipeline().setStages(Array(document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner, + ner_converter)) + +val data = Seq("Hello,I"m 20 year old girl. I"m diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem. Finally I got an appointment with a homeopathy doctor finally he find that i was suffering from hyperthyroid and my TSH was 0.15 T3 and T4 is normal . Also i have b12 deficiency and vitamin D deficiency so I"m taking weekly supplement of vitamin D and 1000 mcg b12 daily. I"m taking homeopathy medicine for 40 days and took 2nd test after 30 days. My TSH is 0.5 now. I feel a little bit relief from weakness and depression but I"m facing with 2 new problem from last week that is breathtaking problem and very rapid heartrate. I just want to know if i should start allopathy medicine or homeopathy is okay? Bcs i heard that thyroid take time to start recover. So please let me know if both of medicines take same time. Because some of my friends advising me to start allopathy and never take a chance as i can develop some serious problems.Sorry for my poor english😐Thank you.").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash +| chunk | ner_label | +|:---------------------|:-----------------------| +| 20 year old | Age | +| girl | Gender | +| hyperthyroid | Disease | +| 1 month ago | DateTime | +| weak | Symptom | +| light | Symptom | +| panic attacks | PsychologicalCondition | +| depression | PsychologicalCondition | +| left | Laterality | +| chest | BodyPart | +| pain | Symptom | +| increased | TestResult | +| heart rate | VitalTest | +| rapidly | Modifier | +| weight loss | Symptom | +| 4 months | Duration | +| hospital | ClinicalDept | +| discharged | AdmissionDischarge | +| hospital | ClinicalDept | +| blood tests | Test | +| brain | BodyPart | +| mri | Test | +| ultrasound scan | Test | +| endoscopy | Procedure | +| doctors | Employment | +| homeopathy doctor | Employment | +| he | Gender | +| hyperthyroid | Disease | +| TSH | Test | +| 0.15 | TestResult | +| T3 | Test | +| T4 | Test | +| normal | TestResult | +| b12 deficiency | Disease | +| vitamin D deficiency | Disease | +| weekly | Frequency | +| supplement | Drug | +| vitamin D | Drug | +| 1000 mcg | Dosage | +| b12 | Drug | +| daily | Frequency | +| homeopathy medicine | Treatment | +| 40 days | Duration | +| after 30 days | DateTime | +| TSH | Test | +| 0.5 | TestResult | +| now | DateTime | +| weakness | Symptom | +| depression | PsychologicalCondition | +| last week | DateTime | +| heartrate | VitalTest | +| homeopathy | Treatment | +| thyroid | BodyPart | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_vop_emb_clinical_large_wip| +|Compatibility:|Healthcare NLP 4.4.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|3.9 MB| +|Dependencies:|embeddings_clinical_large| + +## References + +In-house annotated health-related text in colloquial language. + +## Sample text from the training dataset + +Hello,I"m 20 year old girl. I"m diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem. Finally I got an appointment with a homeopathy doctor finally he find that i was suffering from hyperthyroid and my TSH was 0.15 T3 and T4 is normal . Also i have b12 deficiency and vitamin D deficiency so I"m taking weekly supplement of vitamin D and 1000 mcg b12 daily. I"m taking homeopathy medicine for 40 days and took 2nd test after 30 days. My TSH is 0.5 now. I feel a little bit relief from weakness and depression but I"m facing with 2 new problem from last week that is breathtaking problem and very rapid heartrate. I just want to know if i should start allopathy medicine or homeopathy is okay? Bcs i heard that thyroid take time to start recover. So please let me know if both of medicines take same time. Because some of my friends advising me to start allopathy and never take a chance as i can develop some serious problems.Sorry for my poor english😐Thank you. + +## Benchmarking + +```bash + label tp fp fn total precision recall f1 + Allergen 0 0 8 8 0.00 0.00 0.00 + SubstanceQuantity 7 10 20 27 0.41 0.26 0.32 + RaceEthnicity 2 0 6 8 1.00 0.25 0.40 + Measurements 36 20 38 74 0.64 0.49 0.55 + InjuryOrPoisoning 65 28 52 117 0.70 0.56 0.62 + Treatment 86 27 56 142 0.76 0.61 0.67 + TestResult 379 150 169 548 0.72 0.69 0.70 + Modifier 644 229 269 913 0.74 0.71 0.72 + Route 23 5 13 36 0.82 0.64 0.72 + MedicalDevice 171 54 73 244 0.76 0.70 0.73 + Vaccine 21 4 11 32 0.84 0.66 0.74 + RelationshipStatus 18 2 10 28 0.90 0.64 0.75 + Frequency 478 161 165 643 0.75 0.74 0.75 + HealthStatus 75 19 23 98 0.80 0.77 0.78 + Procedure 275 68 91 366 0.80 0.75 0.78 + Duration 884 275 231 1115 0.76 0.79 0.78 + DateTime 1796 397 408 2204 0.82 0.81 0.82 + Disease 1258 323 245 1503 0.80 0.84 0.82 + Test 752 155 157 909 0.83 0.83 0.83 + Substance 152 37 26 178 0.80 0.85 0.83 + Symptom 3078 547 621 3699 0.85 0.83 0.84 + Laterality 425 62 93 518 0.87 0.82 0.85 + Dosage 266 37 56 322 0.88 0.83 0.85 + ClinicalDept 201 28 35 236 0.88 0.85 0.86 +PsychologicalCondition 282 41 32 314 0.87 0.90 0.89 + VitalTest 146 25 11 157 0.85 0.93 0.89 + Age 295 38 28 323 0.89 0.91 0.90 + Drug 1040 136 95 1135 0.88 0.92 0.90 + BodyPart 2528 245 217 2745 0.91 0.92 0.92 + AdmissionDischarge 24 1 3 27 0.96 0.89 0.92 + Form 233 24 18 251 0.91 0.93 0.92 + Employment 988 51 54 1042 0.95 0.95 0.95 + Gender 1173 26 21 1194 0.98 0.98 0.98 + macro_avg 17801 3225 3355 21156 0.80 0.73 0.76 + micro_avg 17801 3225 3355 21156 0.85 0.84 0.84 +``` \ No newline at end of file diff --git a/docs/_posts/mauro-nievoff/2023-04-12-ner_vop_emb_clinical_medium_wip_en.md b/docs/_posts/mauro-nievoff/2023-04-12-ner_vop_emb_clinical_medium_wip_en.md new file mode 100644 index 0000000000..35e3cfabf8 --- /dev/null +++ b/docs/_posts/mauro-nievoff/2023-04-12-ner_vop_emb_clinical_medium_wip_en.md @@ -0,0 +1,240 @@ +--- +layout: model +title: Voice of the Patients (embeddings_clinical_medium) +author: John Snow Labs +name: ner_vop_emb_clinical_medium_wip +date: 2023-04-12 +tags: [licensed, clinical, en, ner, vop, patient] +task: Named Entity Recognition +language: en +edition: Healthcare NLP 4.4.0 +spark_version: [3.0, 3.2] +supported: true +annotator: MedicalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model extracts healthcare-related terms from the documents transferred from the patient’s own sentences. + +Note: ‘wip’ suffix indicates that the model development is work-in-progress and will be finalised and the model performance will improved in the upcoming releases. + +## Predicted Entities + +`Allergen`, `SubstanceQuantity`, `RaceEthnicity`, `Measurements`, `InjuryOrPoisoning`, `Treatment`, `Modifier`, `TestResult`, `MedicalDevice`, `Vaccine`, `Frequency`, `HealthStatus`, `Route`, `RelationshipStatus`, `Procedure`, `Duration`, `DateTime`, `AdmissionDischarge`, `Disease`, `Test`, `Substance`, `Laterality`, `Symptom`, `ClinicalDept`, `Dosage`, `Age`, `Drug`, `VitalTest`, `PsychologicalCondition`, `Form`, `BodyPart`, `Employment`, `Gender` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/ner_vop_emb_clinical_medium_wip_en_4.4.0_3.2_1681315530573.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/ner_vop_emb_clinical_medium_wip_en_4.4.0_3.2_1681315530573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols(["sentence"]) \ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel().pretrained(embeddings_clinical_medium, "en", "clinical/models")\ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("embeddings") + +ner = MedicalNerModel.pretrained("ner_vop_emb_clinical_medium_wip", "en", "clinical/models") \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("ner") + +ner_converter = NerConverterInternal() \ + .setInputCols(["sentence", "token", "ner"]) \ + .setOutputCol("ner_chunk") +pipeline = Pipeline(stages=[document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner, + ner_converter]) + +data = spark.createDataFrame([["Hello,I"m 20 year old girl. I"m diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem. Finally I got an appointment with a homeopathy doctor finally he find that i was suffering from hyperthyroid and my TSH was 0.15 T3 and T4 is normal . Also i have b12 deficiency and vitamin D deficiency so I"m taking weekly supplement of vitamin D and 1000 mcg b12 daily. I"m taking homeopathy medicine for 40 days and took 2nd test after 30 days. My TSH is 0.5 now. I feel a little bit relief from weakness and depression but I"m facing with 2 new problem from last week that is breathtaking problem and very rapid heartrate. I just want to know if i should start allopathy medicine or homeopathy is okay? Bcs i heard that thyroid take time to start recover. So please let me know if both of medicines take same time. Because some of my friends advising me to start allopathy and never take a chance as i can develop some serious problems.Sorry for my poor english😐Thank you."]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols("document") + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols("sentence") + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel().pretrained(embeddings_clinical_medium, "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("embeddings") + +val ner = MedicalNerModel.pretrained("ner_vop_emb_clinical_medium_wip", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + + +val pipeline = new Pipeline().setStages(Array(document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner, + ner_converter)) + +val data = Seq("Hello,I"m 20 year old girl. I"m diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem. Finally I got an appointment with a homeopathy doctor finally he find that i was suffering from hyperthyroid and my TSH was 0.15 T3 and T4 is normal . Also i have b12 deficiency and vitamin D deficiency so I"m taking weekly supplement of vitamin D and 1000 mcg b12 daily. I"m taking homeopathy medicine for 40 days and took 2nd test after 30 days. My TSH is 0.5 now. I feel a little bit relief from weakness and depression but I"m facing with 2 new problem from last week that is breathtaking problem and very rapid heartrate. I just want to know if i should start allopathy medicine or homeopathy is okay? Bcs i heard that thyroid take time to start recover. So please let me know if both of medicines take same time. Because some of my friends advising me to start allopathy and never take a chance as i can develop some serious problems.Sorry for my poor english😐Thank you.").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash +| chunk | ner_label | +|:---------------------|:-----------------------| +| 20 year old | Age | +| girl | Gender | +| hyperthyroid | Disease | +| 1 month ago | DateTime | +| weak | Symptom | +| light | Symptom | +| panic attacks | PsychologicalCondition | +| depression | PsychologicalCondition | +| left | Laterality | +| chest | BodyPart | +| pain | Symptom | +| increased | TestResult | +| heart rate | VitalTest | +| rapidly | Modifier | +| weight loss | Symptom | +| 4 months | Duration | +| hospital | ClinicalDept | +| discharged | AdmissionDischarge | +| hospital | ClinicalDept | +| blood tests | Test | +| brain | BodyPart | +| mri | Test | +| ultrasound scan | Test | +| endoscopy | Procedure | +| doctors | Employment | +| homeopathy doctor | Employment | +| he | Gender | +| hyperthyroid | Disease | +| TSH | Test | +| 0.15 | TestResult | +| T3 | Test | +| T4 | Test | +| normal | TestResult | +| b12 deficiency | Disease | +| vitamin D deficiency | Disease | +| weekly | Frequency | +| supplement | Drug | +| vitamin D | Drug | +| 1000 mcg | Dosage | +| b12 | Drug | +| daily | Frequency | +| homeopathy medicine | Treatment | +| 40 days | Duration | +| after 30 days | DateTime | +| TSH | Test | +| 0.5 | TestResult | +| now | DateTime | +| weakness | Symptom | +| depression | PsychologicalCondition | +| last week | DateTime | +| rapid | TestResult | +| heartrate | VitalTest | +| allopathy medicine | Treatment | +| homeopathy | Treatment | +| thyroid | BodyPart | +| allopathy | Treatment | +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_vop_emb_clinical_medium_wip| +|Compatibility:|Healthcare NLP 4.4.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|3.9 MB| +|Dependencies:|embeddings_clinical_medium| + +## References + +In-house annotated health-related text in colloquial language. + +## Sample text from the training dataset + +Hello,I"m 20 year old girl. I"m diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem. Finally I got an appointment with a homeopathy doctor finally he find that i was suffering from hyperthyroid and my TSH was 0.15 T3 and T4 is normal . Also i have b12 deficiency and vitamin D deficiency so I"m taking weekly supplement of vitamin D and 1000 mcg b12 daily. I"m taking homeopathy medicine for 40 days and took 2nd test after 30 days. My TSH is 0.5 now. I feel a little bit relief from weakness and depression but I"m facing with 2 new problem from last week that is breathtaking problem and very rapid heartrate. I just want to know if i should start allopathy medicine or homeopathy is okay? Bcs i heard that thyroid take time to start recover. So please let me know if both of medicines take same time. Because some of my friends advising me to start allopathy and never take a chance as i can develop some serious problems.Sorry for my poor english😐Thank you. + +## Benchmarking + +```bash + label tp fp fn total precision recall f1 + Allergen 0 1 8 8 0.00 0.00 0.00 + SubstanceQuantity 9 14 18 27 0.39 0.33 0.36 + RaceEthnicity 2 0 6 8 1.00 0.25 0.40 + Measurements 41 25 33 74 0.62 0.55 0.59 + InjuryOrPoisoning 66 37 51 117 0.64 0.56 0.60 + Treatment 96 39 46 142 0.71 0.68 0.69 + Modifier 642 268 271 913 0.71 0.70 0.70 + TestResult 394 185 154 548 0.68 0.72 0.70 + MedicalDevice 177 76 67 244 0.70 0.73 0.71 + Vaccine 20 4 12 32 0.83 0.63 0.71 + Frequency 456 144 187 643 0.76 0.71 0.73 + HealthStatus 60 4 38 98 0.94 0.61 0.74 + Route 24 4 12 36 0.86 0.67 0.75 + RelationshipStatus 19 3 9 28 0.86 0.68 0.76 + Procedure 286 91 80 366 0.76 0.78 0.77 + Duration 846 227 269 1115 0.79 0.76 0.77 + DateTime 1813 455 391 2204 0.80 0.82 0.81 + AdmissionDischarge 19 1 8 27 0.95 0.70 0.81 + Disease 1247 318 256 1503 0.80 0.83 0.81 + Test 734 150 175 909 0.83 0.81 0.82 + Substance 156 48 22 178 0.76 0.88 0.82 + Laterality 440 91 78 518 0.83 0.85 0.84 + Symptom 3069 566 630 3699 0.84 0.83 0.84 + ClinicalDept 205 35 31 236 0.85 0.87 0.86 + Dosage 273 42 49 322 0.87 0.85 0.86 + Age 294 60 29 323 0.83 0.91 0.87 + Drug 1035 188 100 1135 0.85 0.91 0.88 + VitalTest 144 23 13 157 0.86 0.92 0.89 +PsychologicalCondition 284 32 30 314 0.90 0.90 0.90 + Form 234 32 17 251 0.88 0.93 0.91 + BodyPart 2532 256 213 2745 0.91 0.92 0.92 + Employment 980 65 62 1042 0.94 0.94 0.94 + Gender 1174 27 20 1194 0.98 0.98 0.98 + macro_avg 17771 3511 3385 21156 0.79 0.73 0.75 + micro_avg 17771 3511 3385 21156 0.84 0.84 0.84 +``` \ No newline at end of file