# Zero Shot NER


![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/healthcare/medical_named_entity_recognition/zero_shot_ner.ipynb)

Based on John Snow Labs Enterprise-NLP [ZeroShotNerModel](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#zeroshotnermodel)       
This architecture is based on `RoBertaForQuestionAnswering`.
Zero shot models excel at generalization, meaning that the model can accurately predict entities in very different data sets without the need to fine tune the model or train from scratch for each different domain.
Even though a model trained to solve a specific problem can achieve better accuracy than a zero-shot model in this specific task, 
it probably won’t be be useful in a different task. 
That is where zero-shot models shows its usefulness by being able to achieve good results in various domains.


In [None]:
%%capture
! pip install nlu
! pip install pyspark==3.1.1

## Authorize your Environment

In [None]:
%%capture
import nlu
nlu.auth(
    HEALTHCARE_LICENSE_OR_JSON_PATH= "Your Secrets",
    AWS_ACCESS_KEY_ID = "Your Secrets",
    AWS_SECRET_ACCESS_KEY = "Your Secrets",
    HEALTHCARE_SECRET= 'Your Secrets'
)


## Load the Model

In [None]:
import nlu 
enterprise_zero_shot_ner = nlu.load('en.zero_shot.ner_roberta')
enterprise_zero_shot_ner

zero_shot_ner_roberta download started this may take some time.
Approximate size to download 438.9 MB
[OK!]


{'zero_shot_ner': ZeroShotNerModel_55009a5b6e01,
 'tokenizer': Tokenizer_2e6564cc76ef,
 'document_assembler': DocumentAssembler_5a9e1d983500,
 'chunk_converter_licensed@entities': NerConverterInternal_f0c9676dce30}

## Configure entity definitions

In [None]:
enterprise_zero_shot_ner['zero_shot_ner'].setEntityDefinitions(
    {
        "PROBLEM": [
            "What is the disease?",
            "What is his symptom?",
            "What is her disease?",
            "What is his disease?",
            "What is the problem?",
            "What does a patient suffer",
            "What was the reason that the patient is admitted to the clinic?",
        ],
        "DRUG": [
            "Which drug?",
            "Which is the drug?",
            "What is the drug?",
            "Which drug does he use?",
            "Which drug does she use?",
            "Which drug do I use?",
            "Which drug is prescribed for a symptom?",
        ],
        "ADMISSION_DATE": ["When did patient admitted to a clinic?"],
        "PATIENT_AGE": [
            "How old is the patient?",
            "What is the gae of the patient?",
        ],
    }
)


ZeroShotNerModel_55009a5b6e01

## Predict on some sample text with entities

In [None]:

df = enterprise_zero_shot_ner.predict(
    [
        "The doctor pescribed Majezik for my severe headache.",
        "The patient was admitted to the hospital for his colon cancer.",
        "27 years old patient was admitted to clinic on Sep 1st by Dr. X for a right-sided pleural effusion for thoracentesis.",
    ]
)

df

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


Unnamed: 0,document,entities_zero_shot,entities_zero_shot_class,entities_zero_shot_confidence,entities_zero_shot_origin_chunk,entities_zero_shot_origin_sentence
0,The doctor pescribed Majezik for my severe hea...,Majezik,DRUG,0.6467171,0,0
0,The doctor pescribed Majezik for my severe hea...,severe headache,PROBLEM,0.5526352,1,0
1,The patient was admitted to the hospital for h...,colon cancer,PROBLEM,0.88985014,0,0
2,27 years old patient was admitted to clinic on...,27 years old,PATIENT_AGE,0.6943088,0,0
2,27 years old patient was admitted to clinic on...,Sep 1st,ADMISSION_DATE,0.95646083,1,0
2,27 years old patient was admitted to clinic on...,a right-sided pleural effusion for thoracentesis,PROBLEM,0.50026625,2,0
