![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP/FewShotClassifierModel.ipynb)

# **FewShotClassifierModel**

This notebook covers the uses of `FewShotClassifierModel`. This annotator specifically targets few-shot classification tasks, which involves training models to make accurate predictions with limited labeled data.




**📖 Learning Objectives:**

1. Understand how `FewShotClassifierModel` works.

2. Become comfortable using the parameters of the annotator.


**🔗 Helpful Links:**

- Documentation : [FewShotClassifierModel](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#fewshotclassifier)

- Python Docs : [FewShotClassifierModel](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/classification/few_shot_classifier/index.html#sparknlp_jsl.annotator.classification.few_shot_classifier.FewShotClassifierModel)

- Scala Docs : [FewShotClassifierModel](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/nlp/annotators/classification/FewShotClassifierModel.html)

- For extended examples of usage, see [Spark NLP Workshop repository](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/30.3.Text_Classification_with_FewShotClassifier.ipynb#scrollTo=REDlKd7enG5r).


## **📜 Background**

`FewShotClassifierModel` provides a valuable capability for handling scenarios where labeled data is scarce or expensive to obtain. By effectively utilizing limited labeled examples, the few-shot classification approach enables the creation of models that can generalize and classify new instances accurately, even with minimal training data.

`FewShotClassifier` is designed to process sentence embeddings as an input. It generates category annotations, providing labels along with confidence scores that range from 0 to 1. Input annotation types supported by this model include `SENTENCE_EMBEDDINGS`, while the output annotation type is `CATEGORY`.

## **🎬 Colab Setup**

In [1]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.2/265.2 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.8/310.8 MB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m565.0/565.0 kB[0m [31m39.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m676.2/676.2 kB[0m [31m46.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.6/95.6 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m94.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m320.3 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.9/66.9 kB[0m [31m9

In [2]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

Please Upload your John Snow Labs License using the button below


Saving 5.3.3.spark_nlp_for_healthcare.json to 5.3.3.spark_nlp_for_healthcare.json


In [3]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.install()

👌 Detected license file /content/5.3.3.spark_nlp_for_healthcare.json
🚨 Outdated Medical Secrets in license file. Version=5.3.3 but should be Version=5.3.2
🚨 Outdated OCR Secrets in license file. Version=5.1.2 but should be Version=5.3.2
👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False if you want to install outdated secrets.
📋 Stored John Snow Labs License in /root/.johnsnowlabs/licenses/license_number_0_for_Spark-Healthcare_Spark-OCR.json
👷 Setting up  John Snow Labs home in /root/.johnsnowlabs, this might take a few minutes.
Downloading 🐍+🚀 Python Library spark_nlp-5.3.2-py2.py3-none-any.whl
Downloading 🐍+💊 Python Library spark_nlp_jsl-5.3.2-py3-none-any.whl
Downloading 🫘+🚀 Java Library spark-nlp-assembly-5.3.2.jar
Downloading 🫘+💊 Java Library spark-nlp-jsl-5.3.2.jar
🙆 JSL Home setup in /root/.johnsnowlabs
👌 Detected license file /content/5.3.3.spark_nlp_for_healthcare.json
👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False 

In [4]:
# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

👌 Detected license file /content/5.3.3.spark_nlp_for_healthcare.json
👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False if you want to install outdated secrets.
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==5.3.2, 💊Spark-Healthcare==5.3.2, running on ⚡ PySpark==3.4.0


## **🖨️ Input/Output Annotation Types**

- Input: `SENTENCE EMBEDDINGS`

- Output: `CATEGORY`

## **🔎 Parameters**

- `setFeatureScaling`:  Feature scaling method. Possible values are 'zscore', 'minmax' or empty (no scaling)").
- `setMultiClass`: If multiClass is set, the model will return all the labels with corresponding scores. By default, multiClass is false.   



## **💻 Pipeline**

We will use this pretrained model [Few Shot Patient Complaint Classification](https://nlp.johnsnowlabs.com/2023/08/30/few_shot_classifier_patient_complaint_sbiobert_cased_mli_en.html) in the pipeline to understand the few shot classification concept.


This Text Classifier model was trained using healthcare-related text and  google reviews of various healthcare facilities written by patients about the performance of the facility and its personnel.

The dataset has been labeled with two different classes:

`Complaint`: The text includes dissatisfaction or frustration with some aspect of the healthcare provided to the patient. Most often, negative or critical language is used to describe the experience,

`No_Complaint`: The review expresses positive or neutral sentiment about the service. There is no criticism or expressing of dissatisfaction.

In [5]:
# Annotator that transforms a text column from dataframe into an Annotation ready for NLP
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

bert_sent = nlp.BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")

few_shot_classifier = medical.FewShotClassifierModel.pretrained("few_shot_classifier_patient_complaint_sbiobert_cased_mli", "en", "clinical/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("prediction")

clf_Pipeline = nlp.Pipeline(stages=[
    document_assembler,
    bert_sent,
    few_shot_classifier
])

sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
few_shot_classifier_patient_complaint_sbiobert_cased_mli download started this may take some time.
[OK!]


Let us use those sample texts, which may or may not include patients' complaints, to test the performance of the healthcare facility and its personnel.

We will convert the text to Pyspark dataframe and then get predictions for complaints by using `.transform`.

In [6]:
data = spark.createDataFrame([["""The Medical Center is a large state of the art hospital facility with great doctors, nurses, technicians and receptionists.  Service is top notch, knowledgeable and friendly.  This hospital site has plenty of parking"""],
 ["""My gf dad wasn’t feeling well so we decided to take him to this place cus it’s his insurance and we waited for a while and mind that my girl dad couldn’t breath good while the staff seem not to care and when they got to us they said they we’re gonna a take some blood samples and they made us wait again and to see the staff workers talking to each other and laughing taking there time and not seeming to care about there patience, while we were in the lobby there was another guy who told us they also made him wait while he can hardly breath and they left him there to wait my girl dad is coughing and not doing better and when the lady came in my girl dad didn’t have his shirt because he was hot and the lady came in said put on his shirt on and then left still waiting to get help rn"""],
 ["The doctor seemed rushed during the appointment, and I didn't get a chance to discuss all my concerns. I feel unheard."],
 ["I don't know if it's just me, but the treatment plan doesn't seem to be working, and I'm still in a lot of pain."],
 ["I can't say enough good things about the hospital staff. They were all incredibly kind, compassionate, and went above and beyond to ensure my comfort during my stay. The nurses were always available and responsive to my needs, and the doctors clearly explained everything in a way that was easy to understand."],
 ["The equipment and resources available to the medical staff were truly impressive, and I felt confident that I was receiving the best possible care."],
 ["I'm not sure if it's the medication or something else, but I've been experiencing strange side effects like vivid dreams and dizziness."]
                              ]).toDF("text")

In [7]:
result = clf_Pipeline.fit(data).transform(data)

In [8]:
result.select('prediction.result','text').show(truncate = 150)

+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|        result|                                                                                                                                                  text|
+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|[No_Complaint]|The Medical Center is a large state of the art hospital facility with great doctors, nurses, technicians and receptionists.  Service is top notch, ...|
|   [Complaint]|My gf dad wasn’t feeling well so we decided to take him to this place cus it’s his insurance and we waited for a while and mind that my girl dad co...|
|   [Complaint]|                                The doctor seemed rushed during the appointment, and I didn't get a chance to discuss all my concerns. I feel un

### `setFeatureScaling`

This parameter is used to define the Feature Scaling method. Possible values are 'zscore', 'minmax' or empty (no scaling)").



In [9]:
# zscore

few_shot_classifier = medical.FewShotClassifierModel.pretrained("few_shot_classifier_patient_complaint_sbiobert_cased_mli", "en", "clinical/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("prediction")\
    .setFeatureScaling("zscore")

clf_Pipeline = nlp.Pipeline(stages=[
    document_assembler,
    bert_sent,
    few_shot_classifier
])

result = clf_Pipeline.fit(data).transform(data)

few_shot_classifier_patient_complaint_sbiobert_cased_mli download started this may take some time.
[OK!]


In [10]:
result.select('prediction.result','text').show(truncate = 150)

+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|        result|                                                                                                                                                  text|
+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|[No_Complaint]|The Medical Center is a large state of the art hospital facility with great doctors, nurses, technicians and receptionists.  Service is top notch, ...|
|   [Complaint]|My gf dad wasn’t feeling well so we decided to take him to this place cus it’s his insurance and we waited for a while and mind that my girl dad co...|
|   [Complaint]|                                The doctor seemed rushed during the appointment, and I didn't get a chance to discuss all my concerns. I feel un

In [11]:
# minmax

few_shot_classifier = medical.FewShotClassifierModel.pretrained("few_shot_classifier_patient_complaint_sbiobert_cased_mli", "en", "clinical/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("prediction")\
    .setFeatureScaling("minmax")

clf_Pipeline = nlp.Pipeline(stages=[
    document_assembler,
    bert_sent,
    few_shot_classifier
])

result = clf_Pipeline.fit(data).transform(data)

few_shot_classifier_patient_complaint_sbiobert_cased_mli download started this may take some time.
[OK!]


In [12]:
result.select('prediction.result','text').show(truncate = 150)

+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|        result|                                                                                                                                                  text|
+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|[No_Complaint]|The Medical Center is a large state of the art hospital facility with great doctors, nurses, technicians and receptionists.  Service is top notch, ...|
|   [Complaint]|My gf dad wasn’t feeling well so we decided to take him to this place cus it’s his insurance and we waited for a while and mind that my girl dad co...|
|   [Complaint]|                                The doctor seemed rushed during the appointment, and I didn't get a chance to discuss all my concerns. I feel un