![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/SOCIAL_DETERMINANT_CLASSIFICATION.ipynb)

# **Social Determinants of Health-Classification**

📌To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload `license_keys.json` to the folder that opens.
Otherwise, you can look at the example outputs at the bottom of the notebook.

# **Colab Setup**

In [None]:
import json, os
from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [3]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel
from pyspark.sql.types import StringType, IntegerType

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 5.1.4
Spark NLP_JSL Version : 5.1.4


# 🔎 MODELS

### Sequence Classifier :
|                       Model Name                       |                                                        Description                                                       |   Predicted Entities  |
|:------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------:|
| bert_sequence_classifier_sdoh_community_present_status | This model classifies related to social support such as a family member or friend in the clinical documents.             | True, False |
| bert_sequence_classifier_sdoh_community_absent_status  | This model classifies related to the loss of social support such as a family member or friend in the clinical documents. | True, False |
| bert_sequence_classifier_sdoh_frailty_vulnerability  | This model classifies related to frailty and vulnerability status in the clinical documents. | Frailty_Vulnerability, No_Or_Unknown |
| bert_sequence_classifier_sdoh_mental_health  | This model classifies related to mental health status in the clinical documents. | Mental_Disorder, No_Or_Not_Mentioned |
| bert_sequence_classifier_sdoh_violence_abuse  | This model classifies related to violence and abuse status in the clinical documents. | Domestic_Violence_Abuse, Personal_Violence_Abuse, No_Violence_Abuse, Unknown |


### Generic Classifier :
|                              Model Name                             |                                                                    Description                                                                    |                              Predicted Entities                              |
|:-------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------:|
| genericclassifier_sdoh_alcohol_usage_sbiobert_cased_mli             | This Generic Classifier model is intended for detecting alcohol use in clinical notes and trained by using GenericClassifierApproach annotator.   | Present, Past, Never, None                                         |
| genericclassifier_sdoh_tobacco_usage_sbiobert_cased_mli             | This Generic Classifier model is intended for detecting tobacco use in clinical notes and trained by using GenericClassifierApproach annotator.   | Present, Past, Never, None                                         |
| genericclassifier_sdoh_substance_usage_binary_sbiobert_cased_mli    | This Generic Classifier model is intended for detecting substance use in clinical notes and trained by using GenericClassifierApproach annotator. | Present, None                                                      |
| genericclassifier_sdoh_alcohol_usage_binary_sbiobert_cased_mli      | This Generic Classifier model is intended for detecting alcohol use in clinical notes and trained by using GenericClassifierApproach annotator.   | Present, Never, None                                               |
| genericclassifier_sdoh_economics_binary_sbiobert_cased_mli          | This model classifies related to social economics status in the clinical documents and trained by using GenericClassifierApproach annotator.      | True, False                                                        |
| genericclassifier_sdoh_transportation_insecurity_sbiobert_cased_mli | The transportation insecurity classifier employs BERT embeddings within a generic classifier framework.                                           | Transportation_Insecurity, No_Transportation_Insecurity_Or_Unknown |
| genericclassifier_sdoh_food_insecurity_mpnet | The food classifier employs MPNET embeddings within a generic classifier framework.                                           | No_Food_Insecurity_Or_Unknown, Food_Insecurity |




**🔎You can find all these models and more [NLP Models Hub](https://nlp.johnsnowlabs.com/models?task=Named+Entity+Recognition&edition=Spark+NLP+for+Healthcare)**

# 📌 Sequence Classifier

### **`bert_sequence_classifier_sdoh_community_present_status`**

In [None]:
text_list = [
" Patient with history of lupus, lupus nephritis with ESRD on peritoneal dialysis on transplant list, hx of PE/Antiphopholipid antibody on coumadin, mitral regurg, presents with 4-6 month history of cough, worse in the morning, one week of trace blood, now producing bright red blood over last couple days. Patient states that the amount of blood she has been coughing has been increasing and is now almost hourly, aprroximately 1 teaspoon bright red blood. Patient states that the cough produced primarily yellow sputum until it turned to blood. Patient denies any other symptoms such as dizziness or lightheadedness.  Married with three children,Worked as an accountant until health declined in early 2002. No tobacco, ethanol or drug use. Centrilobular nodules and ground glass opacities throughout both lungs, with a basilar predominance, with associated mild bronchiectasis, compatible with chronic collagen vascular disease, progressed since 2002. There is no advanced fibrosis. Superimposed infection cannot be excluded by imaging alone. Ground glass opacities could also represent hemorrhage. 3. Chronic left lower segmental pulmonary arterial PE, unchanged since 2191. No new acute PE detected to the subsegmental levels.",
" This is an 87 year old man status post motor vehicle accident in Month (only) 956 who was recently discharged from Hospital1 18 status post left radical nephrectomy for renal oncocytoma, who returned to Hospital1 18 on 3-22 for outpatient followup CT scan of the head. Patient was found to have a left subdural hematoma and was transported to the emergency department for workup. Currently patient does not complain of fever, chills, nausea, vomiting, chest pain, shortness of breath. No known drug allergies.The patient is a retired priest. Denies history of tobacco or alcohol use. Patient currently lives at Hospital3 2558.",
]

In [None]:
documentAssembler = DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_community_present_status", "en", "clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("class_")


pipeline = Pipeline(stages=[
                        documentAssembler,
                        tokenizer,
                        sequenceClassifier])


df = spark.createDataFrame(text_list, StringType()).toDF("text")
results = pipeline.fit(df).transform(df)

bert_sequence_classifier_sdoh_community_present_status download started this may take some time.
[OK!]


In [None]:
res = results.select(F.explode(F.arrays_zip(results.document.result,
                                             results.class_.result,
                                             results.class_.metadata)).alias("col"))\
               .select(F.expr("col['1']").alias("prediction"),
                       F.expr("col['2']").alias("confidence"),
                       F.expr("col['0']").alias("sentence"))

if res.count()>1:
  udf_func = F.udf(lambda x,y:  x["Some("+str(y)+")"])
  res.withColumn('confidence', udf_func(res.confidence, res.prediction)).show(truncate=150)

+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|prediction|confidence|                                                                                                                                              sentence|
+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|      True| 0.9982032| Patient with history of lupus, lupus nephritis with ESRD on peritoneal dialysis on transplant list, hx of PE/Antiphopholipid antibody on coumadin,...|
|     False|0.78874034| This is an 87 year old man status post motor vehicle accident in Month (only) 956 who was recently discharged from Hospital1 18 status post left r...|
+----------+----------+------------------------------------------------------------------------------------------------------

### **`bert_sequence_classifier_sdoh_community_absent_status`**

In [None]:
text_list = [
"She has two adult sons. She is a widow. She was employed with housework. She quit smoking 20 to 30 years ago, but smoked two packs per day for 20 to 30 years. She drinks one glass of wine occasionally. She avoids salt in her diet. ",
"65 year old male presented with several days of vice like chest pain. He states that he felt like his chest was being crushed from back to the front. Lives with spouse and two sons moved to US 1 month ago.",
]

In [None]:
sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_community_absent_status", "en", "clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("class_")


pipeline = Pipeline(stages=[
                      documentAssembler,
                      tokenizer,
                      sequenceClassifier])


df = spark.createDataFrame(text_list, StringType()).toDF("text")
results = pipeline.fit(df).transform(df)

bert_sequence_classifier_sdoh_community_absent_status download started this may take some time.
[OK!]


In [None]:
res = results.select(F.explode(F.arrays_zip(results.document.result,
                                             results.class_.result,
                                             results.class_.metadata)).alias("col"))\
               .select(F.expr("col['1']").alias("prediction"),
                       F.expr("col['2']").alias("confidence"),
                       F.expr("col['0']").alias("sentence"))

if res.count()>1:
  udf_func = F.udf(lambda x,y:  x["Some("+str(y)+")"])
  res.withColumn('confidence', udf_func(res.confidence, res.prediction)).show(truncate=150)

+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|prediction|confidence|                                                                                                                                              sentence|
+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|      True| 0.9894813|She has two adult sons. She is a widow. She was employed with housework. She quit smoking 20 to 30 years ago, but smoked two packs per day for 20 t...|
|     False|0.72528476|65 year old male presented with several days of vice like chest pain. He states that he felt like his chest was being crushed from back to the fron...|
+----------+----------+------------------------------------------------------------------------------------------------------

### **`bert_sequence_classifier_sdoh_frailty_vulnerability`**

In [4]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_frailty_vulnerability", "en", "clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("prediction")

pipeline = Pipeline(
        stages=[
            document_assembler,
            tokenizer,
            sequenceClassifier
                ])

sample_texts = [
               ["Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy."],
               ["Post-chemotherapy, the patient was under regular surveillance for osteosarcoma. Recent imaging showed no signs of local recurrence or distant metastasis. Whereas the recovery was challenging, current evaluation confirms patient is in remission."],
               ["The patient was diagnosed with stage II colon cancer and will be undergoing a treatment regimen that includes both chemotherapy and radiation therapy."],
               ["Thyroid nodules detected during routine examination; fine-needle aspiration was conducted. Cytology results indicated no malignancy, consistent with a benign thyroid adenoma. However, patient is advised for a follow-up ultrasound in 12 months to monitor nodule size."],
               ["The patient's persistent lymphadenopathy led to further tests, which confirmed a diagnosis of AIDS."],
               ["Female patient presented with pelvic discomfort. Ovarian cysts were found during ultrasound; however, CA-125 levels are within normal range, and repeat imaging has shown consistent cyst size. No features of ovarian cancer were present, and a follow-up is scheduled in six months."]
               ]

sample_data = spark.createDataFrame(sample_texts).toDF("text")

result = pipeline.fit(sample_data).transform(sample_data)

result.select("text", "prediction.result").show(truncate=100)

bert_sequence_classifier_sdoh_frailty_vulnerability download started this may take some time.
[OK!]
+----------------------------------------------------------------------------------------------------+-----------------------+
|                                                                                                text|                 result|
+----------------------------------------------------------------------------------------------------+-----------------------+
|Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatm...|[Frailty_Vulnerability]|
|Post-chemotherapy, the patient was under regular surveillance for osteosarcoma. Recent imaging sh...|        [No_Or_Unknown]|
|The patient was diagnosed with stage II colon cancer and will be undergoing a treatment regimen t...|[Frailty_Vulnerability]|
|Thyroid nodules detected during routine examination; fine-needle aspiration was conducted. Cytolo...|        [No_Or_Unknown]|
| The patie

### **`bert_sequence_classifier_sdoh_mental_health`**

In [5]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_mental_health", "en", "clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("prediction")

pipeline = Pipeline(stages=[
                            document_assembler,
                            tokenizer,
                            sequenceClassifier
                            ])

sample_texts= [ "John, a 45-year-old man, was diagnosed with bipolar disorder, a mental disorder characterized by alternating periods of elevated mood (mania) and depression. His treatment plan involved a combination of mood stabilizing medication and regular therapy sessions. With proper management and support, John learned to better understand and cope with his condition, leading to improved stability and overall well-being.",
                "Lisa, a 28-year-old woman, was diagnosed with generalized anxiety disorder (GAD), a mental disorder characterized by excessive worry and persistent anxiety.",
                "Mark, a 35-year-old man, sought medical help for symptoms of attention-deficit/hyperactivity disorder (ADHD), a neurodevelopmental disorder characterized by inattention, hyperactivity, and impulsivity. After a comprehensive evaluation, Mark was diagnosed with ADHD, and his healthcare provider recommended a multimodal treatment approach. ",
                "Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy.",
                "She reported occasional respiratory symptoms, such as wheezing and shortness of breath, but had no signs of a mental disorder. Her healthcare provider assessed her lung function, reviewed her medication regimen, and provided personalized asthma education. ",
                "During the appointment, her healthcare provider assessed her joint function, reviewed her medication regimen, and discussed the importance of adherence. They also discussed the benefits of regular exercise, maintaining a healthy weight, and using assistive devices when needed to support Anna's joint health. ",
            ]

sample_data = spark.createDataFrame(sample_texts, StringType()).toDF("text")

result = pipeline.fit(sample_data).transform(sample_data)

result.select("text", "prediction.result").show(truncate=100)

bert_sequence_classifier_sdoh_mental_health download started this may take some time.
[OK!]
+----------------------------------------------------------------------------------------------------+---------------------+
|                                                                                                text|               result|
+----------------------------------------------------------------------------------------------------+---------------------+
|John, a 45-year-old man, was diagnosed with bipolar disorder, a mental disorder characterized by ...|    [Mental_Disorder]|
|Lisa, a 28-year-old woman, was diagnosed with generalized anxiety disorder (GAD), a mental disord...|    [Mental_Disorder]|
|Mark, a 35-year-old man, sought medical help for symptoms of attention-deficit/hyperactivity diso...|    [Mental_Disorder]|
|Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatm...|[No_Or_Not_Mentioned]|
|She reported occasional respirat

### **`bert_sequence_classifier_sdoh_violence_abuse`**

In [6]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_sdoh_violence_abuse", "en", "clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("prediction")

pipeline = Pipeline(
        stages=[
            document_assembler,
            tokenizer,
            sequenceClassifier
            ])

sample_texts = [
                ["Repeated visits for fractures, with vague explanations suggesting potential family-related trauma."],
                ["Patient presents with multiple bruises in various stages of healing, suggestive of repeated physical abuse."],
                ["There are no reported instances or documented episodes indicating the patient poses a risk of violence."] ,
                ["Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy."]
                ]

sample_data = spark.createDataFrame(sample_texts).toDF("text")

result = pipeline.fit(sample_data).transform(sample_data)

result.select("text", "prediction.result").show(truncate=100)

bert_sequence_classifier_sdoh_violence_abuse download started this may take some time.
[OK!]
+----------------------------------------------------------------------------------------------------+-------------------------+
|                                                                                                text|                   result|
+----------------------------------------------------------------------------------------------------+-------------------------+
|  Repeated visits for fractures, with vague explanations suggesting potential family-related trauma.|[Domestic_Violence_Abuse]|
|Patient presents with multiple bruises in various stages of healing, suggestive of repeated physi...|[Personal_Violence_Abuse]|
|There are no reported instances or documented episodes indicating the patient poses a risk of vio...|      [No_Violence_Abuse]|
|Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatm...|                [Unknown]|
+---

# 📌 Generic Classifier

### **`genericclassifier_sdoh_tobacco_usage_sbiobert_cased_mli`**

In [None]:
text_list = ["Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 25 yo daughter. He uses alcohol and cigarettes",
             "The patient quit smoking approximately two years ago with an approximately a 40 pack year history, mostly cigar use. He also reports 'heavy alcohol use', quit 15 months ago.",
             "The patient denies any history of smoking or alcohol abuse. She lives with her one daughter.",
             "She was previously employed as a hairdresser, though says she hasnt worked in 4 years. Not reported by patient, but there is apparently a history of alochol abuse."
             ]

df = spark.createDataFrame(text_list, StringType()).toDF("text")

In [None]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", 'en','clinical/models')\
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")

features_asm = FeaturesAssembler()\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("features")

generic_classifier = GenericClassifierModel.pretrained("genericclassifier_sdoh_tobacco_usage_sbiobert_cased_mli", 'en', 'clinical/models')\
    .setInputCols(["features"])\
    .setOutputCol("class_")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_embeddings,
    features_asm,
    generic_classifier
])

results = pipeline.fit(df).transform(df)

sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
genericclassifier_sdoh_tobacco_usage_sbiobert_cased_mli download started this may take some time.
[OK!]


In [None]:
res = results.select(F.explode(F.arrays_zip(results.document.result,
                                             results.class_.result,
                                             results.class_.metadata)).alias("col"))\
               .select(F.expr("col['1']").alias("prediction"),
                       F.expr("col['2']['confidence']").alias("confidence"),
                       F.expr("col['0']").alias("sentence"))

res.show(truncate=150)

+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|prediction|confidence|                                                                                                                                              sentence|
+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|   Present|0.65745443|        Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 25 yo daughter. He uses alcohol and cigarettes|
|      Past|0.98161787|The patient quit smoking approximately two years ago with an approximately a 40 pack year history, mostly cigar use. He also reports 'heavy alcohol...|
|     Never| 0.9825732|                                                          The patient denies any history of smoking or

### **`genericclassifier_sdoh_alcohol_usage_sbiobert_cased_mli`**

In [None]:
text_list = ["Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 25 yo daughter. He uses alcohol and cigarettes",
             "The patient quit smoking approximately two years ago with an approximately a 40 pack year history, mostly cigar use. He also reports 'heavy alcohol use', quit 15 months ago.",
             "Employee in neuro departmentin at the Center Hospital 18. Widower since 2001. Current smoker since 20 years. No EtOH or illicits.",
             "Patient smoked 4 ppd x 37 years, quitting 22 years ago. He is widowed, lives alone, has three children."
             ]

df = spark.createDataFrame(text_list, StringType()).toDF("text")

In [None]:
generic_classifier = GenericClassifierModel.pretrained("genericclassifier_sdoh_alcohol_usage_sbiobert_cased_mli", 'en', 'clinical/models')\
    .setInputCols(["features"])\
    .setOutputCol("class_")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_embeddings,
    features_asm,
    generic_classifier
])

results = pipeline.fit(df).transform(df)


In [None]:
res = results.select(F.explode(F.arrays_zip(results.document.result,
                                             results.class_.result,
                                             results.class_.metadata)).alias("col"))\
               .select(F.expr("col['1']").alias("prediction"),
                       F.expr("col['2']['confidence']").alias("confidence"),
                       F.expr("col['0']").alias("sentence"))

res.show(truncate=150)

+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|prediction|confidence|                                                                                                                                              sentence|
+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|   Present|0.65745443|        Retired schoolteacher, now substitutes. Lives with wife in location 1439. Has a 27 yo son and a 25 yo daughter. He uses alcohol and cigarettes|
|      Past|0.98161787|The patient quit smoking approximately two years ago with an approximately a 40 pack year history, mostly cigar use. He also reports 'heavy alcohol...|
|     Never| 0.9825732|                                                          The patient denies any history of smoking or

### **`genericclassifier_sdoh_transportation_insecurity_sbiobert_cased_mli`**

In [None]:
text_list =[
"Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy. She is alone and can not drive a car or can not use public bus.",
"Emily, a 30-year-old woman, had been struggling to manage her chronic condition due to transportation problems. She often missed medical appointments and couldn't obtain her prescribed medications regularly. Recognizing the impact of transportation barriers on her health, her healthcare provider worked to find solutions. They helped her arrange rides through community programs and provided her with information about telehealth options. By addressing her transportation challenges, Emily's healthcare provider ensured that she could better manage her health and receive the necessary care.",
"""Patient John is a 60-year-old man who presents to a primary care clinic for a routine check-up. He reports feeling generally healthy, with no significant medical concerns. However, he reveals that he is a smoker and drinks alcohol on a regular basis. The patient also mentions that he has a history of working long hours and has limited time for physical activity and social interactions.



Based on this information, it appears that Patient John's overall health may be affected by several social determinants of health, including tobacco and alcohol use, lack of physical activity, and social isolation. To address these issues, the healthcare provider may recommend a comprehensive physical exam and develop a treatment plan that includes lifestyle modifications, such as smoking cessation and reduction of alcohol intake. Additionally, the patient may benefit from referrals to local organizations that provide resources for physical activity and social engagement. The healthcare provider may also recommend strategies to reduce work-related stress and promote work-life balance. By addressing these social determinants of health, healthcare providers can help promote Patient John's overall health and prevent future health problems.

Additionally, The distance to the hospital from the John's home is considerable, and there are no available train services.""",
"Ms. Klnum is a 69y/o lady with COPD, ulcerative colitis, diverticulosis s/p partial colectomy, and 2 PE's in the past on chronic Warfarin with IVC filter who presented to the ED for chest pain and was found to have PEs and pneumonia. Cough productive of yellow sputum and has been ongoing for weeks, no acute change.  She is divorced. Her 2 daughters died of drug use. She is close with her sister who is her HCP, as well as a brother. -Tobacco: Smoked ~2ppd from age 13 until 5 years ago. -EtOH: former heavy use, reports drinking two 6 packs per day for 2 yrs; quit 27 yrs ago. -Illicits: None. Family History: Daughter - colitis. Had 6 siblings. One sister died, 35, ovarian CA. Brother, died at 48, stroke.",
"Mark, a 35-year-old man, sought medical help for symptoms of attention-deficit/hyperactivity disorder (ADHD), a neurodevelopmental disorder characterized by inattention, hyperactivity, and impulsivity. After a comprehensive evaluation, Mark was diagnosed with ADHD, and his healthcare provider recommended a multimodal treatment approach. ",
"Michael, a 25-year-old man, sought medical advice for concerns related to stress and difficulty managing everyday pressures. After a thorough evaluation, it was determined that Michael did not meet the criteria for any specific mental disorder. However, his healthcare provider acknowledged the impact of stress on his well-being and recommended implementing stress management techniques. They discussed strategies such as regular exercise, relaxation techniques, and engaging in activities that bring him joy and relaxation. "
"John, a 45-year-old man, was diagnosed with bipolar disorder, a mental disorder characterized by alternating periods of elevated mood (mania) and depression. His treatment plan involved a combination of mood stabilizing medication and regular therapy sessions. With proper management and support, John learned to better understand and cope with his condition, leading to improved stability and overall well-being.",
],

df = spark.createDataFrame(text_list, StringType()).toDF("text")

In [None]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", 'en','clinical/models')\
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")

features_asm = FeaturesAssembler()\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("features")

generic_classifier = GenericClassifierModel.pretrained("genericclassifier_sdoh_transportation_insecurity_sbiobert_cased_mli", 'en', 'clinical/models')\
    .setInputCols(["features"])\
    .setOutputCol("class_")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_embeddings,
    features_asm,
    generic_classifier
])

In [None]:
results = pipeline.fit(df).transform(df)

In [None]:
res = results.select(F.explode(F.arrays_zip(results.document.result,
                                             results.class_.result,
                                             results.class_.metadata)).alias("col"))\
               .select(F.expr("col['1']").alias("prediction"),
                       F.expr("col['2']['confidence']").alias("confidence"),
                       F.expr("col['0']").alias("sentence"))

res.show(truncate=150)

+-------------------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|               prediction|confidence|                                                                                                                                              sentence|
+-------------------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+
|Transportation_Insecurity| 0.9798329|[Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and...|
+-------------------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------+



### **`genericclassifier_sdoh_food_insecurity_mpnet`**

In [None]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sent_embd = MPNetEmbeddings.pretrained("mpnet_embedding_all_mpnet_base_v2_by_sentence_transformers", 'en')\
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")\

features_asm = FeaturesAssembler()\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("features")

gen_clf = GenericClassifierModel.pretrained('genericclassifier_sdoh_food_insecurity_mpnet', 'en', 'clinical/models')\
    .setInputCols("features")\
    .setOutputCol("prediction")\

pipeline = Pipeline(stages=[document_assembler,
                            sent_embd,
                            features_asm,
                            gen_clf])



mpnet_embedding_all_mpnet_base_v2_by_sentence_transformers download started this may take some time.
Approximate size to download 390.6 MB
[OK!]
genericclassifier_sdoh_food_insecurity_mpnet download started this may take some time.
[OK!]


In [None]:
text_list = ["Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy. ",
             "She reported occasional respiratory symptoms, such as wheezing and shortness of breath, but had no signs of a mental disorder. Her healthcare provider assessed her lung function, reviewed her medication regimen, and provided personalized asthma education. ",
             "she has food stability",
             "he doesn't experience Economic uncertainty",
             "The individual is managing their mental health challenges.",
             "the patient has  food problems.",
             "she has food instability",

                "The patient a 35-year-old woman, visited her healthcare provider with concerns about her health. She bravely shared that she was facing food difficultie, which was affecting her ability to afford necessary medical care and prescriptions. The caring healthcare provider listened attentively and discussed various options. They helped Sarah explore low-cost alternatives for her medications and connected her with local resources that could assist with healthcare expenses. By addressing the food aspect, Sarah's healthcare provider ensured that she could receive the care she needed without further straining her finances. Leaving the appointment, Sarah felt relieved and grateful for the support in managing her health amidst her food challenges.",
             """Case Study: Comprehensive Health Assessment

Patient Information:
Age: 40 years
Gender: Female
Occupation: Part-time administrative assistant
Marital Status: Married, with two school-aged children

Presenting Complaint:
The patient presented to the primary care clinic with concerns about her overall health and well-being. She reported feeling overwhelmed and stressed due to food difficulties.

Medical History:
The patient has a history of well-managed asthma, which is controlled with an inhaler as needed. She reported occasional headaches, likely related to stress and tension.

Social Determinants of Health (SDOH):
One significant social determinant affecting the patient is food instability. The patient and her husband have experienced a recent reduction in household income due to her husband's job loss. This has resulted in difficulties in meeting basic needs, including housing, utilities, and groceries. The patient expressed concerns about her ability to afford necessary medical care and medications for both herself and her children.

Family Support:
The patient's husband is actively seeking employment, but the food strain has created additional stress for the family. They have limited support from extended family members, who are also facing their own food challenges.

Mental Health:
The patient reported feeling anxious and experiencing occasional bouts of sadness related to the food stressors. She expressed a desire to explore coping strategies and potentially seek counseling to help manage her emotional well-being.

Physical Examination Findings:
On physical examination, the patient appeared well-nourished but displayed signs of mild fatigue. Vital signs were within normal limits, and cardiovascular, respiratory, gastrointestinal, and musculoskeletal examinations revealed no abnormalities. Neurological examination findings were normal, with intact cranial nerves and normal motor and sensory functions.

Diagnostic Impression:
The patient's medical history and physical examination findings did not reveal any acute or chronic medical conditions. However, it was evident that her overall health was being significantly impacted by the current food difficulties.

Treatment Recommendations:
1. food Assistance: Provide information on local resources and assistance programs available to help individuals and families facing food hardships. This may include information on food assistance programs, housing support, and access to discounted healthcare services.

2. Coping Strategies and Counseling: Discuss and recommend strategies for coping with stress and anxiety related to food difficulties. Provide information on local counseling services or support groups that can help the patient manage her emotional well-being.

3. Asthma Management: Review the patient's asthma action plan and ensure she has an adequate supply of inhalers. Discuss any concerns or questions she may have regarding her asthma management.

4. Follow-Up: Schedule regular follow-up appointments to monitor the patient's progress, provide ongoing support, and address any emerging challenges related to her health and food circumstances."""]



In [None]:
df = spark.createDataFrame(text_list, StringType()).toDF("text")

result = pipeline.fit(df).transform(df)

result.select("text", "prediction.result").show(truncate=100)

+----------------------------------------------------------------------------------------------------+-------------------------------+
|                                                                                                text|                         result|
+----------------------------------------------------------------------------------------------------+-------------------------------+
|Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatm...|[No_Food_Insecurity_Or_Unknown]|
|She reported occasional respiratory symptoms, such as wheezing and shortness of breath, but had n...|[No_Food_Insecurity_Or_Unknown]|
|                                                                              she has food stability|[No_Food_Insecurity_Or_Unknown]|
|                                                          he doesn't experience Economic uncertainty|[No_Food_Insecurity_Or_Unknown]|
|                                          The individu