![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/SOCIAL_DETERMINANT_NER.ipynb)

# **Social Determinants of Health-NER**




📌To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload `license_keys.json` to the folder that opens.
Otherwise, you can look at the example outputs at the bottom of the notebook.

# **Colab Setup**

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, medical, visual

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.install()

In [None]:
from johnsnowlabs import nlp, medical, visual
import pandas as pd

# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

In [None]:
from pyspark.sql import DataFrame
import pyspark.sql.functions as F
import pyspark.sql.types as T
import pyspark.sql as SQL
from pyspark import keyword_only
from pyspark.sql.types import StringType

# 🔎 MODELS

|Model Name|Description|Predicted Entities|
|-|-|-|
|ner_sdoh       |extract terminology related to Social Determinants of Health| Other_SDoH_Keywords, Education, Population_Group, Quality_Of_Life, Housing, Substance_Frequency, Smoking, Eating_Disorder, Obesity, Healthcare_Institution, Financial_Status, Age, Chidhood_Event, Exercise, Communicable_Disease, Hypertension, Other_Disease, Violence_Or_Abuse, <div style="width:260px"></div> Spiritual_Beliefs, Employment, Social_Exclusion, Access_To_Care, Marital_Status, Diet, Social_Support, Disability, Mental_Health, Alcohol, Insurance_Status, Substance_Quantity, Hyperlipidemia, Family_Member, Legal_Issues, Race_Ethnicity, Gender, Geographic_Entity, Sexual_Orientation, Transportation, Sexual_Activity, Language, Substance_Use|
|ner_sdoh_mentions |intended for detecting Social Determinants of Health|sdoh_community, sdoh_economics, sdoh_education, sdoh_environment, behavior_tobacco, behavior_alcohol, behavior_drug|
|ner_sdoh_slim_wip  |extracts terminology related to Social Determinants of Health|Housing, Smoking, Substance_Frequency, Childhood_Development, Age, Other_Disease, Employment, Marital_Status, Diet, Disability, Mental_Health, Alcohol, Substance_Quantity, Family_Member, Race_Ethnicity, Gender, Geographic_Entity, Sexual_Orientation, Substance_Use|
|ner_sdoh_income_social_status |extract entities associated with income and social status|Education, Employment, Financial_Status, Income, Marital_Status, Population_Group|
|ner_sdoh_demographics    |extract entities associated with different demographic factors|Age, Family_Member, Gender, Geographic_Entity, Language, Race_Ethnicity, Spiritual_Beliefs|
|ner_sdoh_social_environment      |extract entities associated with different aspects of the social environment|Chidhood_Event, Legal_Issues, Social_Exclusion, Social_Support, Violence_Or_Abuse|
|ner_sdoh_access_to_healthcare     |extract entities related to access to healthcare|Access_To_Care, Healthcare_Institution, Insurance_Status|
|ner_sdoh_health_behaviours_problems      |extract entities associated with health behaviors and problems|Communicable_Disease, Diet, Disability, Eating_Disorder, Exercise, Hyperlipidemia, Hypertension, Mental_Health, Obesity, Other_Disease, Quality_Of_Life, Sexual_Activity|
|ner_sdoh_substance_usage     |extract entities associated with substance usage|Alcohol, Smoking, Substance_Duration, Substance_Frequency, Substance_Quantity, Substance_Use|
|ner_sdoh_community_condition     |identify and extract entities associated with different community conditions|Community_Safety, Environmental_Condition, Food_Insecurity, Housing, Transportation|


<br>




**🔎You can find all these models and more [NLP Models Hub](https://nlp.johnsnowlabs.com/models?task=Named+Entity+Recognition&edition=Spark+NLP+for+Healthcare)**

## **`ner_sdoh`**

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

clinical_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner_model = medical.NerModel.pretrained("ner_sdoh", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = medical.NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])



sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_sdoh download started this may take some time.
[OK!]


In [None]:
sample_texts = ["Smith is a 55 years old, divorced Mexcian American woman with financial problems. She speaks spanish. She lives in an apartment. She has been struggling with diabetes for the past 10 years and has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and does not have access to health insurance or paid sick leave. She has a son student at college. Pt with likely long-standing depression. She is aware she needs rehab. Pt reprots having her catholic faith as a means of support as well.  She has long history of etoh abuse, beginning in her teens. She reports she has been drinker for 30 years, most recently drinking beer daily. She smokes a pack of cigarettes a day. She had DUI back in April and was due to be in court this week.",
                "The patient is a 42-year-old female who presented to the healthcare institution with complaints of hypertension and hyperlipidemia. The patient reported experiencing childhood trauma related to domestic violence and has struggled with mental health issues as a result. The patient also disclosed a history of substance use, smoking, and alcohol consumption, which have contributed to her health issues. The patient is currently unemployed and facing financial difficulties, which have impacted her access to care and quality of life. Additionally, the  patient expressed experiencing social exclusion due to her sexual orientation and has limited social support. The patient has an unstable housing situation, and she expressed concerns about being able to maintain adequate housing in the future. The patient's healthcare provider recommended an exercise regimen and dietary changes to help manage her health issues, but the patient expressed difficulty in accessing healthy food options due to limited transportation and financial resources.",
                "The patient reported a history of substance use, specifically alcohol and marijuana, which began during their college years. They also disclosed a history of childhood trauma related to emotional abuse by a family member. The patient is currently experiencing financial difficulties and is unemployed, which has caused significant pressure and impacted their access to healthcare services. Additionally, the patient has been diagnosed with hypertension and hyperlipidemia, and struggles with maintaining a healthy diet due to limited access to healthy food options and a lack of social support. The patient has no current legal issues and identifies as bisexual. They report limited transportation options and reside in a geographic area with limited access to healthcare institutions. The patient speaks Spanish and experiences language barriers, making it challenging to communicate with healthcare providers.",
                "During a routine check-up, a patient disclosed that they had experienced childhood trauma, including  physical abuse and emotional abuse by a family member. They also reported having financial difficulties and limited access to healthcare due to their low income status. Additionally, the patient disclosed that they were a member of a minority population group and had faced discrimination and social exclusion as a result. They expressed concerns about their mental health, specifically feeling depressed and anxious, and reported using alcohol as a coping mechanism. The patient expressed interest in seeking support and resources to improve their mental health and overall well-being",
                "The patient reported experiencing symptoms of anxiety and depression, which have been affecting their quality of life. The patient disclosed that they had recently lost their teacher job and were facing financial difficulties.The patient reported a history of childhood trauma related to violence and abuse in their household, which has contributed to their current mental health struggles. The patient's family history is significant for a first-degree relative with a history of alcohol abuse. The patient also reported a history of smoking, but had recently quit and was interested in receiving resources for smoking cessation. The patient's medical history is notable for hypertension, which is currently well-controlled with medication. The patient denied any recent substance use or sexual activity, and reported being monogamous in their relationship with their fionce. The patient is an immigrant and speaks English as a second language. They reported difficulty accessing healthcare due to lack of transportation and insurance status."]


In [None]:
df = spark.createDataFrame(sample_texts, StringType()).toDF("text")

In [None]:
result = ner_pipeline.fit(df).transform(df)

In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+------------------+-------------------+
|chunk             |ner_label          |
+------------------+-------------------+
|55 years old      |Age                |
|divorced          |Marital_Status     |
|Mexcian American  |Race_Ethnicity     |
|woman             |Gender             |
|financial problems|Financial_Status   |
|She               |Gender             |
|spanish           |Language           |
|She               |Gender             |
|apartment         |Housing            |
|She               |Gender             |
|diabetes          |Other_Disease      |
|hospitalizations  |Other_SDoH_Keywords|
|cleaning assistant|Employment         |
|health insurance  |Insurance_Status   |
|She               |Gender             |
|son               |Family_Member      |
|student           |Education          |
|college           |Education          |
|depression        |Mental_Health      |
|She               |Gender             |
|she               |Gender             |
|rehab          

In [None]:
visualizer = nlp.viz.NerVisualizer()

for i in range(len(sample_texts)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document'
    )
    print("\n"*2)



























## **`ner_sdoh_mentions`**

In [None]:
ner_model = medical.NerModel.pretrained("ner_sdoh_mentions", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])



ner_sdoh_mentions download started this may take some time.
[OK!]


In [None]:
text_list = [
    """The patient is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years.""",
    """Cooperative gentleman with a long standing history (20 years) diverticulitis. He has been having flares of diverticulitis. The pain is nonradiating, has no provoking factors but is alleviated with narcotics. Social History: He is history teacher. He is divorced and lives at home with his girlfriend. He does not currently and never has used tobacco or illicit drugs. Until 3 weeks ago, he was having 1-19 drinks per day. Currently he uses no alcohol at all. Family History: noncontributory, no history of colon cancers or IBD."""]

In [None]:
df = spark.createDataFrame(text_list, StringType()).toDF("text")

In [None]:
result = ner_pipeline.fit(df).transform(df)

In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+----------------+----------------+
|chunk           |ner_label       |
+----------------+----------------+
|married         |sdoh_community  |
|children        |sdoh_community  |
|works           |sdoh_economics  |
|alcohol         |behavior_alcohol|
|intravenous drug|behavior_drug   |
|smoking         |behavior_tobacco|
|narcotics       |behavior_drug   |
|teacher         |sdoh_economics  |
|divorced        |sdoh_community  |
|home            |sdoh_environment|
|girlfriend      |sdoh_community  |
|tobacco         |behavior_tobacco|
|illicit drugs   |behavior_drug   |
|drinks          |behavior_alcohol|
|alcohol         |behavior_alcohol|
+----------------+----------------+



In [None]:
for i in range(len(text_list)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document')
    print("\n"*2)













## **`ner_sdoh_slim_wip`**

In [None]:
text_list = [
        """ Mother states that he does smoke, there is a family hx of alcohol on both maternal and paternal sides of the family, maternal grandfather who died of alcohol related complications and paternal grandmother with alcoholism. Pts own drinking began at age 16, living in LA, had a DUI at 17yo after totaling a new car that his mother bought for him, he was married. """,
        """Husband presented as anxious , while friend took notes about pt s condition and names of providers, etc.  Husb reports both he and pt had been drinking on Saturday night, and he left her sitting up in a chair.  In the morning he found her bleeding from the mouth, and it became apparent she had overdosed, and left a suicide note.  Husband and friend report pt has hx of suicide attempts, most recently in of this year.  She also has hx of EtOH abuse, has been to detox and treatment programs several times over recent years, and lived in residential sober houses, until recently.  Husband reports pt was a pedestrian struck by motor vehicle at 12yo , sustained head injury. He reports pt had been diagnosed bipolar disorder. """
]

In [None]:
df = spark.createDataFrame(text_list, StringType()).toDF("text")

In [None]:
ner_model = medical.NerModel.pretrained("ner_sdoh_slim_wip", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])


ner_sdoh_slim_wip download started this may take some time.
[OK!]


In [None]:
result = ner_pipeline.fit(df).transform(df)

In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+-----------+-----------------+
|chunk      |ner_label        |
+-----------+-----------------+
|Mother     |Family_Member    |
|he         |Gender           |
|smoke      |Smoking          |
|alcohol    |Alcohol          |
|maternal   |Family_Member    |
|paternal   |Family_Member    |
|maternal   |Family_Member    |
|grandfather|Family_Member    |
|alcohol    |Alcohol          |
|paternal   |Family_Member    |
|grandmother|Family_Member    |
|alcoholism |Alcohol          |
|drinking   |Alcohol          |
|age 16     |Age              |
|LA         |Geographic_Entity|
|17yo       |Age              |
|his        |Gender           |
|mother     |Family_Member    |
|him        |Gender           |
|he         |Gender           |
|married    |Marital_Status   |
|Husband    |Family_Member    |
|anxious    |Mental_Health    |
|Husb       |Family_Member    |
|he         |Gender           |
|drinking   |Alcohol          |
|he         |Gender           |
|her        |Gender           |
|he     

In [None]:
for i in range(len(text_list)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document')
    print("\n"*2)













## **`ner_sdoh_income_social_status`**

In [None]:
text_list = ["Mr. Chen is a 35-year-old immigrant who presents to the emergency department with complaints of abdominal pain and nausea. He reports a history of gastritis and diverticulitis, which have been managed with medication in the past. However, he recently lost his job as a plumber and has been experiencing financial difficulties, which have made it difficult for him to afford his medication and maintain his health.During his visit, Mr. Chen disclosed that he has been divorced for several years and has been struggling to support himself while pursuing a college degree. He reports that the stress of his financial situation and educational demands has taken a toll on his mental health, and he has been experiencing anxiety and depression.The healthcare team conducted a comprehensive assessment of Mr. Chen's social determinants of health and identified several potential barriers to his healthcare access and management of his chronic conditions. They found that his financial difficulties and lack of stable employment have made it difficult for him to afford and access healthcare services.",
                "The patient reported experiencing significant financial difficulties, which have been linked to increased risk for mental health issues such as anxiety and depression. Additionally, the patient disclosed that they were divorced and working as a plumber. The patient expressed concern about being able to provide for their children as a single parent with limited income. As an immigrant and college student, the patient faces additional challenges in terms of finding stable employment and accessing resources for financial assistance.",
                "The patient is a 50-year-old female who identifies as African American and primarily speaks Spanish. She comes from a low-income family and resides in a densely populated urban area. She reports feeling socially isolated due to language barriers and struggles to find employment despite having a college degree. Additionally, the patient reports experiencing childhood trauma related to her parent's divorce and subsequent financial difficulties. She notes that her spiritual beliefs and family support have been instrumental in her coping with these stressors. The patient is currently uninsured and expresses concerns about accessing affordable healthcare. She denies any current substance use or smoking history. The patient is alert and oriented with no acute distress on examination.",
                "Mrs. Smith, a 45-year-old immigrant woman, presented to the healthcare institution with symptoms of hypertension and hyperlipidemia. She reported experiencing childhood trauma and emotional abuse from her primary caregiver. Her financial status is precarious, and she struggles to make ends meet as a divorced single mother of two young children. She works as a plumber and does not have health insurance. Mrs. Smith lives in a crowded apartment complex in a high-crime neighborhood, which has led to social exclusion and a lack of social support. She reports smoking and drinking alcohol frequently as a way of coping with stress, and also struggles with obesity and a poor diet due to limited access to healthy foods in her neighborhood. Mrs. Smith's healthcare provider discussed the importance of regular exercise and a healthy diet, as well as the potential benefits of therapy to address her childhood trauma and emotional distress. The provider also referred Mrs. Smith to resources for financial assistance, transportation, and language services, as well as programs for substance abuse and smoking cessation.",
                "A 45-year-old divorced male with financial difficulties presented to the healthcare institution complaining of hypertension and hyperlipidemia. He reported experiencing childhood trauma, including emotional abuse from his primary caregiver. The patient is an immigrant and speaks English as a second language. He works as a plumber and has a history of smoking and alcohol use. He reported experiencing social exclusion due to his race and ethnic background. The patient's quality of life has been impacted by his chronic conditions and financial stress."]


In [None]:
df = spark.createDataFrame(text_list, StringType()).toDF("text")

In [None]:
ner_model = medical.NerModel.pretrained("ner_sdoh_income_social_status", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])



result = ner_pipeline.fit(df).transform(df)

ner_sdoh_income_social_status download started this may take some time.
[OK!]


In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+----------------------+----------------+
|chunk                 |ner_label       |
+----------------------+----------------+
|immigrant             |Population_Group|
|plumber               |Employment      |
|financial difficulties|Financial_Status|
|divorced              |Marital_Status  |
|college degree        |Education       |
|financial situation   |Financial_Status|
|financial difficulties|Financial_Status|
|stable employment     |Employment      |
|financial difficulties|Financial_Status|
|divorced              |Marital_Status  |
|working as a plumber  |Employment      |
|single                |Marital_Status  |
|limited income        |Income          |
|immigrant             |Population_Group|
|college               |Education       |
|student               |Education       |
|stable employment     |Employment      |
|financial assistance  |Financial_Status|
|low-income            |Income          |
|employment            |Employment      |
|college degree        |Education 

In [None]:
for i in range(len(text_list)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document')
    print("\n"*2)



























## **`ner_sdoh_demographics`**

In [None]:
text_list = ["During a medical evaluation, a healthcare provider asked a 40 year old Hispanic woman about her spiritual beliefs and speaking-language. The patient indicated that she is Catholic and finds comfort in her faith during times of stress and illness, and she primarily speaks English but also speaks Spanish at home with her family. Understanding these aspects of the patient's background and culture can help the provider deliver culturally sensitive and patient-centered care.",
                "A 61 year old Caucasian man was admitted to a hospital in Korea with respiratory distress. He was accompanied by his adult children, who expressed concern about their father's condition. The patient, a devout Catholic, spoke English as his primary Language.",
                "A pathology report for a patient with breast cancer indicated that she was a 45 year old African American woman with a family history of breast cancer. When the healthcare provider discussed the patient's medical history with her, she mentioned that her mother and aunt had both been diagnosed with breast cancer in their 50s. The provider also inquired about the patient's spiritual beliefs and speaking-language. The patient shared that she was raised Catholic but no longer practices the faith, and that English is her primary language. The patient expressed concern about the cost of treatment, as she lived in a low-income neighborhood and struggled to afford healthcare.",
                "A radiology report for a patient with suspected lung cancer indicated that he was a 60 year old male who had worked in a factory for over 30 years. The patient is Caucasian and has a history of smoking. The healthcare provider also inquired about the patient's family history of cancer, and the patient indicated that his father had died from lung cancer. The provider recommended that the patient speak with his family members about their medical histories. Additionally, the provider asked about the patient's language preference, and the patient indicated that he primarily speaks English but also speaks Spanish with his wife, who is of Hispanic descent.",
                "A 38 year old woman from a Hispanic background presented to the emergency department with chest pain and shortness of breath. She spoke English as her primary Language and identified as Catholic. The patient reported no significant past medical history or medication use."]


In [None]:
df = spark.createDataFrame(text_list, StringType()).toDF("text")

In [None]:
ner_model = medical.NerModel.pretrained("ner_sdoh_demographics", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])


result = ner_pipeline.fit(df).transform(df)

ner_sdoh_demographics download started this may take some time.
[OK!]


In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+-----------------+-----------------+
|chunk            |ner_label        |
+-----------------+-----------------+
|40 year old      |Age              |
|Hispanic         |Race_Ethnicity   |
|woman            |Gender           |
|her              |Gender           |
|spiritual beliefs|Spiritual_Beliefs|
|speaking-language|Language         |
|she              |Gender           |
|Catholic         |Spiritual_Beliefs|
|her              |Gender           |
|faith            |Spiritual_Beliefs|
|she              |Gender           |
|English          |Language         |
|Spanish          |Language         |
|her              |Gender           |
|61 year old      |Age              |
|Caucasian        |Race_Ethnicity   |
|man              |Gender           |
|Korea            |Geographic_Entity|
|He               |Gender           |
|his              |Gender           |
|children         |Family_Member    |
|father's         |Family_Member    |
|Catholic         |Spiritual_Beliefs|
|English    

In [None]:
for i in range(len(text_list)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document')
    print("\n"*2)


























## **`ner_sdoh_social_environment`**





In [None]:
text_list = ["Medical history: Jane was born in a low - income household and experienced significant trauma during her childhood, including physical and emotional abuse. Events such as social exclusion, violence, and abuse can have significant and lasting impacts on a person's physical and mental health. Additionally, patients who have experienced childhood trauma may struggle to access or maintain social support systems, which can exacerbate the negative effects of these events.",
             "During the patient's intake interview, it was noted that she had experienced childhood trauma, which can have long-term effects on a person's mental and physical health. The patient also disclosed that she is currently in an abusive relationship and that her partner is her primary caregiver. Additionally, the patient reported feeling ostracized from her community and the provider referred her to resources for social inclusion. Finally, the provider screened the patient for a history of incarceration and domestic violence, both of which can impact a patient's health, and made appropriate referrals.",
             "Mrs. Smith is a 60-year-old woman who presents with symptoms of anxiety and depression. During her intake interview, she disclosed a history of childhood trauma, including experiences of social exclusion from her primary caregiver. Mrs. Smith reported feeling unsupported by her family and community during this difficult time, leading to long-term psychological distress. Additionally, she shared that she had recently experienced physical violence at the hands of her partner and was concerned for her safety.The healthcare team worked to address these social determinants of health by connecting Mrs. Smith with resources for domestic violence support. They also provided education on the importance of social support networks and connected Mrs. Smith with local community organizations that offer support groups for survivors of domestic violence. By addressing these social factors and providing targeted support, the healthcare team was able to help Mrs. Smith improve her mental health and safety.",
             "Individuals who have been incarcerated, as the patient has, face unique health challenges related to social exclusion and a lack of social support systems. These can include higher rates of infectious disease, mental health disorders, and chronic medical conditions. It is essential that healthcare providers offer comprehensive care to address the physical and psychological impacts of incarceration, including any history of violence or abuse.",
             "Mr. Johnson is a 35-year-old man.During his evaluation, he disclosed a history of childhood trauma, including experiences of emotional abuse from his primary caregiver. Mr. Johnson reported feeling unsupported.He also shared that he had been imprisoned for a non-violent offense and was struggling with the social exclusion and stigma that often come with a criminal record.The healthcare team worked to address these social determinants of health by connecting Mr. Johnson with resources for trauma-informed therapy and counseling for re-entry into society after incarceration. They provided education on the importance of social support networks and connected Mr. Johnson with local organizations that offer support groups."]


In [None]:
df = spark.createDataFrame(text_list, StringType()).toDF("text")

In [None]:
ner_model = medical.NerModel.pretrained("ner_sdoh_social_environment", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])


result = ner_pipeline.fit(df).transform(df)

ner_sdoh_social_environment download started this may take some time.
[OK!]


In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+---------------------------+-----------------+
|chunk                      |ner_label        |
+---------------------------+-----------------+
|trauma during her childhood|Chidhood_Event   |
|emotional abuse            |Violence_Or_Abuse|
|social exclusion           |Social_Exclusion |
|violence                   |Violence_Or_Abuse|
|abuse                      |Violence_Or_Abuse|
|childhood trauma           |Chidhood_Event   |
|social support             |Social_Support   |
|childhood trauma           |Chidhood_Event   |
|abusive                    |Violence_Or_Abuse|
|caregiver                  |Social_Support   |
|ostracized                 |Social_Exclusion |
|incarceration              |Legal_Issues     |
|domestic violence          |Violence_Or_Abuse|
|childhood trauma           |Chidhood_Event   |
|social exclusion           |Social_Exclusion |
|primary caregiver          |Social_Support   |
|unsupported                |Social_Support   |
|physical violence          |Violence_Or

In [None]:
for i in range(len(text_list)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document')
    print("\n"*2)




























## **`ner_sdoh_access_to_healthcare`**

In [None]:
sample_texts = ["During the assessment, it was discovered that the individual was in a rehab program, which may impact their ability to access certain healthcare services. Additionally, the individual reported that they were working on communicating with their insurance company, indicating that they may be facing financial or other barriers to accessing care. Furthermore, the healthcare team learned that the Health Department and Fire Department were working to intervene on a matter related to the individual's health or living situation. Lastly, Elder Protective Services reported that the patient's wife had been reported to Protective Services in the past, raising concerns about potential abuse or neglect.",
                "Jane requested SW  insurance questions and lodging for his wife. Spoke with her by phone and informed her that she will have to assist with Mass Health reinstatement. However, he also speaks of his anxiety as he works on communicating with Jane s insurance company and Social Security to try to learn about his benefits and coverage for rehabs and possibly help Jane apply for SSDI. He states he is encountering roadblocks as the insurance company and  Social Security are not willing to give information to the partner without Jane s consent or without the partner having He states he would rather use his short and long-term disability plans through work than apply for Social Security. Discussed the possibility of assigning POA to someone, i . e. partner, so that he could more easily advocate for Jane around insurance issues and other financial matters. She reports that the Health Dept and Fire Dept are working to intervene in this matter. At this time, SW has not received first-party information sufficient to make a referral to Springwell Elder Protective Services, but SW did request Ms .  to make the report since they witnessed Jane s apartment firsthand and expressed initial concerns. Plan/Follow up: Continuing issues to be addressed: Please page SW when Jane is to be extubated in order for SW to assess the mental status and gather further information re: Jane s living situation, contacts w / other community and/or social service agencies and the possible need for psychiatric care.",
                "During the assessment, it was discovered that the individual resided at a nursing home, and their parent admitted that their child was in a large daycare. The individual reported that they are currently on social security disability, which may impact their ability to access certain healthcare services. Additionally, the individual reported that they are divorced, live alone, and are a retired laboratory technician from Cornell Diagnostic Laboratory. These details provide important context for the individual's health and living situation and will be taken into consideration when developing a care plan to meet their healthcare needs. The healthcare team will work collaboratively with the individual and their family to ensure that they receive the appropriate care and support to optimize their health and well-being",
                "After undergoing surgery at OSH, the resident suggested that a family meeting be held the next day to discuss the results of the MRI and to readdress the patient's goals of care based on those findings. This meeting would be an opportunity for the family to receive important updates and to ask any questions they may have regarding the patient's condition. Additionally, the family requested assistance with re-instating the patient's MA health coverage to help with the expenses associated with the surgery. This would alleviate some of the financial burden and allow the patient to focus on their recovery. Overall, it is important to keep the family informed and involved in the patient's care, while also addressing any practical concerns that may arise.",
                "The patient's chronic health conditions may be linked to various social determinants of health, such as poverty and limited access to resources. These factors may have impeded the patient's ability to receive regular check-ups and specialist consultations due to a lack of insurance. Sadly, this is not an uncommon situation. As reported by the United Nations High Commissioner for Refugees (UNHCR), there are currently over 80 million forcibly displaced people worldwide, including more than 26 million refugees. These individuals often face significant barriers in accessing healthcare, including a lack of resources and limited access to affordable insurance or healthcare facilities. It is essential to address these social determinants of health to ensure that everyone has the opportunity to live a healthy life, regardless of their circumstances."]


In [None]:
df = spark.createDataFrame(sample_texts, StringType()).toDF("text")

In [None]:
ner_model = medical.NerModel.pretrained("ner_sdoh_access_to_healthcare", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])


result = ner_pipeline.fit(df).transform(df)

ner_sdoh_access_to_healthcare download started this may take some time.
[OK!]


In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+---------------------------------------------+----------------------+
|chunk                                        |ner_label             |
+---------------------------------------------+----------------------+
|rehab program                                |Access_To_Care        |
|access certain healthcare services           |Access_To_Care        |
|insurance company                            |Insurance_Status      |
|accessing care                               |Access_To_Care        |
|Elder Protective Services                    |Healthcare_Institution|
|Protective Services                          |Access_To_Care        |
|insurance                                    |Insurance_Status      |
|Mass Health                                  |Insurance_Status      |
|insurance company                            |Insurance_Status      |
|Social Security                              |Insurance_Status      |
|rehabs                                       |Access_To_Care        |
|insur

In [None]:
for i in range(len(sample_texts)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document')
    print("\n"*2)


























## **`ner_sdoh_health_behaviours_problems`**

In [None]:
sample_texts = ["Maintaining a healthy lifestyle is important for preventing a variety of health conditions. Proper diet and exercise can help prevent obesity, hyperlipidemia, and hypertension. However, many people struggle with maintaining healthy habits and may experience mental health issues or eating disorders as a result. It's important to seek help if you're experiencing any of these issues, as they can have a significant impact on your quality of life. Sexual activity is also an important aspect of health, and it's important to practice safe sex to prevent the spread of communicable diseases. Some people may also have disabilities that affect their ability to exercise or maintain a healthy diet and may require additional support to maintain their health.",
               "The patient is a 37 - year - old female who presents with a history of chronic pain, hyperlipidemia, history of eating disorder, and fibromyalgia. She has been receiving treatment for her chronic pain for the past 5 years and reports that her symptoms have improved but have not gone away completely. She takes medication to manage her pain and has tried different physical therapy and exercise programs, but she continues to experience pain daily. As a result, she has a history of poor physical health, including a history of obesity and a sedentary lifestyle. She's been struggling with chronic pain for quite some time, and she's tried various methods to alleviate it, but it's still present. The patient's medical history and social determinants of health, including her long-standing issues with a diet lacking essential nutrients and her limited access to healthcare, appear to be contributing factors to her ongoing chronic pain and fibromyalgia symptoms.",
               "Hyperlipidemia is a condition characterized by an abnormal amount of lipids, such as cholesterol and triglycerides, in the blood. It is a major risk factor for heart disease and stroke, and can also contribute to the development of peripheral artery disease and other health problems. Inactivity and a sedentary lifestyle can contribute to the development of hyperlipidemia by increasing the risk of obesity and metabolic disorders. A lack of physical activity can also lead to a decrease in cardiovascular fitness, which is an important factor in maintaining healthy lipid levels. Sexual inactivity can also be an issue, but it isn't directly related to the Hyperlipidemia condition. To help manage hyperlipidemia, Mark should focus on making lifestyle changes that can help lower his lipid levels and reduce his risk of heart disease. This might include: Eating a healthy diet that is low in saturated fat and cholesterol. Losing weight, if he is overweight or obese.",
               "Ahmed has a history of diabetes, hypertension, and high cholesterol. Additionally, Ahmed reported that he is not sexually active. Diet and Exercise History: Ahmed reported that his diet consists mostly of traditional Pakistani foods, which include a lot of grains, vegetables, and meat. He reported that he is not physically active due to his job, which requires him to sit for long hours. Treatment and Follow-up Care: Given Ahmed's history of diabetes, hypertension, and high cholesterol, as well as his social determinants, it is crucial to provide him with appropriate follow-up care to ensure that his conditions are well-managed. com continue the file to develop a healthy diet plan that incorporates traditional Pakistani foods and addresses his diabetes and high cholesterol. Additionally, Ahmed should be encouraged to develop a healthy diet plan that incorporates traditional Pakistani foods and addresses his diabetes and high cholesterol. Regular check-ups with a mental health professional should also be scheduled to address the stress and anxiety he is experiencing due to his financial issues.",
               "Ryan Smith, a 24-year-old male, reported to the hospital with complaints of chest pain and shortness of breath. With a history of asthma, anxiety, and PTSD, he also suffers from allergies to dust, pollen, and shellfish and takes medication for his conditions. Despite enjoying going to the gym and jogging in his free time,, Ryan's symptoms have been affecting his daily activities and work. In the assessment and plan, Ryan was prescribed medication for his high blood pressure, referred to a cardiologist, and scheduled for a follow-up appointment in six months. He reported being sexually active with one partner and requested information on contraception options. Additionally, Ryan disclosed a history of orthorexia nervosa, and an obsession with healthy eating, and was referred to a therapist and dietitian for treatment and follow-up care. His medical history and conditions will be considered during his evaluation and treatment in the hospital."]


In [None]:
df = spark.createDataFrame(sample_texts, StringType()).toDF("text")

In [None]:
ner_model = medical.NerModel.pretrained("ner_sdoh_health_behaviours_problems", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])


result = ner_pipeline.fit(df).transform(df)

ner_sdoh_health_behaviours_problems download started this may take some time.
[OK!]


In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+--------------------------------+--------------------+
|chunk                           |ner_label           |
+--------------------------------+--------------------+
|healthy lifestyle               |Quality_Of_Life     |
|diet                            |Diet                |
|exercise                        |Exercise            |
|obesity                         |Obesity             |
|hyperlipidemia                  |Hyperlipidemia      |
|hypertension                    |Hypertension        |
|mental health issues            |Mental_Health       |
|eating disorders                |Eating_Disorder     |
|quality of life                 |Quality_Of_Life     |
|Sexual activity                 |Sexual_Activity     |
|sex                             |Sexual_Activity     |
|communicable diseases           |Communicable_Disease|
|disabilities                    |Disability          |
|exercise                        |Exercise            |
|healthy diet                    |Diet          

In [None]:
for i in range(len(sample_texts)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document')
    print("\n"*2)


























## **`ner_sdoh_substance_usage`**

In [None]:
sample_texts = ["The patient has a history of consuming alcohol on a regular basis. They reported drinking alcoholic beverages occasionally, with an average intake of 5 to 6 drinks per month. This alcohol consuming pattern has persisted for the past 28 years. It is important to assess the potential health risks associated with this level of alcohol consumption, such as liver damage, increased risk of certain cancers, and cardiovascular problems. The patient should be advised on responsible drinking habits and educated about the potential benefits of reducing alcohol intake.",
             "During the consultation, the patient revealed a history of smoking tobacco products. They reported smoking approximately 10 to 15 cigarettes per day for the past 15 years. This significant smoking habit puts the patient at an elevated risk of developing respiratory problems, cardiovascular diseases, and various types of cancer. It is imperative to emphasize the importance of smoking cessation and offer appropriate support, such as counseling, replacement therapy of the smoking nicotine, or prescription medications, to assist the patient in quitting smoking.",
             "The patient disclosed a habit of excessive caffeine consumption. They reported consuming too much coffee and energy beverage daily, amounting to around 500 to 600 milligrams of caffeine per day. It is important to address the potential health consequences of high caffeine intake, such as increased heart rate, insomnia, and digestive issues. The patient should be advised to gradually reduce their caffeine intake and encouraged to explore healthier alternatives, such as herbal teas or decaffeinated beverages.",
             "The patient admitted to occasional marijuana use. They reported using marijuana and other cannabis products approximately once every two to three months. It is important to discuss the potential dangers associated with marijuana use, including addiction, mental health issues, and adverse effects on cognitive function. The patient should be provided with accurate information about the risks involved and offered resources for substance abuse counseling and treatment if needed.",
             "Patient: I consume alcoholic beverages occasionally, particularly wine and cocktails.Doctor: How frequently do you drink alcohol?Patient: Around 5 to 6 times per month.Doctor: It's important to be aware of the potential health risks associated with alcohol consumption, such as liver damage, addiction, and increased vulnerability to accidents. I recommend practicing responsible drinking habits and considering alternative non-alcoholic options."]


In [None]:
df = spark.createDataFrame(sample_texts, StringType()).toDF("text")

In [None]:
ner_model = medical.NerModel.pretrained("ner_sdoh_substance_usage", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])


result = ner_pipeline.fit(df).transform(df)

ner_sdoh_substance_usage download started this may take some time.
[OK!]


In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+------------------------+-------------------+
|chunk                   |ner_label          |
+------------------------+-------------------+
|alcohol                 |Alcohol            |
|a regular basis         |Substance_Frequency|
|drinking                |Alcohol            |
|alcoholic beverages     |Alcohol            |
|occasionally            |Substance_Frequency|
|average intake          |Substance_Quantity |
|5 to 6 drinks           |Substance_Quantity |
|per month               |Substance_Frequency|
|alcohol                 |Alcohol            |
|past 28 years           |Substance_Duration |
|alcohol consumption     |Alcohol            |
|drinking                |Alcohol            |
|alcohol intake          |Alcohol            |
|smoking tobacco products|Smoking            |
|smoking                 |Smoking            |
|10 to 15                |Substance_Quantity |
|cigarettes              |Smoking            |
|per day                 |Substance_Frequency|
|past 15 year

In [None]:

for i in range(len(sample_texts)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document')
    print("\n"*2)


























## **`ner_sdoh_community_condition`**

In [None]:
sample_texts = ["Access to safe and adequate shelter is critical for good health. Living in small apartments can lead to overcrowding, which increases the risk of infectious diseases such as tuberculosis. Green spaces and outdoor recreational activities play a crucial role in promoting physical activity and mental health. People who have access to these spaces tend to be more active and have lower rates of depression and anxiety. Additionally, access to healthy food options is essential for maintaining good nutrition and reducing the risk of chronic diseases. Food insecurity is a significant societal health issue that can contribute to poor diet and malnutrition. Reliable transportation is also necessary for accessing healthcare services, grocery stores.",
                "The patient has MS and has been wheel chair bound for many years. The patient met with the hospital social worker last week and reported feeling unsafe at home due to an altercation with another resident in the patient s building over a former girlfriend. The patient reported getting into an altercation with another resident in an apartment complex after the resident came after the patient and grabbed the patient s throat. The patient reports currently living alone and no longer having a relationship .Sw spoke with the patient about developing a safety plan for when the patient returns home as the patient reports that the resident did threaten patient life and the patient does not feel safe.",
                "During the initial assessment, it was discovered that the patient has a new colostomy, but is ambivalent about discussing any support she may need at home to manage it. This raises concerns about potential risky safety issues in the home, which may need to be addressed. Furthermore, the patient faces challenges in procuring and obtaining a sufficient quantity of nutritious food, which might affect her rehabilitation and general health. Furthermore, the patient mentioned that she lives at house of her parents on the weekends and visits during the week, but due to her condition, she would not be able to board the T or the bus and was asked to leave the premises. These transportation limitations may pose a challenge for accessing medical care and other essential services, which will need to be addressed in the patient's care plan.",
                "In addition to concerns regarding transportation, there are also potential neighborhood safety issues to consider for the patient's home. Given her fall down the steps, it is important to assess the safety of her living environment and make any necessary modifications to prevent future accidents. This may include installing handrails or removing tripping hazards. It is crucial to ensure that the patient can continue to live independently in her own home while also maintaining her safety and well-being. It is recommended that a professional home safety assessment be conducted to address any potential risks and make appropriate modifications.",
                "Rachel reported experiencing transportation and food insecurity, as well as a lack of insurance coverage, which may limit her access to medical care and other essential services. However, while in care, Rachel reports feeling safe and supported by the healthcare team. Despite her financial concerns, she reports no transportation issues, which may be a positive factor in her overall well-being. The healthcare team will continue to work with Rachel to address her healthcare needs and provide support and resources as needed to improve her overall health and well-being."]


In [None]:
df = spark.createDataFrame(sample_texts, StringType()).toDF("text")

In [None]:
ner_model = medical.NerModel.pretrained("ner_sdoh_community_condition", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])


result = ner_pipeline.fit(df).transform(df)

ner_sdoh_community_condition download started this may take some time.
[OK!]


In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.metadata)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']['entity']").alias("ner_label")).show(30, truncate=False)

+-------------------------------+-----------------------+
|chunk                          |ner_label              |
+-------------------------------+-----------------------+
|adequate shelter               |Housing                |
|small apartments               |Housing                |
|Green spaces                   |Environmental_Condition|
|outdoor recreational activities|Environmental_Condition|
|healthy food                   |Food_Insecurity        |
|Food insecurity                |Food_Insecurity        |
|poor diet                      |Food_Insecurity        |
|malnutrition                   |Food_Insecurity        |
|transportation                 |Transportation         |
|unsafe at home                 |Community_Safety       |
|apartment                      |Housing                |
|living alone                   |Housing                |
|feel safe                      |Community_Safety       |
|safety issues                  |Community_Safety       |
|nutritious fo

In [None]:
for i in range(len(sample_texts)):
    visualizer.display(
        result = result.collect()[i],
        label_col = 'ner_chunk',
        document_col = 'document')
    print("\n"*2)
























