# ðŸ©ºðŸ“Šâœ¨ Text Analytics for Health

Azure AI Language Text Analytics for Health is a cloud-based API service that uses machine learning to extract and label relevant medical information from unstructured texts like doctor's notes, discharge summaries, clinical documents, and electronic health records. This service is designed to help healthcare providers improve health outcomes by analyzing text data for insights.

Its documentation can be found [here](https://learn.microsoft.com/en-us/azure/ai-services/language-service/text-analytics-for-health/overview?tabs=ner). 

With Azure AI Language Text Analytics for Health, you can:

- **Named Entity Recognition**: Identify and categorize medical entities such as 
    - *Medications*: Names of drugs, dosages, and forms (e.g., tablets, injections).
    - *Conditions*: Medical conditions, diseases, symptoms, and diagnoses.
    - *Procedures*: Medical procedures, surgeries, and treatments.
    - *Anatomical Terms*: Parts of the body, organs, and tissues.
    - *Lab Tests*: Names of laboratory tests and their results.
    - *Medical Devices*: Equipment and devices used in medical treatments.
    - *Healthcare Providers*: Names and roles of healthcare professionals (e.g., doctors, nurses).
    - *Patient Information*: Demographic details such as age, gender, and ethnicity.
    - *Clinical Events*: Events related to patient care, such as hospital admissions and discharges.
- **Relation Extraction**: Determine relationships between entities, such as dosage and medication.
- **Entity Linking**: Link entities to standardized medical vocabularies like UMLS.
- **Assertion Detection**: Identify whether entities are present, absent, or conditional.
- **Social Determinants of Health (SDOH) Extraction**: Extract mentions of social factors affecting health, such as living conditions and ethnicity

## How to run this notebook

1. Make sure Python is installed. 
2. Create a virtual environment and activate it. 
3. Install the dependencies specified in requirements.txt.

In [1]:
!pip install -r requirements.txt



4. Deploy the necessary services by clicking on the Deploy button in the notebook.


<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FNicoGrassetto%2Fai-language-notebooks%2Fmain%2Ftext-analytics-for-health%2Fdeplo.json" target="_blank">
    <img src="deploytoazure.svg" alt="Button" style="width:200px;height:auto;">
</a>

## Analyze Healthcare Entities and Identify Relationships in Document Batches

Text Analytics for health processes and extracts insights from unstructured medical data. The service detects and surfaces medical concepts, assigns assertions to concepts, infers semantic relations between concepts and links them to common medical ontologies.


Prerequisites:
Make sure you have a Language resource deployed on Azure. If that's the case, fetch your credentials -- Azure Language endpoint and key -- on Azure Portal. 

In [9]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import (
    TextAnalyticsClient,
    AnalyzeHealthcareEntitiesAction,
    RecognizePiiEntitiesAction,
)
from dotenv import load_dotenv
load_dotenv()

endpoint = os.getenv("AZURE_LANGUAGE_ENDPOINT")
key = os.getenv("AZURE_LANGUAGE_KEY")

In [10]:
text_analytics_client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
)

In [None]:
documents = [
        "RECORD #333582770390100 | MH | 85986313 | | 054351 | 2/14/2001 12:00:00 AM | \
        CORONARY ARTERY DISEASE | Signed | DIS | Admission Date: 5/22/2001 \
        Report Status: Signed Discharge Date: 4/24/2001 ADMISSION DIAGNOSIS: \
        CORONARY ARTERY DISEASE. HISTORY OF PRESENT ILLNESS: \
        The patient is a 54-year-old gentleman with a history of progressive angina over the past several months. \
        The patient had a cardiac catheterization in Ju.ly of this year revealing total occlusion of the RCA and \
        50% left main disease , with a strong family history of coronary artery disease with a brother dying at \
        the age of 52 from a myocardial infarction and another brother who is status post coronary artery bypass grafting. \
        The patient had a stress echocardiogram done on July , 2001 , which showed no wall motion abnormalities ,\
        but this was a difficult study due to body habitus. The patient went for six minutes with minimal ST depressions \
        in the anterior lateral leads , thought due to fatigue and wrist pain , his anginal equivalent. Due to the patient's \
        increased symptoms and family history and history left main disease with total occasional of his RCA was referred \
        for revascularization with open heart surgery."
    ]

The method below identifies health entity categories in the document, as well as PII entities.

In [12]:
poller = text_analytics_client.begin_analyze_actions(
        documents,
        display_name="Sample Text Analysis",
        actions=[
            AnalyzeHealthcareEntitiesAction(),
            RecognizePiiEntitiesAction(domain_filter="phi"),
        ],
    )

In [13]:
document_results = poller.result()
for doc, action_results in zip(documents, document_results):
    print(f"\nDocument text: {doc}")
    for result in action_results:
        if result.kind == "Healthcare":
            print("...Results of Analyze Healthcare Entities Action:")
            for entity in result.entities:
                print(f"Entity: {entity.text}")
                print(f"...Normalized Text: {entity.normalized_text}")
                print(f"...Category: {entity.category}")
                print(f"...Subcategory: {entity.subcategory}")
                print(f"...Offset: {entity.offset}")
                print(f"...Confidence score: {entity.confidence_score}")
                if entity.data_sources is not None:
                    print("...Data Sources:")
                    for data_source in entity.data_sources:
                        print(f"......Entity ID: {data_source.entity_id}")
                        print(f"......Name: {data_source.name}")
                if entity.assertion is not None:
                    print("...Assertion:")
                    print(f"......Conditionality: {entity.assertion.conditionality}")
                    print(f"......Certainty: {entity.assertion.certainty}")
                    print(f"......Association: {entity.assertion.association}")
            for relation in result.entity_relations:
                print(f"Relation of type: {relation.relation_type} has the following roles")
                for role in relation.roles:
                    print(f"...Role '{role.name}' with entity '{role.entity.text}'")

        elif result.kind == "PiiEntityRecognition":
            print("Results of Recognize PII Entities action:")
            for pii_entity in result.entities:
                print(f"......Entity: {pii_entity.text}")
                print(f".........Category: {pii_entity.category}")
                print(f".........Confidence Score: {pii_entity.confidence_score}")

        elif result.is_error is True:
            print(f"...Is an error with code '{result.error.code}' and message '{result.error.message}'")

        print("------------------------------------------")


Document text: RECORD #333582770390100 | MH | 85986313 | | 054351 | 2/14/2001 12:00:00 AM |         CORONARY ARTERY DISEASE | Signed | DIS | Admission Date: 5/22/2001         Report Status: Signed Discharge Date: 4/24/2001 ADMISSION DIAGNOSIS:         CORONARY ARTERY DISEASE. HISTORY OF PRESENT ILLNESS:         The patient is a 54-year-old gentleman with a history of progressive angina over the past several months.         The patient had a cardiac catheterization in July of this year revealing total occlusion of the RCA and         50% left main disease , with a strong family history of coronary artery disease with a brother dying at         the age of 52 from a myocardial infarction and another brother who is status post coronary artery bypass grafting.         The patient had a stress echocardiogram done on July , 2001 , which showed no wall motion abnormalities ,        but this was a difficult study due to body habitus. The patient went for six minutes with minimal ST depressions

## Start a Long-Running Operation for Comprehensive Text Analysis

# Analyze Healthcare and PII Entities with Azure Text Analytics
This notebook demonstrates how to analyze healthcare entities and recognize Personally Identifiable Information (PII) entities using Azure Text Analytics. We'll use the `AnalyzeHealthcareEntitiesAction` and `RecognizePiiEntitiesAction` to process a batch of documents and extract meaningful insights.

In [27]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import (
    TextAnalyticsClient,
    AnalyzeHealthcareEntitiesAction,
    RecognizePiiEntitiesAction,
)
from dotenv import load_dotenv
load_dotenv()

True

# Set Up Azure Text Analytics Client
Here, we set up the Azure Text Analytics client using the endpoint and key stored in environment variables. Make sure to set the `AZURE_LANGUAGE_ENDPOINT` and `AZURE_LANGUAGE_KEY` environment variables before running this cell.

In [28]:
endpoint = os.getenv("AZURE_LANGUAGE_ENDPOINT")
key = os.getenv("AZURE_LANGUAGE_KEY")

text_analytics_client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
)

# Define Input Documents
We define a batch of documents that will be analyzed for healthcare and PII entities. These documents contain information about medications and patient details.

In [29]:
documents = [
    """
    Patient needs to take 100 mg of ibuprofen, and 3 mg of potassium. Also needs to take
    10 mg of Zocor.
    """,
    """
    Patient needs to take 50 mg of ibuprofen, and 2 mg of Coumadin.
    """
]

# Submit Documents for Analysis
In this cell, we submit the documents for analysis using the `begin_analyze_actions` method. We specify two actions:
1. `AnalyzeHealthcareEntitiesAction` to extract healthcare-related entities.
2. `RecognizePiiEntitiesAction` to identify PII entities with a domain filter set to `"phi"` (Protected Health Information).

In [30]:
poller = text_analytics_client.begin_analyze_actions(
    documents,
    display_name="Sample Text Analysis",
    actions=[
        AnalyzeHealthcareEntitiesAction(),
        RecognizePiiEntitiesAction(domain_filter="phi"),
    ],
)

# Process and Display Results
This cell processes the results of the analysis and displays the extracted healthcare entities, PII entities, and any errors encountered during the analysis.

In [None]:
document_results = poller.result()
for doc, action_results in zip(documents, document_results):
    print(f"\nDocument text: {doc}")
    for result in action_results:
        if result.kind == "Healthcare":
            print("...Results of Analyze Healthcare Entities Action:")
            for entity in result.entities:
                print(f"Entity: {entity.text}")
                print(f"...Normalized Text: {entity.normalized_text}")
                print(f"...Category: {entity.category}")
                print(f"...Subcategory: {entity.subcategory}")
                print(f"...Offset: {entity.offset}")
                print(f"...Confidence score: {entity.confidence_score}")
                if entity.data_sources is not None:
                    print("...Data Sources:")
                    for data_source in entity.data_sources:
                        print(f"......Entity ID: {data_source.entity_id}")
                        print(f"......Name: {data_source.name}")
                if entity.assertion is not None:
                    print("...Assertion:")
                    print(f"......Conditionality: {entity.assertion.conditionality}")
                    print(f"......Certainty: {entity.assertion.certainty}")
                    print(f"......Association: {entity.assertion.association}")
            for relation in result.entity_relations:
                print(f"Relation of type: {relation.relation_type} has the following roles")
                for role in relation.roles:
                    print(f"...Role '{role.name}' with entity '{role.entity.text}'")

        elif result.kind == "PiiEntityRecognition":
            print("Results of Recognize PII Entities action:")
            for pii_entity in result.entities:
                print(f"......Entity: {pii_entity.text}")
                print(f".........Category: {pii_entity.category}")
                print(f".........Confidence Score: {pii_entity.confidence_score}")

        elif result.is_error is True:
            print(f"...Is an error with code '{result.error.code}' and message '{result.error.message}'")

        print("------------------------------------------")

## Detecting Healthcare Entities in Documents

This notebook demonstrates how to use Azure Text Analytics to detect healthcare entities in a batch of documents. We'll analyze prescription data to catalog medication inventory in a pharmacy.

Prerequisites: Azure Language resource deployed 
After that's done, you can fetch the credentials needed to run the script. 

In [14]:
import os
import typing
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient, HealthcareEntityRelation
from dotenv import load_dotenv
load_dotenv()

True

# Set Up Azure Text Analytics Client
Here, we set up the Azure Text Analytics client using the endpoint and key stored in environment variables. Make sure to set the `AZURE_LANGUAGE_ENDPOINT` and `AZURE_LANGUAGE_KEY` environment variables before running this cell.

In [15]:
endpoint = os.getenv("AZURE_LANGUAGE_ENDPOINT")
key = os.getenv("AZURE_LANGUAGE_KEY")

text_analytics_client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
)

# Define Input Documents
We define a batch of prescription documents that will be analyzed for healthcare entities.

Supported input: ??

In [16]:
documents = [
    """
    Patient needs to take 100 mg of ibuprofen, and 3 mg of potassium. Also needs to take
    10 mg of Zocor.
    """,
    """
    Patient needs to take 50 mg of ibuprofen, and 2 mg of Coumadin.
    """
]

# Analyze Healthcare Entities
This cell sends the documents to Azure Text Analytics for healthcare entity analysis. The results are retrieved and filtered to exclude any errors.

In [17]:
poller = text_analytics_client.begin_analyze_healthcare_entities(documents)
result = poller.result()

docs = [doc for doc in result if not doc.is_error]

# Visualize Healthcare Entities
We iterate through the results to display the detected healthcare entities, their categories, confidence scores, and relationships.

In [19]:
for doc in docs:
    for entity in doc.entities:
        print(f"Entity: {entity.text}")
        print(f"...Normalized Text: {entity.normalized_text}")
        print(f"...Category: {entity.category}")
        print(f"...Subcategory: {entity.subcategory}")
        print(f"...Offset: {entity.offset}")
        print(f"...Confidence score: {entity.confidence_score}")
        if entity.data_sources is not None:
            print("...Data Sources:")
            for data_source in entity.data_sources:
                print(f"......Entity ID: {data_source.entity_id}")
                print(f"......Name: {data_source.name}")
        if entity.assertion is not None:
            print("...Assertion:")
            print(f"......Conditionality: {entity.assertion.conditionality}")
            print(f"......Certainty: {entity.assertion.certainty}")
            print(f"......Association: {entity.assertion.association}")
    for relation in doc.entity_relations:
        print(f"Relation of type: {relation.relation_type} has the following roles")
        for role in relation.roles:
            print(f"...Role '{role.name}' with entity '{role.entity.text}'")
    print("------------------------------------------")

Entity: 100 mg
...Normalized Text: None
...Category: Dosage
...Subcategory: None
...Offset: 27
...Confidence score: 0.99
Entity: ibuprofen
...Normalized Text: ibuprofen
...Category: MedicationName
...Subcategory: None
...Offset: 37
...Confidence score: 1.0
...Data Sources:
......Entity ID: C0020740
......Name: UMLS
......Entity ID: 0000019879
......Name: AOD
......Entity ID: M01AE01
......Name: ATC
......Entity ID: 0046165
......Name: CCPSS
......Entity ID: 0000006519
......Name: CHV
......Entity ID: 2270-2077
......Name: CSP
......Entity ID: DB01050
......Name: DRUGBANK
......Entity ID: 1611
......Name: GS
......Entity ID: sh97005926
......Name: LCH_NW
......Entity ID: LP16165-0
......Name: LNC
......Entity ID: 40458
......Name: MEDCIN
......Entity ID: d00015
......Name: MMSL
......Entity ID: D007052
......Name: MSH
......Entity ID: WK2XYI10QM
......Name: MTHSPL
......Entity ID: C561
......Name: NCI
......Entity ID: 002377
......Name: NDDF
......Entity ID: CDR0000040475
......Name: PD

# Extract Medication Dosage Relations
This cell extracts all relations of type `DosageOfMedication` from the healthcare entity results.

In [20]:
dosage_of_medication_relations = [
    entity_relation
    for doc in docs
    for entity_relation in doc.entity_relations if entity_relation.relation_type == HealthcareEntityRelation.DOSAGE_OF_MEDICATION
]

# Calculate Total Dosage for Each Medication
We use a dictionary to calculate the total dosage for each medication by extracting dosage values using regular expressions.

In [21]:
import re
from collections import defaultdict

medication_to_dosage: typing.Dict[str, int] = defaultdict(int)

for relation in dosage_of_medication_relations:
    # The DosageOfMedication relation should only contain the dosage and medication roles
    dosage_role = next(iter(filter(lambda x: x.name == "Dosage", relation.roles)))
    medication_role = next(iter(filter(lambda x: x.name == "Medication", relation.roles)))

    try:
        dosage_value = int(re.findall(r"\d+", dosage_role.entity.text)[0])  # Extract numbers from dosage
        medication_to_dosage[medication_role.entity.text] += dosage_value
    except StopIteration:
        # Error handling for if there's no dosage in numbers.
        pass

# Display Total Dosage
Finally, we display the total dosage for each medication based on the extracted relations.

In [22]:
for medication, dosage in medication_to_dosage.items():
    print(f"We have fulfilled '{dosage}' total mg of '{medication}'")

We have fulfilled '150' total mg of 'ibuprofen'
We have fulfilled '3' total mg of 'potassium'
We have fulfilled '10' total mg of 'Zocor'
We have fulfilled '2' total mg of 'Coumadin'


## Choosing the model version

# Setting Model Version for Pre-Built Text Analytics Models
This notebook demonstrates how to set the `model_version` for pre-built Text Analytics models. By default, the `model_version` is set to `"latest"`, which uses the latest generally available version of the model. We'll explore how to use this feature with the `Recognize Entities` action.

In [23]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient, RecognizeEntitiesAction
from dotenv import load_dotenv
load_dotenv()

True

# Set Up Azure Text Analytics Client
Here, we set up the Azure Text Analytics client using the endpoint and key stored in environment variables. Make sure to set the `AZURE_LANGUAGE_ENDPOINT` and `AZURE_LANGUAGE_KEY` environment variables before running this cell.

In [24]:
endpoint = os.getenv("AZURE_LANGUAGE_ENDPOINT")
key = os.getenv("AZURE_LANGUAGE_KEY")

text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))

# Define Input Documents
We define a batch of documents that will be analyzed for entity recognition. These documents contain information about a company's event and service quality.

In [25]:
documents = [
    "I work for Foo Company, and we hired Contoso for our annual founding ceremony. The food \
    was amazing and we all can't say enough good words about the quality and the level of service."
]

# Recognize Entities with `model_version="latest"`
In this cell, we use the `recognize_entities` method with the `model_version` parameter set to `"latest"`. This ensures that the latest version of the model is used for entity recognition.

In [26]:
result = text_analytics_client.recognize_entities(documents, model_version="latest")
result = [review for review in result if not review.is_error]

print("...Results of Recognize Entities:")
for review in result:
    for entity in review.entities:
        print(f"......Entity '{entity.text}' has category '{entity.category}'")

...Results of Recognize Entities:
......Entity 'Foo Company' has category 'Organization'
......Entity 'Contoso' has category 'Person'
......Entity 'annual' has category 'DateTime'
......Entity 'founding ceremony' has category 'Event'
......Entity 'food' has category 'Product'


# Recognize Entities with `begin_analyze_actions` and `model_version="latest"`
Here, we use the `begin_analyze_actions` method with the `RecognizeEntitiesAction` action. The `model_version` parameter is set to `"latest"` to use the latest model version for entity recognition.

In [None]:
poller = text_analytics_client.begin_analyze_actions(
    documents,
    actions=[
        RecognizeEntitiesAction(model_version="latest")
    ]
)

print("...Results of Recognize Entities Action:")
document_results = poller.result()
for action_results in document_results:
    action_result = action_results[0]
    if action_result.kind == "EntityRecognition":
        for entity in action_result.entities:
            print(f"......Entity '{entity.text}' has category '{entity.category}'")
    elif action_result.is_error is True:
        print("......Is an error with code '{}' and message '{}'".format(
            action_result.error.code, action_result.error.message
        ))

# When to use `recognize_entities` and when `begin_analyze_actions`?

- **`recognize_entities`**: This method is used for a single, synchronous call to recognize entities in a batch of documents. It is straightforward and returns results immediately after processing. Use this when you only need to perform one type of analysis (e.g., entity recognition) and require quick results.

- **`begin_analyze_actions`**: This method is used for asynchronous, multi-action analysis. It allows you to perform multiple types of analyses (e.g., entity recognition, key phrase extraction, etc.) on the same batch of documents in a single request. The results are retrieved after the processing is complete. Use this when you need to perform multiple analyses or when working with larger datasets that require asynchronous processing.

In [None]:
# from azure.identity import DefaultAzureCredential
# from azure.ai.textanalytics import TextAnalyticsClient
# import os
# # Use DefaultAzureCredential for managed identity authentication
# credential = DefaultAzureCredential()
# endpoint = os.environ.get('LANGUAGE_ENDPOINT')
# # Create the Text Analytics client
# client = TextAnalyticsClient(endpoint=endpoint, credential=credential)

NoneType

In [None]:
# # Authenticate the client using your key and endpoint 
# def authenticate_client():
#     ta_credential = AzureKeyCredential(key)
#     text_analytics_client = TextAnalyticsClient(
#             endpoint=endpoint, 
#             credential=ta_credential)
#     return text_analytics_client

# client = authenticate_client()

# # Example function for extracting information from healthcare-related text 
# def health_example(client):
#     documents = [
#         """
#         Patient needs to take 50 mg of ibuprofen.
#         """
#     ]

#     poller = client.begin_analyze_healthcare_entities(documents)
#     result = poller.result()

#     docs = [doc for doc in result if not doc.is_error]

#     for idx, doc in enumerate(docs):
#         for entity in doc.entities:
#             print("Entity: {}".format(entity.text))
#             print("...Normalized Text: {}".format(entity.normalized_text))
#             print("...Category: {}".format(entity.category))
#             print("...Subcategory: {}".format(entity.subcategory))
#             print("...Offset: {}".format(entity.offset))
#             print("...Confidence score: {}".format(entity.confidence_score))
#         for relation in doc.entity_relations:
#             print("Relation of type: {} has the following roles".format(relation.relation_type))
#             for role in relation.roles:
#                 print("...Role '{}' with entity '{}'".format(role.name, role.entity.text))
#         print("------------------------------------------")
# health_example(client)

: 