# 🩺📊✨ Text Analytics for Health

Azure AI Language Text Analytics for Health is a cloud-based API service that uses machine learning to extract and label relevant medical information from unstructured texts like doctor's notes, discharge summaries, clinical documents, and electronic health records. This service is designed to help healthcare providers improve health outcomes by analyzing text data for insights.

Its documentation can be found [here](https://learn.microsoft.com/en-us/azure/ai-services/language-service/text-analytics-for-health/overview?tabs=ner). 

With Azure AI Language Text Analytics for Health, you can:

- **Named Entity Recognition**: Identify and categorize medical entities such as 
    - *Medications*: Names of drugs, dosages, and forms (e.g., tablets, injections).
    - *Conditions*: Medical conditions, diseases, symptoms, and diagnoses.
    - *Procedures*: Medical procedures, surgeries, and treatments.
    - *Anatomical Terms*: Parts of the body, organs, and tissues.
    - *Lab Tests*: Names of laboratory tests and their results.
    - *Medical Devices*: Equipment and devices used in medical treatments.
    - *Healthcare Providers*: Names and roles of healthcare professionals (e.g., doctors, nurses).
    - *Patient Information*: Demographic details such as age, gender, and ethnicity.
    - *Clinical Events*: Events related to patient care, such as hospital admissions and discharges.
- **Relation Extraction**: Determine relationships between entities, such as dosage and medication.
- **Entity Linking**: Link entities to standardized medical vocabularies like UMLS.
- **Assertion Detection**: Identify whether entities are present, absent, or conditional.
- **Social Determinants of Health (SDOH) Extraction**: Extract mentions of social factors affecting health, such as living conditions and ethnicity

## How to run this notebook

1. Make sure Python is installed. 
2. Create a virtual environment and activate it. 
3. Install the dependencies specified in requirements.txt.

In [1]:
!pip install -r requirements.txt



4. Deploy the necessary services by clicking on the Deploy button in the notebook.


<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FNicoGrassetto%2Fai-language-notebooks%2Fmain%2Ftext-analytics-for-health%2Fdeplo.json" target="_blank">
    <img src="deploytoazure.svg" alt="Button" style="width:200px;height:auto;">
</a>

## Analyze Healthcare Entities and Identify Relationships in Document Batches

In [None]:
import os
from azure.core.exceptions import HttpResponseError
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient

    endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
    key = os.environ["AZURE_LANGUAGE_KEY"]

    text_analytics_client = TextAnalyticsClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(key),
    )

    documents = [
        "RECORD #333582770390100 | MH | 85986313 | | 054351 | 2/14/2001 12:00:00 AM | \
        CORONARY ARTERY DISEASE | Signed | DIS | Admission Date: 5/22/2001 \
        Report Status: Signed Discharge Date: 4/24/2001 ADMISSION DIAGNOSIS: \
        CORONARY ARTERY DISEASE. HISTORY OF PRESENT ILLNESS: \
        The patient is a 54-year-old gentleman with a history of progressive angina over the past several months. \
        The patient had a cardiac catheterization in July of this year revealing total occlusion of the RCA and \
        50% left main disease , with a strong family history of coronary artery disease with a brother dying at \
        the age of 52 from a myocardial infarction and another brother who is status post coronary artery bypass grafting. \
        The patient had a stress echocardiogram done on July , 2001 , which showed no wall motion abnormalities ,\
        but this was a difficult study due to body habitus. The patient went for six minutes with minimal ST depressions \
        in the anterior lateral leads , thought due to fatigue and wrist pain , his anginal equivalent. Due to the patient's \
        increased symptoms and family history and history left main disease with total occasional of his RCA was referred \
        for revascularization with open heart surgery."
    ]

    poller = text_analytics_client.begin_analyze_healthcare_entities(documents)

    try:
        poller.cancel()
    except HttpResponseError as e:
        # If the operation has already reached a terminal state it cannot be cancelled.
        print(e)

    else:
        print("Healthcare entities analysis was successfully cancelled.")

## Start a Long-Running Operation for Comprehensive Text Analysis

In [None]:
"""
FILE: sample_analyze_healthcare_action.py

DESCRIPTION:
    This sample demonstrates how to submit a collection of text documents for analysis, which uses the
    AnalyzeHealthcareEntitiesAction (plus FHIR feature) and RecognizePiiEntitiesAction to recognize healthcare entities,
    along with any PII entities.
    The response will contain results from each of the individual actions specified in the request.

USAGE:
    python sample_analyze_healthcare_action.py

    Set the environment variables with your own values before running the sample:
    1) AZURE_LANGUAGE_ENDPOINT - the endpoint to your Language resource.
    2) AZURE_LANGUAGE_KEY - your Language subscription key
"""


def sample_analyze_healthcare_action() -> None:
    import os
    from azure.core.credentials import AzureKeyCredential
    from azure.ai.textanalytics import (
        TextAnalyticsClient,
        AnalyzeHealthcareEntitiesAction,
        RecognizePiiEntitiesAction,
    )

    endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
    key = os.environ["AZURE_LANGUAGE_KEY"]

    text_analytics_client = TextAnalyticsClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(key),
    )

    documents = [
        """
        Patient needs to take 100 mg of ibuprofen, and 3 mg of potassium. Also needs to take
        10 mg of Zocor.
        """,
        """
        Patient needs to take 50 mg of ibuprofen, and 2 mg of Coumadin.
        """
    ]

    poller = text_analytics_client.begin_analyze_actions(
        documents,
        display_name="Sample Text Analysis",
        actions=[
            AnalyzeHealthcareEntitiesAction(),
            RecognizePiiEntitiesAction(domain_filter="phi"),
        ],
    )

    document_results = poller.result()
    for doc, action_results in zip(documents, document_results):
        print(f"\nDocument text: {doc}")
        for result in action_results:
            if result.kind == "Healthcare":
                print("...Results of Analyze Healthcare Entities Action:")
                for entity in result.entities:
                    print(f"Entity: {entity.text}")
                    print(f"...Normalized Text: {entity.normalized_text}")
                    print(f"...Category: {entity.category}")
                    print(f"...Subcategory: {entity.subcategory}")
                    print(f"...Offset: {entity.offset}")
                    print(f"...Confidence score: {entity.confidence_score}")
                    if entity.data_sources is not None:
                        print("...Data Sources:")
                        for data_source in entity.data_sources:
                            print(f"......Entity ID: {data_source.entity_id}")
                            print(f"......Name: {data_source.name}")
                    if entity.assertion is not None:
                        print("...Assertion:")
                        print(f"......Conditionality: {entity.assertion.conditionality}")
                        print(f"......Certainty: {entity.assertion.certainty}")
                        print(f"......Association: {entity.assertion.association}")
                for relation in result.entity_relations:
                    print(f"Relation of type: {relation.relation_type} has the following roles")
                    for role in relation.roles:
                        print(f"...Role '{role.name}' with entity '{role.entity.text}'")

            elif result.kind == "PiiEntityRecognition":
                print("Results of Recognize PII Entities action:")
                for pii_entity in result.entities:
                    print(f"......Entity: {pii_entity.text}")
                    print(f".........Category: {pii_entity.category}")
                    print(f".........Confidence Score: {pii_entity.confidence_score}")

            elif result.is_error is True:
                print(f"...Is an error with code '{result.error.code}' and message '{result.error.message}'")

            print("------------------------------------------")


if __name__ == "__main__":
    sample_analyze_healthcare_action()

## Detecting Healthcare Entities in Documents

In [None]:
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------

"""
FILE: sample_analyze_healthcare_entities.py

DESCRIPTION:
    This sample demonstrates how to detect healthcare entities in a batch of documents.

    In this sample we will be a newly-hired engineer working in a pharmacy. We are going to
    comb through all of the prescriptions our pharmacy has fulfilled so we can catalog how
    much inventory we have.

USAGE:
    python sample_analyze_healthcare_entities.py

    Set the environment variables with your own values before running the sample:
    1) AZURE_LANGUAGE_ENDPOINT - the endpoint to your Language resource.
    2) AZURE_LANGUAGE_KEY - your Language subscription key
"""


def sample_analyze_healthcare_entities() -> None:

    print(
        "In this sample we will be combing through the prescriptions our pharmacy has fulfilled "
        "so we can catalog how much inventory we have"
    )
    print(
        "We start out with a list of prescription documents."
    )

    # [START analyze_healthcare_entities]
    import os
    import typing
    from azure.core.credentials import AzureKeyCredential
    from azure.ai.textanalytics import TextAnalyticsClient, HealthcareEntityRelation

    endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
    key = os.environ["AZURE_LANGUAGE_KEY"]

    text_analytics_client = TextAnalyticsClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(key),
    )

    documents = [
        """
        Patient needs to take 100 mg of ibuprofen, and 3 mg of potassium. Also needs to take
        10 mg of Zocor.
        """,
        """
        Patient needs to take 50 mg of ibuprofen, and 2 mg of Coumadin.
        """
    ]

    poller = text_analytics_client.begin_analyze_healthcare_entities(documents)
    result = poller.result()

    docs = [doc for doc in result if not doc.is_error]

    print("Let's first visualize the outputted healthcare result:")
    for doc in docs:
        for entity in doc.entities:
            print(f"Entity: {entity.text}")
            print(f"...Normalized Text: {entity.normalized_text}")
            print(f"...Category: {entity.category}")
            print(f"...Subcategory: {entity.subcategory}")
            print(f"...Offset: {entity.offset}")
            print(f"...Confidence score: {entity.confidence_score}")
            if entity.data_sources is not None:
                print("...Data Sources:")
                for data_source in entity.data_sources:
                    print(f"......Entity ID: {data_source.entity_id}")
                    print(f"......Name: {data_source.name}")
            if entity.assertion is not None:
                print("...Assertion:")
                print(f"......Conditionality: {entity.assertion.conditionality}")
                print(f"......Certainty: {entity.assertion.certainty}")
                print(f"......Association: {entity.assertion.association}")
        for relation in doc.entity_relations:
            print(f"Relation of type: {relation.relation_type} has the following roles")
            for role in relation.roles:
                print(f"...Role '{role.name}' with entity '{role.entity.text}'")
        print("------------------------------------------")

    print("Now, let's get all of medication dosage relations from the documents")
    dosage_of_medication_relations = [
        entity_relation
        for doc in docs
        for entity_relation in doc.entity_relations if entity_relation.relation_type == HealthcareEntityRelation.DOSAGE_OF_MEDICATION
    ]
    # [END analyze_healthcare_entities]

    print(
        "Now, I will create a dictionary of medication to total dosage. "
        "I will use a regex to extract the dosage amount. For simplicity sake, I will assume "
        "all dosages are represented with numbers and have mg unit."
    )
    import re
    from collections import defaultdict

    medication_to_dosage: typing.Dict[str, int] = defaultdict(int)

    for relation in dosage_of_medication_relations:
        # The DosageOfMedication relation should only contain the dosage and medication roles

        dosage_role = next(iter(filter(lambda x: x.name == "Dosage", relation.roles)))
        medication_role = next(iter(filter(lambda x: x.name == "Medication", relation.roles)))

        try:
            dosage_value = int(re.findall(r"\d+", dosage_role.entity.text)[0]) # we find the numbers in the dosage
            medication_to_dosage[medication_role.entity.text] += dosage_value
        except StopIteration:
            # Error handling for if there's no dosage in numbers.
            pass

    for medication, dosage in medication_to_dosage.items():
        print("We have fulfilled '{}' total mg of '{}'".format(
            dosage, medication
        ))


if __name__ == "__main__":
    sample_analyze_healthcare_entities()

## Choosing the model version

In [None]:
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------

"""
FILE: sample_model_version.py

DESCRIPTION:
    This sample demonstrates how to set the model_version for pre-built Text Analytics models.
    Recognize entities is used in this sample, but the concept applies generally to all pre-built Text Analytics models.

    By default, model_version is set to "latest". This indicates that the latest generally available version
    of the model will be used. Model versions are date based, e.g "2021-06-01".
    See the documentation for a list of all model versions:
    https://aka.ms/text-analytics-model-versioning

USAGE:
    python sample_model_version.py

    Set the environment variables with your own values before running the sample:
    1) AZURE_LANGUAGE_ENDPOINT - the endpoint to your Language resource.
    2) AZURE_LANGUAGE_KEY - your Language subscription key
"""


def sample_model_version() -> None:
    print("--------------Choosing model_version sample--------------")
    import os
    from azure.core.credentials import AzureKeyCredential
    from azure.ai.textanalytics import TextAnalyticsClient, RecognizeEntitiesAction

    endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
    key = os.environ["AZURE_LANGUAGE_KEY"]

    text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
    documents = [
        "I work for Foo Company, and we hired Contoso for our annual founding ceremony. The food \
        was amazing and we all can't say enough good words about the quality and the level of service."
    ]

    print("\nSetting model_version='latest' with recognize_entities")
    result = text_analytics_client.recognize_entities(documents, model_version="latest")
    result = [review for review in result if not review.is_error]

    print("...Results of Recognize Entities:")
    for review in result:
        for entity in review.entities:
            print(f"......Entity '{entity.text}' has category '{entity.category}'")

    print("\nSetting model_version='latest' with recognize entities action in begin_analyze_actions")
    poller = text_analytics_client.begin_analyze_actions(
        documents,
        actions=[
            RecognizeEntitiesAction(model_version="latest")
        ]
    )

    print("...Results of Recognize Entities Action:")
    document_results = poller.result()
    for action_results in document_results:
        action_result = action_results[0]
        if action_result.kind == "EntityRecognition":
            for entity in action_result.entities:
                print(f"......Entity '{entity.text}' has category '{entity.category}'")
        elif action_result.is_error is True:
            print("......Is an error with code '{}' and message '{}'".format(
                action_result.error.code, action_result.error.message
            ))


if __name__ == '__main__':
    sample_model_version()

In [None]:
# from azure.identity import DefaultAzureCredential
# from azure.ai.textanalytics import TextAnalyticsClient
# import os
# # Use DefaultAzureCredential for managed identity authentication
# credential = DefaultAzureCredential()
# endpoint = os.environ.get('LANGUAGE_ENDPOINT')
# # Create the Text Analytics client
# client = TextAnalyticsClient(endpoint=endpoint, credential=credential)

NoneType

In [None]:
# # Authenticate the client using your key and endpoint 
# def authenticate_client():
#     ta_credential = AzureKeyCredential(key)
#     text_analytics_client = TextAnalyticsClient(
#             endpoint=endpoint, 
#             credential=ta_credential)
#     return text_analytics_client

# client = authenticate_client()

# # Example function for extracting information from healthcare-related text 
# def health_example(client):
#     documents = [
#         """
#         Patient needs to take 50 mg of ibuprofen.
#         """
#     ]

#     poller = client.begin_analyze_healthcare_entities(documents)
#     result = poller.result()

#     docs = [doc for doc in result if not doc.is_error]

#     for idx, doc in enumerate(docs):
#         for entity in doc.entities:
#             print("Entity: {}".format(entity.text))
#             print("...Normalized Text: {}".format(entity.normalized_text))
#             print("...Category: {}".format(entity.category))
#             print("...Subcategory: {}".format(entity.subcategory))
#             print("...Offset: {}".format(entity.offset))
#             print("...Confidence score: {}".format(entity.confidence_score))
#         for relation in doc.entity_relations:
#             print("Relation of type: {} has the following roles".format(relation.relation_type))
#             for role in relation.roles:
#                 print("...Role '{}' with entity '{}'".format(role.name, role.entity.text))
#         print("------------------------------------------")
# health_example(client)

: 