## Introduction

Amazon Comprehend provides textual analysis for a given content. In this workshop we will demonstrate some of the features and capabilities with in the Instance domain

Comprehend Builtin features:
* Entities detection – extract a a list of entities, such as people, places, and locations, identified in a document.
* Key phrases detection – extracts key phrases that appear in a document.
* PII information detection – analyzes and detect personal data that could be used to identify an individual, such as an address, bank account number, or phone number.
* Determine Language – the dominant language in a document.
* Determine Sentiment – the emotional sentiment of a document.
* Syntax Processing – parses each word in your document and determines the part of speech for the word. For example, in the sentence "It is raining today in Seattle," "it" is identified as a pronoun, "raining" is identified as a verb, and "Seattle" is identified as a proper noun.
* Custom classifier - you can also build our own classifier by providing content and matching classification.
* Topic modeling - organize a large corpus of documents into topics or clusters that are similar based on the frequency of words within them.

Comprehend supports both synchronous and batch API calls, so it can be used for large scale processing as well as realtime results.


## Extracting Insurance claims entities

In this example, we will show how to integrate Amazon comprehend when building a model as part of a feature engineering. or feature extraction.
Comprehend offers built-in entities' extraction when we can use the result for further analysis.



In [9]:
import boto3
import json

example1 = "Hi, My name is Sam Ford and i'm filing a claim for car accident. my policy number is 4561-YQT"

client = boto3.client('comprehend')

response = client.detect_entities(
    Text=example1,
    LanguageCode='en'
)
print("Example 1 response " , json.dumps(response['Entities'], indent=4))


Example 1 response  [
    {
        "Score": 0.9992227554321289,
        "Type": "PERSON",
        "Text": "Sam Ford",
        "BeginOffset": 15,
        "EndOffset": 23
    },
    {
        "Score": 0.9835370182991028,
        "Type": "OTHER",
        "Text": "4561-YQT",
        "BeginOffset": 85,
        "EndOffset": 93
    }
]


You can see comprehend was able to detect the person name as Sam Ford and the Policy number as an entity of type Other.
* We can build a customer comprehend model and train it to detect specific number as policy numbers, so that our model will return "Policy" instead of "OTHER" for a given policy number

Let's look at another example

In [10]:
example2 = "A turbine generator of model KG43 supplying power to a hospital failed when blades broke and penetrated the engine. Total Loss: $292,513"
response = client.detect_entities(
    Text=example2,
    LanguageCode='en'
)
print("Example 2 response " , json.dumps(response['Entities'], indent=4))


Example 2 response  [
    {
        "Score": 0.9821133017539978,
        "Type": "COMMERCIAL_ITEM",
        "Text": "KG43",
        "BeginOffset": 29,
        "EndOffset": 33
    },
    {
        "Score": 0.9957295656204224,
        "Type": "QUANTITY",
        "Text": "$292,513",
        "BeginOffset": 128,
        "EndOffset": 136
    }
]


In this case, you can see comprehend was able to detect the machinery part caused the issue and automatically classify it as "COMMERCIAL_ITEM", Comprehend was also able to detect the loss quantitiy of the amount of $292,513 and classify it as "QUANTITY"
* as mentioned above, we can build a custom classifier to classify the type of amount i.e.: "Loss Amount" or "Claim amount"


## Detecting PII Data
Let's see how comprehend can help to detect Personal identifiable information such as bank accounts and routing numbers

In [11]:
example3 = "Please submit the claim funds to bank account: 100115449863 routing number: 012334851"
response = client.detect_pii_entities(
    Text=example3,
    LanguageCode='en'
)
print("Example 3 response " , json.dumps(response['Entities'], indent=4))


Example 3 response  [
    {
        "Score": 0.9999892711639404,
        "Type": "BANK_ACCOUNT_NUMBER",
        "BeginOffset": 43,
        "EndOffset": 55
    },
    {
        "Score": 0.9998562335968018,
        "Type": "BANK_ROUTING",
        "BeginOffset": 72,
        "EndOffset": 81
    }
]


## Detecting Medical information
We will now check a dedicated comprehend service for Medical information, this provides a pre-trained text analysis for Medical terms including Medications, Medical forms, drug, Diagnosis and more.

In this example we will use Comprehend medical to detect an insurance claim for medical diagnosis


In [17]:
client = boto3.client('comprehendmedical')

example4 = "Dear Mr. Adams,  after conducting several tests, I beleive my patient suffer from a Chronic Hypertension. As per this diagnosis, please approve claims P11023 and P42011. Thanks, Dr. Gegings" 
response = client.detect_entities(
    Text=example4
)

print("Example 4 response " , json.dumps(response['Entities'], indent=4))

# entities = []
# for entity in response['Entities']:
#    if entity['Score']>0.8 and entity['Type']=="ORGANIZATION":
#       entities.append(entity['Text'])
# print(entities)

Example 4 response  [
    {
        "Id": 2,
        "BeginOffset": 9,
        "EndOffset": 14,
        "Score": 0.9998182654380798,
        "Text": "Adams",
        "Category": "PROTECTED_HEALTH_INFORMATION",
        "Type": "NAME",
        "Traits": []
    },
    {
        "Id": 0,
        "BeginOffset": 84,
        "EndOffset": 91,
        "Score": 0.9535625576972961,
        "Text": "Chronic",
        "Category": "MEDICAL_CONDITION",
        "Type": "ACUITY",
        "Traits": []
    },
    {
        "Id": 1,
        "BeginOffset": 92,
        "EndOffset": 104,
        "Score": 0.9996131062507629,
        "Text": "Hypertension",
        "Category": "MEDICAL_CONDITION",
        "Type": "DX_NAME",
        "Traits": [
            {
                "Name": "DIAGNOSIS",
                "Score": 0.9769366383552551
            }
        ],
        "Attributes": [
            {
                "Type": "ACUITY",
                "Score": 0.9535625576972961,
                "RelationshipScore

You can see comprehend detect the Medical Condition as Hypertension and can even detect its acute condition. Moreover the claims ID are classified as Protected Medical Health information along with the patient name.