# Detecting the Dominant Language in a Text Document Using Amazon Comprehend

In [1]:
import boto3
import json
from pprint import pprint as pp

In [2]:
comprehend = boto3.client(service_name='comprehend')

In [3]:
string1 = 'Machine Learning is fascinating.'
string2 = 'शामिल होने के लिए धन्यवाद'

In [4]:
comprehend.detect_dominant_language(Text = string1)

{'Languages': [{'LanguageCode': 'en', 'Score': 0.9855319857597351}],
 'ResponseMetadata': {'RequestId': 'a040f564-e363-4cba-b22b-22d3e01083f9',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'a040f564-e363-4cba-b22b-22d3e01083f9',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '64',
   'date': 'Tue, 16 Mar 2021 08:13:22 GMT'},
  'RetryAttempts': 0}}

In [5]:
print('Calling DetectDominantLanguage')

print("*"*100)
print('string1 result:')
print("*"*100)

response = comprehend.detect_dominant_language(Text = string1)

pp(response)

print("*"*100)
print('string2 result:')
print("*"*100)

response = comprehend.detect_dominant_language(Text = string2)

pp(response)

print('End of DetectDominantLanguage\n')

Calling DetectDominantLanguage
****************************************************************************************************
string1 result:
****************************************************************************************************
{'Languages': [{'LanguageCode': 'en', 'Score': 0.9855319857597351}],
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '64',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Tue, 16 Mar 2021 08:13:23 GMT',
                                      'x-amzn-requestid': '27f24c7b-bfe6-4de2-a226-47f2e5837dca'},
                      'HTTPStatusCode': 200,
                      'RequestId': '27f24c7b-bfe6-4de2-a226-47f2e5837dca',
                      'RetryAttempts': 0}}
****************************************************************************************************
string2 result:
******************************************************************************

# Extracting Information from a Document

###  `detect_entities`

In [6]:
comprehend = boto3.client(service_name='comprehend')

In [7]:
my_string = "I study Machine Learning with AWS, and I am from India, \
                  and we are excited about the HackNIT which is scheduled in March 2021"

In [8]:
response = comprehend.detect_entities(Text = my_string, LanguageCode='en')

pp(response)

{'Entities': [{'BeginOffset': 30,
               'EndOffset': 33,
               'Score': 0.9771283864974976,
               'Text': 'AWS',
               'Type': 'ORGANIZATION'},
              {'BeginOffset': 49,
               'EndOffset': 54,
               'Score': 0.9970868229866028,
               'Text': 'India',
               'Type': 'LOCATION'},
              {'BeginOffset': 103,
               'EndOffset': 110,
               'Score': 0.3540899455547333,
               'Text': 'HackNIT',
               'Type': 'EVENT'},
              {'BeginOffset': 133,
               'EndOffset': 143,
               'Score': 0.9991873502731323,
               'Text': 'March 2021',
               'Type': 'DATE'}],
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '396',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Tue, 16 Mar 2021 08:13:23 GMT',
                                      'x-amzn-requestid

### `DetectSentiment`

In [9]:
comprehend = boto3.client(service_name='comprehend')

In [10]:
my_string = "You know what, you are such a pathetic presenter, I have no clue who requested you to come here"

In [11]:
response = comprehend.detect_sentiment(Text = my_string, LanguageCode='en')

In [12]:
pp(response)

{'ResponseMetadata': {'HTTPHeaders': {'content-length': '166',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Tue, 16 Mar 2021 08:13:25 GMT',
                                      'x-amzn-requestid': '3489b567-4c99-4336-bb02-a5606a347a79'},
                      'HTTPStatusCode': 200,
                      'RequestId': '3489b567-4c99-4336-bb02-a5606a347a79',
                      'RetryAttempts': 0},
 'Sentiment': 'NEGATIVE',
 'SentimentScore': {'Mixed': 0.00035584188299253583,
                    'Negative': 0.9849704504013062,
                    'Neutral': 0.013673778623342514,
                    'Positive': 0.0009999849135056138}}


In [13]:
my_string = "Hey, I was kidding, I have not seen any presenter like you, since my birth"
response = comprehend.detect_sentiment(Text = my_string, LanguageCode='en')

In [14]:
pp(response)

{'ResponseMetadata': {'HTTPHeaders': {'content-length': '162',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Tue, 16 Mar 2021 08:13:25 GMT',
                                      'x-amzn-requestid': 'ac00359a-6766-497a-a29c-a5df2d2145f7'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'ac00359a-6766-497a-a29c-a5df2d2145f7',
                      'RetryAttempts': 0},
 'Sentiment': 'NEUTRAL',
 'SentimentScore': {'Mixed': 0.0013401777250692248,
                    'Negative': 0.3932669460773468,
                    'Neutral': 0.5333490967750549,
                    'Positive': 0.07204371690750122}}


In [15]:
my_string = "Hey, I was kidding, I have not seen any presenter like you, since my birth, you are awesome"
response = comprehend.detect_sentiment(Text = my_string, LanguageCode='en')

In [16]:
pp(response)

{'ResponseMetadata': {'HTTPHeaders': {'content-length': '166',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Tue, 16 Mar 2021 08:13:25 GMT',
                                      'x-amzn-requestid': '43320edd-861a-4f61-a072-faea68fe8a8a'},
                      'HTTPStatusCode': 200,
                      'RequestId': '43320edd-861a-4f61-a072-faea68fe8a8a',
                      'RetryAttempts': 0},
 'Sentiment': 'POSITIVE',
 'SentimentScore': {'Mixed': 0.0020419417414814234,
                    'Negative': 0.011707475408911705,
                    'Neutral': 0.030218344181776047,
                    'Positive': 0.9560322165489197}}


# Setting Up a Lambda Function and Analyzing Imported Text Using Comprehend

## Integrating Comprehend and AWS Lambda for responsive NLP

#### What Is AWS Lambda?

- AWS Lambda is a compute service that runs code without provisioning or managing servers. 
- AWS Lambda executes code only when needed, and scales automatically.
- AWS Lambda runs your code on a high-availability compute infrastructure, which performs the administration of the compute service. 
- AWS Lambda performs the following: 
    - server and operating system maintenance, 
    - capacity provisioning and automatic scaling, code monitoring, and logging.
- Overall, the goal of AWS Lambda is to make short, simple, modular code segments that you can tie together into a larger processing infrastructure.

#### Let's examine the structure of a Lambda function

- When you create a function (for example, `s3_trigger`), AWS creates a folder named the same, with a Python file named `Lambda_function.py` within the folder. This file contains a stub for the `Lambda_handler` function, which is the entry point of our Lambda function. 

- The entry point takes two parameters as arguments: 
    
    - The `event` argument provides the value of the payload, which is sent to the function from the calling process. 

    - The `context` argument is of the type LambdaContext and contains runtime information. 

#### What we are going to do ? 

We will incorporate :
- AWS Lambda
- Amazon S3
- Amazon Comprehend

to automatically perform document analysis when a text document is uploaded to S3.

`Step 1`:
    Create a S3 bucket (`aws-ml-hacknit-demo`)

`Step 2`:
    Create a Lambda function

`Step 3`:
    Add a role so that Lambda can access Comprehend