# Amazon Comprehend API Tutorial
## Step 1 - Setting Up
This is a tutorial of using Amazon Comprehend API. This is an API designed for generating insights for business products, but you can always try different features of it by accessing its console or accessing through its APIs. This is suitable for you if you are interested in applications of machine learning/natural language processing, even if you are not a business owner. 

The first thing is to make sure you have an AWS account and a user that has access to Comprehend services. Then you will need to set up AWS CLI(Command Line Interface). In this blog, we will use AWS SageMaker. 

To begin using AWS SageMaker, we will need to create a notebook instance. After that, we could open the notebook in Jupyter and start by creating a new python-conda3 file. In this case, since we are using AWS SageMaker, the AWS CLI should come in pre-installed. To check this, run the command below:

In [2]:
!aws s3 help

S3()                                                                      S3()



[1mNAME[0m
       s3 -

[1mDESCRIPTION[0m
       This  section  explains  prominent concepts and notations in the set of
       high-level S3 commands provided.

       If you are looking for the low level S3 commands for  the  CLI,  please
       see the [1ms3api [22mcommand [4mreference[24m [4mpage[24m.

   [1mPath Argument Type[0m
       Whenever using a command, at least one path argument must be specified.
       There are two types of path arguments: [1mLocalPath [22mand [1mS3Uri[22m.

       [1mLocalPath[22m: represents the path of a local file or directory.  It can be
       written as an absolute path or relative path.

       [1mS3Uri[22m: represents the location of a S3 object, prefix, or bucket.  This
       must be written in the form [1ms3://mybucket/mykey [22mwhere [1mmybucket  [22mis  the
       specified  S3 bucket, [1mmykey [22mis the spec

## Step 2 - Checking AWS S3 Buckets
The next step is to interact with S3 buckets using AWS CLI. The introduction to S3 buckets can be found in this [link](https://aws.amazon.com/s3/). Run the following code to list the S3 buckets you have:

In [3]:
!aws s3 ls

2023-03-28 23:19:03 myaswabcdaksldjdoidasjdk
2023-03-26 22:04:01 myawsbucket-groupproject
2023-02-22 18:14:52 myawsbucketfordaylight


Next, you will need to upload your .txt test files to the S3 bucket being utilized, in this case, I used myawsbucket-groupproject. A recommended method for uploading files to an S3 bucket involves navigating to the S3 console, selecting the desired bucket, and choosing the "upload" option. From there, you can select the files you wish to upload and proceed to transfer them to the specified S3 bucket.

## Step 3 - Using AWS Comprehend to Get Insights from the Text
AWS Comprehend allows users to use both synchronous and asynchronous analysis of the text. Here, mainly synchronous analysis for the purpose of demonstration. However, asynchronous analysis of named entities will also be illustrated to show the interaction between AWS Comprehend and S3 buckets. AWS Comprehend provides with different NLP services to analyze the text, and here three different services are included: Named Entity Recognition, Sentiment Analysis, and Targeted Sentiment Analysis. 

### Named Entity Recognition
This is the NLP service from AWS Comprehend to detect named entities - a real world object like a location - within a sentence or sentences. A list of all possible entities from AWS Comprehend is included below. <img src = 'https://drive.google.com/uc?export=view&id=15OUwjtNiVwpjJ_zWMl3vu2DCknJ7xTox' width = 550>

#### Synchronous Method
To obtain the Named Entity from textual data synchronously, execute the code snippet below. The required input parameters comprise the region of operation,  the language code of the text to be analyzed, and the target text itself. As an example, I chose to analyze "The weather in Atlanta is nice today". Comprehend returned two named entities: "Atlanta" as a location and "today" as a date in a json query. It also provides the "BeginOffset" and "EndOffset" of the recognized entities. There is also a score assigned to the entity, which is a confidence score that the service has in the accuracy of the entity recognition.

In [6]:
!aws comprehend detect-entities \
    --region us-east-1 \
    --language-code "en" \
    --text "The weather in Atlanta is nice today"

{
    "Entities": [
        {
            "Score": 0.9985149502754211,
            "Type": "LOCATION",
            "Text": "Atlanta",
            "BeginOffset": 15,
            "EndOffset": 22
        },
        {
            "Score": 0.9952268600463867,
            "Type": "DATE",
            "Text": "today",
            "BeginOffset": 31,
            "EndOffset": 36
        }
    ]
}


#### Asynchronous Method
To obtain the Named Entity from textual data asynchronously, execute the code snippet below. The necessary input parameters include the S3 URL of the input text, the output destination, the ARN of the AWS role with access to the data and AWS Comprehend, a user-defined name for the asynchronous analysis job, the region of operation, and the language code of the text to be analyzed.

If this request to start the job is sucessful, the following response will be received:

In [14]:
!aws comprehend start-entities-detection-job \
--input-data-config S3Uri=s3://myawsbucket-groupproject/TestText/Entity_Recog_test.txt,InputFormat=ONE_DOC_PER_LINE \
--output-data-config S3Uri=s3://myawsbucket-groupproject/ \
--data-access-role-arn arn:aws:iam::139228718159:role/LabRole \
--job-name entity_detection_test\
--region us-east-1\
--language-code 'en'

{
    "JobId": "1419b6498a3de8563a8ef6a31cdcc327",
    "JobArn": "arn:aws:comprehend:us-east-1:139228718159:entities-detection-job/1419b6498a3de8563a8ef6a31cdcc327",
    "JobStatus": "SUBMITTED"
}


You can check the status of the job through the following code. You can specify the region of the job and filter based on the job name you defined in the previous step. 

In [21]:
!aws comprehend list-entities-detection-jobs \
--filter "JobName = entity_detection_test" \
--region us-east-1


{
    "EntitiesDetectionJobPropertiesList": [
        {
            "JobId": "1419b6498a3de8563a8ef6a31cdcc327",
            "JobArn": "arn:aws:comprehend:us-east-1:139228718159:entities-detection-job/1419b6498a3de8563a8ef6a31cdcc327",
            "JobName": "entity_detection_test",
            "JobStatus": "IN_PROGRESS",
            "SubmitTime": 1681856340.227,
            "InputDataConfig": {
                "S3Uri": "s3://myawsbucket-groupproject/TestText/Entity_Recog_test.txt",
                "InputFormat": "ONE_DOC_PER_LINE"
            },
            "OutputDataConfig": {
                "S3Uri": "s3://myawsbucket-groupproject/139228718159-NER-1419b6498a3de8563a8ef6a31cdcc327/output/output.tar.gz"
            },
            "LanguageCode": "en",
            "DataAccessRoleArn": "arn:aws:iam::139228718159:role/LabRole"
        }
    ]
}


After the JobStatus becomes to "COMPLETED", you can download the output to your SageMaker using the following code:
!aws s3 cp "OUTPUT_DATA_DIRECTORY" "NAME_OF_THE_FILE"
You should replace "OUTPUT_DATA_DIRECTORY" by the S3 URL of OutputDataConfig in the previous output, and replace "NAME_OF_THE_FILE" by the user-defined name of the output data. 

In [None]:
!aws s3 cp s3://myawsbucket-groupproject/139228718159-NER-1419b6498a3de8563a8ef6a31cdcc327/output/output.tar.gz entity_output.tar.gz

### Sentiment Analysis
This service returns a sentiment score of the inputted text, including positive, negative, neutral, and mixed. Each of the sentiment score is assigned with a "confidence score", providing an estimate by Amazon Comprehend for that sentiment being dominant. 

To get the sentiment of the text, run the following code. The detect-sentiment takes three arguments: region, language-code, and the input text. The output will provide you will the general sentiment and the confidence score of each sentiment. For instance, the service labels the sentence, "We regret to inform that we are unable to invite you to our program this year.", as negative with 92% confidence.

In [23]:
!aws comprehend detect-sentiment \
    --region us-east-1 \
    --language-code "en" \
    --text "We regret to inform that we are unable to invite you to our program this year."


{
    "Sentiment": "NEGATIVE",
    "SentimentScore": {
        "Positive": 0.0031476884614676237,
        "Negative": 0.9206318855285645,
        "Neutral": 0.07557003200054169,
        "Mixed": 0.0006503906333819032
    }
}


### Targeted Sentiment Analysis
Targeted sentiment analysis provides a more detailed understanding of the sentiments associated with specific entities, such as brands or products, mentioned in your input documents. This approach differs from standard sentiment analysis in that it analyzes the sentiment at the entity level rather than the document level.

With targeted sentiment analysis, you can gain insights into the sentiment of specific products or services and identify those that are receiving positive or negative feedback. For example, if you were analyzing a set of restaurant reviews, targeted sentiment analysis could tell you the sentiment associated with particular menu items, like the "tacos," and the behavior of the "staff."

In contrast to standard sentiment analysis, which determines the overall sentiment of each input document, targeted sentiment analysis determines the sentiment for entities and attributes mentioned in each document. The output of targeted sentiment analysis includes the identity of the entities mentioned in the documents, the entity type classification, and the sentiment and sentiment score for each entity mention. Additionally, targeted sentiment analysis groups together mentions that correspond to a single entity, known as co-reference groups, to provide a more complete picture of the sentiment associated with that entity.

To access the targeted sentiment analysis, run the folliwng code, which takes the same arguments as sentiment analysis: region, language-code, text. The output returns the identified entities and the sentiment score of each entity. 

In [24]:
!aws comprehend detect-targeted-sentiment \
    --region us-east-1 \
    --language-code "en" \
    --text "We regret to inform that we are unable to invite you to our program this year."

{
    "Entities": [
        {
            "DescriptiveMentionIndex": [
                2
            ],
            "Mentions": [
                {
                    "Score": 0.999966025352478,
                    "GroupScore": 1.0,
                    "Text": "We",
                    "Type": "ORGANIZATION",
                    "MentionSentiment": {
                        "Sentiment": "NEUTRAL",
                        "SentimentScore": {
                            "Positive": 0.0,
                            "Negative": 0.0,
                            "Neutral": 1.0,
                            "Mixed": 0.0
                        }
                    },
                    "BeginOffset": 0,
                    "EndOffset": 2
                },
                {
                    "Score": 0.9999939799308777,
                    "GroupScore": 0.9992619752883911,
                    "Text": "we",
                    "Type": "ORGANIZATION",
         