This is a basic tutorial of how to use Amazon Comprehend and an introduction to its functions.

Detecting Dominant Language

In [5]:
!aws comprehend detect-dominant-language \
    --region us-east-1 \
    --text "It is raining today in Seattle."


{
    "Languages": [
        {
            "LanguageCode": "en",
            "Score": 0.9925304651260376
        }
    ]
}


In [32]:
!aws comprehend detect-dominant-language \
    --region us-east-1 \
    --text "Das ist nicht gut."


{
    "Languages": [
        {
            "LanguageCode": "de",
            "Score": 0.9983222484588623
        }
    ]
}


Detecting Named Entities

In [7]:
!aws comprehend detect-entities \
    --region us-east-1 \
    --language-code "en" \
    --text "It is raining today in Seattle."

{
    "Entities": [
        {
            "Score": 0.9967291355133057,
            "Type": "DATE",
            "Text": "today",
            "BeginOffset": 14,
            "EndOffset": 19
        },
        {
            "Score": 0.9988620281219482,
            "Type": "LOCATION",
            "Text": "Seattle",
            "BeginOffset": 23,
            "EndOffset": 30
        }
    ]
}


Detecting Key Phrases

In [9]:
!aws comprehend detect-key-phrases \
    --region us-east-1 \
    --language-code "en" \
    --text "It is raining today in Seattle."


{
    "KeyPhrases": [
        {
            "Score": 0.9993902444839478,
            "Text": "today",
            "BeginOffset": 14,
            "EndOffset": 19
        },
        {
            "Score": 0.9989783763885498,
            "Text": "Seattle",
            "BeginOffset": 23,
            "EndOffset": 30
        }
    ]
}


Determining sentiment

In [10]:
!aws comprehend detect-sentiment \
    --region us-east-1 \
    --language-code "en" \
    --text "It is raining today in Seattle."


{
    "Sentiment": "NEUTRAL",
    "SentimentScore": {
        "Positive": 0.02944827452301979,
        "Negative": 0.42635542154312134,
        "Neutral": 0.5440797805786133,
        "Mixed": 0.00011647139763226733
    }
}


Analyzing targeted sentiment - Sentiment analysis determines the dominant sentiment for each input document, but doesn't provide data for further analysis. Targeted sentiment analysis determines the entity-level sentiment for specific entities in each input document. 

In [11]:
!aws comprehend detect-targeted-sentiment \
    --region us-east-1 \
    --language-code "en" \
    --text "The burger was cooked perfectly but it was cold. The service was OK."


{
    "Entities": [
        {
            "DescriptiveMentionIndex": [
                0
            ],
            "Mentions": [
                {
                    "Score": 0.9999960064888,
                    "GroupScore": 1.0,
                    "Text": "burger",
                    "Type": "OTHER",
                    "MentionSentiment": {
                        "Sentiment": "POSITIVE",
                        "SentimentScore": {
                            "Positive": 0.9999750256538391,
                            "Negative": 9.999999974752427e-07,
                            "Neutral": 0.0,
                            "Mixed": 2.499999936844688e-05
                        }
                    },
                    "BeginOffset": 4,
                    "EndOffset": 10
                },
                {
                    "Score": 0.9981750249862671,
                    "GroupScore": 0.9996610283851624,
                    "Text": "it",
      

Detecting syntax

In [12]:
!aws comprehend detect-syntax \
   --region us-east-1 \
   --language-code "en" \
   --text "It is raining today in Seattle."

{
    "SyntaxTokens": [
        {
            "TokenId": 1,
            "Text": "It",
            "BeginOffset": 0,
            "EndOffset": 2,
            "PartOfSpeech": {
                "Tag": "PRON",
                "Score": 0.9999788403511047
            }
        },
        {
            "TokenId": 2,
            "Text": "is",
            "BeginOffset": 3,
            "EndOffset": 5,
            "PartOfSpeech": {
                "Tag": "AUX",
                "Score": 0.9020146131515503
            }
        },
        {
            "TokenId": 3,
            "Text": "raining",
            "BeginOffset": 6,
            "EndOffset": 13,
            "PartOfSpeech": {
                "Tag": "VERB",
                "Score": 0.9988774657249451
            }
        },
        {
            "TokenId": 4,
            "Text": "today",
            "BeginOffset": 14,
            "EndOffset": 19,
            "PartOfSpeech": {
                "Tag": "NOUN

Async analysis - Topic Modeling

In [13]:
!aws s3 ls

2023-03-28 23:19:03 myaswabcdaksldjdoidasjdk
2023-03-26 22:04:01 myawsbucket-groupproject
2023-02-22 18:14:52 myawsbucketfordaylight


In [25]:
!aws comprehend start-topics-detection-job \
                --number-of-topics 10\
                --job-name "mytest" \
                --region us-east-1 \
                --cli-input-json file://jsontest.json
    

{
    "JobId": "fa2c6758c4321cb6a8f4111fee44d8d6",
    "JobArn": "arn:aws:comprehend:us-east-1:139228718159:topics-detection-job/fa2c6758c4321cb6a8f4111fee44d8d6",
    "JobStatus": "SUBMITTED"
}


In [30]:
!aws comprehend list-topics-detection-jobs \--region us-east-1



{
    "TopicsDetectionJobPropertiesList": [
        {
            "JobId": "fa2c6758c4321cb6a8f4111fee44d8d6",
            "JobArn": "arn:aws:comprehend:us-east-1:139228718159:topics-detection-job/fa2c6758c4321cb6a8f4111fee44d8d6",
            "JobName": "mytest",
            "JobStatus": "IN_PROGRESS",
            "SubmitTime": 1680579091.268,
            "InputDataConfig": {
                "S3Uri": "s3://myawsbucket-groupproject/TestText/",
                "InputFormat": "ONE_DOC_PER_FILE"
            },
            "OutputDataConfig": {
                "S3Uri": "s3://myawsbucket-groupproject/TestText/139228718159-TOPICS-fa2c6758c4321cb6a8f4111fee44d8d6/output/output.tar.gz"
            },
            "NumberOfTopics": 10,
            "DataAccessRoleArn": "arn:aws:iam::139228718159:role/LabRole"
        }
    ]
}


In [33]:
!aws comprehend describe-topics-detection-job --job-id fa2c6758c4321cb6a8f4111fee44d8d6

{
    "TopicsDetectionJobProperties": {
        "JobId": "fa2c6758c4321cb6a8f4111fee44d8d6",
        "JobArn": "arn:aws:comprehend:us-east-1:139228718159:topics-detection-job/fa2c6758c4321cb6a8f4111fee44d8d6",
        "JobName": "mytest",
        "JobStatus": "COMPLETED",
        "SubmitTime": 1680579091.268,
        "EndTime": 1680579457.086,
        "InputDataConfig": {
            "S3Uri": "s3://myawsbucket-groupproject/TestText/",
            "InputFormat": "ONE_DOC_PER_FILE"
        },
        "OutputDataConfig": {
            "S3Uri": "s3://myawsbucket-groupproject/TestText/139228718159-TOPICS-fa2c6758c4321cb6a8f4111fee44d8d6/output/output.tar.gz"
        },
        "NumberOfTopics": 10,
        "DataAccessRoleArn": "arn:aws:iam::139228718159:role/LabRole"
    }
}


In [34]:
!aws s3 cp s3://myawsbucket-groupproject/TestText/139228718159-TOPICS-fa2c6758c4321cb6a8f4111fee44d8d6/output/output.tar.gz output.tar.gz


download: s3://myawsbucket-groupproject/TestText/139228718159-TOPICS-fa2c6758c4321cb6a8f4111fee44d8d6/output/output.tar.gz to ./output.tar.gz
