The Analysis Main State Machine focuses on extracting machine learning metadata from the media file. Depending on the media type of the file and the types of AI/ML detection specified in the request, the Analysis Main State Machine activates different branches of the detections. For instance, if the media file is a document, the analysis workflow focuses on extracting tabular data from the document using Amazon Textract service. If the uploaded file is a video file, the analysis workflow activates the video analysis workflow to extract visual data using Amazon Rekognition service AND the audio analysis workflow to extract speech-to-text data using Amazon Transcribe service as well as to run Natural Language Processing (NLP) process to extract key phrases, entities, locations, quantities, and so forth using Amazon Comprehend service.
__
The state machine execution input is a pass through from the Main State Machine where the mandatory field is the input.uuid
that is used to look up the the ingestion data from the Ingest DynamoDB Table. The input.aiOptions.*
fields specify the type of AI/ML detection to run. See all possible option in the following table. If $.input.aiOptions
field is not specified, the state machine lambda function defaults to the AI/ML settings specified when the CloudFormation stack was created.
{
"input": {
"uuid": "UUID",
"aiOptions": {
"minConfidence": 80,
/* Rekognition settings */
"celeb": true,
"face": true,
"facematch": true,
"faceCollectionId": "REKOGNITION_COLLECTION_ID",
"label": true,
"moderation": true,
"person": true,
"text": true,
"textROI": [true, true, true, false, false, false, false, false, false],
"segment": true,
"customlabel": true,
"customLabelModels": [
"REKONIGTION_CUSTOM_LABEL_01",
"REKONIGTION_CUSTOM_LABEL_02"
],
/* Transcribe settings */
"transcribe": true,
"languageCode": "en-US",
"customLanguageModel": "TRANSCRIBE_CUSTOM_LANGUAGE_MODEL",
"customVocabulary": "TRANSCRIBE_CUSTOM_VOCABULARY",
/* Comprehend settings */
"keyphrase": true,
"entity": true,
"sentiment": true,
"customentity": true,
"customEntityRecognizer": "COMPREHEND_CUSTOM_ENTITY_RECOGNIZER",
/* Textract settings */
"textract": true,
/* frame based analysis */
"framebased": false,
"frameCaptureMode": 1003,
}
}
}
Field | Description | Required? |
---|---|---|
input.uuid | UUID of the media file used to look up the ingest data from Ingest DynamoDB Table | Mandatory |
input.aiOptions.* | AI/ML options to run the analysis workflow. If not specified, the solution uses the default AI/ML options specified when the Amazon CloudFormation stack was created | Optional |
input.aiOptions.minConfidence | Minimum confidence level to return detection results | Optional |
input.aiOptions.celeb | Enable/disable Amazon Rekognition Celebrity Recognition detection | Optional |
input.aiOptions.face | Enable/disable Amazon Rekognition Face detection | Optional |
input.aiOptions.facematch | Enable/disable Amazon Rekognition Search Face in collection. faceCollectionId field must also be specified. |
Optional |
input.aiOptions.faceCollectionId | Specify the face collection to use to run Amazon Rekognition Search Face API. If facematch is set to false, this field is ignored. |
Optional |
input.aiOptions.label | Enable/disable Amazon Rekognition Label detection | Optional |
input.aiOptions.moderation | Enable/disable Amazon Rekognition Content Moderation Label detection | Optional |
input.aiOptions.person | Enable/disable Amazon Rekognition People Pathing detection. | Optional |
input.aiOptions.text | Enable/disable Amazon Rekognition Text detection. | Optional |
input.aiOptions.textROI | Limit the text detection to specific region of the video/image frame. The field is an array of 3x3 grid representing: Top Left (TL), Top Center (TC), Top Right (TR), Middle Left (ML), Center (C), Middle Right (MR), Bottom Left (BL), Bottom Center (BC), and Bottom Right (BR). Setting the grid(s) to true indicates the area is in interests. | Optional |
input.aiOptions.segment | Enable/disable Amazon Rekognition Video Segment detection. | Optional |
input.aiOptions.customlabel | Enable/disable Amazon Rekognition Custom Labels feature. customLabelModels field must also be specified. |
Optional |
input.aiOptions.customLabelModels | Specify the Custom Label model(s) to be used to run the analysis. This field is an array of the models. You can specify TWO models at most. If customlabel field is not specified, this field is ignored. |
Optional |
input.aiOptions.transcribe | Enable/disable Amazon Transcribe | Optional |
input.aiOptions.languageCode | Specify the language code to run Amazon Transcribe. To enable Auto Language Detection , DO NOT specify this field |
Optional |
input.aiOptions.customLanguageModel | Specify Amazon Transcribe Custom Language Models (CLM) to use | Optional |
input.aiOptions.customVocabulary | Specify Amazon Transcribe Custom Vocabularies to use | Optional |
input.aiOptions.keyphrase | Enable/disable Amazon Comprehend Key Phrases detection | Optional |
input.aiOptions.entity | Enable/disable Amazon Comprehend Entities detection | Optional |
input.aiOptions.sentiment | Enable/disable Amazon Comprehend Sentiment Analysis | Optional |
input.aiOptions.customEntityRecognizer | Specify Amazon Comprehend Custom Entity Recognizer to use to improve domain specific entity detection | Optional |
input.aiOptions.framebased | Opt-in to use Frame based analysis which converts video into image frames and uses Amazon Rekognition Image APIs instead of Video APIs with the exceptions of TWO Amazon Rekognition Video APIs that are Amazon Rekognition Video Segment and Amazon Rekognition People Pathing APIs. | Optional |
input.aiOptions.frameCaptureMode | When opt-in to use Frame-based analysis, this field specifies the frame capture rate such as 1 frame every 2 seconds , 1 frame every second , and so forth. The full list can be found in source/layers/core-lib/lib/frameCaptureMode.js |
Optional |
__
A state where a lambda function checks the incoming analysis request, $.input.aiOptions field and prepares the optimal AI/ML analysis options to run based on the media type of the file and the availability of specific detections.
An example is that when user specifies an Amazon Rekognition Face Collection (XYZ) to run face matching analysis but XYZ does not contain any face, the lambda function automatically opts out the face matching detection to minimize the cost and to reduce the processing time.
Another example is that when user specifies an Amazon Transcribe Custom Language Model (ABC) but the ABC Model is not in COMPLETED state, the lambda function opts out the Custom Language Model (customLanguageModel) setting automatically to avoid potential failure caused by the availability of the Custom Language Model.
__
A Choice state to check whether video analysis is enabled by checking $.input.video.enabled flag.
__
Start video analysis and wait is a sub-state machine where it runs Computer Vision (CV) analysis using Amazon Rekognition service.
__
An End state to indicate video analysis is not enabled.
__
A Choice state to check whether audio analysis is enabled by checking $.input.audio.enabled flag.
__
Start audio analysis and wait is a sub-state machine where it runs Speech-to-Text (STT) and Natural Language Processing (NLP) analysis using Amazon Transcribe and Amazon Comprehend services. The audio analysis is activated when the media type is audio
or video
.
__
An End state to indicate audio analysis is not enabled.
__
A Choice state to check whether audio analysis is enabled by checking $.input.image.enabled flag.
__
Start image analysis and wait is a sub-state machine where it runs Computer Vision (CV) analysis using Amazon Rekognition Image APIs.
__
An End state to indicate image analysis is not enabled.
__
A Choice state to check whether document analysis is enabled by checking $.input.document.enabled flag.
__
Start document analysis and wait is a sub-state machine where it runs OCR analysis using Amazon Textract service.
__
An End state to indicate document analysis is not enabled.
__
A state where a lambda function parses and merges results from the nested states above.
__
A state where a lambda function updates analysis field of the Ingest DynamoDB table to indicate types of analysis have been run. The lambda function also creates records on the aiml
DynamoDB table with information including start time and end time of each analysis detection, pointers to where the analysis metadata JSON results stored in the Amazon S3 proxy bucket, the job name of the detection, and the ARN of the state machine execution.
__
The analysis-main lambda function provides the implementation to support different states of the Analysis Main state machine. The following AWS XRAY trace diagram demonstrates the AWS services this lambda function communicates to.
__
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "s3:ListBucket",
"Resource": "PROXY_BUCKET",
"Effect": "Allow"
},
{
"Action": "s3:GetObject",
"Resource": "PROXY_BUCKET",
"Effect": "Allow"
},
{
"Action": [
"dynamodb:DescribeTable",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem"
],
"Resource": [
"INGEST_TABLE",
"AIML_TABLE",
"SERVICE_TOKEN_TABLE"
],
"Effect": "Allow"
},
{
"Action": [
"es:ESHttpGet",
"es:ESHttpHead",
"es:ESHttpPost",
"es:ESHttpPut",
"es:ESHttpDelete"
],
"Resource": "OPENSEARCH_DOMAIN",
"Effect": "Allow"
},
{
"Action": "iot:Publish",
"Resource": "IOT_STATUS_TOPIC",
"Effect": "Allow"
},
{
"Action": "sns:Publish",
"Resource": "SNS_STATUS_TOPIC",
"Effect": "Allow"
},
{
"Action": "rekognition:ListFaces",
"Resource": "arn:aws:rekognition:REGION:ACCOUNT:collection/*",
"Effect": "Allow"
},
{
"Action": "rekognition:DescribeProjectVersions",
"Resource": "arn:aws:rekognition:REGION:ACCOUNT:project/*/*",
"Effect": "Allow"
},
{
"Action": "comprehend:DescribeEntityRecognizer",
"Resource": "arn:aws:comprehend:REGION:ACCOUNT:entity-recognizer/*",
"Effect": "Allow"
},
{
"Action": [
"transcribe:GetVocabulary",
"transcribe:DescribeLanguageModel"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
__
- Analysis Video State Machine
- Analysis Audio State Machine
- Analysis Image State Machine
- Analysis Document State Machine
__
Back to Main State Machine | Back to Table of contents