v1.0.0 release (#13)

* Initial commit * Creating initial file from template * Creating initial file from template * Updating initial README.md from template * Creating initial file from template * Creating initial file from template * Creating initial file from template * initial commit of release * add webapp readme to gitignore * add images and readme update * fix broken readme links * Update README with architecture overview diagram. * Formating changes for Architecture overview section. * Typo * Update README.md * Update intro, Media Analysis -> Media Insights * disable faceSearch * Review AWS services names and minor fixes/improvements (#2) * add CORS rule to allow the webapp to upload files to the Dataplane bucket * add dropzone dependencies * draw bounding boxes and scatter plot * simplify * Use 0.0.0.0/0 allow all policy for API Gateway and Elasticsearch * fix type-o in reference to ES domain ARN * ignore dist files * move HTTP actions to a policy for ES domain subresources (mie-es/*) * adjust canvas erase interval to improve bounding boxes appearance * add workflow configuration form * use region-specific s3 boto client * add TODO for dropzone initialization * set upload url to the URL in the get_presigned_url() response * fix bug in declaring dropzone presigned url * update drawing * remove unnecessary timeout threshold * Set deployment defaults to automatically build MIE web application. * fix image workflow error * fix thumbnails for image assets * if search is empty string then get asset list from dataplane instead of Elasticsearch * show alert if dataplane connection fails * display image instead of video for image media types * fix lint warnings * fix bug in table fields prop * fix bug in custom slot table prop * Add Rekognition workflow back to stack * fix formatting in top level cfn stack * update bootstrap-vue to fix issue with new v-slot syntax * Update one click launch locations and add us-west-2 * Fix link formatting * Beta v0.1.1 release (#22) * add CORS rule to allow the webapp to upload files to the Dataplane bucket * add dropzone dependencies * draw bounding boxes and scatter plot * simplify * Use 0.0.0.0/0 allow all policy for API Gateway and Elasticsearch * fix type-o in reference to ES domain ARN * ignore dist files * move HTTP actions to a policy for ES domain subresources (mie-es/*) * adjust canvas erase interval to improve bounding boxes appearance * add workflow configuration form * use region-specific s3 boto client * add TODO for dropzone initialization * set upload url to the URL in the get_presigned_url() response * fix bug in declaring dropzone presigned url * update drawing * remove unnecessary timeout threshold * Set deployment defaults to automatically build MIE web application. * fix image workflow error * fix thumbnails for image assets * if search is empty string then get asset list from dataplane instead of Elasticsearch * show alert if dataplane connection fails * display image instead of video for image media types * fix lint warnings * fix bug in table fields prop * fix bug in custom slot table prop * Add Rekognition workflow back to stack * fix formatting in top level cfn stack * update bootstrap-vue to fix issue with new v-slot syntax * Update one click launch locations and add us-west-2 * Fix link formatting * Add more guidance for ApiIplist We've seen issues with setting the ApiIpList to a specific host. * semi-working end to end cognito implementation, need webapp to autodeploy properly and elasticsearch support * env file change and a sample auth for tests * fix unmerged changes * Major rewrite to replace dataplane REST API references with MIE Lambda helper methods * finish updating webapp views to support cognito and update website autodeploy * first pass at tests w/ cognito, bug fixes in .env file, add /analysis to a secure route, move tests directories, remove api gw resource policies * remove unused operator * show error if Polly is enabled but Translate is disabled * fixes to tests * deployment bug fixes for cognito, more updates to tests * remove testing file * Cognito / security enhancements (#42) * Big security update * Cognito Support for API’s (3 days) (*DONE*) * Build user schemas / groups * Cloudformation for cognito * Integrate Workflow API * Integrate Dataplane API * Update any operators / services calling an API to perform an auth call or bypass API and invoke lambda directly * Maybe could use identity pools for elasticsearch/kibana access? * Cognito support in GUI (4 days) (*DONE*) * Integrate cognito libraries into app * Update website build script to parse cognito values into env variables * Write Login component * Write Logout component * Update secure views to perform authorization check * Update API calls to pass auth token * Chalice * policies (2 days) (*DONE*) * Add functionality to replace IAM role in API handler lambdas with a parameter passed in from workflow stack * Create new IAM policies specific to each API * Update workflow stack * Dataplane helper (which is needed for every operator) needs ability to invoke dataplane API handler lambda, but not lambda * (1 day) (*DONE*) * Pattern is completed and implemented for most operators * Just need to finish applying this to all operators * Stepfunction role needs invoke * lambda to be able to execute all operators (1 day) (*DONE*) * No good answer for this due to Lambda / IAM * Only option I see currently is referencing every lambda arn in the operator library stack in the resource field for the stepfunction role. This would require us to manually add a lambda arn to the policy if an operator is created outside of cloudformation * We could also potentially leave this open (*Decided on this, not a huge security risk, but appsec can guide us here*) * Updated testing to support cognito auth on API’s (3 days) (*Mostly done, issue cut for remaining tests to be updated*) * Testing scripts need to be updated to pass a token with requests * fix lint warnings * authenticate after loading venv in case boto3 is not installed on system * Pass cognito access token to API calls * authenticate after loading venv in case boto3 is not installed on system * version bump amplify vue plugin to resolve sporadic login component issue * custom mie boto config * update 1 click deploy links * bug fix in key phrases * explain how to use cognito tokens when invoking the workflow API * automate the command to get the token * Summary: This commit adds a generic data lookup operator. Detail: JSON datasets can be precomputed and uploaded along with their associated media files to MIE. This commit adds an operator that saves user-specified precomputed data into the back-end dataplane. Several other bug fixes and documentation improvements are also in this commit. Finally, the sample-video.mp4 file in tests/tests-parameterized-rekognition/test-media/ was updated because the old version did not contains any audible speech. This caused the transcribe test to fail. So, we've updated sample-video.mp4 with an excerpt of the video publicly available [here](https://techmkt-videoarchive.s3-us-west-2.amazonaws.com/boulder_eats-ft.mp4). * draw bounding boxes for images * fix lint warnings * fix lint warnings * improve support for images * allow users to enable/disable image operators via workflow config form * rename the rekognition workflow to clarify what it actually does * init commit of getting started guide * added template parameters to main readme; draft for the implementation guide * remove sample app guide, now in implementation guide * Color overall architecture diagram (#85) Color overall architecture diagram and control flows for control plane operations * Updated elasticsearch access (#76) * cleanup / refactor fetch assets method in collection view * more collection view refactoring * elasticsearch query in collection view updated to use amplify api + auth * small cleanup in collection view * small ui improvment in collection view, update objects component to use amplify api for ES, remove servercheck in analysis * additional simplification in collection view and ui fixes for ES loading * update remaining components to use amplify api for es calls, small usability fixes to ui in components * updated elasticsearch consumer for allowing cognito * remove analytics tab to kibana * resolve linting warnings/errors * bugfixes after removing ip access list parameter * add post for ES search function and remove duplicate UI tips * fix bugs in es calls for text operators * update one clicks links * Remove information about api access list No longer needed after moving to identity based access policies * bug fix for thumbnail if statement * change default for rekognition workflow * Development (#94) * explain how to use cognito tokens when invoking the workflow API * automate the command to get the token * Summary: This commit adds a generic data lookup operator. Detail: JSON datasets can be precomputed and uploaded along with their associated media files to MIE. This commit adds an operator that saves user-specified precomputed data into the back-end dataplane. Several other bug fixes and documentation improvements are also in this commit. Finally, the sample-video.mp4 file in tests/tests-parameterized-rekognition/test-media/ was updated because the old version did not contains any audible speech. This caused the transcribe test to fail. So, we've updated sample-video.mp4 with an excerpt of the video publicly available [here](https://techmkt-videoarchive.s3-us-west-2.amazonaws.com/boulder_eats-ft.mp4). * draw bounding boxes for images * fix lint warnings * fix lint warnings * improve support for images * allow users to enable/disable image operators via workflow config form * rename the rekognition workflow to clarify what it actually does * init commit of getting started guide * added template parameters to main readme; draft for the implementation guide * remove sample app guide, now in implementation guide * Color overall architecture diagram (#85) Color overall architecture diagram and control flows for control plane operations * Updated elasticsearch access (#76) * cleanup / refactor fetch assets method in collection view * more collection view refactoring * elasticsearch query in collection view updated to use amplify api + auth * small cleanup in collection view * small ui improvment in collection view, update objects component to use amplify api for ES, remove servercheck in analysis * additional simplification in collection view and ui fixes for ES loading * update remaining components to use amplify api for es calls, small usability fixes to ui in components * updated elasticsearch consumer for allowing cognito * remove analytics tab to kibana * resolve linting warnings/errors * bugfixes after removing ip access list parameter * add post for ES search function and remove duplicate UI tips * fix bugs in es calls for text operators * update one clicks links * Remove information about api access list No longer needed after moving to identity based access policies * bug fix for thumbnail if statement * change default for rekognition workflow * fix face detection error * Change WebAppCloudfrontUrl stack output name to MediaInsightsWebAppUrl and make the value a clickable link (#104) * Webapp updates for beta1.4 (#110) This merge includes several changes that improve the first user experience. These changes include: * link Help menu to Implementation Guide * Rename the cognito app client for the webapp so it's easier to understand which app client should be used for boto3 and which should be used for Amplify. * clear canvas if user clicked the label button a second consecutive time * advise user to "Try lowering confidence threshold" when elasticsearch returns no data * prevent bounding boxes from overlapping * Persist the workflow execution history on the upload page. * add a hyperlink to workflow status for accessing step function execution details * add line break between workflow config and execution history * indicate when a thumbnail image is not available * allow users to control them thumbnail seek position in workflow config * alphabetize the transcribeLanguages list * Push all assets in parallel to the collection table so the table updates in O(1) instead of O(n) time. * Show both date and time in Created column in Collection view * Add operator for thumbnail creation and remove thumbnail creation from the mediaconvert (transcribe) operator. * fix paging bug * Fix reko pagination (#118) * Enable dataplane pagination when saving paginated data from Rekognition. * Increase lambda timeout and memory in order to help manage requests by Rekognition operators to save large result sets. * Remove unusued GUI artifacts for Polly and AutoML. * rename references to logo detection in the generic data operator * Remove unusued GUI artifacts for Polly and AutoML. * remove automl * Update README with v0.1.4 hosted deployment. Add sentence about 4 minute video limit. * Add api docs (#121) * add API documentation * Add page token to stepfunc (#125) This commit includes changes necessary to support 2 hour long videos. The key to achieving this was to allow step functions to pass a pagination token to the Lambda functions which get results from Rekognition jobs. Now, those Reko operators will persist 10 pages at a time, then stop and pass the pagination token to the step function so it can repeatedly restart the operator Lambda until there are no more pages left to read. This enables the Reko operators to save much larger datasets to the data-plane. Prior to this commit, Reko operators will timeout when trying to save large quantities of paged Reko result, which was often the case with label_detection and face_detection. Other changes: * Increase lambda timeout and memory in order to help manage requests by Rekognition operators to save large result sets. * Remove unusued GUI artifacts for Polly and AutoML. * add API documentation to README * Split input text in the translate operator so it does not exceed the 5000 characters max allowed by AWS Translate service limit. * Set timeouts and memory allocations for Lambda functions based on test results from a 2 hour movie (Amélie). * split bulk elasticsearch inserts in order to avoid exceeding max payload size * If data is empty, skip ES insert. Data is often empty for operators like content moderation when processing non-explicit videos. GUI changes: * Raise max file upload size to 2GB in the GUI. * allow analysis button to open in new tab * fade delete alert after 5 seconds * Change workflow configuration form so users only have to set the language for Transcribe and Translate once. Used to be that users would have to set that language preference twice, but now, since both Transcribe and Translate use the same source language, users can just specify this option once. CloudFormation template changes: * Make the Cloudfront URL a clickable link in the outputs from both the webapp CF template and the base stack template. * Update the email template for the Cognito invite message so it includes a link to the stack. * Update DEVELOPER_QUICK_START.md (#132) * Reference the correct template file in DEVELOPER_QUICK_START.md * Print root template at end of build script for easier deployment * fix mistakes made while resolving merge conflict * work around bug in amplify-authenticator that breaks autofill in the 1password browser plugin * update one-click deploy links * increase lambda timeout threshold for MediaConvert * update video duration limit * allow non-email usernames * Cache the mediaconvert endpoint in order to avoid getting throttled on the DescribeEndpoints API. * allow input text to be empty * Add support for new languages in AWS Translate and Transcribe * Add support for new languages in AWS Translate and Transcribe * V0.1.6 bug fixes (#140) * allow non-email usernames * Cache the mediaconvert endpoint in order to avoid getting throttled on the DescribeEndpoints API. * allow input text to be empty * Add support for new languages in AWS Translate and Transcribe * Add support for new languages in AWS Translate and Transcribe * fix python 3.6 build errors and add support for python 3.8 * Fix markdown anchor for glossary * add support to delete an asset from elasticsearch (#142) * fix template validation error that happens when DeployAnalyticsPipeline=false but DeployDemoSite=true * Mitigate XSS threats (#147) * add subresource integrity (SRI) checksums so browsers can verify that the files they fetch are delivered without unexpected manipulation. * move runtime configs from .env to /public/runtimeConfig.json * webapp code cleanup * webapp code cleanup * Updated tests (#149) This PR focuses on scoping IAM policies with least privalege. Along the way we have also improved the organization of build scripts and unit tests so they're easier to use. Summary: * Least privalege concerns were achieved by updating Cloud Formation templates to resolve issues reported by cfn_nag and viperlight * We used to have many run_test.sh scripts to run unit tests. These have been consolidated into one script, tests/run_tests.sh, which you can run like this: `echo "$REGION \n $MIE_STACK_NAME \n $MIE_USERNAME \n $MIE_PASSWORD" | ./tests/run_tests.sh` Details: * a pass at refactoring iam roles/policies * refactor tests to use media in dataplane bucket, big test overhaul, small IAM changes for dataplane * do not assume the user has put the region at the end of the bucket name * Remove sam_translate from dataplaneapi and workflowapi. Organize the code and output so it's easier to follow. Access MIE Helper package from source/lib/ instead of /lib. * Apply bash syntax optimizations * Access MIE Helper package from source/lib/ instead of /lib. * update lib path to mie helper * remove redundant doc * add stream encryption to fix cfn_nag warning * remove sam-translate.py files * remove old /webapp and /lib * remove old /webapp and /lib * rename license file per AWS guidelines * rename notice file per AWS guidelines * output misc debug info * move tests/ into source/ Co-authored-by: Ian Downard <54998167+ianwow@users.noreply.github.com> * Add mediainfo and transcode operators (#150) Resolved Issues: #32 #138 #152 #151 #128 #153 #154 #156 #157 Summary of changes: 1. added proxy encode to mediaconvert job that generates thumbnails 2. added MediaInfo libraries to MIE lambda layer. Also published these layers in the Technical 3. Marketing public S3 buckets. 4. added MediaInfo operator to MIE Complete Workflow and show mediainfo data in webui 5. major organization improvements in the build script 6. fixed minor webpack warnings 7. Added support for videos without spoken words 8. Added support for videos without any audio tracks 9. Added security measures to prevent users from uploading invalid media files Details: * Add mediainfo operator * Add MediaInfo library to MIE lambda layer * avoid webpack warnings about package size * fix compile-time jquery warning * remove unused requirements file * minor code cleanup * add log statement so we're consistent with other components * show mediainfo data in analysis page * explain how to enable hot-reload in dev mode * Explain how to validate data in elasticsearch. * Explain how to read/write metadata from one operator to another via workflow output objects. * skip comprehend operators when transcript is empty * skip comprehend operators when transcript is empty * skip transcribe if video is silent * use proxy encoded video for Rekognition operators * recognize more image file types when determining what to use for thumbnail * use a consistent print statement for logging the incoming lambda event object * Now that we're supporting media formats besides mp4 and jpg, use a generic image or video media type. We can't assume "video/mp4" or "image/jpg" anymore. * Remind developers that workflow attributes must be non-empty strings. * Add transcode to mediaconvert job. Use that for the proxy encode input to downstream operators. * Move transcribe operation from mediaconvert operator to thumbnail operator. The thumbnail operator now superseeds the old mediaconvert operator. We've disable old mediaconvert operator. After testing, we can remove the old mediaconvert operator. * Avoid drawing boxes outside the dimensions of the video player. * Thumbnail operator needs a check-status function now that it includes transcode. This commit adds that check-status function to the build script. * minor edit, just to reorder packages to improve readability * Move thumbnail operator to prelim stage so all mediaconvert outputs are ready before analysis operators begin. * avoid showing undefined mediainfo attributes * use free tier for elasticsearch domain * change header title to AWS Content Analysis * validate file types before upload * build layer for python 3.8 runtime * explain how to validate that the layer includes certain libraries * add PointInTimeRecoveryEnabled and HTTP (non-ssl) Deny rule to dataplane bucket * add versioning to S3 bucket * validate file type before upload and enable Mediainfo for image workflow * consolidate the code for checking image types * use webpack's default devServer https option * support all caps filenames * remove input media from upload/ after copying it to private/assets/[asset_id]/input/ * if input file is not a valid media file then remove it from S3 * Get mediaconvert endpoint from cache if available * Specify thumbnail as the first mediaconvert job so the thumbnail images become available as soon as possible. This lessens the likelihood of seeing broken thumbnail images in the webui. * Add Mediainfo to Image workflow and allow Mediainfo to delete files from S3. * minor edit to remove unnecessary whitespace * minor edit to fix a 'key not found' exception that occurred when testing an empty workflow execution request (e.g. POST {} payload to /api/workflow/execution) * Add Mediainfo to image workflow * minor remove errand comma * add CloudFormation string functions so we can use (lower case) stack name for mie website bucket * fix bug in error messages for invalid file types * fix yaml syntax errors * fix invalid table query when invoking a GET on $WORKFLOW_API_ENDPOINT/workflow/execution/status/Error * fix "key not found" error that occurs running workflows that include transcribe but not mediainfo * 1) Update workflow configs and 2) upload media prior to every workflow execution because dataplane now deletes the uploaded media after copying it to private/assets/. * upload media prior to workflow execution because dataplane now deletes the uploaded media after copying it to private/assets/. * 1) Update workflow configs and 2) upload media prior to every workflow execution because dataplane now deletes the uploaded media after copying it to private/assets/. * cleanup comments * Use app.current_request.raw_body.decode instead of app.current_request.json_body in order to work around a bug in Chalice whereby it returns None for json_body. Reference: https://stackoverflow.com/questions/52789943/cannot-access-the-request-json-body-when-using-chalice * append a unique id to image files uploaded to s3 so there are no conflicts between multiple threads running this concurrency test * Handle the HTTP 409 and 500 errors that happen when tests don't clean up properly. * add cost information * minor edits * minor edits * minor edits * minor edits * fix bug detection silent videos * bump up the python version * bump up the python version * Rek detect text in video support (#158) * rek text detection functionality * bug fixes for player markers and readdition of accidentally deleted code for text detection * fix string operation to determine file type * get input video from ProxyEncode (#168) * get input video from ProxyEncode * add new region support for Rekognition (#163) * allow users to upload videos with formats supported by mediaconvert (#169) * get input video from ProxyEncode * add new region support for Rekognition * allow users to upload videos with formats supported by mediaconvert (#164) * allow users to upload videos with formats supported by mediaconvert * Allow users to upload webm files. * fix bug with determining key to proxy encode mp4 * fix bug with determining key to proxy encode mp4 (#170) * get input video from ProxyEncode * add new region support for Rekognition * allow users to upload videos with formats supported by mediaconvert * Allow users to upload webm files. * fix bug with determining key to proxy encode mp4 * Disable versioning on dataplane bucket (#171) * Disable versioning on dataplane bucket because so that bucket can be removed more easily * minor edit * v0.1.6 release (#172) Three new operators: * Text in Video: words are searchable and shown under the ML Vision tab in the GUI * MediaInfo: codec info and other file metadata is searchable and shown in the GUI under the video player * Transcode: MIE leverages MediaConvert to support many more video and image formats including Flash, Quicktime, MXF, and MKV. See https://docs.aws.amazon.com/mediaconvert/latest/ug/reference-codecs-containers.html for a full list of supported video formats. Cost * Reduced cost by deploying the free tier for Elasticsearch * Pricing information for MIE resources is now included in README.md (https://github.com/awslabs/aws-media-insights-engine/blob/master/README.md) Documentation * The Developer guide is now included in IMPLEMENTATION_GUIDE.md (https://github.com/awslabs/aws-media-insights-engine/blob/master/IMPLEMENTATION_GUIDE.md) Security * Subresource integrity (SRI) checks ensure the validity of GUI assets * GUI prevents users form uploading unsupported file types (such as .exe and .zip) * If users upload invalid media files then those files will be removed by the Mediainfo operator. Bug fixes * Bounding boxes no longer appear outside the video player * Videos without sound or dialog no longer produce a workflow error * support for rerunning analysis on an existing asset (#175) * support for rerunning analysis on an existing asset * bug fix in webapp code * fix formatting issues after merge and update status to be polled by wf id * Add gitter chat info (#182) * Bumps [jquery](https://github.com/jquery/jquery) from 1.12.4 to 3.4.1. - [Release notes](https://github.com/jquery/jquery/releases) - [Commits](jquery/jquery@1.12.4...3.4.1) * add gitter channel info * Bumps [jquery](https://github.com/jquery/jquery) from 1.12.4 to 3.4.1. (#181) - [Release notes](https://github.com/jquery/jquery/releases) - [Commits](jquery/jquery@1.12.4...3.4.1) * Fix mediainfo (#180) * remove VersioningConfiguration on S3 bucket since that makes it much harder for AWS account owners to delete the bucket. * MediaInfo version 19.09 works but 20.03 does not. Use to 19.09 instead of latest. * update one-click deploy links for release version 0.1.7 * v0.1.7 release (#185) Highlights of this release: New Features: • The analysis view in the GUI includes a new link to “Perform Additional Analysis”. This link takes you to the upload page where you can run a different workflow configuration without uploading the video again. The resulting analysis data will be saved using the same asset id. Documentation: • Users are encouraged to join the MIE public chat forum on Gitter. This forum was created to foster communication between MIE users external to AWS. Bug fixes • MediaInfo released a new version (20.03) last week which broke the existing MediaInfo operator in MIE. As a temporary workaround this MIE release is configured to use the previous version of MediaInfo (v19.09). * testing buildspec * version bump python version in buildspec * remove unneeded quotes from build command * Change distribution bucket instructions (#189) Previously, the instruction was to created a distribution bucket named $DIST_OUTPUT_BUCKET-$REGION, but now in `deployment/build-s3-dist.sh` it's expected to be just $DIST_OUTPUT_BUCKET. * Init of build pipeline (#193) * working build pipeline * fix testing spec filename * persist build user * Add logo (#194) The clapperboard, representing *multimedia*, is centered inside a crosshair, representing *under extreme scrutiny*. This symbol is available from [nounproject](https://thenounproject.com/icon/1815092/). The font is Engineering Plot, https://www.dafont.com/engineering-plot.font which conveys the scaffolding nature of MIE. * Update README.md (#197) Improve instructions in the README: * fix references to old MediaInsightsEngine repository name * use docker port forwarding to enable developers to see the result of npm run serve on their local machine * Update media-insights-stack.yaml (#198) fix PolicyName typo * Prevent duplication of this.entities (#201) If user switches from Entities to KeyPhrases tab and back, this.entities doubles in size. To prevent this, we can employ the same method of clearing memory that is used in ComprehendKeyPhrases * Avoid linking to step functions for queued workflows because that link will break since the step function doesn't exist yet. (#210) * Added Cognito Identity Pool ID to the output of CF (#211) Add IDENTITY_POOL_ID to stack outputs in order to make it easier for users to find the values they will need for the `webapp/public/runtimeConfig.json` file when trying to run the webapp locally on their laptop. * change logo. The MIE team agreed to use the 3d black and white logo w… (#200) * change logo. The MIE team agreed to use the 3d black and white logo without a slogan. * move logo files to doc/images * Update gui readme (#202) * add instructions for creating new accounts for the GUI and remove out-of-date instructions for running the webapp. * Add quantitative cost info * fix type-o * add cursor usage info * document limitations * update 3rd party licenses to include every package listed in package.json * remove local dist and package files after build * remove license file form MIE lambda helper. This was left over from when the lambda helper used to be in its own repo * Remove reference to old MediaInsightsEngineLambdaHelper repo. It used to be managed in a different repo but now it's part of this repo. * Video segment detection / v0.1.8 one click links (#215) * working segment detection v1 * working segment detection w/ api changes * added end scene pause functionality and pagination to scene detection tables * fix webapp deploy bug * reformat readme for simpler installation * updated readme with instructions for installtion * remove values from runtimeConfig and set sriplugin to true * Added ListBucket to Dataplane bucket policy for better debugging and minor documentation correction (#235) Updated the Dataplane API Handler's Role policy to include a ListBucket action on the Dataplane S3 bucket. This is done so that the developer gets a NoSuchKey error when accessing an invalid S3 key instead of getting AccessDenied. The incorrect message makes it hard to debug especially when all required permissions for execution of the Lambda exist. Updated the path under Implementation guide to reflect the correct path when exporting the MIE_ACCESS_TOKEN. Currently: $MIE_DEVELOPMENT_HOME/tests/getAccessToken.py. Proposed change: $MIE_DEVELOPMENT_HOME/source/tests/getAccessToken.py Added .vscode/ to gitignore as a QOL improvement for VSCode users. * init commit media insights app * add missing consumer dir, add one click links, point template to public bucket * temp commit to merge current webapp with isolated frontend repo * merge bugfixes + rename workflow to video workflow * update allowed version * fix merge issue with technical cues / shots * reflect the new build script name in comments * add in image workflow * fix template validation error * use virtual-hosted style s3 paths * fix thumbnail path for images assets * initial commit * initial commit * initial commit * Add build from scratch instructions. Add implementation guide * add images for implementation guide * minor update * minor update * minor update * minor update * minor update * minor update * minor update * minor update * minor update * minor update * minor update * Fix text detection in images * Fix text detection in images * Fix download button for celebrities and moderation * Add the Operator name for "entities" and "key_phrases" to the fields indexed by Elasticsearch so that users can filter search queries using those operator names like they can for all the other operators. * minor update * minor update * fix create-stack command * enable text detection by default * enable text detection when user selects "select all" * Enable text in images by default * update one-click deploy links * Enable text in images by default * Enable text in images by default * Enable text in images by default * make template easier to copy and paste * In installation instructions, avoid checking out a branch. Just use master when building. * fix type-o in virtual-hosted-style s3 path * fix type-o in virtual-hosted-style s3 path * Move source code compilation instructions to implementation guide. * Move source code compilation instructions to implementation guide. * Move source code compilation instructions to implementation guide. * Move source code compilation instructions to implementation guide. * Move source code compilation instructions to implementation guide. * Move source code compilation instructions to implementation guide. * Move source code compilation instructions to implementation guide. * fix type-o in build instructions Co-authored-by: James Siri <22601145+jamesiri@users.noreply.github.com> Co-authored-by: Brandon Dold <brandold@amazon.com> Co-authored-by: Burkleaux <burkleaa@amazon.com> Co-authored-by: aburkleaux-amazon <33331299+aburkleaux-amazon@users.noreply.github.com> Co-authored-by: Gustavo Veloso <gjmveloso@users.noreply.github.com> Co-authored-by: Tom Gilman <tom@gilman5.com> Co-authored-by: Brandon Dold <46355297+brandold@users.noreply.github.com> Co-authored-by: joan <morjoan@amazon.com> Co-authored-by: Tulio Casagrande <tuliocasagrande@gmail.com> Co-authored-by: brand161 <brandondold@gmail.com> Co-authored-by: Anton <62160100+antonostrovsky@users.noreply.github.com>
awslabs · Oct 7, 2020 · 74a2a51 · 74a2a51
1 parent b08f50b
commit 74a2a51
Show file tree

Hide file tree

Showing 67 changed files with 10,004 additions and 9 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,6 @@
+*Issue #, if available:*
+
+*Description of changes:*
+
+
+By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
diff --git a/IMPLEMENTATION_GUIDE.md b/IMPLEMENTATION_GUIDE.md
diff --git a/README.md b/README.md
@@ -1,17 +1,63 @@
-## My Project
+# MEDIA INSIGHTS APPLICATION
 
-TODO: Fill this README out!
+This application, designed to be a reference application for the [Media Insights Engine](https://github.com/awslabs/aws-media-insights-engine) (MIE), catalogs videos and images with data generated by AWS AI services for computer vision and speech detection. A graphical user interface (GUI) enables users to search through the catalog to find videos or images containing certain content and to analyze what the cataloged data looks like for selected files.
 
-Be sure to:
+![](doc/images/analysis_view.png)
 
-* Change the title in this README
-* Edit your repository description on GitHub
+# INSTALLATION
 
-## Security
+The following Cloudformation templates will deploy the Media Insights front-end application with a prebuilt version of the most recent MIE release.  
 
-See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
+Region| Launch
+------|-----
+US East (N. Virginia) | [![Launch in us-east-1](doc/images/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=mie&templateURL=https://rodeolabz-us-east-1.s3.amazonaws.com/content-analysis-solution/v1.0.0/cf/aws-content-analysis-deploy-mie.template)
+US West (Oregon) | [![Launch in us-west-2](doc/images/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/new?stackName=mie&templateURL=https://rodeolabz-us-west-2.s3.amazonaws.com/content-analysis-solution/v1.0.0/cf/aws-content-analysis-deploy-mie.template)
 
-## License
+For more installation options, see the [Implementation Guide](IMPLEMENTATION_GUIDE.md).
 
-This project is licensed under the Apache-2.0 License.
+# Analysis Workflow
 
+After uploading a video or image in the GUI, the application runs a workflow in MIE that extracts insights using a variety of media analysis services on AWS and stores them in a search engine for easy exploration. The following flow diagram illustrates this workflow:
+
+<img src="doc/images/mie_workflow.png" width=600>
+
+This application includes the following features:
+
+* Proxy encode of videos and separation of video and audio tracks using **AWS Elemental MediaConvert**. 
+* Object and activity detection in images and video using **Amazon Rekognition**. 
+* Celebrity detection in images and video using **Amazon Rekognition**
+* Face search from a collection of known faces in images and video using **Amazon Rekognition**
+* Facial analysis to detect facial features and faces in images and videos to determine things like happiness, age range, eyes open, glasses, facial hair, etc. In video, you can also measure how these things change over time, such as constructing a timeline of the emotions expressed by an actor.  From **Amazon Rekognition**.
+* Unsafe content detection using **Amazon Rekognition**. Identify potentially unsafe or inappropriate content across both image and video assets.
+* Detect text in videos and images using **Amazon Rekognition**.
+* Video segment detection using **Amazon Rekognition**. Identify black frames, color bars, end credits, and scene changes.
+* Identify start, end, and duration of each unique shot in your videos using **Amazon Rekognition.** 
+* Convert speech to text from audio and video assets using **Amazon Transcribe**.
+* Convert text from one language to another using **Amazon Translate**.
+* Identify entities in text using **Amazon Comprehend**. 
+* Identify key phrases in text using **Amazon Comprehend**
+
+Users can enable or disable operators in the upload view shown below:
+
+![](doc/images/upload_view.png)
+
+# Search Capabilities:
+
+The search field in the Collection view searches the full media content database in Elasticsearch. Everything you see in the analysis page is searchable. Even data that is excluded by the threshold you set in the Confidence slider is searchable. Search queries must use valid Lucene syntax.
+
+Here are some sample searches:
+
+* Since Content Moderation returns a "Violence" label when it detects violence in a video, you can search for any video containing violence simply with: `Violence`
+* Search for videos containing violence with a 80% confidence threshold: `Violence AND Confidence:>80` 
+* The previous queries may match videos whose transcript contains the word "Violence". You can restrict your search to only Content Moderation results, like this: `Operator:content_moderation AND (Name:Violence AND Confidence:>80)`
+* To search for Violence results in Content Moderation and guns or weapons identified by Label Detection, try this: `(Operator:content_moderation AND Name:Violence AND Confidence:>80) OR (Operator:label_detection AND (Name:Gun OR Name:Weapon))`  
+* You can search for phrases in Comprehend results like this, `PhraseText:"some deep water" AND Confidence:>80`
+* To see the full set of attributes that you can search for, click the Analytics menu item and search for "*" in the Discover tab of Kibana.
+
+# Developers
+
+Join our Gitter chat at [https://gitter.im/awslabs/aws-media-insights-engine](https://gitter.im/awslabs/aws-media-insights-engine). This public chat forum was created to foster communication between MIE developers worldwide.
+
+[![Gitter chat](https://badges.gitter.im/gitterHQ/gitter.png)](https://gitter.im/awslabs/aws-media-insights-engine)
+
+For instructions on how to build and deploy MIE (the framework) and the Media Insights front-end application from source code, read the  [Implementation Guide](IMPLEMENTATION_GUIDE.md).
diff --git a/babel.config.js b/babel.config.js
@@ -0,0 +1,5 @@
+module.exports = {
+  presets: [
+    '@vue/app'
+  ]
+}
diff --git a/cloudformation/aws-content-analysis-auth.yaml b/cloudformation/aws-content-analysis-auth.yaml
@@ -0,0 +1,292 @@
+AWSTemplateFormatVersion: "2010-09-09"
+Description: AWS Content Analysis - Deploys the AWS Content Analysis Application Cognito Infrastructure
+
+Parameters:
+  AdminEmail:
+    Description: Email address of the Content Analysis Administrator
+    Type: String
+  WorkflowAPIRestID:
+    Description: REST API ID of the MIE Workflow API
+    Type: String
+  DataplaneAPIRestID:
+    Description: REST API ID of the MIE Dataplane API
+    Type: String
+  ElasticDomainArn:
+    Description: ARN of the Content Analysis ES Domain
+    Type: String
+  DataplaneBucket: 
+    Description: Name of the MIE dataplane bucket
+    Type: String
+
+Resources:
+  ContentAnalysisUserPool:
+    Type: AWS::Cognito::UserPool
+    Properties:
+      AdminCreateUserConfig:
+        AllowAdminCreateUserOnly: True
+        InviteMessageTemplate:
+          EmailMessage: !Join ["", [
+            "Your username is {username} and temporary password is {####}<br>Stack Name: ",
+            Ref: "AWS::StackName",
+            "<br>Stack Overview:<br>",
+            "https://",
+            Ref: "AWS::Region",
+            ".console.aws.amazon.com/cloudformation/home?region=",
+            Ref: "AWS::Region",
+            "#/stacks/stackinfo?stackId=",
+            Ref: "AWS::StackId"
+          ]]
+          EmailSubject: "Welcome to AWS Content Analysis"
+      EmailConfiguration:
+        EmailSendingAccount: 'COGNITO_DEFAULT'
+      AutoVerifiedAttributes: ['email']
+
+  ContentAnalysisWebAppClient:
+    Type: AWS::Cognito::UserPoolClient
+    Properties:
+      UserPoolId: !Ref ContentAnalysisUserPool
+
+    # Service - cognito / security infrastructure
+
+    # Super hacky lambda for formatting cognito role mapping since cognito is severely lacking in CF support
+    # https://forums.aws.amazon.com/message.jspa?messageID=790437#790437
+    # https://stackoverflow.com/questions/53131052/aws-cloudformation-can-not-create-stack-when-awscognitoidentitypoolroleattac
+
+  CognitoRoleMappingTransformer:
+      Type: AWS::Lambda::Function
+      Properties:
+        Code:
+          ZipFile: |
+            import json
+            import cfnresponse
+
+
+            def handler(event, context):
+                print("Event: %s" % json.dumps(event))
+                resourceProperties = event["ResourceProperties"]
+                responseData = {
+                    "RoleMapping": {
+                        resourceProperties["IdentityProvider"]: {
+                            "Type": resourceProperties["Type"]
+                        }
+                    }
+                }
+                if resourceProperties["AmbiguousRoleResolution"]:
+                    responseData["RoleMapping"][resourceProperties["IdentityProvider"]]["AmbiguousRoleResolution"] = \
+                    resourceProperties["AmbiguousRoleResolution"]
+
+                print(responseData)
+                cfnresponse.send(event, context, cfnresponse.SUCCESS, responseData)
+        Handler: !Join
+          - ''
+          - - index
+            - .handler
+        Role: !GetAtt CognitoRoleMapperLambdaExecutionRole.Arn
+        Runtime: python3.7
+        Timeout: 30
+
+  CognitoRoleMapperLambdaExecutionRole:
+      Type: 'AWS::IAM::Role'
+      Properties:
+        AssumeRolePolicyDocument:
+          Version: 2012-10-17
+          Statement:
+            - Effect: Allow
+              Principal:
+                Service:
+                  - lambda.amazonaws.com
+              Action:
+                - 'sts:AssumeRole'
+        Path: /
+        Policies:
+          - PolicyName: root
+            PolicyDocument:
+              Version: 2012-10-17
+              Statement:
+                - Effect: Allow
+                  Action:
+                    - 'logs:CreateLogGroup'
+                    - 'logs:CreateLogStream'
+                    - 'logs:PutLogEvents'
+                  Resource: 'arn:aws:logs:*:*:*'
+
+# TODO: Do we even need this?
+#  ContentAnalysisCognitoDomain:
+#    Type: AWS::Cognito::UserPoolDomain
+#    Properties:
+#      Domain: !Ref # TODO: Figure out what to do here
+#      UserPoolId: !Ref ContentAnalysisUserPool
+
+  ContentAnalysisIdentityPool:
+    Type: AWS::Cognito::IdentityPool
+    Properties:
+      AllowUnauthenticatedIdentities: False
+      CognitoIdentityProviders:
+        - ClientId: !Ref ContentAnalysisWebAppClient
+          ProviderName: !GetAtt ContentAnalysisUserPool.ProviderName
+
+  # More hacky cfn for getting the role mapping
+  TransformedRoleMapping:
+    Type: Custom::TransformedRoleMapping
+    Properties:
+      ServiceToken: !GetAtt CognitoRoleMappingTransformer.Arn
+      Type: Token
+      AmbiguousRoleResolution: Deny
+      IdentityProvider:
+        'Fn::Join':
+          - ':'
+          - - 'Fn::GetAtt':
+                - ContentAnalysisUserPool
+                - ProviderName
+            - Ref: ContentAnalysisWebAppClient
+
+  CognitoStandardAuthDefaultRole:
+    Type: "AWS::IAM::Role"
+    Properties:
+      AssumeRolePolicyDocument:
+        Version: "2012-10-17"
+        Statement:
+          - Effect: "Allow"
+            Principal:
+              Federated: "cognito-identity.amazonaws.com"
+            Action:
+              - "sts:AssumeRoleWithWebIdentity"
+            Condition:
+              StringEquals:
+                "cognito-identity.amazonaws.com:aud": !Ref ContentAnalysisIdentityPool
+              "ForAnyValue:StringEquals":
+                "cognito-identity.amazonaws.com:amr": authenticated
+      Policies:
+        - PolicyName: !Sub "${AWS::StackName}-AuthNoGroup"
+          PolicyDocument:
+            Version: "2012-10-17"
+            Statement:
+              - Action: "*"
+                Resource: "*"
+                Effect: "Deny"
+
+  CognitoStandardUnauthDefaultRole:
+    Type: "AWS::IAM::Role"
+    Properties:
+      AssumeRolePolicyDocument:
+        Version: "2012-10-17"
+        Statement:
+          - Effect: "Allow"
+            Principal:
+              Federated: "cognito-identity.amazonaws.com"
+            Action:
+              - "sts:AssumeRoleWithWebIdentity"
+            Condition:
+              StringEquals:
+                "cognito-identity.amazonaws.com:aud": !Ref ContentAnalysisIdentityPool
+              "ForAnyValue:StringEquals":
+                "cognito-identity.amazonaws.com:amr": unauthenticated
+
+  ContentAnalysisIdentityPoolRoleMapping:
+    Type: AWS::Cognito::IdentityPoolRoleAttachment
+    Properties:
+      IdentityPoolId: !Ref ContentAnalysisIdentityPool
+      RoleMappings: !GetAtt TransformedRoleMapping.RoleMapping
+      Roles:
+        authenticated: !GetAtt CognitoStandardAuthDefaultRole.Arn
+        unauthenticated: !GetAtt CognitoStandardUnauthDefaultRole.Arn
+
+  ContentAnalysisAdminGroup:
+    Type: AWS::Cognito::UserPoolGroup
+    Properties:
+      Description: 'User group for AWS Content Analysis Admins'
+      RoleArn: !GetAtt ContentAnalysisAdminRole.Arn
+      UserPoolId: !Ref ContentAnalysisUserPool
+      GroupName: !Sub "${AWS::StackName}-Admins"
+
+  ContentAnalysisAdminAccount:
+    Type: AWS::Cognito::UserPoolUser
+    Properties:
+      DesiredDeliveryMediums:
+        - EMAIL
+      UserAttributes: [{"Name": "email", "Value": !Ref AdminEmail}]
+      Username: !Ref AdminEmail
+      UserPoolId: !Ref ContentAnalysisUserPool
+
+  # TODO: Need to add S3 put access to dataplane bucket on public/upload/*
+  ContentAnalysisAdminRole:
+    Type: "AWS::IAM::Role"
+    Properties:
+      AssumeRolePolicyDocument:
+        Version: "2012-10-17"
+        Statement:
+          - Effect: "Allow"
+            Principal:
+              Federated: "cognito-identity.amazonaws.com"
+            Action:
+              - "sts:AssumeRoleWithWebIdentity"
+            Condition:
+              StringEquals:
+                "cognito-identity.amazonaws.com:aud": !Ref ContentAnalysisIdentityPool
+              "ForAnyValue:StringEquals":
+                "cognito-identity.amazonaws.com:amr": authenticated
+      Policies:
+        - PolicyName:  !Sub "${AWS::StackName}-AdminPolicy"
+          PolicyDocument: !Sub
+            - |-
+              {
+                "Version": "2012-10-17",
+                "Statement": [
+                  {
+                    "Action": [
+                      "execute-api:Invoke"
+                    ],
+                    "Effect": "Allow",
+                    "Resource": ["arn:aws:execute-api:${region}:${account}:${wfapi}/*", "arn:aws:execute-api:${region}:${account}:${dataapi}/*"]
+                  },
+                  {
+                    "Action": [
+                      "s3:PutObject"
+                    ],
+                    "Effect": "Allow",
+                    "Resource": [
+                      "arn:aws:s3:::${dataplanebucket}/public/*"
+                    ]
+                  },
+                  {
+                    "Action": [
+                      "s3:ListBucket"
+                    ],
+                    "Effect": "Allow",
+                    "Resource": "arn:aws:s3:::${dataplanebucket}"
+                  },
+                  {
+                    "Action": [
+                      "es:*"
+                    ],
+                    "Effect": "Allow",
+                    "Resource": "${esdomain}/*"
+                  }
+                ]
+              }
+            - {
+              region: !Ref "AWS::Region",
+              account: !Ref "AWS::AccountId",
+              wfapi: !Ref WorkflowAPIRestID,
+              dataapi: !Ref DataplaneAPIRestID,
+              esdomain: !Ref ElasticDomainArn,
+              dataplanebucket: !Ref DataplaneBucket
+            }
+
+  AddAdminUserToAdminGroup:
+    DependsOn: ContentAnalysisAdminAccount
+    Type: AWS::Cognito::UserPoolUserToGroupAttachment
+    Properties:
+      GroupName: !Ref ContentAnalysisAdminGroup
+      Username: !Ref AdminEmail
+      UserPoolId: !Ref ContentAnalysisUserPool
+
+Outputs:
+  AdminRoleArn:
+    Value: !GetAtt ContentAnalysisAdminRole.Arn
+  UserPoolId:
+    Value: !Ref ContentAnalysisUserPool
+  IdentityPoolId:
+    Value: !Ref ContentAnalysisIdentityPool
+  UserPoolClientId:
+    Value: !Ref ContentAnalysisWebAppClient