This repository contains the code to deploy the needed infrastructure to carry out crowd-2d-skeleton labeling jobs. This architecture can also be used as a reference for how Amazon Ground Truth custom UIs can be deployed and used.
Below will describe what each component of the architecture is used for.
The Amazon S3 bucket will be used for:
- Storing the custom worker task template (also known as the custom UI template)
- Storing the crowd-2d-skeleton.js component code
- Storing the images for the labeling job
- Storing the manifest files for the labeling job
A bucket policy will be added to the bucket to restrict the created Origin Access Identity to only be able to access the crowd-2d-skeleton.js component code. All other objects in the bucket will remain private.
The Amazon CloudFront Distribution will be used to host the crowd-2d-skeleton.js JavaScript code which will be accessed via the custom labeling user UI. The Amazon CloudFront Distribution will be assigned an Origin Access Identity which will allow the Amazon CloudFront Distribution to access the crowd-2d-skeleton.js residing in the Amazon S3 bucket.
The pre-annotation lambda will process line items from the input manifest file before the manifest data is injected into the custom UI template.For more information on pre-annotation lambda functions see: Processing with AWS Lambda
The post-annotation lambda will process the labeling results after all labelers have finished labeling or the labeling job has expired. This lambda is responsible for formatting the data for the labeling job output results. For more information on post-annotation lambda functions see: Processing with AWS Lambda
This role is created to give the Amazon SageMaker Ground Truth labeling job the ability to invoke the lambda functions and to read the S3 objects (i.e. images, manifest files, and custom UI template) in the Amazon S3 Bucket.
The key components of the repository are listed below
.
├── cdk/
│ ├── ground_truth_templates <-- custom UI template
│ ├── libs
│ ├── post_annotation_lambda <-- Post-annotation code
│ ├── pre_annotation_lambda <-- Pre-annotation code
│ └── crowd_2d_skeletong_example_stack.py <-- CDK details of the stack
├── docs <-- images for the documentation
├── scripts <-- Post deployment scripts & example job launching scripts
└── app.py <-- CDK entry point
- Python 3.9+
- AWS Account
- AWS Cloud Development Kit (AWS CDK)
- AWS SageMaker Ground Truth Private Workforce
This project is set up like a standard Python project. The initialization
process also creates a virtualenv within this project, stored under the .venv
directory. To create the virtualenv it assumes that there is a python3
(or python
for Windows) executable in your path with access to the venv
package. If for any reason the automatic creation of the virtualenv fails,
you can create the virtualenv manually.
To manually create a virtualenv on MacOS and Linux:
$ python -m venv .venv
After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.
$ source .venv/bin/activate
If you are a Windows platform, you would activate the virtualenv like this:
% .venv\Scripts\activate.bat
Once the virtualenv is activated, you can install the required dependencies.
$ pip install -r requirements.txt
This repository assumes that you have used the AWS Cloud Development Kit (CDK) before. If this is your first CDK project, you may want to familiarize your self by reading the CDK documentation which can be found here: AWS Cloud Development Kit .
Simply run:
$ cdk synth
To deploy the CDK stack run:
$ cdk deploy
Not all deployment steps can be done in CDK. In our case, we need to update the HTML UI template to point the newly hosted JavaScript which was deployed in the previous step. To handle the post deployment steps the post deployment script should be run. Activate your python environment and run:
$ python scripts/post_deployment_script.py
Once the previous steps have completed, you can create labeling jobs using the
created infrastructure. For examples on how to do this programmatically,
see scripts/create_example_labeling_job.py
cdk ls
list all stacks in the appcdk synth
emits the synthesized CloudFormation templatecdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk docs
open CDK documentationcdk destroy
Destroy the stack
See CONTRIBUTING for more information.
Install the pre-commit hooks
pip install pre-commit
pre-commit install
This library is licensed under the MIT-0 License. See the LICENSE file.