This repository provides a pattern for simple generative AI deployments on AWS. It aims at keeping deployment easy, the architecture understandable and the used resources as low-maintenance and cost-effective as possible.
This is achieved by using foundation models that SageMaker Jumpstart provides, deploying them in a SageMaker Asynchronous Endpoint which can scale to zero and requires no additional operations effort. The use of CDK Pipelines ensures repeatability and traceability of deployment and configuration activities.
While we chose to use Stable Diffusion for text to image generation as the implemented example here, it should be easy to adjust the code to fit other models and use cases, too. Keep in mind that the Lambda function working with the successful output of the SageMaker inference needs to be adjusted if your use case differs. Currently, it is programmed to work with JSON output containing image pixel RGB arrays. If you would rather use real-time than asynchronous inference, the endpoint stack can be adjusted accordingly.
If you just want to test this solution or actually need a scale-to-zero Stable Diffusion 2.1 endpoint deployed in eu-central-1, you can skip configuration - that is the default.
Otherwise, you can modify the TOML formatted configuration file config/config.toml to your needs. All adjustable parameters are described there in the comments.
To successfully deploy this project, you need to have the following software installed on your workstation:
- Python version 3.8 or newer
- Python Poetry
- git
- git-remote-codecommit
- Node.js, current LTS version recommended
These instructions have been tested on Linux and Mac. In case you are using Windows, you may either use a bash terminal in the Windows Subsystem for Linux or adjust the commands for setting and using the environment variables.
- Set your environment to the AWS account you want to deploy into.
- Bootstrap your account for CDK in the region you intend to deploy to in case this has not been done yet.
- Clone this repository and change your working directory into it.
- Create a Python virtual environment, activate and install the dependencies into it with
poetry shell
andpoetry install
- Some environment variables need to be set according to your settings in config/config.toml. For this example, the default values for Stable Diffusion are used.
The
export REPOSITORY_NAME=StableDiffusionService export REPOSITORY_BRANCH=main export AWS_REGION=eu-central-1 export INITIAL_DEPLOY=yes
INITIAL_DEPLOY
environment variable takes care that the CDK is only working with the stacks and resources required for deploying the solution repository and a minimal pipeline. This eliminates the need of having Docker installed and running on the developer's machine as well as decreasing the runtime of CDK commands there. We leave the heavy lifting to the execution within the AWS CodePipeline CodeBuild steps later. - Deploy the CloudFormation Stack creating the CodeCommit repository by executing
cdk deploy ${REPOSITORY_NAME}RepositoryStack
- Change the remote origin of this project to the new CodeCommit repository via
git remote set-url origin codecommit::${AWS_REGION}://${REPOSITORY_NAME}
- In case you modified config/config.toml,
git add
andgit commit
those changes. - Push the code to your new repository with
git push origin ${REPOSITORY_BRANCH}
- Deploy the Service Stack by executing
cdk deploy ${REPOSITORY_NAME}ServiceStack
The CDK Pipeline configured will start deploying all resources defined and self-update to contain the full application stack (as it does not have the INITIAL_DEPLOY environment variable set). From this point on, you initiate changes in the architecture by just committing to the repository and letting the pipeline take care of the rest.
Wait for the pipeline to be finished.
You can use util/generate_image.py to test the image generation. The file util/test_request.json in the same folder works with the StableDiffusion model configured in the standard config.toml.
Example
./util/generate_image.py --request-input-file util/test_request.json
With the SageMaker Asynchronous Endpoint scaled to zero, it will take a few minutes for an instance starting up to serve the request. You can check the current state of the endpoint in the SageMaker console Inference
submenu, Endpoints
item.
The generate_image.py
command will show you the endpoint name which you will find in the list. On its detail page in the Endpoint runtime settings
, you will first see the Desired instance count
increasing, then the Current instance count
when one is started.
In the Monitor
section, the View logs
link leads to the CloudWatch console with all log streams of the endpoint available for the full inference execution details.
After this step is finished, you can check the logs of the two Lambda functions deployed with the solution. The ExtractImageFunction
will execute and protocol the conversion of the RGB pixel array of the inference result to PNG files while the SaveMessageFunction
stores the execution info to S3.
Finally, you will see the generated images in the S3 bucket path the generate_image.py
output tells you.