Skip to content

awslabs/aws-greengrass-labs-certificate-rotator

AWS Greengrass Labs Certificate Rotator

Device certificate and private key rotation is a security best practice defined in the IoT Lens for the AWS Well-Architected Framework. At the time of writing however, AWS IoT does not offer a device certificate rotation service or feature. It's left to application builders to implement both the device software and the cloud backend to achieve device certificate rotation.

To guide you in this implementation, AWS offers a device certificate rotation blog, an IoT Jumpstart workshop and the Connected Device Framework (CDF) Certificate Vendor module. These are documented rotation procedures and/or partial implementations.

In general, it's challenging to offer a full end-to-end device certificate and private key rotation reference implementation because the device software is heavily dependent on the device hardware. In particular, certificate and private key storage and APIs are strongly influenced by the hardware and the Hardware Abstraction layer (HAL). However, AWS IoT Greengrass presents an opportunity to build a reference implementation because it standardizes the certificate and private key storage via the Greengrass Core software installation configuration. The location of the certificate and private key are defined by the certificateFilePath and privateKeyPath configuration parameters. These may be defined as files on disk or as PKCS#11 objects in a Hardware Security Module (HSM).

Consequently this repository delivers a full end-to-end device certificate and private key rotation reference implementation for Greengrass core devices. A Greengrass component named aws.greengrass.labs.CertificateRotator delivers the device part of the contract and an AWS Cloud Development Kit (CDK) stack delivers the companion cloud backend. The component supports credentials stored as files or stored in an HSM. The cloud backend supports certificates issued by either AWS IoT Core or by AWS Private Certificate Authority (CA). The cloud backend can also be adopted by non-Greengrass devices, with the device software developed to match the functionality of the Greengrass component.

Table of Contents

Repository Contents

Item Description
/artifacts Greengrass V2 component artifacts that run on the Greengrass edge runtime.
/backend CDK Typescript application for the cloud backend.
/cicd CDK Typescript application for the CodePipeline CI/CD pipeline.
/images Images for README files.
/libs Python libraries shared by Python scripts.
/robot Robot Framework integration tests.
/tests Pytest unit tests.
deploy_component_version.py Deploys a component version to a Thing group of Greengrass core devices.
gdk_build.py Custom build script for the Greengrass Development Kit (GDK) - Command Line Interface.
gdk-config.json Configuration for the Greengrass Development Kit (GDK) - Command Line Interface.
recipe.yaml Greengrass V2 component recipe template.

System Design

Architecture

An overview of the system architecture is presented below. The solution consists of the Greengrass Certificate Rotator component and a cloud backend that is principally comprised of three Lambda functions, three AWS IoT Core rules, an AWS IoT custom job template and a Simple Notification Service (SNS) topic. Certificates are issued by either AWS IoT Core or by AWS Private CA.

The sequence of MQTT message topics for a successful certificate and private key rotation is also shown.

certificate-rotation-architecture

As can be seen, the Certificate Rotator component and backend make use of both AWS IoT Jobs and custom MQTT topics. The cloud application initiates a certificate and private key rotation by creating an AWS IoT custom job using the custom job template. This creates a certificate rotation job that targets one or more Things. Trigger conditions and business logic for the job creation are left to the application developer. In other words, this solution provides the means of rotating a device certificate and private key without dictating when or why it should be done.

With the AWS IoT job bookending the process, custom topics are used to create and commit a new device certificate. The SNS topic is used to notify subscribers of any certificate rotation failures.

Message Sequence

With reference to the architecture diagram:

Topic Description
$aws/things/thingName/jobs/notify-next The Certificate Rotator component is notified of a new certificate rotation job.
$aws/things/thingName/jobs/jobId/update The component updates the job execution status to IN_PROGRESS.
$aws/things/thingName/jobs/jobId/update/accepted The cloud notifies the component that the job execution update is accepted. The Certificate Rotator component generates a new private key and creates a Certificate Signing Request (CSR) from it.
awslabs/things/thingName/certificate/create The component sends the CSR to the cloud backend, along with the job ID so that strict chain of custody can be maintained. The create-certificate Lambda validates the creation request and creates the new certificate, using either AWS IoT or AWS Private CA. It attaches IoT policies to the new certificate and attaches the new certificate to the Thing.
awslabs/things/thingName/certificate/create/accepted The cloud backend returns the new certificate to the Certificate Rotator component. The component backs up the old certificate, installs the new certificate on disk or in the HSM, and restarts the Greengrass service so that it will attempt to connect using the new certificate and private key.
$aws/things/thingName/jobs/$next/get Upon restarting, the component attempts to get the latest job context.
$aws/things/thingName/jobs/jobId/get/accepted If the new certificate and private key are good, the job execution status is returned by the cloud and the status is IN_PROGRESS. (Should the component fail to receive this message, rollback would begin.)
awslabs/things/thingName/certificate/commit The component asks the cloud backend to attempt to commit to the new certificate. The request includes the job ID so that strict chain of custody can be maintained. The commit-certificate Lambda validates the request and verifies that the principal used to make the MQTT connection is the new certificate.
awslabs/things/thingName/certificate/commit/accepted The cloud backend notifies the component that the new certificate is good and can be committed. The component deletes the old certificate and private key.
$aws/things/thingName/jobs/jobId/update The component updates the job executon status to SUCCEEDED. The job-execution-terminal Lambda deletes the old certificate from the AWS ioT Core registry.
$aws/things/thingName/jobs/jobId/update/accepted The cloud notifies the component that the job is completed.

Should any errors occur, the job execution will terminate with state FAILED or TIMED_OUT causing the job-execution-terminal Lambda to issue a notification on the SNS topic. If an error occurs after the new certificate and private key are installed on Greengrass, a rollback will occur.

Component State Machine

The Certificate Rotator component implements a state machine as illustrated below. The blue numbers indicate the progression normally followed during a successful rotation.

certificate-rotation-component-fsm

Design Goals

This component and cloud backend take inspiration from the device certificate rotation blog, the IoT Jumpstart workshop and the Connected Device Framework (CDF) Certificate Vendor module. All three approaches have their merits, but this repository attempts to deliver an enhanced solution with the following design goals:

  1. Rotate both the private key and the device certificate.
  2. Use AWS IoT Jobs to encapsulate the rotation process and manage fleets at scale. This allows fleet operators to:
  3. Do not mandate AWS IoT Device Defender device certificate expiring check as the only rotation trigger. You should be free to choose an appropriate trigger to suit your own business needs.
  4. In addition to AWS IoT certificates, support AWS Private Certificate Authority (CA) as a Certificate Authority (CA) and to issue device certificates. This gives you the option to use your own CA and therefore control device certificate expiry dates. The AWS IoT Device Defender device certificate expiring check only makes sense as a trigger if you can control the expiry date.
  5. Have the cloud backend explicitly check what principal was used to authenticate the connection when the core device first connects with the new certificate. This is to guard against the device erroneously using its old certificate and mistakenly thinking it used the new one (a potentially very serious error).
  6. Have the cloud backend demand that the MQTT client ID match the Thing name, as per best practice. This guards against mismatch between things as job targets and MQTT client names.
  7. Likewise demand that each device has just one device certificate attached to it (in the AWS IoT Core registry) and each device has a unique device certificate.
  8. Maintain strict chain of custody, ensuring that certificate create and commit requests originate from a Thing that is part of an in-progress rotation job. This guards against new certificates being issued to bad actors.
  9. Be resilient to unexpected loss of connection or loss of messages at inconvenient times.
  10. Be resilient to power loss, service restart or device reboot during rotation.
  11. Support both Windows and Linux core devices.
  12. Support devices that use Hardware Security Modules (HSMs).
  13. Support the Certificate signing algorithms supported by AWS IoT.

YouTube Video

This video guides you in how to deploy and use this solution.

Watch the video

Requirements and Prerequisites

Greengrass Core Device

Platform

This component supports all platforms and architectures supported by Greengrass itself.

Python Requirements

This component requires python3, python3-venv and pip3 to be installed on the core device.

Hardware Security Module

If the Greengrass core device uses a Hardware Security Module (HSM), it must support the following PKCS#11 API operations in addition to the PKCS#11 API operation required by Greengrass:

  • C_CopyObject
  • C_DestroyObject
  • C_GenerateKeyPair

Edge Runtime

The Greengrass edge runtime needs to be installed to a suitable machine, virtual machine or EC2 instance. It must be installed as a system service.

Interpolation of component recipe variables must be enabled using the interpolateComponentConfiguration setting. This requires Nucleus 2.6.0 or later. This setting should be applied before deploying the aws.greengrass.labs.CertificateRotator component.

If you use AWS Private CA to issue certificates, the greengrassDataPlaneEndpoint setting should be set to iotdata.

Since Greengrass does not expose IoT Core connection status to components, this component uses QoS 0 to ensure timely delivery (or failure) of messages. Consequently it is recommend to leave the keepQos0WhenOffline setting at the default of disabled so that QoS 0 mesages are not spooled.

Greengrass Cloud Services

Core Device Role

Assuming the bucket name in gdk-config.json is left unchanged, this component downloads artifacts from an S3 bucket named greengrass-certificate-rotator-REGION-ACCOUNT. Therefore your Greengrass core device role must allow the s3:GetObject permission for this bucket. For more information: https://docs.aws.amazon.com/greengrass/v2/developerguide/device-service-role.html#device-service-role-access-s3-bucket

Policy template to add to your device role (substituting correct values for ACCOUNT and REGION):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::greengrass-certificate-rotator-REGION-ACCOUNT/*"
    }
  ]
}

IoT Policy

The AWS IoT Policy for the Greengrass core device must grant the ability to publish, subscribe and receive from the AWS IoT job topics and the rotation custom topics. The full list of topics used by the Certificate Rotator component is available in recipe.yaml.

Developer Machine

AWS CLI

It may be necessary to upgrade your AWS CLI if you wish to use any greengrassv2 commands, as these are relatively recent additions to the CLI.

AWS CDK

The cloud backend is a Typescript CDK application. Follow the Getting started with the AWS SDK guide (for Typescript) to install CDK and bootstrap your environment.

Python

Most of the scripts in this repository are Python scripts. They are Python 3 scripts and hence python3 and pip3 are required.

Package dependencies can be resolved as follows:

pip3 install -r requirements.txt

Please consider to use a virtual environment.

Boto3 is included in the package dependencies and therefore your machine requires appropriate credentials.

GDK CLI

This component makes use of the Greengrass Development Kit (GDK) - Command Line Interface (CLI). This can be installed as follows:

pip3 install git+https://github.com/aws-greengrass/aws-greengrass-gdk-cli.git@v1.6.2

Getting Started

Please ensure that all Requirements and Prerequisites have been met before deploying the component or the cloud backend.

Cloud Backend

To build and deploy the cloud backend CDK application:

  1. Change into the backend subdirectory.
  2. Run npm install to install the required Node modules.
  3. Run npm run build to build the cloud backend.
  4. Run cdk deploy to synthesize and deploy the cloud backend.

Example execution:

cd backend
npm install
npm run build
cdk deploy

This example deploys the cloud backend with AWS IoT selected as the service for issuing certificates. To issue certificates using AWS Private CA, please refer to the cloud backend README.

Component

Build and Publish

To build and publish the Certificate Rotator Greengrass component:

  1. Set the AWS region in gdk-config.json.
  2. Run gdk component build to build the component.
  3. Run gdk component publish to create a component version in Greengrass cloud service, and upload artifacts to S3.

Example execution:

gdk component build
gdk component publish

Deploy

The component can be deployed using the console or using the AWS CLI in the normal way. Alternatively it can be deployed using the supplied deploy_component_version.py script. Example execution of the convenience script:

python3 deploy_component_version.py 1.0.0 MyCoreDevicesThingGroup

This deploys the new component version to a Greengrass core device thing group named MyCoreDevicesThingGroup.

Notifications

The cloud backend creates an SNS topic named AWSLabsCertificateRotatorNotification. Subscribe to this topic to receive notifications of any rotation job executions that conclude as FAILED or TIMED_OUT.

Certificate Rotation Jobs

The cloud backend creates an AWS IoT custom job template named AWSLabsCertificateRotator. This template can be used to create certificate rotation jobs.

The template has job rollout and job executions timeout configurations that should be sensible defaults for most fleets. It does not set any scheduling configuration because this is very domain dependent. It also does not set any job executions retry configuration nor any abort configuration. All of these configurations can be set or modified when a job is created from the template. Alternatively you can create a new custom template to suit your needs, so long as it has the same job document.

Job templates can be used to create certificate rotation jobs in several ways.

Console

The AWS IoT Console offers a flexible interface for creating jobs from the job template. It allows the user to conveniently modify the job configuration.

job-template-console

Fleet Hub

Fleet Hub offers a simple mechanism for creating jobs from job templates. However it does not presently allow the user to modify the job configuration.

job-template-fleet-hub

CLI

The AWS CLI offers the greatest flexibility and options. The following example illustrates a simple creation of a job from the template, without modifying the job configuration.

aws iot create-job --job-id certrot1 --targets arn:aws:iot:us-east-1:012345678901:thinggroup/GreengrassEC2DeviceFarm --target-selection SNAPSHOT --job-template-arn arn:aws:iot:us-east-1:012345678901:jobtemplate/AWSLabsCertificateRotator
{
    "jobArn": "arn:aws:iot:us-east-1::job/certrot1",
    "jobId": "certrot1"
}

Configuration

Component Configuration

The aws.greengrass.labs.CertificateRotator component supports just two configuration parameters: keyAlgorithm and signingAlgorithm.

The key algorithm is the algorithm the component will use for any new private key it generates when a rotation is performed. The supported algorithms match the values that can be used with the CreateCertificateFromCsr operation.

The signing algorithm is what the component will use to sign the Certificate Signing Request (CSR) during a rotation. The supported signing algorithms are the certificate signing algorithms supported by AWS IoT, with the exception of DSA_WITH_SHA256. Although AWS IoT supports certificates signed with this algorithm, the CreateCertificateFromCsr operation cannot use it.

Key Allowed Values Default
keyAlgorithm RSA-2048
RSA-3072
ECDSA-P2561
ECDSA-P3841
ECDSA-P5211,2
RSA-2048
signingAlgorithm SHA256WITHRSA
SHA384WITHRSA
SHA512WITHRSA
SHA256WITHRSAANDMGF1
SHA384WITHRSAANDMGF1
SHA512WITHRSAANDMGF1
ECDSA-WITH-SHA256
ECDSA-WITH-SHA384
ECDSA-WITH-SHA512
SHA256WITHRSA

1 - ECDSA keys currently cannot be used with Windows devices: awslabs/aws-c-io#260
2 - ECDSA-P521 keys currently cannot be used with HSMs and PKCS #11: awslabs/aws-c-io#591

Default values are defined in recipe.yaml.

The component validates configuration updates. If there is an attempt to merge invalid settings, the deployment will fail. The encryption family (RSA or EC) of the signing algorithm has to match that of the key algorithm.

If an HSM is used, the component can only use the subset of algorithms supported by the HSM.

Cloud Backend Configuration

For details on configuring the cloud backend, please refer to the cloud backend README.

Troubleshooting

Tips for troubleshooting rotation jobs that conclude as FAILED or TIMED_OUT.

Troubleshooting Tools

Core Device: Component Log File

Linux:

/greengrass/v2/logs/aws.greengrass.labs.CertificateRotator.log

Windows:

C:\greengrass\v2\logs\aws.greengrass.labs.CertificateRotator.log

Cloud Backend: Lambda CloudWatch Log Groups

/aws/lambda/AWSLabsCertificateRotatorCommitCertificate
/aws/lambda/AWSLabsCertificateRotatorCreateCertificate
/aws/lambda/AWSLabsCertificateRotatorJobExecutionTerminal

Common Errors

Interpolation Not Enabled

If the interpolateComponentConfiguration setting is not enabled, the component will not be able to subscribe to the necessary MQTT topics and will fail on startup.

The component log will report:

awsiot.greengrasscoreipc.model.UnauthorizedError

This will manifest as a certificate rotation job execution staying in the QUEUED state.

Job Execution Events Disabled

The cloud backend CDK application includes a Lambda that subscribes to jobs events, specifically job execution events. Job execution events are disabled by default in AWS IoT Core. The cloud backend CDK application enables these events when it's first deployed.

If job execution events are inadvertantly disabled, the AWSLabsCertificateRotatorJobExecutionTerminal Lambda will not run during certificate rotations. Therefore, the old certificate will not be deleted at the conclusion of a rotation job. The first rotation job after disabling the events will conclude as SUCCEEDED. However the problem will manifest as subsequent jobs concluding as FAILED within a few seconds of the job being created.

The component log will report that it received awslabs/things/thingName/certificate/create/rejected with an error message:

Pre-conditions not met

The Thing can be checked to see if it has more than one certificate attached. If so, the older certificate should be detached. And job events should be re-enabled as follows:

aws iot update-event-configurations --event-configurations "{\"JOB_EXECUTION\":{\"Enabled\": true}}"

Development and Testing

This solution is an extensible reference implementation. This section documents the code quality measures used during development, should you wish to modify or extend the solution.

Static Analysis

Static analysis is performed using Pylint. Example execution:

pylint artifacts backend/lambda libs tests *.py

Unit Tests

Unit tests are performed using pytest.

Example execution:

pytest --cov=.

With branch coverage reported:

pytest --cov=. --cov-branch

Producing an HTML coverage report in the htmlcov directory:

pytest --cov=. --cov-branch --cov-report=html

Producing a coverage report for just the on-device artifacts:

pytest --cov=artifacts --cov-branch

Security Scanning

Security scanning is performed using Bandit. Example execution:

bandit -r -v artifacts backend/lambda libs *.py

Automated Integration Tests

This repository includes an automated integration test suite built on top of Robot Framework. This can be run on demand from the command-line but it is also included as part of the CI/CD pipeline.

CI/CD Pipeline

This repository offers a CodePipeline CI/CD pipeline as a CDK application. This can be optionally deployed to the same account as the Greengrass core devices and the cloud backend. This pipeline is intended for use in DEV, TEST or NON-PROD environments to support testing and development of this component and associated backend.

This CI/CD pipeline automates the build and deployment of both the Certificate Rotator Greengrass component and the cloud backend CDK application. Additionally, it runs the automated integration test suite after deployment.

CDK Unit Tests

Both the cloud backend and the CI/CD pipeline CDK applications are supplied with unit tests. Example execution:

npm run tests

CDK Nag

Both the cloud backend and the CI/CD pipeline CDK applications use the AWS Solutions rules pack of CDK Nag to validate their stacks at the time of synthesis.

Test Fleet

The Greengrass EC2 Device Farm is a convenient way to bring up a disparate fleet of Greengrass devices. This component has been tested against the operating systems and machine architectures supported by the farm. Additionally, it has been tested against a Raspberry Pi 4 with Raspbian OS 10 (Buster) 32-bit (armv7l), with SoftHSMv2 for PKCS#11/HSM coverage.