Terraform Enterprise Workspace Reaper
The beauty of Infrastructure as Code with Terraform is that you can specify every piece of infrastructure you need built out. Along with that, the other benefit of Terraform is that you can then destroy all of that infrastructure once you are done with it as well. Some other solutions out there revolve around writing code to destroy individual instances of servers, and potentially other parts of the infrastructure that are deployed, but they don't always catch everything that is deployed. That is where the Workspace Reaper comes into play. It will destroy everything that is configured under a Terraform Workspace. All you have to do is define a variable "WORKSPACE_TTL".
Conceptually, a workspace can be considered a set of infrastructure, tracked by a Terraform state file, which also has many attributes associated with it. Some of these include: secure variables, VCS configurations and even role based access control. So not only do we get a secure environment for our variables, but also a secure location that manages the locking of the state as well.
Utilizing Terraform Enterprise (TFE) allows for direct API integration for Terraform, which allows a much more rich experience for interacting with Terraform in an automated way. It also allows for us to take advantage of the built in logic that TFE employs, and keep our logic rather simple in terms of how to deal with destroying workspaces.
Terraform Enterprise Advantages
This application takes advantage of a few different features that TFE provides by default.
- Checks to see if TFE has the workspace locked.
- Utilizes the logic of different states the workspace can be in.
- Makes decisions if a workspace was either applied recently or destroyed.
- Utilizes the tracking TFE does in terms of plans and applies.
- Scans through the workspace variables to find workspaces that have the variable set, and utilizes the value to make a determination of what to do next.
- Utilizes the role based access control with tokens to scope what workspaces are even exposed to this reaper bot.
This capability now allows teams to test out their infrastructure without worrying about leaving dangling infrastructure around, and it is as simple as setting a single variable on the workspace.
To extend this even further, one could have a Sentinel policy which checks to make sure the variable is present and is set to a predefined range. This then could ensure that no development infrastructure could be left around for an indeterminent amount of time.
This application is utilized to auto-destroy workspaces based on a TTL value being set.
The application is fully based on Lambda functions, and is automatically deployed via Terraform.
Configure your TFE Workspace
- Clone this Repo into your VCS
- Set the working directory to be
- Configure your TFE Variables
TFE_URL- URL of the Server Instance, e.g. https://app.terraform.io
TFE_ORG- The organization your workspaces are configured under
TFE_TOKEN- Either a User Token, or a Team Token
ui- true or false: Enables a web-ui to report on how many total destructions have occurred and details about the workspaces which were destroyed. This defaults to
false. The WebURL will be exposed as an output if it is set.
check_time- How often (in minutes) the reaper bot should run to check on workspaces. The default is set to
For workspaces you wish to destroy, you must set an Environment Variable of
WORKSPACE_TTL with an integer that is counted in minutes. This will allow the reaper bot to know how long you intend to keep the workspace around.
By doing a new apply to the workspace, it will reset the counter time, thus in effect extending the "lease" of the workspace.
- AWS - Lambda
- AWS - SQS
- AWS - DynamoDB
- AWS - CloudWatch
Two functions are deployed:
Both functions are in the same Python file (reapWorkspaces.py)
This process loops through the variables in the organization you have specified. It is setup to run every 5 minutes, from the time of the deployment of the Lambda function, unless the optional variable is changed.
It looks for a specific Key name of
Workspace_TTL, and an integer value specifying an amount of minutes to keep the workspace around.
For any workspace, which the variable is found in, the process then evaluates whether or not the last run was an apply or destroy. If it was an apply, it then compares the last execution time to the TTL. If the time exceeds the TTL, a message is submitted to the SQS Queue for further processing.
This process is triggered when a message is submitted to the queue from the FindWorkspacesToReap. It will utilize that message to process the workspace submitted for destruction. It will continue to loop through and process messages until the workspace is finally destroyed.
Simple Queue Service (SQS)
A single queue is deployed:
This queue is setup to accept messages, and for all messages, there is a delay which keeps the message from being processed. This allows for a limited amount of calls and a variable timing of calls to be made to Terraform Enterprise, based on different factors of planning and applying of workspaces.
A single table is created:
This table is utilized for storing details about the workspaces which were destroyed. There is also an item which tracks the amount of resources which have been destroyed.
A single CloudWatch event:
This event fires every 5 minutes to kick off the Lambda function FindWorkspacesToReap