This repo hosts the infrastructure-as-code (IaC) for my personal cloud deployment, which is split across AWS (in progress) and a private homelab (coming soon).
Do not skip any of these steps!
Update config.toml based on your domain and hosting needs.
foundation.public_domain and foundation.private_domain are the values most
likely to change for a fresh deployment. public_domain hosts the
control-plane URLs (Authentik, WebFinger, etc.); private_domain is reserved
for services that don't need to be reachable from the public internet. Look at
the Contents section for a full breakdown of what services you're
configuring.
Set up your default profile in ~/.aws/config and ~/.aws/credentials
We use the boto3 and aws_cdk Python packages to do all of our AWS API
interaction. Those packages should both use the standard AWS config and
environment variable conventions for configuration and secrets injection, so if
you're familiar with those, you can do something fancier than using the
default profile.
Run bin/bootstrap to complete all one-time setup steps.
This script will:
- ensure that
dotslashis installed on your system (which we use extensively to fetch all our dependency tools), and - interactively setup the persistent AWS resources that will be referenced by the deploy process.
You can perform these bootstrapping steps manually if you want more control, or need to rerun only a subset of them for some reason. See the Manual Bootstrapping section below for more details.
Run bin/test to run all tests in the repository.
These tests will, among other things, ensure that your config is setup correctly, and there are no obvious errors with the CDK IaC before you deploy.
Run this command to deploy the AWS infrastructure:
bin/cdk deploy --all --trace \
--require-approval never \
--concurrency="$(nproc --all)" \
--asset-build-concurrency="$(nproc --all)"You can modify those concurrency parameters or replace --all with the names of
specific stacks, as needed. --require-approval can be omitted if you aren't
running with any concurrency.
The Tailscale, Headscale, Headplane, and Vaultwarden OIDC applications, users,
and group memberships are provisioned automatically by Authentik blueprints in
assets/authentik/blueprints/, synced to each
Authentik task from S3 by an init container on every deploy. The only remaining
manual step is the Tailscale SaaS-side registration.
- Tailscale (SaaS-side)
- Create a new account with OIDC provider
- Email address: tailscale@example.com
- WebFinger URL (automatically populated to https://example.com/.well-known/webfinger)
- Which identity provider: Authentik
- Get OIDC issuer
- Client ID / Client Secret: use the values stored in the
authentik/oidc/tailscaleSecrets Manager entry - Prompts: consent
- Sign up with OIDC
- Authentik redirecting to Tailscale
- Continue
- Create a new account with OIDC provider
- Headplane
- Log in at
https://headplane.<public_domain>/adminusing the Headscale admin API key, which is written to Secrets Manager during the HeadscaleStack deploy:aws secretsmanager get-secret-value \ --secret-id headscale/admin-api-key \ --query SecretString --output text | jq -r .secret
- Log in at
- Vaultwarden
- Log in at
https://vaultwarden.<public_domain>using your SSO identity.
- Log in at
- Domain registrars
- After the hosted zone are created, copy the 4 NS records from Route53 into the registrar DNS config.
The headscale.exit_node.preauthkey_user config value names the Headscale
user that owns the Tailscale-side preauthkey for the exit node. The
preauthkey itself lives in Secrets Manager at headscale/exit-node/preauthkey
and is managed by an ExitNodePreauthkey Custom Resource Lambda that
regenerates the key when the stored value isn't a current preauthkey for
that user.
After changing preauthkey_user (or otherwise invalidating the user —
deleting them in Headplane, restoring Headscale state from a backup,
etc.) the Lambda will detect the orphan on the next deploy and rotate
the secret automatically. But the running ECS task caches the old
preauthkey from its TS_AUTHKEY env injection; if the task isn't
otherwise replaced by the deploy, force a new deployment so the fresh
secret value is read at container start:
aws ecs update-service \
--cluster <ExitNodeCluster name> \
--service <ExitNodeService name> \
--force-new-deploymentIf the EC2 host's /var/lib/tailscale state directory predates the user
rename, terminate the instance to wipe it — the ASG will replace it,
and the fresh container will register cleanly via the new preauthkey.
To clean up the orphaned old user, delete it manually in Headplane (Headscale's REST API has no idempotent user delete).
If you don't want to use the automated bootstrapping script, you can perform each of its steps yourself. To do so, you can either run the associated helper script for each step, or take matters into your own hands.
- Install
dotslashon your system.- Manual install
- Helper script
bash scripts/bootstrap/dotslash.sh
- Create a public hosted zone for your domain in AWS.
- AWS console
- Route 53 > Hosted Zones > Create hosted zone
- Fill in your domain, select "Public hosted zone", then "Create hosted zone"
- Helper script
bin/aws-create-hosted-zone DOMAIN
- AWS console
- Create persistent secrets used by AWS services.
- AWS console
- TODO
- Helper script
bin/aws-write-secret ecr-pullthroughcache/ghcr --template='{"username":"GITHUB_USERNAME"}' --key=accessToken # GitHub PAT with read:packages scopebin/aws-write-secret ecr-pullthroughcache/dockerhub --template='{"username":"DOCKERHUB_USERNAME"}' --key=accessToken # Docker Hub PATbin/aws-write-secret authentik/secret-key --template='{}' --key=secret --length=50 --exclude-punctuationbin/aws-write-secret authentik/bootstrap --template='{"email":"EMAIL"}' --key=passwordbin/aws-write-secret data/database --template='{"username":"USERNAME"}' --key=password(RDS master; read by theDataStackinit Lambda only — no service uses it)bin/aws-write-secret authentik/database --template='{"username":"authentik"}' --key=password --length=32 --exclude-punctuationbin/aws-write-secret headscale/database --template='{"username":"headscale"}' --key=password --length=32 --exclude-punctuationbin/aws-write-secret authentik/smtp --template='{"username":"USERNAME"}' --key=passwordbin/aws-write-secret authentik/oidc/tailscale --template='{"client_id":"CLIENT_ID"}' --key=client_secret(values from thetailscaleprovider registered in Authentik)bin/aws-write-secret authentik/oidc/headscale -(JSON blob:{"client_id":"...","client_secret":"..."}— blueprint seeds both into Authentik on first apply)bin/aws-write-secret authentik/oidc/headplane -(same shape as headscale)bin/aws-write-secret authentik/oidc/vaultwarden -(same shape as headscale)bin/aws-write-secret headscale/noise-private-key --template='{}' --key=secret --bytes=32bin/aws-write-secret headplane/cookie-secret --template='{}' --key=secret --bytes=32echo -n pending | bin/aws-write-secret headscale/admin-api-key --template='{}' --key=secret - # sentinel placeholder; replaced by HeadscaleStackbin/aws-write-secret vaultwarden/database --template='{"username":"vaultwarden"}' --key=password --length=32 --exclude-punctuationbin/aws-write-secret vaultwarden/admin-token --template='{}' --key=secret --length=64 --exclude-punctuationbin/aws-write-secret vaultwarden/smtp --template='{"username":"USERNAME"}' --key=passwordbin/aws-write-secret mail/postmaster-password --template='{}' --key=secret --length=32 --exclude-punctuationecho -n pending | bin/aws-write-secret mail/dkim-private-key --template='{}' --key=secret - # sentinel placeholder; the MailStack DKIM Custom Resource replaces it on first deploymail/ses-relay: create an IAM user withAmazonSESFullAccess, mint an access key, and derive the SES SMTP password (seederive_ses_smtp_passwordinscripts/bootstrap/aws_resources.py). Stored as{"username":"AKIA...","password":"<derived>"}.
- AWS console
- SES domain setup (also handled automatically by
bin/bootstrap):aws ses verify-domain-identity --domain DOMAIN(publish the returned token at_amazonses.DOMAINas a TXT record)aws ses verify-domain-dkim --domain DOMAIN(publish each of the three returned tokens as<token>._domainkey.DOMAINCNAME records pointing to<token>.dkim.amazonses.com)- SES production access (manual support ticket): "Service Quota Increase → SES → Production Access". Without it, SES only delivers to verified addresses.
All single-value secrets (passwords, tokens, keys) are stored as JSON objects
with a "secret" key rather than as bare strings:
{"secret": "the-actual-value"}Multi-value secrets (credentials with username + password, OIDC clients, etc.)
continue to use their natural field names ("username", "password",
"client_id", "client_secret", etc.).
AWS Secrets Manager assigns every secret a random 6-character suffix in its
ARN (e.g., my-secret-AbCdEf). CDK's ECS secret grant uses the wildcard
pattern my-secret-?????? to match it.
When ECS resolves a container secret, it calls GetSecretValue with either:
- A JSON field ref (
name:field::): Secrets Manager resolves the name to the full ARN first, then the IAM check is against the full ARN —my-secret-??????matchesmy-secret-AbCdEf. ✓ - A bare name or partial ARN: the IAM check is against the partial ARN
(
my-secret) —my-secret-??????requires 6 extra characters and does not matchmy-secret(zero extra characters). ✗
Wrapping every single-value secret in {"secret": "..."} and referencing it
with ecs.Secret.from_secrets_manager(secret, "secret") ensures the first
(working) resolution path is always used, without needing to hard-code the
random ARN suffix anywhere in the codebase.
The same applies to init-container shell scripts: passing the secret name
(not partial ARN) to --secret-id triggers name→full-ARN resolution before the
IAM check, so CDK's grants work correctly there too.
Install the git pre-commit hook (symlinks bin/pre-commit into .git/hooks/):
bin/pre-commit --installRun validators against the whole repo (or a file/dir/glob subset):
bin/validate # all tracked files
bin/validate --fix # apply fixers then check
bin/validate infra/ # restrict to a subtree
bin/validate --dirty # only staged filesPer-directory validator scoping is driven by .validator.toml files. The
repo-root file enables python-black and python-pyright on all .py files;
nested .validator.toml files can narrow or extend that config.
Run tests against the whole repo:
bin/testbin/cdk ls # list all stacks in the app
bin/cdk synth # emits the synthesized CloudFormation template
bin/cdk deploy # deploy this stack to your default AWS account/region
bin/cdk diff # compare deployed stack with current state
bin/cdk docs # open CDK documentationCDK organizes infrastructure into "stacks". Each stack has its own name, and can be deployed individually. Each stack is composed of one or more "constructs", which are either individual infrastructure resources, or collections of infrastructure resources. The relationships between a parent stack to its child constructs, and parent constructs to their child constructs creates a resource dependency tree. Constructs may declare relationships to other constructs across branches of that tree, or even into the trees of other stacks, forming a rich DAG. But they can never declare a cyclical dependency.
-
- Declares shared resources used by all other stacks.
- Resources:
- Hosted Zone and VPC
- ECS cluster
-
- OIDC identity provider
- Resources:
- Storage: S3 media bucket, RDS PostGres database
- Services: Authentik Server and Worker Fargate service containers
- Network: publicly-accessible Application Load Balancer
-
- Shared Postgres for stateful services (Authentik, Headscale, ...).
- Resources:
- RDS PostgreSQL instance in a private-isolated subnet
- A CDK custom resource (Provider + Lambda using
pg8000) that creates each logical database named by the stack'sdatabases=[...]list if it doesn't exist yet
-
- Self-hosted Tailscale control server + Headplane admin UI.
- Resources:
- Services:
headscaleandheadplaneFargate services - Network: one public ALB serving both
headscale.<public_domain>andheadplane.<public_domain>via host-header routing; Cloud Map private namespaceheadscale.localfor Headplane→Headscale - Storage: shared Postgres via DataStack
- Init: noise private key materialized from
headscale/noise-private-keyonto a tmpfs volume by an init container before Headscale starts - Custom resource: populates
headscale/admin-api-keyby running a one-shot Fargate task (headscale apikeys create) the first time the stack is deployed
- Services:
- MagicDNS base domain:
{headscale.private_subdomain}.{foundation.private_domain}(e.g.ts.example.net)
-
- Agentic assistant platform
- Resources:
- Storage: EFS volume, backup plan
- Services: EC2 instance running openclaw node daemon
- Network: VPC
- Notes:
- I consider this service to be high-risk to run, so I've isolated it in several ways. It has its own VPC, is running on a machine that can only be accessed via SSM connection sessions, and currently has no privileges to communicate with anything internal.
- I may want to modify this setup to reuse the foundation VPC and host in Fargate. Will need to get more trust in the system first.
-
- Self-hosted Bitwarden-compatible password vault
- Resources:
- Storage: EFS
/data, shared Postgres via DataStack - Services: single Fargate task (
vaultwarden/server) pulled from the Docker Hub pull-through cache - Network: public ALB at
vaultwarden.<public_domain>
- Storage: EFS
-
Planned public AWS stacks:
- Matrix Synapse
-
Planned private AWS stacks:
- searXNG
-
Planned homelab hosting:
- ownCloud / NextCloud
- Gitea or Forgejo
-
TODO:
- Automated credential rotation. Two viable paths:
- Secrets Manager hosted rotation
(
secret.add_rotation_schedule(..., hosted_rotation=secretsmanager.HostedRotation.postgres_single_user( vpc=foundation.vpc), automatically_after=Duration.days(30))). AWS-managed Lambda rotates the password in Postgres and writes the new value back to the secret. Downside: services that read secrets at task boot (all ECS tasks here) will keep the old password until redeploy — wire an EventBridge rule on the rotation event toecs:UpdateService --force-new-deploymentso tasks get recycled.multi_userrotation avoids the restart but requires app-side retry-on-auth-fail. - IAM DB auth. Enable
iam_database_authenticationon the RDS instance, grant task rolesrds-db:connecton the specific DB user ARN. Services request a 15-minute token from RDS at connect time. No rotation needed because there's no long-lived password. Downside: Authentik/Headscale don't natively fetch RDS auth tokens — would need a sidecar (pgbouncer+ token refresh) per task. More moving parts. - Recommended order: per-service users first, then hosted rotation with force-new-deployment on rotation events. Defer IAM DB auth until there's a real need — the sidecar cost outweighs the rotation cost at single-digit-service scale.
- Secrets Manager hosted rotation
(
- Vaultwarden
- Enable SSO
- Get rid of the Imports types and just use kwargs
- Be specific about which properties we want from foundation and data exports
- EFS and RDS backups
- Automated credential rotation. Two viable paths: