Case Study | Architecture Decisions
End-to-end NHS Trust analytics platform simulating a consultancy engagement that ingests operational healthcare data, models it through a medallion architecture, and surfaces access and inequality analytics.
NHS Trusts generate vast amounts of operational data across EHR systems, A&E departments, diagnostic services, appointment bookings etc, but they rarely have the engineering infrastructure to turn it into actionable inequality analytics.
Waiting times, A&E breaches, and diagnostic delays affect deprived communities disproportionately. National targets (18-week RTT, 6-week DM01, 4-hour A&E) are tracked in aggregate, but breakdowns by IMD decile, ethnicity, age band, and gender are largely invisible.
Access-IQ demonstrates how a data engineering consultancy would build the analytics infrastructure for a fictional NHS Trust ("Northshire Trust") to surface these patterns. The platform ingests synthetic but structurally realistic healthcare data from three source systems, models it through a Bronze/Silver/Gold medallion architecture, and delivers inequality-aware dashboards to Trust analysts.
Two AWS accounts model a real vendor-client boundary - the Trust controls its own data, the platform account processes it under contract. All infrastructure is CDK-managed with an ephemeral deploy/destroy pattern to avoid idle costs.
The platform spans two AWS accounts connected via VPC peering. The Trust account hosts the simulated hospital environment (RDS Postgres, SFTP via Transfer Family, S3 exports). The Platform account runs the analytics infrastructure: ECS Fargate for ingestion, S3 data lake with KMS encryption, Redshift Serverless for warehousing and dbt modelling, Prefect for orchestration, and CloudWatch for observability.
Gold-layer Parquet files are exported to S3 and read by a Streamlit dashboard on Community Cloud via DuckDB - keeping the dashboard stateless and free-tier. See Data Flow Diagram for the Bronze to Silver to Gold pipeline.
View as Mermaid (interactive)
graph LR
subgraph Trust["Trust Account (Northshire Hospital Sim)"]
RDS[(RDS Postgres<br/>EHR + Urgent Care)]
SFTP[Transfer Family<br/>SFTP Appointments]
TS3[Trust S3<br/>Diagnostics + Providers]
end
subgraph Platform["Platform Account (access-iq)"]
subgraph Compute["ECS Fargate"]
ING[3x Ingestion Tasks]
PFS[Prefect Server]
PFW[Prefect Worker]
end
S3L[(S3 Data Lake<br/>Bronze / Silver / Gold)]
KMS[KMS CMK<br/>Encryption]
RS[(Redshift Serverless<br/>Spectrum + dbt)]
CW[CloudWatch<br/>Logs + Alarms + Dashboard]
SNS[SNS<br/>Alert Topics]
end
ST[Streamlit<br/>Community Cloud]
RDS -->|VPC Peering| ING
SFTP -->|VPC Peering| ING
TS3 -->|Cross-account S3| ING
ING -->|Bronze Parquet| S3L
S3L -->|Spectrum| RS
RS -->|dbt Silver+Gold| RS
PFS -->|orchestrates| PFW
PFW -->|triggers| ING
PFW -->|dbt run| RS
RS -->|Gold export| S3L
S3L -->|Parquet read| ST
CW -->|alarm| SNS
KMS -.->|encrypts| S3L
| Component | Tool | Version | Why |
|---|---|---|---|
| Ingestion | ECS Fargate + Python | 3.12 + PyArrow | Serverless containers avoid idle EC2 costs; Fargate scales each source independently without cluster management |
| Storage | S3 + KMS CMK | - | Decouples storage from compute; KMS CMK enables per-env key rotation and cross-account grants that SSE-S3 cannot |
| Warehouse | Redshift Serverless | 8 RPU | Spectrum queries Bronze in-place (no ETL duplication), RPU auto-scales to zero when idle, and native dbt adapter avoids Athena per-query costs |
| Modelling | dbt-core + dbt-redshift | 1.10 | Declarative SQL transformations with built-in lineage, test contracts, and incremental materialisation that Redshift stored procedures lack |
| Data Quality | dbt-expectations + GE 1.x | - | dbt-expectations validates inline during transforms; Great Expectations covers row-level profiling that dbt tests alone cannot express |
| Orchestration | Prefect 3 (self-hosted) | ECS Fargate | Free self-hosted server on Fargate avoids Prefect Cloud costs; ECS work pool launches tasks as ephemeral containers with $0 idle |
| Dashboard | Streamlit | Community Cloud | Reads static Gold Parquet via DuckDB with no warehouse connection; Community Cloud hosts free-tier with zero infrastructure |
| Infrastructure | AWS CDK | Python | Imperative Python over declarative YAML keeps config DRY with shared EnvConfig; L2 constructs handle IAM/encryption defaults that CloudFormation requires manually |
| Pseudonymisation | HMAC-SHA-256 | Lambda UDF | Deterministic hash preserves join integrity across tables while NHS Mod-11 validation catches malformed identifiers before pseudonymisation |
Prerequisites: AWS CLI v2 with two named profiles configured (one for the Platform account, one for the Trust account), uv, Node.js (for CDK), Docker.
# Clone both repos as siblings
git clone https://github.com/chiplusplus/access-iq-platform.git
git clone https://github.com/chiplusplus/northshire-hospital-sim.git
# Set up the Trust simulator (create virtual environment and install dependencies)
cd northshire-hospital-sim
python -m venv .northshire-hospital-sim
.northshire-hospital-sim/bin/pip install -r requirements.txt
# Configure Trust CDK context (required for cross-account VPC peering)
cp infra/cdk.context.example.json infra/cdk.context.json
# Edit infra/cdk.context.json — set platformAccountId to your Platform AWS account ID
# See northshire-hospital-sim README for full variable reference
# Set up the platform
cd ../access-iq-platform
cp .env.example .env # Fill in AWS profile, bucket name, secret ARNs
# Review infra/config/dev.json for account IDs, VPC CIDRs, Redshift capacity
make setup # Create venv, install deps, pre-commit hooks
# Log in to both AWS accounts (skip if using IAM keys)
export PLATFORM_PROFILE=<your-platform-profile>
export TRUST_PROFILE=<your-trust-profile>
aws sso login --profile $PLATFORM_PROFILE
aws sso login --profile $TRUST_PROFILE
# Deploy everything
make up # Deploy infra, seed data, start pipeline (~65 min)make up invokes scripts/session.sh which orchestrates an 8-step deployment:
- Trust bootstrap - deploys Trust CDK stack, seeds RDS with ~100K synthetic patients + ~586K encounters via
northshire-hospital-sim, publishes to RDS/S3/SFTP - Platform deploy - deploys all Platform stacks (lake, secrets, catalog, ECR, network, warehouse, compute, observability, budget) with VPC peering
- Redshift pre-warm - creates Spectrum external schema, waits for workgroup availability
- Secrets + Docker + dbt + Prefect - seeds Secrets Manager, builds and pushes ingestion image, registers Spectrum tables, starts Prefect server and worker
After completion: Streamlit URL printed, CloudWatch dashboard URL printed, Prefect UI at localhost:4200 (via SSM tunnel).
Only dev is actively deployed. The CDK stacks and config files support a separate prod environment (infra/config/prod.json with RETAIN policies, Redshift snapshots, 90-day log retention), but running a second account with stateful resources is not justified for a portfolio project. See Environment Matrix for the full prod design.
Each environment is driven by a config file at infra/config/{dev,prod}.json where you set your own account IDs, region, VPC CIDRs, and Redshift capacity. CDK reads the config matching the CDK_ENV variable:
CDK_ENV=dev make infra-bootstrap # one-time CDK bootstrap per account
CDK_ENV=dev make up # deploy dev (default)In dev, all resources use DESTROY removal policy — make down leaves the accounts at $0 idle cost with nothing persisted between sessions.
make up # Deploy all stacks, seed Trust data, start Prefect
make status # Show stack status, Redshift state, latest manifest
make pipeline # Optional (pipeline is scheduled to trigger on the hour every hour): Run full pipeline: ingest -> dbt -> DQ -> Gold export
make dashboard # Run Streamlit dashboard locally (reads Gold Parquet from S3 via DuckDB)
make down # Destroy all stacks across both accountsmake down returns to $0 idle cost. In dev, every resource is destroyed (DESTROY removal policy) except the KMS CMK, which uses RETAIN in both environments because its pending-deletion window would break redeploy cycles.
Budget safety net: The BudgetStack deploys an AWS Budget with a monthly ceiling ($10 dev / $20 prod). If actual spend reaches 80% of the ceiling, an SNS alarm triggers a Lambda that automatically destroys the ephemeral stacks (compute, warehouse, network, observability, iam). Stateful stacks (lake, secrets, catalog, ECR) are not touched. This is a last-resort guard against forgotten make down sessions — if it fires, redeploy with make up when ready.
Additional targets: make dbt CMD="run --select silver", make rs-tunnel, make dq-gate, make reconnect, make profile, make ready.
| Prefix | Written by | Description |
|---|---|---|
bronze/ |
Ingestion | Raw Parquet, partitioned source/entity/ingest_date/run_id. Registered as Spectrum external tables via dbt-external-tables for Redshift queries |
silver/ |
dbt | Conformed tables: patients, encounters, appointments, urgent_care, diagnostics, providers |
gold/ |
dbt | Marts: fct_wait_times, fct_inequality, fct_urgent_care, fct_utilisation + 6 dimensions |
_manifests/ |
Ingestion | JSON manifest per run (idempotency + audit) |
_dq/ |
GE | Great Expectations validation results |
The ObservabilityStack deploys production-grade monitoring across the full pipeline:
Dashboards: A Pipeline Health dashboard (access-iq-{env}-pipeline-health) shows last-success timestamps, GE gate metrics, ingestion status, and ECS resource utilisation.
Metric Filters: 15+ CloudWatch metric filters on the pipeline log group track structured events across the lifecycle: ingestion start/complete/fail, dbt Silver/Gold completion, GE gate pass/fail, Gold export completion, and validation errors.
Alarms:
| Alarm | What it catches |
|---|---|
| Ingestion failure | Manifest status: failed in logs |
| GE gate failure | Data quality validation failed on Silver tables |
| Validation error | GE infrastructure error during checkpoint execution |
| Pipeline staleness | No successful pipeline run in 48h (configurable) |
| Gold export staleness | No Gold export in 50h (configurable) |
| Budget threshold | 80% of monthly ceiling reached; triggers auto-teardown |
| ECS OOM detection | EventBridge rule catches container crashes/OOM kills |
Staleness alarms use BREACHING on missing data, so they fire even when the pipeline simply hasn't run (e.g. forgotten make down). Evaluation windows are configurable in infra/config/{env}.json.
Permanent Dashboard Exports: Gold Parquet files are exported to a KMS-encrypted S3 bucket (access-iq-dashboard-exports) that lives outside the ephemeral stacks. This bucket, its KMS key, and a read-only IAM user persist across make down / budget teardowns, keeping the Streamlit Community Cloud dashboard live between sessions. See the Runbook for one-time setup instructions.
The project documents key technical decisions as Architecture Decision Records. Highlights:
- ADR-001: Medallion Architecture - Bronze/Silver/Gold with Spectrum on Bronze
- ADR-004: Ephemeral Infrastructure - deploy/destroy pattern, ~94% cost reduction
- ADR-008: Static Gold Export - DuckDB reads Parquet on Streamlit Community Cloud
- ADR-009: Self-hosted Prefect - Cloud push-pool incompatible with free tier
- All 9 ADRs
The Gold layer and pipeline infrastructure create natural extension points. These next areas to be explored for this project are:
- Demand forecasting - the fct_wait_times time series lends itself to Prophet/ARIMA modelling to predict breach risk by specialty, giving Trust managers lead time to reallocate capacity
- Referral-letter triage - referral free-text fields could be classified to prioritise urgent pathways, reducing manual clinical triage overhead
- Breach-rate anomaly detection - statistical process control on fct_urgent_care 4h/12h breach rates could surface emerging pressure points before they hit aggregate targets
- Real-time ingestion - replacing batch ECS tasks with Kinesis Data Streams would enable sub-minute latency for time-sensitive sources like A&E arrivals
License: MIT | Author: Chia A | Case Study

