AtlasStack is a modular ingestion and validation stack for UK energy and infrastructure datasets.
It converts unstable external APIs into deterministic, schema-controlled, test-validated inputs for analytics, forecasting, and ML systems.
Public energy datasets (NESO, ESO, weather APIs, interconnector feeds) are predominantly:
- poorly versioned
- weakly typed
- prone to silent schema drift
- inconsistent in cadence
- rarely testable
AtlasStack treats ingestion as engineering.
It enforces structure, typing, cadence, and validation before data is allowed into analytics or ML layers.
If the foundation is unreliable, every forecast built on top of it is suspect in its trustworthiness.
flowchart LR
EXT[External APIs] --> BR[Bronze<br/>Raw JSONL<br/>dt partitions]
BR --> SL[Silver<br/>Typed Parquet<br/>schema enforced]
SL --> STG[dbt staging models]
STG --> DIM[dim_date]
STG --> FCT[fct_observations]
FCT --> MART[mart_daily_summary]
- Raw API responses
- Append-only
- Partitioned by
dt=YYYY-MM-DD - Never mutated / no transformation logic
- Strict typing
- Normalized timestamps
- Explicit schema enforcement
- Ready for dbt consumption
- Deterministic partitioning
- Local analytical engine
- dbt transformations
- Fact and dimension modelling
- Explicit data tests
flowchart TB
%% Sources
subgraph Sources
NESO[NESO Demand API]
WEATHER[Open-Meteo Weather API]
end
%% Orchestration
subgraph Orchestration
PREFECT[Prefect Flow]
end
%% Storage
subgraph Storage
BRONZE[Bronze JSONL<br/>data/bronze]
SILVER[Silver Parquet<br/>data/silver]
end
%% Transform
subgraph Transform
DUCK[(DuckDB Warehouse)]
DBT[dbt Models]
MARTS[Marts]
end
%% Quality
subgraph Validation
PYTEST[Unit Tests]
DBTTEST[dbt Data Tests]
CI[GitHub Actions]
end
%% Future
subgraph Optional_Cloud_Extension
S3[S3 Object Storage]
SNOW[Snowflake Warehouse]
end
NESO --> PREFECT
WEATHER --> PREFECT
PREFECT --> BRONZE
BRONZE --> SILVER
SILVER --> DUCK
DUCK --> DBT
DBT --> MARTS
PYTEST --> CI
DBTTEST --> CI
BRONZE -. storage swap .-> S3
SILVER -. storage swap .-> S3
DUCK -. warehouse swap .-> SNOW
Run the pipeline for the last N days:
atlasstack run --days 3This serves to:
- Extract NESO demand data to the bronze layer
- Extract Open-Meteo weather data to the bronze layer
- Build silver layers
- Execute dbt build
- Produce validated data marts
Check CLI options:
atlasstack --help
atlasstack run --helpAtlasStack is governed by the following set of engineering constraints:
-
Determinism over convenience The same date range always produces identical outputs.
-
Immutability Bronze level data is never mutated, being append-only. Corrections happen in the downstream layers.
-
Explicit schema contracts All external data is normalised and typed before consumption. Cadence and null thresholds are enforced.
-
Loud failure The CI fails on scheme drift, cadence breaks, or coverage degradations occur.
-
Layered testing Unit tests used for extractors. Data tests used for marts. CI validation tests for the full stack.
-
Infrastructure focus Analytics are secondary to foundational reliability.
pip install -e ".[dev]"ruff check .
pytestpython scripts/ci_bootstrap.pycd dbt/atlasstack_dbt
dbt build --no-partial-parse --profiles-dir .A successful pipeline run produces:
- Partitioned bronze JSONL files
- Partitioned silver Parquet files
- Passing dbt tests
- A valid fct_observations table with:
- Half-hour cadence
- Enforced weather coverage thresholds
- Unique settlement timestamps
If dbt and pytest are green, the ingestion layer is behaving as expected for the tested range.
AtlasStack has storage abstraction.
- Bronze:
data/bronze/ - Silver:
data/silver/ - Warehouse: DuckDB
- Bronze/Silver → S3
- Warehouse → Snowflake
The cloud infrastructure is scaffolded but not required to run the project.
The entire stack runs locally without cloud billing dependencies.
Short-Term
- Prefect deployment to managed orchestration (schedules and retries)
- Run metadata: structured run reports, row counts, and freshness markers
Mid-Term
- Move bronze/silver to S3 (partitioned object storage)
- Switch warehouse target from DuckDB to Snowflake (raw, staging, and marts)
- CI runs dbt against a temporary Snowflake schema (PR validation)
Long-Term
- Minimal Terraform: S3 bucket and Snowflake roles/permissions
- Dataset contracts and schema drift alerts (contract breaks fail CI)