OpenLakes Core provides an on-prem, open-source lakehouse stack built from battle-tested projects like Apache Iceberg, Spark, Trino, Airflow, Superset, MinIO, Grafana, and more. The deployer script (deploy-openlakes.sh) wires together storage tiers, credentials, and ingress to deliver a full multi-tenant analytics environment on Kubernetes.
- Storage + Lakehouse: MinIO for object storage, Apache Iceberg tables, optional tiered (hot/cold/cache) hostPath mounts.
- Compute + Orchestration: Spark, Trino, Airflow, Meltano, and JupyterHub notebooks with automatic synchronization.
- Observability & Control Plane: Prometheus, Grafana, Loki, integrated auth, and lifecycle automation.
-
Prerequisites
- Kubernetes cluster (single node or multi-node) with
kubectlcontext set. - Helm 3,
yq, GNUbash4+, and access to persistent storage paths for MinIO tiers. - Optional: Longhorn or CSI storage class for block volumes.
- Kubernetes cluster (single node or multi-node) with
-
Clone and configure
git clone https://github.com/OpenLakes/core.git cd core cp core-config.example.yaml core-config.yaml # edit core-config.yaml to match your domains, passwords, and storage paths
⚠️ Security: All passwords and hostnames incore-config.example.yamlare placeholders. Change every credential and hostPath before using outside a throwaway lab cluster. -
Deploy
./deploy-openlakes.sh
The script validates prerequisites, applies the Helm layers, and waits for critical services before exiting.
core-config.example.yamldocuments every tunable option. Treat it as a template and keep your realcore-config.yamlout of version control (see.gitignore).- For advanced tweaks (custom storage classes, additional components), edit the relevant sections under
storage,components, andobservability.
Copyright (c) OpenLakes.
Licensed under the Apache License 2.0.