End-to-end Ansible playbook for installing an enterprise OpenShift cluster on bare metal (UPI), with:
- Static IPs + VLAN tagging on every node (NetworkManager keyfile baked into RHCOS ISO)
- FIPS 140-2 mode enabled at install time
- OpenShift Data Foundation (ODF) with CephFS (file), RBD (block), and NooBaa (S3) storage classes
- Local Storage Operator managing raw bare-metal disks behind ODF
- Cluster Logging + Loki + Network Observability stack (logs, traces, flows in one Loki backend)
- User Workload Monitoring + Grafana Operator for cross-cutting metrics dashboards
- Web Terminal Operator for in-browser
ocaccess - Internal image registry configured with CephFS-backed PVC (HA, 2 replicas)
- Infra-node placement for router / registry / monitoring / logging (subscription-correct)
- Label Studio deployed on ODF as the data-labeling workload
- End-to-end verification report — pass/fail table covering registry, node roles, workload placement, FIPS, operators, storage
This playbook is the executable companion to the RUNBOOK.md in this repo (also published in Turtini Runbooks). Every phase here maps 1:1 to a runbook section, with the same gate checks.
# 0. Install collections
ansible-galaxy collection install -r requirements.yml
# 1. Edit your inventory (start by copying the example)
cp -r inventories/cluster-example inventories/my-cluster
$EDITOR inventories/my-cluster/group_vars/all.yml
$EDITOR inventories/my-cluster/hosts.yml
# 2. Drop your pull secret + SSH key paths into all.yml, then run end-to-end
ansible-playbook -i inventories/my-cluster site.yml
# Or run a single phase
ansible-playbook -i inventories/my-cluster playbooks/99-validate.yml| # | Playbook | Purpose | Gate |
|---|---|---|---|
| 0 | 00-preflight.yml |
DNS, NTP, LB, pull-secret, mirror registry checks | All preflight assertions pass |
| 1 | 10-install-config.yml |
Render install-config.yaml (FIPS, no-platform BM, OVNKubernetes) |
install-config.yaml rendered + backed up |
| 2 | 20-ignition-and-iso.yml |
openshift-install create ignition + per-host RHCOS ISO with NM keyfile |
Per-host ISOs in /var/www/html/, ignition served on :8080 |
| 3 | 30-bootstrap.yml |
Bootstrap node, control-plane bring-up | wait-for bootstrap-complete returns 0 |
| 4 | 40-approve-csrs.yml |
Approve worker CSRs, wait for nodes Ready | All nodes Ready, MCPs Updated=True |
| 5 | 50-local-storage.yml |
Local Storage Operator + LocalVolumeSet | localblock SC exists, N PVs visible |
| 6 | 60-odf.yml |
ODF Operator + StorageCluster (RBD + CephFS + NooBaa) | StorageCluster Ready, two SCs present |
| 7 | 70-loki.yml |
Loki Operator + LokiStack with NooBaa S3 backend | LokiStack Ready |
| 8 | 80-logging.yml |
Cluster Logging Operator + ClusterLogForwarder → Loki | Logs flowing into Loki |
| 9 | 90-netobserv.yml |
Network Observability Operator + FlowCollector → Loki | FlowCollector Ready, Console plugin enabled |
| 10 | 91-monitoring.yml |
UWM enable + Grafana Operator + datasources + dashboards | Grafana CR Ready, datasources resolve |
| 11 | 92-label-studio.yml |
Label Studio CE on ODF + ServiceMonitor scraped by UWM | Route serves 200, scrape target up |
| 12 | 93-image-registry.yml |
Internal registry → CephFS PVC + 2 replicas + default route | Image push/pull round-trip succeeds |
| 13 | 94-infra-nodes.yml |
Label/taint infra nodes + move platform workloads onto them | All platform pods on infra |
| 14 | 95-web-terminal.yml |
Web Terminal Operator (no CR needed) | CSV Succeeded, console icon present |
| 15 | 99-validate.yml |
Full verification report (re-runnable any time) | 0 FAIL entries |
Run-anytime: 99-validate.yml is idempotent and standalone. Wire it into cron, CI, or a pre-demo smoke gate.
- FIPS is install-time only.
fips: trueininstall-config.yamlis immutable. Skipping it now means a full reinstall to add it later. - No DHCP. Every node gets a static IP via NM keyfile baked into its RHCOS ISO. The keyfile also carries the VLAN child profile.
- VLAN tagging lives in the same NM keyfile (parent NIC + VLAN id + IPv4 method=manual).
- ODF "file storage" = the
cephFilesystemresource in theStorageClusterCR. Without that flag you only get RBD (block). - Local Storage Operator is a hard prereq for ODF on raw bare-metal disks.
- Image registry comes up
Removedon bare-metal UPI until you give it storage. Phase 12 fixes that. - NooBaa is the linchpin of the observability stack. Loki uses NooBaa S3 for its backend. If ODF degrades, Loki/Logging/NetObserv degrade with it.
ocp-bm-odf-playbook/
├── README.md
├── ansible.cfg
├── requirements.yml
├── site.yml # imports the phase playbooks in order
├── inventories/
│ └── cluster-example/
│ ├── hosts.yml
│ └── group_vars/{all,masters,workers,infra}.yml
├── playbooks/ # one file per phase, see table above
└── roles/
├── preflight/
├── install_config/
├── ignition_iso/
├── bootstrap/
├── csr_approve/
├── local_storage_operator/
├── odf/
├── image_registry/
├── loki_operator/
├── cluster_logging/
├── network_observability/
├── monitoring/
├── web_terminal/
├── label_studio/
├── infra_nodes/
└── validate/
ansible-playbook -i inventories/my-cluster playbooks/99-validate.yml
============= Cluster Verification Report =============
[PASS] Image registry managementState: Managed
[PASS] Image registry storage backend: image-registry-storage
[PASS] Image registry pods: 2 Running
[PASS] Image registry default route: default-route-openshift-image-registry.apps.ocp01.lab.example.com
[PASS] Image push/pull round-trip: round-trip succeeded
[PASS] Control plane count: 3 nodes (expected 3)
[PASS] Worker count: 5 nodes (expected 5)
[PASS] Infra count: 3 nodes (expected 3)
[PASS] All nodes Ready: all 11 Ready
[PASS] Infra nodes tainted NoSchedule: all infra tainted
[PASS] ODF storage labels: 3 nodes labeled
[PASS] Placement: Router (ingress): on infra
[PASS] Placement: Image Registry: on infra
[PASS] Placement: Prometheus k8s: on infra
[PASS] Placement: Loki gateway: on infra
[PASS] All cluster operators Available
[PASS] FIPS enabled on m01: FIPS mode is enabled
[PASS] ODF StorageCluster: Ready
[PASS] LokiStack: Ready
[PASS] FlowCollector: Ready
=======================================================
Result: 20 PASS, 0 FAIL
WARN entries don't fail the run; FAIL entries do.
The runbook walks through each phase in narrative form, with manual fallbacks, gate checks, common failure modes, and a "what broke" diagnostic tree. This repo provides the executable path; the runbook is the educational and diagnostic path. The two stay in sync.
- RUNBOOK.md — full narrative, lives in this repo
- Turtini Runbooks — same content, forkable per cluster, with marketplace inheritance for downstream orgs
Public reference implementation. Tested against OpenShift 4.16. Issues + PRs welcome.
Apache License 2.0 — see LICENSE.