Skip to content

Turtini/OpenShift

Repository files navigation

OpenShift Bare-Metal Cluster — Ansible Companion Playbook

End-to-end Ansible playbook for installing an enterprise OpenShift cluster on bare metal (UPI), with:

  • Static IPs + VLAN tagging on every node (NetworkManager keyfile baked into RHCOS ISO)
  • FIPS 140-2 mode enabled at install time
  • OpenShift Data Foundation (ODF) with CephFS (file), RBD (block), and NooBaa (S3) storage classes
  • Local Storage Operator managing raw bare-metal disks behind ODF
  • Cluster Logging + Loki + Network Observability stack (logs, traces, flows in one Loki backend)
  • User Workload Monitoring + Grafana Operator for cross-cutting metrics dashboards
  • Web Terminal Operator for in-browser oc access
  • Internal image registry configured with CephFS-backed PVC (HA, 2 replicas)
  • Infra-node placement for router / registry / monitoring / logging (subscription-correct)
  • Label Studio deployed on ODF as the data-labeling workload
  • End-to-end verification report — pass/fail table covering registry, node roles, workload placement, FIPS, operators, storage

This playbook is the executable companion to the RUNBOOK.md in this repo (also published in Turtini Runbooks). Every phase here maps 1:1 to a runbook section, with the same gate checks.


Quick start

# 0. Install collections
ansible-galaxy collection install -r requirements.yml

# 1. Edit your inventory (start by copying the example)
cp -r inventories/cluster-example inventories/my-cluster
$EDITOR inventories/my-cluster/group_vars/all.yml
$EDITOR inventories/my-cluster/hosts.yml

# 2. Drop your pull secret + SSH key paths into all.yml, then run end-to-end
ansible-playbook -i inventories/my-cluster site.yml

# Or run a single phase
ansible-playbook -i inventories/my-cluster playbooks/99-validate.yml

Phase map

# Playbook Purpose Gate
0 00-preflight.yml DNS, NTP, LB, pull-secret, mirror registry checks All preflight assertions pass
1 10-install-config.yml Render install-config.yaml (FIPS, no-platform BM, OVNKubernetes) install-config.yaml rendered + backed up
2 20-ignition-and-iso.yml openshift-install create ignition + per-host RHCOS ISO with NM keyfile Per-host ISOs in /var/www/html/, ignition served on :8080
3 30-bootstrap.yml Bootstrap node, control-plane bring-up wait-for bootstrap-complete returns 0
4 40-approve-csrs.yml Approve worker CSRs, wait for nodes Ready All nodes Ready, MCPs Updated=True
5 50-local-storage.yml Local Storage Operator + LocalVolumeSet localblock SC exists, N PVs visible
6 60-odf.yml ODF Operator + StorageCluster (RBD + CephFS + NooBaa) StorageCluster Ready, two SCs present
7 70-loki.yml Loki Operator + LokiStack with NooBaa S3 backend LokiStack Ready
8 80-logging.yml Cluster Logging Operator + ClusterLogForwarder → Loki Logs flowing into Loki
9 90-netobserv.yml Network Observability Operator + FlowCollector → Loki FlowCollector Ready, Console plugin enabled
10 91-monitoring.yml UWM enable + Grafana Operator + datasources + dashboards Grafana CR Ready, datasources resolve
11 92-label-studio.yml Label Studio CE on ODF + ServiceMonitor scraped by UWM Route serves 200, scrape target up
12 93-image-registry.yml Internal registry → CephFS PVC + 2 replicas + default route Image push/pull round-trip succeeds
13 94-infra-nodes.yml Label/taint infra nodes + move platform workloads onto them All platform pods on infra
14 95-web-terminal.yml Web Terminal Operator (no CR needed) CSV Succeeded, console icon present
15 99-validate.yml Full verification report (re-runnable any time) 0 FAIL entries

Run-anytime: 99-validate.yml is idempotent and standalone. Wire it into cron, CI, or a pre-demo smoke gate.


Hard constraints

  • FIPS is install-time only. fips: true in install-config.yaml is immutable. Skipping it now means a full reinstall to add it later.
  • No DHCP. Every node gets a static IP via NM keyfile baked into its RHCOS ISO. The keyfile also carries the VLAN child profile.
  • VLAN tagging lives in the same NM keyfile (parent NIC + VLAN id + IPv4 method=manual).
  • ODF "file storage" = the cephFilesystem resource in the StorageCluster CR. Without that flag you only get RBD (block).
  • Local Storage Operator is a hard prereq for ODF on raw bare-metal disks.
  • Image registry comes up Removed on bare-metal UPI until you give it storage. Phase 12 fixes that.
  • NooBaa is the linchpin of the observability stack. Loki uses NooBaa S3 for its backend. If ODF degrades, Loki/Logging/NetObserv degrade with it.

Repo layout

ocp-bm-odf-playbook/
├── README.md
├── ansible.cfg
├── requirements.yml
├── site.yml                           # imports the phase playbooks in order
├── inventories/
│   └── cluster-example/
│       ├── hosts.yml
│       └── group_vars/{all,masters,workers,infra}.yml
├── playbooks/                         # one file per phase, see table above
└── roles/
    ├── preflight/
    ├── install_config/
    ├── ignition_iso/
    ├── bootstrap/
    ├── csr_approve/
    ├── local_storage_operator/
    ├── odf/
    ├── image_registry/
    ├── loki_operator/
    ├── cluster_logging/
    ├── network_observability/
    ├── monitoring/
    ├── web_terminal/
    ├── label_studio/
    ├── infra_nodes/
    └── validate/

Verification report

ansible-playbook -i inventories/my-cluster playbooks/99-validate.yml

============= Cluster Verification Report =============
[PASS] Image registry managementState: Managed
[PASS] Image registry storage backend: image-registry-storage
[PASS] Image registry pods: 2 Running
[PASS] Image registry default route: default-route-openshift-image-registry.apps.ocp01.lab.example.com
[PASS] Image push/pull round-trip: round-trip succeeded
[PASS] Control plane count: 3 nodes (expected 3)
[PASS] Worker count: 5 nodes (expected 5)
[PASS] Infra count: 3 nodes (expected 3)
[PASS] All nodes Ready: all 11 Ready
[PASS] Infra nodes tainted NoSchedule: all infra tainted
[PASS] ODF storage labels: 3 nodes labeled
[PASS] Placement: Router (ingress): on infra
[PASS] Placement: Image Registry: on infra
[PASS] Placement: Prometheus k8s: on infra
[PASS] Placement: Loki gateway: on infra
[PASS] All cluster operators Available
[PASS] FIPS enabled on m01: FIPS mode is enabled
[PASS] ODF StorageCluster: Ready
[PASS] LokiStack: Ready
[PASS] FlowCollector: Ready
=======================================================
Result: 20 PASS, 0 FAIL

WARN entries don't fail the run; FAIL entries do.


Companion runbook

The runbook walks through each phase in narrative form, with manual fallbacks, gate checks, common failure modes, and a "what broke" diagnostic tree. This repo provides the executable path; the runbook is the educational and diagnostic path. The two stay in sync.

  • RUNBOOK.md — full narrative, lives in this repo
  • Turtini Runbooks — same content, forkable per cluster, with marketplace inheritance for downstream orgs

Status

Public reference implementation. Tested against OpenShift 4.16. Issues + PRs welcome.

License

Apache License 2.0 — see LICENSE.

About

Ansible companion playbook for OpenShift bare-metal UPI installs with ODF, Loki, NetObserv, Grafana, FIPS, infra placement, and a re-runnable verification report

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages