Skip to content

Latest commit

 

History

History

home-cluster

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Home Cluster

Description

This is fully autonomous and automated high availability "micro data center" for home. All critical components has been reserved both on the hardware level and on the software level.

The 1st version on 12V was unstable when all 6 computers were turned on. It's a long story, but in a few words from electronics "axioms" - if you want to eliminate issues with power - power your electronics with proper voltage and power. Therefore, was created 2nd version on 24V which was stable, but I decided to improve it and build 3rd version. This version on 24V also, but without Low voltage UPS for smart home (replaced by extenal UPS), Rack Alarm System (replaced by external smoke alarm system) and PJON routers (I don't use PJON for external devices, therefore, all internal devices was switched to I2C protocol). After that little bit decreased energy consumption and become better cooling of cluster (as more free space become available).

Beside automation and reservation all devices (like computers, switches, external wifi routers, etc.) fully controlled remotely:

Hardware level reservation

  • cluster hardware has 3 x arm SBCs and 3 x x64 mini PCs which totally enough for building HA Kubernetes cluster (see details: Cluster case and hardware);
  • 2 x USB 3.0 KVM Switcher 2 Port PCs Sharing 4 Devices used for reservation connection of USB devices;
  • availability to use multiple internet providers include mobile and satellite internet providers (rack has all necessary outside connectors which can be easily connected to internal cluster components) for full reservation of internet connection.

Software level reservation

  • HA Kubernetes cluster: 3 x arm SBCs for Master nodes and 3 x x64 mini PCs for Workers;
  • Each Worker node has additional 1TB disk which used for HA data storage based on OpenEBS and MinIO;
  • Full remote access to Masters and Workers via IP-KVM for example to easily access in BIOS or remote manual OS installation.

Automation and monitoring

As electronics are just my hobby and my primary job/position is SRE I clearly understand that repeatable things should be automated and critical components should be monitored.

Automation

At first sight, what can be repeatable for a home cluster where hardware for the years can be unchangeable? Yes, with one server - it's can be ok, but when you have 6 servers and time to time they should be upgraded both on the hardware and on the software levels - manual deployment and configuration become to pain, therefore, I trying to keep everything automated.

  • OS deployment on all nodes (includes arm SBCs) I making via PXE and process fully automated. IP-KVM is used only when need make some correction in configuration files for the new OS version of automated deployment;
  • for automate configuration of hardware nodes, LXD containers and applications like HashiCorp Vault I use Pulumi and Ansible;
  • Kubernetes deployment, include building docker containers via Bazel GitOps Rules.

Some other software which I use:

  • Kube-vip - network load balancer for bare-metal clusters;
  • Traefik - as Kubernetes ingress controller;
  • Docker-registry - for storing my containers;
  • OpenVPN - access into private network from anywhere (I have everything closed for public internet);
  • Gitea - for storing my projects code and docs in git, also, for CI-CD, tickets system (sometimes it useful create tickets for my self :) );
  • Home Assistant - main component for IoT things;
  • Frigate - NVR With Realtime Object Detection for IP Cameras.

Monitoring

For hardware level, almost each line has a monitoring of voltage, current and power consumption. Also, it has 10 temperature sensors. From all these components data sending to cluster via I2C protocol and storing it in the DB for visualization via Prometheus / Grafana with alerting about abhormal situations via Prometheus Alertmanager -> Telegram and Slack.

For software level monitoring and alerting use the same software stack: Prometheus, Grafana, Prometheus Alertmanager -> Telegram and Slack.

How overview cluster dashboard looks:

Cluster Dashboard

Cluster rack design

If you interested, check version #1 and version #2.

For cluster rack I chose TUFFIOM 9U Network Cabinet Enclosure and added connectors to it to avoid pulling wires from outside through holes, i.e. isolated it from outside. This rack comes with 2 x 110V fans which I replaced by 2 x 120mm 12V fans. Also was added 2 x 120mm 12V to each rack side.

Also I use 12U rack for UPS (APC BE600M1) and PoE Switch (NETGEAR GS305EP) which I need for my cameras. To be sure that top 9U rack won't overweight native 12U legs I added 4 x adjustable legs.

Holes for mounting closed by 3M Fire Barrier.

I tried to make it a safe as possible:

  • fully isolate everything from outside (via nonflammable connectors)
  • I think that I used more 70% nonflammable components and wires inside
  • I used here so many fuses as I have never seen on any device before (each line at least has one fuse, in some cases two)

But for "better sleep" I decided to put 3 automatic fire suppressors inside cluster 9U rack: 2 x StoveTop FireStop Rangehood (near the largest congestion of wires) and 1 x JOSEOZSTA on the top (between fans). Inside 12U rack I put 1 x Automatic Fire Extinguisher. All external wires was placed in fiberglass tube which is totally non flammable.

Also, was added 2 external smoke detectors: 1 x X-Sense SC06-W Smoke and Carbon Monoxide Detector which paired with other smoke detectors (if one will be triggered - all will be activated) and 1 x First Alert Z-Wave Smoke Detector & Carbon Monoxide Alarm which I use for notifications on mobile.

On the back side of rack was placed Power Supply with Monitoring with 5 x 50mm 12V fans and Rack Cooling module.

Heatmap

As it too much electronics inside 9U rack it should be very good cooled, therefore, inside this rack I placed 20 fans (11 for rack cooling and 9 for cluster cooling). Fans turn on only when temperatures higher than normal and controlled by cooling modules.