Skip to content

NicolaiMatthew/homelab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏠 Homelab Infrastructure

A production-grade self-hosted homelab built on Proxmox VE, featuring containerized services, automated media pipelines, comprehensive monitoring, and an offsite backup strategy.

Proxmox Docker Linux Prometheus Grafana Debian


πŸ“ Writing

πŸ“‹ Overview

This homelab runs on a Lenovo ThinkCentre M920q and serves as a practical environment for developing and demonstrating skills in Linux administration, containerization, infrastructure monitoring, and DevOps practices.

Every decision in this build was made deliberately β€” from choosing unprivileged LXC containers for security isolation, to implementing hardware-accelerated transcoding via Intel QuickSync passthrough, to building a custom SMART monitoring pipeline that caught a real conflict between a systemd timer and a cron-based textfile collector.


πŸ–₯️ Hardware

Component Spec
Host Lenovo ThinkCentre M920q (SFF)
CPU Intel Core i5/i7 8th/9th Gen (QuickSync capable)
Hypervisor Proxmox VE 9.x
Primary Storage Local SSD (Proxmox OS + LXC rootfs)
Media Storage Terramaster D4-320 DAS - 2x4TB HDD β†’ 7.3TB LVM volume

πŸ—ΊοΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Proxmox VE Host                         β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Networking  β”‚  β”‚   Media      β”‚  β”‚   Monitoring     β”‚  β”‚
β”‚  β”‚              β”‚  β”‚   Stack      β”‚  β”‚   Stack          β”‚  β”‚
β”‚  β”‚  Tailscale   β”‚  β”‚  Docker LXC  β”‚  β”‚  Prometheus      β”‚  β”‚
β”‚  β”‚  Exit Node   β”‚  β”‚  (LXC 112)   β”‚  β”‚  Grafana         β”‚  β”‚
β”‚  β”‚  + Subnet    β”‚  β”‚              β”‚  β”‚  Alertmanager    β”‚  β”‚
β”‚  β”‚  Router      β”‚  β”‚  Gluetun ──► β”‚  β”‚  PVE Exporter    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  qBittorrent β”‚  β”‚  Node Exporter   β”‚  β”‚
β”‚                    β”‚  Prowlarr    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  Sonarr      β”‚                        β”‚
β”‚  β”‚  AI & Tools  β”‚  β”‚  Radarr      β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚              β”‚  β”‚  Bazarr      β”‚  β”‚   Dashboard      β”‚  β”‚
β”‚  β”‚  OpenWebUI   β”‚  β”‚  Jellyfin    β”‚  β”‚                  β”‚  β”‚
β”‚  β”‚  n8n         β”‚  β”‚  Jellyseerr  β”‚  β”‚  Homepage        β”‚  β”‚
β”‚  β”‚  Code Server β”‚  β”‚  Scraparr    β”‚  β”‚  (LXC 113)       β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  Profilarr   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”‚                           β”‚                                β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚              β”‚   DAS β€” 7.3TB LVM       β”‚                   β”‚
β”‚              β”‚   /mnt/das              β”‚                   β”‚
β”‚              β”‚   Bind mounted β†’        β”‚                   β”‚
β”‚              β”‚   /mnt/data in LXC      β”‚                   β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Technical Highlights

Container Strategy

All services run in unprivileged LXC containers on Proxmox. This was a deliberate security decision β€” unprivileged containers remap UIDs so even a container escape would land in an unprivileged user context on the host. The tradeoff is bind mount complexity: UID 1000 inside maps to UID 101000 on the host, requiring careful chown management for shared storage.

VPN Kill Switch

The entire torrent pipeline routes through a Gluetun WireGuard container. qBittorrent uses network_mode: service:gluetun in Docker Compose, meaning it physically cannot route traffic except through the VPN tunnel. If Gluetun stops, qBittorrent loses network access entirely β€” no configuration required, it's enforced at the network namespace level.

Hardware Transcoding

Intel QuickSync is passed through to the Docker LXC via lxc.mount.entry directives in the Proxmox container config. Jellyfin uses QSV hardware decoding/encoding for H.264, HEVC, VP9 and AV1, offloading transcoding from the CPU entirely.

Custom SMART Monitoring

Built a bash-based SMART metrics exporter that writes to Prometheus node_exporter's textfile collector. During development discovered a conflict: the built-in prometheus-node-exporter-smartmon.service systemd timer runs every 15 minutes and overwrites the textfile with zeros β€” causing false alerts. Fixed by using a custom_smart_ metric prefix and disabling the conflicting timer. Alert rules fire after 20 minutes of sustained failure to avoid transient scrape issues.

Storage Architecture

Two 4TB HDDs combined into a single 7.3TB LVM logical volume. Mounted on the Proxmox host at /mnt/das and bind mounted into the Docker LXC at /mnt/data. Sonarr/Radarr download and media folders are on the same filesystem, enabling hardlinks instead of file copies β€” imported media is instantaneous with zero additional disk usage.


πŸ“¦ Full Service Inventory

Service Purpose Location
Tailscale VPN exit node + subnet router LXC 100
File Share Network file sharing LXC 101
n8n Workflow automation LXC 102
Upsnap Wake-on-LAN LXC 103
Crafty Controller Minecraft server management LXC 105
Prometheus Metrics collection + alerting LXC 106
PVE Exporter Proxmox-specific metrics LXC 107
Grafana Metrics visualization LXC 108
Alertmanager Alert routing + notifications LXC 109
OpenWebUI Self-hosted AI interface LXC 110
Code Server Browser-based VS Code LXC 111
Gluetun WireGuard VPN tunnel (Mullvad) Docker LXC 112
qBittorrent Torrent client (VPN-only network namespace) Docker LXC 112
Prowlarr Indexer aggregation Docker LXC 112
FlareSolverr Cloudflare challenge bypass Docker LXC 112
Sonarr TV show automation Docker LXC 112
Radarr Movie automation Docker LXC 112
Bazarr Subtitle automation Docker LXC 112
Jellyfin Media server (QuickSync HW transcode) Docker LXC 112
Jellyseerr Media request portal Docker LXC 112
Scraparr ARR stack Prometheus exporter Docker LXC 112
Profilarr Quality profile management Docker LXC 112
Navidrome Self-hosted music streaming Docker LXC 112
Audiobookshelf Audiobook + podcast server Docker LXC 112
Stirling PDF Self-hosted PDF processing Docker LXC 112
IT Tools Developer utility toolkit Docker LXC 112
Homepage Unified service dashboard LXC 113

πŸ“Š Monitoring Stack

Prometheus scrapes metrics from four sources:

Proxmox Host
β”œβ”€β”€ node_exporter (port 9100)
β”‚   β”œβ”€β”€ CPU, RAM, disk, network
β”‚   β”œβ”€β”€ Filesystem metrics (/mnt/das)
β”‚   └── Custom textfile collector
β”‚       β”œβ”€β”€ custom_smart_health
β”‚       β”œβ”€β”€ custom_smart_temperature
β”‚       β”œβ”€β”€ custom_smart_reallocated_sectors
β”‚       β”œβ”€β”€ custom_smart_power_on_hours
β”‚       └── r2_bucket_size_bytes (offsite backup monitoring)
β”‚
PVE Exporter (LXC 107)
└── Per-LXC CPU, RAM, disk usage
    └── VM/LXC up/down state

Scraparr (Docker LXC)
└── Sonarr, Radarr, Prowlarr, Bazarr, Jellyseerr metrics

Alert Rules

Alert Condition Severity
VirtualGuestDown LXC/VM offline >1m Critical
ProxmoxDiskFilling Disk >85% Warning
ProxmoxGuestHighMemory RAM >90% Warning
DriveHealthFailed SMART health=0 for 20m Critical
DriveReallocatedSectors Any reallocated sectors Warning
DriveTemperatureHigh Temp >55Β°C for 10m Warning
DASStorageFillingUp DAS >80% Warning
DASStorageCritical DAS >95% Critical
R2BackupSizeWarning R2 bucket >8GB Warning
R2BackupSizeCritical R2 bucket >9GB Critical

πŸ’Ύ Backup Strategy

Three-tier backup approach:

Tier 1 β€” Daily (Local)
β”œβ”€β”€ /opt/mediastack configs β†’ DAS tarball
└── /etc/pve/lxc/*.conf β†’ DAS

Tier 2 β€” Weekly (Local Snapshots)
└── All LXCs/VMs β†’ Proxmox VZDump β†’ DAS

Tier 3 β€” Weekly (Offsite)
└── rclone sync β†’ Cloudflare R2 (free tier)
    β”œβ”€β”€ LXC configs
    β”œβ”€β”€ Docker Compose + service configs
    β”œβ”€β”€ Prometheus alert rules
    β”œβ”€β”€ Homepage config
    └── Custom scripts

Restore time estimate: ~1 hour from complete hardware failure using DAS snapshots + R2 config backup + this documentation.


πŸ”‘ Key Engineering Decisions & Lessons Learned

Unprivileged LXCs over VMs β€” Lower overhead than full VMs, better security than privileged containers. The UID remapping complexity (101000 offset) is worth the security isolation.

Docker Compose over individual containers β€” Managing 15+ containers as a single stack with shared networking makes the VPN kill switch trivial to implement and the whole stack easy to update atomically.

Hardlinks over copies β€” Keeping download and media directories on the same filesystem enables Sonarr/Radarr to use hardlinks. A 30GB season pack is "moved" instantaneously with zero extra disk usage.

Custom metric prefix to avoid collector conflicts β€” The built-in smartmon collector in prometheus-node-exporter conflicted with custom textfile metrics, causing false critical alerts every 15 minutes. Using a custom_smart_ prefix completely eliminated the conflict without disabling any system services.

Atomic file writes for Prometheus textfile β€” Writing metrics to a temp file and mv-ing atomically prevents Prometheus from scraping a partially-written file and generating false alerts during the write window.

PATH in cron β€” Cron runs with a minimal environment. Discovered this when smartctl couldn't be found during cron execution despite working fine interactively. Fixed by explicitly exporting PATH at the top of all cron-executed scripts.


πŸ“ Repository Structure

homelab/
β”œβ”€β”€ Infrastructure/
β”‚   β”œβ”€β”€ proxmox-setup.md      β†’ LXC config, GPU passthrough, UID mapping
β”‚   └── lvm-storage.md        β†’ DAS setup, LVM provisioning, bind mounts
β”œβ”€β”€ Media/
β”‚   └── media-server.md       β†’ Full ARR stack, VPN kill switch, hardlinks
β”œβ”€β”€ Monitoring/
β”‚   β”œβ”€β”€ prometheus-stack.md   β†’ Scrape configs, exporters, alert rules
β”‚   β”œβ”€β”€ smart-monitoring.md   β†’ Custom SMART exporter, timer conflict fix
β”‚   └── grafana.md            β†’ Dashboard queries, R2 monitoring panels
β”œβ”€β”€ Networking/
β”‚   └── tailscale.md          β†’ Exit node setup, subnet routing
β”œβ”€β”€ Dashboard/
β”‚   └── homepage-dashboard.md β†’ Homepage YAML config, widget setup
└── Templates/
    └── service-template.md   β†’ Standard template for new service docs

πŸ—“οΈ Changelog

Date Change
2026-03 Initial build β€” Proxmox, Docker stack, ARR media pipeline
2026-03 DAS storage β€” Terramaster D4-320, LVM, LXC bind mount
2026-03 Monitoring stack β€” Prometheus, Grafana, Alertmanager, PVE Exporter
2026-03 Custom SMART monitoring β€” textfile collector, systemd timer conflict resolution
2026-03 Offsite backup β€” rclone + Cloudflare R2, R2 size monitoring in Grafana
2026-03 Documentation workflow β€” Obsidian vault synced to GitHub
2026-03 Expanded stack β€” Navidrome, Audiobookshelf, Stirling PDF, IT Tools

About

Self-hosted homelab on Proxmox VE - containerized services, monitoring stack, automated backups

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors