fluxnet

Description, topology, and high-level configuration of my local fluxnet environment, to help simplify management and provide examples that others can apply in their own environments.

This page (and this repo) is very much a very early Work-in-Progress.

Quick Notes:

My virtualization environment as it is described below is currently contained/constrained to a single Lenovo Tiny M900 (not described below). The primary pool's host OSs were intentionally wiped for the first rebuild in several years to prepare for a series of video tutorials/training on XCP-ng and Xen Orchestra likely to be posted on YouTube (and linked here).
One of my goals beyond laying the virtualization foundation is to build in some level of container orchestration. For a while, I had been waffling about between Docker Swarm and some flavor of Kubernetes. There is already a somewhat decent level of Docker/Compose usage, so my first thought went to Docker Swarm. Of course, after spending just a small amount of time simply trying to search Google for which is better for my use case, far too many links would proclaim that Docker and/or Swarm is dead and this and that and this and that. Too much of it seemed like somewhere between incompetent analysis and kneejerk conspiracy theory. Fanatics gonna fanatic. It's like Genesis vs. SNES or PS2 vs. Xbox all over again, just in the container space. I don't have the experience to break down the truth here, but this excellent post was revelatory. I think it is safe to say that since Mirantis acquired Docker's enterprise business in 2019 there are many more competitors in the container space, but also that Docker Swarm is not going anywhere anytime soon. tl;dr Docker Swarm will be my choice for container orchestration, at least in the short term.
My time has been partly consumed by studying for technical certifications, so progress on rebuilding everything has been a bit slow lately. I don't know how popular LFCS and DCA actually are, but I figured starting with familiar is better than forcing myself back into the Cisco world for the first time since maybe 2013. Then maybe LPIC-1 and/or RHCSA. Eh, time will tell.

My Methodology:

Let's start by describing the environment and why it was designed this way. In a production environment, it would be ideal to strive for redundancy and fault tolerance at every layer, but that is not always feasible depending on many factors. Redundancy can include multiple physical hosts, dual storage controllers and RAID arrays, firewall HA with CARP, LACP internal network links and redundant switches, redundant carrier links, and so on. The most obvious thing preventing redundancy at every layer is cost. Extra physical hosts, network switches, storage controllers, and so on can raise the design cost of an environment by double, if not more. Another element is simply the added complexity that needs to be mitigated by skilled admins and engineers, both in design, management, and maintenance, which again adds to cost. There is a lot of similarity between the production and the lab environments. Both use a single firewall, a single core/access switch and a single storage switch, among other things. Redundancy at these layers /could/ be built in, but I chose this because of the reduced complexity, simpler management, and they can be easily replaced with minimal downtime. If 99.9999% uptime is not required, there is no point in pulling your hair out over it. Literally all of IT is built up like a house of cards, and the balancing act of knowing where to invest time, effort, and money comes down to experience and a bit of luck.

Could this be accomplished with two hosts instead of three? Certainly. The reason I like three identical hosts instead of just two is that you have one more level of peace of mind. I might be wrong, but I think the most likely core physical element of an environment like these to experience downtime is a physical server, or in this case, virtualization host. We have all seen servers lock up for one reason or another, whether it's due to failed updates or something else. With just two hosts, literally everything rests on keeping that other host up while troubleshooting the other host. Maybe its as simple as a reboot, but often it is not. With three hosts, there is a lot more flexibility with remaining resources and how troubleshooting is approached, as well as the unlikelihood of a second host going down and making everything rest on a single host just as in the two host scenario.

Also, when virtualization hosts are treated as ephemeral appliances, as I do with XCP-ng, it becomes near trivial to simply reinstall the OS and rejoin the existing pool if troubleshooting seems like it is not worth the time, or if absolutely necessary. Three hosts allow the decision to reinstall to be much less stressful than just two hosts.

Three hosts also means hypervisor-level High Availability is possible, albeit with that aforementioned complexity. HA is not necessary in these environments, but having three hosts is a huge hurdle already crossed in case it ever does become necessary.

Redundant storage controllers in the production environment meant that the virtualization environment stayed operational in the extremely unlikely event that a single controller went down. There was only one time in the nearly five years of running this TrueNAS appliance that one of the controllers was down for longer than a few short moments due to a failed update, and the remaining controller was just fine the whole time waiting to apply the update until the first controller was good again. It was one of the few times I needed to call on support. Of course, this is perfect example of why redundancy is so important, not to mention a solid level of support for anything in production.

Basic hardware specs for both environments, as well as my approach to subnetting, are listed below. As mentioned above, this all is a Work-In-Progress.

The Proof is in the Pudding:

Core Components:

XCP-ng (8.3 LTS)

Xen Orchestra (v5 is preferred for now, but v6 is gorgeous)

Example Production Environment:

Hosts:


Host(s)	3x Dell PowerEdge R450
CPU	Intel Xeon Silver 4316 / 20c40t
RAM	256GB
Storage	Mirrored NVMe boot disk
Network 1GbE	4x 1GbE
Network 10GbE	2x 10GbE

Storage:


Model	TrueNAS X10
Contr.	2x redundant X10 controllers, 32GB RAM each, 12 vCPU each
HDD	9x TrueNAS SAS 7200RPM in RAID1+0 with hot spare (43.2TB usable)
Read Accelerator	400GB SAS SSD
Write Accelerator	16GB SAS SSD
Network 1GbE	2x 1GbE (per controller)
Network 10GbE	2x 10GbE (per controller)

Network:


Access/Core	Unifi USW-Pro-24
Storage	Unifi USW-Pro-Aggregation
Link Media	Copper preferred. CAT6 for 1GbE, DAC for 10GbE, fiber for carrier links

Lab Environment:

Hosts:


Host(s)	3x Lenovo M720q
CPU	Intel Core i7-8700t / 6c12t
RAM	32GB
Storage	120GB+ NVMe
Network 1GbE	1x 1GbE
Network 10GbE	2x 10GbE (10Gtek [Intel 82599ES Controller])

Storage:


Model	QNAP TS-873
CPU	AMD RX-421ND / 4c4t
RAM	16GB
HDD	8x 16TB in RAID6 (87.26TB usable)
SSD	2x 2TB in RAID0 (1.76TB usable)
Network 1GbE	4x 1GbE
Network 10GbE	2x 10GbE

Network:


Access/Core	Unifi USW 24 PoE
Storage	Unifi USW Aggregation
Link Media	Copper preferred. CAT6 for 1GbE, DAC for 10GbE

Topology:

fluxnet-xcp-host-001


vlan100_management	10.1.100.221	1GbE
vlan250_storage	10.1.250.221	10GbE

fluxnet-xcp-host-002


vlan100_management	10.1.100.222	1GbE
vlan250_storage	10.1.250.222	10GbE

fluxnet-xcp-host-003


vlan100_management	10.1.100.223	1GbE
vlan250_storage	10.1.250.223	10GbE

Subnets:

My approach to subnetting here will absolutely not be applicable to many, if not even most, environments. This started when I took over production infrastructure that included several locations, each using their own flat /24 subnet. These looked like 192.168.A.B, where A was the site, and B for the hosts/devices. Unsurprisingly, the two subnets with the most devices would quite often run out of available IP addresses. What do the hosts at each location look like? Are they user PCs/laptops, cell phones, voip phones, servers, network equipment, etc.? With subnet math, there are often many ways to address this. Since I was the only one managing the infrastructure, including just about every aspect of it, I chose to walk what I think is the simplest, most efficient, and most logical path.

The first thing was to plan the VLANs/subnets. Despite having 65535 available, I wanted to stay between 1-254 (which should be apparent below), and I chose to use 100-200 initially because they are easier to distinguish visually when displayed in a list. Initially, I had planned ten VLANs (100, 110, 120, etc.) for things like primary data, VoIP, management, servers, and a few others. In practice, I think I only ended up using a handful of them. A key thing here is VLAN consistency at each location. Each VLAN/subnet was also its own /24. Even with just the handful that were used consistently, that segmented the network enough that there may not ever be another shortage of IP addresses.

It really is as simple as this: 10.A.B.C

A = Site number
B = VLAN
C = Host

With my home network, the site number was no longer important. I still stayed with a very similar schema, because.... futureproofing? I did not need the same number of VLANs, but the core ideas are there.

VLAN	Logical Name	Description	CIDR
25	vlan25_fluxnet	Primary Data	10.1.25.0/24
50	vlan50_idiot	IoT Devices	10.1.50.0/24
100	vlan100_management	Management	10.1.100.0/24
120	vlan120_voice	VoIP (Future)	10.1.120.0/24
150	vlan150_secnet	Secure Devices	10.1.150.0/24
200	vlan200_kubernetes	Kubernetes (Future)	10.1.200.0/24
250	vlan250_storage	Storage/SAN - 10GbE links	10.1.250.0/24

Storage Repositories:

Name	SR Type	Protocol	Medium	IP Address
ISO	ISO	NFS	HDD (RAID 6)	10.1.250.250
vmstore_001	VM	iSCSI	HDD (RAID 6)	10.1.250.250
vmstore_nfs_001	VM	NFS	HDD (RAID 6)	10.1.250.250
vmstore_ssd_nfs_001	VM	NFS	SSD (6Gb/s SATA, RAID 0)	10.1.250.250

Backup Repositories:

Name	Protocol	Medium	IP Address
vm_backups	NFS	HDD (RAID 6)	10.1.250.250

Xen Orchestra:


vlan100_management	10.1.100.220	1GbE
vlan250_storage	10.1.250.220	10GbE

Backups:

Virtual Machines:

Xen Orchestra provides the ability to do full and incremental (delta) backups (to a backup remote), rolling snapshots (to same SR, next to VM disk), replication to another SR (or same), as well as backups of XO config and XCP-ng pool metadata.

My bare minimum recommendation would be a nightly backup of XO config and XCP-ng pool metadata, nightly incremental backups with retention of 7+ and full backup forced every 7 days, and weekly full backups with retention of 4+. Off-site backup replication is also recommended.

XO config and XCP-ng pool metadata - This is backed up nightly.
** Incremental (Delta)** - ( I got sidetracked when writing this out... will get back to it asap.)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets/images		assets/images
stacks/docker-001		stacks/docker-001
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fluxnet

Quick Notes:

My Methodology:

The Proof is in the Pudding:

Core Components:

Example Production Environment:

Hosts:

Storage:

Network:

Lab Environment:

Hosts:

Storage:

Network:

Topology:

fluxnet-xcp-host-001

fluxnet-xcp-host-002

fluxnet-xcp-host-003

Subnets:

Storage Repositories:

Backup Repositories:

Xen Orchestra:

Backups:

Virtual Machines:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

fluxnet

Quick Notes:

My Methodology:

The Proof is in the Pudding:

Core Components:

Example Production Environment:

Hosts:

Storage:

Network:

Lab Environment:

Hosts:

Storage:

Network:

Topology:

fluxnet-xcp-host-001

fluxnet-xcp-host-002

fluxnet-xcp-host-003

Subnets:

Storage Repositories:

Backup Repositories:

Xen Orchestra:

Backups:

Virtual Machines:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages