diff --git a/ic-os/README.adoc b/ic-os/README.adoc index 40336968a3d..b4fa054451e 100644 --- a/ic-os/README.adoc +++ b/ic-os/README.adoc @@ -1,51 +1,100 @@ = IC-OS -IC-OS is an all-encompassing term for all the operating systems within the IC: setupOS, hostOS, guestOS, and boundary-guestOS. +== Introduction -* setupOS: responsible for booting a new replica node and installing hostOS and guestOS. -* hostOS: the operating system that runs on the host machine. The main responsibility of hostOS is to launch and run the guestOS in a virtual machine. In regards to its capabilities, it is dumb by design. -* guestOS: the operating system that runs inside of a virtual machine on the hostOS. Inside guestOS, the core IC protocol is run. -* boundary-guestOS: the operating system that runs on boundary nodes +IC-OS is an umbrella term for all the operating systems within the IC, including SetupOS, HostOS, GuestOS, and Boundary-guestOS. -== Operating System +* SetupOS: Responsible for booting a new replica node and installing HostOS and GuestOS. +* HostOS: The operating system that runs on the host machine. Its main responsibility is to launch and run the GuestOS in a virtual machine. In terms of its capabilities, it is intentionally limited by design. +* GuestOS: The operating system that runs inside a virtual machine on the HostOS. The core IC protocol is executed within the GuestOS. +* Boundary-guestOS: The operating system that runs on boundary nodes. -Each IC-OS operating system is currently based on the Ubuntu 20.04 Server LTS Docker image: +== Building IC-OS images - FROM ubuntu:20.04 +All the IC-OS images can be built though Bazel. -Missing components such as the kernel, bootloader and system service manager are installed during the build process. +=== Environment setup + +Building IC-OS images locally requires environment configuration. The required packages are found in ic/gitlab-ci/container/Dockerfile. + +In addition to these packages, https://bazel.build/install[Bazel] must be installed. + +As an alternative, the following script can be used to build the images in a container with the correct environment already configured: + + ./gitlab-ci/container/container-run.sh + +=== Build targets + +Each image has its own build targets, which are variations of the image: + +* SetupOS: `prod`, `dev` +* HostOS: `prod`, `dev` +* GuestOS: `prod`, `dev`, `dev-malicious` +* BoundaryGuestOS: `prod`, `prod-sev`, `dev`, `dev-sev` +** Note that the `dev` and `dev-sev` images use the local service worker, while the `prod` and `prod-dev` images pull the service worker from `npm`. + +The difference between production and development images is that the console can be accessed on `dev` images, but not on `prod` images. + +Note: The username and password for all IC-OS `dev` images are set to `root` + +=== Building images + +Use the following command to build images: + + $ bazel build //ic-os/{setupos,hostos,guestos,boundary-guestos}/envs//... + +All build outputs are stored under `/ic/bazel-bin/ic-os/{setupos,hostos,guestos,boundary-guestos}/envs/` + +Example: + + $ bazel build //ic-os/guestos/envs/dev/... + # This will output a GuestOS image in /ic/bazel-bin/ic-os/guestos/envs/dev == Under the hood: Building an image -GuestOS, boundary-guestOS, hostOS, and setupOS each have a docker-base file containing all their external dependencies, and once a week, the CI pipeline builds a new base image for each OS. -The docker base image creates a common version of dependencies, which helps provide determinism to our builds. +IC-OS images are first created as docker images and then transformed into "bare-metal" or "virtual-metal" images that can be use outside containerization. -Each OS also has a main dockerfile that builds off the base image, and builds a docker image containing the actual system logic. +Rather than installing and relying on a full-blown upstream ISO image, the system is assembled based on a minimal Docker image with the required components added. This approach allows for a minimal, controlled, and well understood system - which is key for a secure platform. -Then, this docker image is transformed into a bootable "bare-metal" image (or "virtual-metal" VM image) that can be used outside of containerization (either in a VM or as a physical host operating system). This results in a very minimal system with basically no services running at all. +The build process is as follows: -Note that all pre-configuration of the system is performed using docker utilities, and the system is actually also operational as a docker container. -This means that some development and testing could be done on the docker image itself, but an actual VM image is required for proper testing. +=== Docker + +The docker build process is split into two dockerfiles. This split is necessary to ensure a reproducible build. + +*Dockerfile.base* -== Developing the Ubuntu system + ic/ic-os/{setupos,hostos,guestos,boundary-guestos}/rootfs/Dockerfile.base -The Ubuntu configuration and system logic is contained in the rootfs/ subdirectory of each OS. -See instructions link:README-rootfs.adoc#[here] on how to make changes to the OS. + ** The Dockerfile.base takes care of installing all upstream Ubuntu packages. + ** Because the versions of these packages can change at any given time (as updates are published regularly), in order to maintain build determinism, once a week, the CI pipeline builds a new base image for each OS. The result is published on the DFINITY public https://hub.docker.com/u/dfinity[Docker Hub]. -== Directory organization +*Dockerfile* + + ic/ic-os/{setupos,hostos,guestos,boundary-guestos}/rootfs/Dockerfile + + ** The +Dockerfile+ builds off the published base image and takes care of configuring and assembling the main disk-image. + ** Any instruction in this file needs to be reproducible in itself. + +=== Image Transformation + +The docker image is then transformed into a bootable "bare-metal" or "virtual-metal" VM image for use outside containerization (either in a VM or as a physical host operating system). The resulting image is minimal, with only a few systemd services running. + +Note that all pre-configuration of the system is performed using docker utilities, and the system is actually also operational as a docker container. +This means that some development and testing could be done on the docker image itself, but an actual VM image is still required for proper testing. -Each rootfs/ subdirectory contains everything related to building a bootable Ubuntu system. -It uses various template directories (e.g. /opt) that are simply copied verbatim to the target system -- you can just drop files there to include them in the image. +== IC-OS Directory Organization -The bootloader/ directory contains everything related to building EFI firmware and the grub bootloader image. It is configured to support the A/B partition split on those OSes that are upgradable (hostOS, guestOS, and potentially boundary-guestOS) +* *bootloader/*: This directory contains everything related to building EFI firmware and the GRUB bootloader image. It is configured to support the A/B partition split on upgradable IC-OS images (HostOS, GuestOS, and potentially Boundary-guestOS) -All build scripts are contained in the scripts/ directory. -Note: guestOS has many scripts in its own scripts/ subdirectory that still need to be unified with the outer scripts/ directory. +* *scripts/*: This directory contains build scripts. +** Note that GuestOS has its own scripts subdirectory that still need to be unified with the outer scripts directory. -== Environment setup -To build IC-OS images outside of using /gitlab-ci/container/container-run.sh, you will need to configure your environment. To see what packages you must install, see ic/gitlab-ci/container/Dockerfile. +* *rootfs/*: Each rootfs subdirectory contains everything required to build a bootable Ubuntu system. Various template directories (e.g., /opt) are used, which are simply copied verbatim to the target system. You can add files to these directories to include them in the image. +** For instructions on how to make changes to the OS, refer to the link:docs/Rootfs.adoc#[rootfs documentation] -== Storing the SEV Certificates on the host (e.g. for test/farm machines) +== SEV testing +=== Storing the SEV Certificates on the host (e.g. for test/farm machines) Note: we are storing the PEM files instead of the DER files. @@ -54,9 +103,9 @@ Note: we are storing the PEM files instead of the DER files. % sev-host-set-cert-chain -r ark.pem -s ask.pem -v vcek.pem ``` -== Running SEV-SNP VM with virsh +=== Running SEV-SNP VM with virsh -=== Preparing dev machine +==== Preparing dev machine Here are the steps to run a boundary-guestOS image as a SEV-SNP image diff --git a/ic-os/boundary-guestos/README.adoc b/ic-os/boundary-guestos/README.adoc index c31efe1a4e4..b3955b7d4b7 100644 --- a/ic-os/boundary-guestos/README.adoc +++ b/ic-os/boundary-guestos/README.adoc @@ -2,21 +2,9 @@ This contains the instructions to build the system images for a Boundary Node. More detailed information can be found link:doc/README.adoc[here]. -== Build the BN image -The entire build process relies on `bazel`. In order to build the Boundary Node image with `bazel`, use the provided build environment: +== Build a Boundary Node image -[source,shell] - gitlab-ci/container/container-run.sh - -Then, you can build any of the boundary-guestos targets (e.g., `prod`, `prod-sev`, `dev`, `dev-sev`): - -[source,shell] - bazel build //ic-os/boundary-guestos/envs/prod - -Bazel locally builds and includes the binaries, such that you can test your changes. -The `dev` and `dev-sev` images use the local service worker, while the `prod` and `prod-dev` images pull the service worker from `npm`. - -All the outputs from the build are stored under `bazel-bin/ic-os/boundary-guestos/envs/`. +To build a boundary node image, refer to the link:../README.adoc[IC-OS README] == Run a Boundary Node locally @@ -121,7 +109,6 @@ ic-os/boundary-guestos/scripts/build-bootstrap-config-image.sh \ _Note:_ If you need to make changes, just destroy the VM, rebuild the images you need and create the VM again. The XML configuration file can be reused. - == Developing with the system The entirety of the actual Ubuntu operating system is contained in the @@ -137,9 +124,3 @@ include them into the image. The directory `../bootloader/` contains everything related to building EFI firmware and the grub bootloader image. All build steps are contained in the link:../defs.bzl[../defs.bzl] and the target specific directories (e.g., link:prod/BUILD.bazel[prod/BUILD.bazel]). - -== Under the hood - -The Ubuntu system is built by converting the official Ubuntu docker image -into a bootable "bare-metal" image (or "virtual-metal" VM image). This -results in a very minimal system with basically no services running at all. diff --git a/ic-os/docs/Network-Configuration.adoc b/ic-os/docs/Network-Configuration.adoc new file mode 100644 index 00000000000..aac7afcb573 --- /dev/null +++ b/ic-os/docs/Network-Configuration.adoc @@ -0,0 +1,101 @@ += Network Configuration + +== Basic network information + +Network configuration details for each IC-OS: + +* SetupOS +** Basic network connectivity is checked via pinging nns.ic0.app and the default gateway. Virtually no network traffic goes through SetupOS. +* HostOS +** The br6 bridge network interface is set up and passed to the GuestOS VM through qemu (refer to hostos/rootfs/opt/ic/share/guestos.xml.template). +* GuestOS +** An internet connection is received via the br6 bridge interface from qemu. + +== Deterministic MAC Address + +Each IC-OS node must have a unique but deterministic MAC address. To solve this, a schema has been devised. + +=== Schema + +* *The first 8-bits:* +** IPv4 interfaces: 4a +** IPv6 interfaces: 6a + +* *The second 8-bits:* +** We reserve the following hexadecimal numbers for each IC-OS: +*** SetupOS: 0f +*** HostOS: 00 +*** GuestOS: 01 +*** Boundary-GuestOS: 02 + +** Note: any additional virtual machine on the same physical machine gets the next higher hexadecimal number. + +* *The remaining 32-bits:* +** Deterministically generated + +=== Example MAC addresses + +* SetupOS: `{4a,6a}:0f:` +* HostOS: `{4a,6a}:00:` +* GuestOS: `{4a,6a}:01:` +* BoundaryOS: `{4a,6a}:02:` +* Next Virtual Machine: `{4a,6a}:03:` + +Note that the MAC address is expected to be lower-case and to contain colons between the octets. + +=== Deterministically Generated Part + +The deterministically generated part is generated using the following inputs: + +1. IPMI MAC address (the MAC address of the BMC) +a. Obtained via `$ ipmitool lan print | grep 'MAC Address'`` +2. Deployment name +a. Ex: `mainnet` + +The concatenation of the IPMI MAC address and deployment name is hashed: + + $ sha256sum "" + # Example: + $ sha256sum "3c:ec:ef:6b:37:99mainnet" + +The first 32-bits of the sha256 checksum are then used as the deterministically generated part of the MAC address. + + # Checksum + f409d72aa8c98ea40a82ea5a0a437798a67d36e587b2cc49f9dabf2de1cedeeb + + # Deterministically Generated Part + f409d72a + +==== Deployment name + +The deployment name is added to the MAC address generation to further increase its uniqueness. The deployment name *mainnet* is reserved for production. Testnets must use other names to avoid any chance of a MAC address collision in the same data center. + +The deployment name is retrieved from the +deployment.json+ configuration file, generated as part of the SetupOS: + + { + "deployment": { + "name": "mainnet" + } + } + +== IPv6 Address + +The IP address can be derived from the MAC address and vice versa: As every virtual machine ends in the same MAC address, the IPv6 address of each node on the same physical machine can be derived, including the hypervisor itself. +In other words, the prefix of the EUI-64 formatted IPv6 SLAAC address is swapped to get to the IPv6 address of the next node. + +When the corresponding IPv6 address is assigned, the IEEE’s 64-bit Extended Unique Identifier (EUI-64) format is followed. In this convention, the interface’s unique 48-bit MAC address is reformatted to match the EUI-64 specifications. + +The network part (i.e. +ipv6_prefix+) of the IPv6 address is retrieved from the +config.json+ configuration file. The host part is the EUI-64 formatted address. + +== Active backup + +[NOTE] +This feature is currently under development. See ticket https://dfinity.atlassian.net/browse/NODE-869#[NODE-869]. + +In order to simplify the physical cabling of the machine, Linux's active-backup bonding technique is utilized. This operating mode also improves redundancy when more than one 10-gigabit ethernet network interface is connected to the switch. A node operator can decide to either just use one or all of the 10GbE network interfaces in the bond. The handling of the uplink and connectivity is taken care of by the Linux operating system. + +Details can be found in: + + ic/ic-os/setupos/rootfs/opt/ic/bin/generate-network-config.sh + +Note that this mode does not increase the bandwidth/throughput. Only one link will be active at the same time. diff --git a/ic-os/docs/README.adoc b/ic-os/docs/README.adoc new file mode 100644 index 00000000000..2c6fc330787 --- /dev/null +++ b/ic-os/docs/README.adoc @@ -0,0 +1,8 @@ += IC-OC docs + +Refer to detailed documentation on: + +* link:Services{outfilesuffix}[Services] +* link:Network-Configuration{outfilesuffix}[Network-Configuration] +* link:Rootfs{outfilesuffix}[Rootfs] +* link:SELinux{outfilesuffix}[SELinux security policy] diff --git a/ic-os/README-rootfs.adoc b/ic-os/docs/Rootfs.adoc similarity index 79% rename from ic-os/README-rootfs.adoc rename to ic-os/docs/Rootfs.adoc index 2513b590fed..845b981230f 100644 --- a/ic-os/README-rootfs.adoc +++ b/ic-os/docs/Rootfs.adoc @@ -4,7 +4,7 @@ The Ubuntu-based IC OS is built by: * creating a root filesystem image using docker -- this is based on the official Ubuntu docker image and simply adds the OS kernel plus our - required services to it + required services to it. * converting this root filesystem into filesystem images for +/+ and +/boot+ via +mke2fs+ @@ -136,19 +136,6 @@ For all of the above, the system expects a file +ic-bootstrap.tar+ - either already present at +/mnt+ or supplied on a removable storage medium (e.g. a USB stick or an optical medium). -==== Network configuration - -The network configuration is performed using a file +network.conf+ in the -bootstrap tarball. It must contain lines of "key=value" statements, -with the following keys supported: - -* ipv6_address: address used for the IC replica service -* ipv6_gateway: gateway used for the primary interface -* name_servers: space-separated list of DNS servers - -This configuration file is simply copied to the +config+ partition and evaluated -on each boot to set up network. - ==== Journalbeat configuration The Journalbeat configuration is performed using a file +journalbeat.conf+ in @@ -157,26 +144,3 @@ with the following keys supported: * journalbeat_hosts: space-separated list of logging hosts * journalbeat_tags: space-separated list of tags - -== SELinux - -The system will (eventually) run SELinux in enforcing mode for security. This -requires that all system objects including all files on filesystems are -labelled appropriately. The "usual" way of setting up such a system is -to run it in "permissive" mode first on top of an (SELinux-less) base -install, however this would not work for our cases as we never want the -system to be in anything else than "enforcing" mode (similarly as for -embedded systems in general). - -Instead, SELinux is installed using docker into the target system, but -without applying any file labels (which would not be possible in docker -anyways). The labelling is then applied when extracting the docker image -into a regular filesystem image, with labels applied as per -+/etc/selinux/default/contexts/files/file_contexts+ in the file system -tree. - -Since the system has never run, some files that would have "usually" been -created do not exist yet and are not labelled -- to account for this, -a small number of additional permissions not foreseen in the reference -policy are required -- this is contained in module +fixes.te+ and set -up as part of the +prep.sh+ script called in docker. diff --git a/ic-os/docs/SELinux.adoc b/ic-os/docs/SELinux.adoc new file mode 100644 index 00000000000..d32fb038b86 --- /dev/null +++ b/ic-os/docs/SELinux.adoc @@ -0,0 +1,33 @@ +== SELinux + +SELinux is currently configured to run in enforcing mode for the sandbox and in permissive mode for the rest of the replica (Note: Technically, SELinux is running in enforcing mode, but only the sandbox has a written-out policy. Most other domains are marked as "permissive"). + +This means that the SELinux policy is enforced only for the sandbox, and just used to monitor and log access requests on the rest of the replica. +This approach allows us to secure the sandbox while observing how SELinux would behave under enforcing mode on the rest of the replica without actually denying access. + +To develop a robust SELinux policy, we need to understand all the actions a service may require and include the necessary permissions in the policy. +Over time, we will continue refining the SELinux policy until no services violate it. +Once achieved, we will run the entire replica in enforcing mode. + +== Technical details + +The system will (eventually) run SELinux in enforcing mode for security. This +requires that all system objects including all files on filesystems are +labelled appropriately. The "usual" way of setting up such a system is +to run it in "permissive" mode first on top of an (SELinux-less) base +install, however this would not work for our cases as we never want the +system to be in anything else than "enforcing" mode (similarly as for +embedded systems in general). + +Instead, SELinux is installed using docker into the target system, but +without applying any file labels (which would not be possible in docker +anyways). The labelling is then applied when extracting the docker image +into a regular filesystem image, with labels applied as per ++/etc/selinux/default/contexts/files/file_contexts+ in the file system +tree. + +Since the system has never run, some files that would have "usually" been +created do not exist yet and are not labelled -- to account for this, +a small number of additional permissions not foreseen in the reference +policy are required -- this is contained in module +fixes.te+ and set +up as part of the +prep.sh+ script called in docker. diff --git a/ic-os/docs/Services.adoc b/ic-os/docs/Services.adoc new file mode 100644 index 00000000000..7016be3f52f --- /dev/null +++ b/ic-os/docs/Services.adoc @@ -0,0 +1,74 @@ += Services + +== Packages + +We use Focal (20.04) package repositories for our Ubuntu packages. +To see the full list of packages included in each IC-OS, refer to the rootfs/packages.common file in each respective OS. + +== Services + +In addition to the regular, built-in Ubuntu services, a unique set of systemd services are added or managed for each IC-OS. Some services are enabled in rootfs/Dockerfile, and custom services are defined at rootfs/etc/systemd/services. + +The specific systemd services for each IC-OS are as follows: + +[NOTE] +These lists may be out-of-date. For the source of truth, see each OSes `rootfs/Dockerfile` and `rootfs/etc/systemd`. + +=== SetupOS + +|==== +|Name |Type |State |Upstream|Description +|config |service|Enabled |No |Normalize config.ini configuration file +|generate-network-config |service|Enabled |No |Configure physical network interfaces, bonds and bridges +|setupos |service|Enabled |No |Initiate the SetupOS installation +|systemd-networkd-wait-online |service|Enabled |Yes |Wait for Network to be Configured +|systemd-networkd |service|Enabled |Yes |Network Service +|systemd-resolved |service|Enabled |Yes |Network Name Resolution +|systemd-timesyncd |service|Disabled|Yes |NTP Client +|==== + +=== HostOS + +|==== +|Name |Type |State |Upstream|Description +|chrony |service|Enabled|Yes |chrony, an NTP client/server +|deploy-updated-ssh-account-keys|service|Enabled|No |Manage SSH public keys +|generate-guestos-config |service|Enabled|No |Configure virtual machine XML configuration from template +|generate-network-config |service|Enabled|No |Configure physical network interfaces, bonds and bridges +|guestos |service|Enabled|No |Start and stop virtual machine +|journalbeat |service|Enabled|No |Logging daemon +|libvirtd |service|Enabled|Yes |Virtualization daemon +|monitor-guestos |service|Enabled|No |Monitor virtual machine service +|monitor-guestos |timer |Enabled|No |Monitor virtual machine interval +|nftables |service|Enabled|Yes |nftables firewall +|node_exporter |service|Enabled|No |Prometheus node_exporter daemon +|relabel-machine-id |service|Enabled|No |Relabel unique machine ID +|save-machine-id |service|Enabled|No |Save unique machine ID +|setup-hostname |service|Enabled|No |Configure hostname +|setup-libvirt |service|Enabled|No |Configure Libvirt +|setup-node_exporter-keys |service|Enabled|No |Configure node_exporter daemon +|setup-ssh-account-keys |service|Enabled|No |Configure SSH public keys +|setup-ssh-keys |service|Enabled|No |Generate SSH host keys +|systemd-journal-gatewayd |service|Enabled|No |Journal Gateway Service +|systemd-networkd-wait-online |service|Enabled|Yes |Wait for Network to be Configured +|systemd-networkd |service|Enabled|Yes |Network Service +|systemd-resolved |service|Enabled|Yes |Network Name Resolution +|vsock-agent |service|Enabled|No |VSOCK agent daemon +|==== + +== 3rd Party Software + +Below is a list of 3rd party software installed from their official sources. We strictly install software using vendor-packaged archives, preferably tarballs, to maintain the highest level of control over the installation process. + +|==== +|Name |Description | IC-OS |URL + +|Journalbeat |Lightweight shipper for forwarding and centralizing log data from systemd journals. | HostOS, GuestOS |https://artifacts.elastic.co/downloads/beats/journalbeat/ + +|node_exporter |Service that collects and publishes system metrics. | HostOS, GuestOS |https://github.com/prometheus/node_exporter/releases + +|QEMU |Quick Emulator, a hypervisor.| HostOS |https://download.qemu.org/ + +|SEV |Hardware-based memory encryption.| SetupOS |https://github.com/dfinity/AMDSEV/releases + +|==== diff --git a/ic-os/guestos/README.adoc b/ic-os/guestos/README.adoc index 6ab098f58d6..532a5364a45 100644 --- a/ic-os/guestos/README.adoc +++ b/ic-os/guestos/README.adoc @@ -1,63 +1,62 @@ = Guest OS -A GuestOS image is comprised of the base Ubuntu plus the replica and orchestrator binaries. -GuestOS runs inside a QEMU virtual machine. The IC protocol runs inside of the guestOS virtual machine. +GuestOS refers to the operating system running inside a QEMU virtual machine on the hostOS. A GuestOS image consists of the base Ubuntu system, along with the replica and orchestrator binaries. The IC protocol runs inside the GuestOS virtual machine. For more details on the goals, structure, and disk layout of GuestOS, https://docs.google.com/presentation/d/1xECozJhVCqzFC3mMMvROD7rlB-xWDHHLKvZuVnuLgJc/edit?usp=sharing[see here] -== How to build and run guestOS -=== Building guestOS +== How to build and run GuestOS +=== Building GuestOS -To build with Bazel, you need https://bazel.build/install[Bazel] installed. -Alternatively, the following script allows building IC-OS in a container with the correct environment already configured: +To build a GuestOS image, refer to the link:../README.adoc[IC-OS README] - ./gitlab-ci/container/container-run.sh +=== Running GuestOS -Build a guestOS image: +The GuestOS image (`disk.img`) can booted directly in qemu using the following command: -* You can build any of the guestos targets (`prod`, `dev`, `dev-malicious`): + qemu-system-x86_64 \ + -nographic -m 4G \ + -bios /usr/share/OVMF/OVMF_CODE.fd \ + -drive file=disk.img,format=raw,if=virtio - bazel build //ic-os/guestos/{prod,dev,dev-malicious}/... +* Note: Press `Ctrl-A` followed by `x` to exit the QEMU console. -This will output disk-img.tar{.gz,.zst} in /ic/bazel-bin/ic-os/guestos/{prod,dev,dev-malicious}, which is your tarred guestOS image. +Alternatively, Bazel can be used to perform a testnet deployment. For documentation on this process, see ic/testnet/tools/README.md. -=== Running guestOS +==== Launch a GuestOS VM on farm -The guestOS image (disk.img) can booted directly in qemu: +Instead of running GuestOS locally in qemu, you can launch a GuestOS virtual machine on Farm: - qemu-system-x86_64 \ - -nographic -m 4G \ - -bios /usr/share/OVMF/OVMF_CODE.fd \ - -drive file=disk.img,format=raw,if=virtio + bazel run --config=systest //ic-os/guestos:launch-single-vm -* Note: `ctrl-a x` to quit the qemu console. +The program will spin up a new GuestOS VM on Farm, and the machine can then be accessed via SSH. -You can also use bazel to do a testnet deployment. For documentation on this process, see ic/testnet/tools/README.md +For more details about the program, refer to the `rs/ic_os/launch-single-vm` directory. == Upgrade GuestOS -The GuestOS disk layout contains two sets of system partitions, called partition sets "A" and "B". + +The GuestOS disk layout contains two sets of system partitions, called partition sets "A" and "B". The A/B partitions enable a dual-boot system that can be updated and maintained without any downtime. image:doc/media/guestOS_disk-layout.png[] -Above is the guestOS disk layout—partition set "A" in green, partition set "B" in blue. +The image above shows the GuestOS disk layout with partition set "A" in green and partition set "B" in blue. -At any point in time, one partition set is "active" and the other is "passive". -We can write to the passive partition set, and when we are ready, we can swap the active and passive partition sets, thereby upgrading the GuestOS. +At any given time, one partition set is "active" while the other is "passive". +To upgrade the GuestOS, first, the new GuestOS is written to the passive partition set. Then, the active and passive partition sets are "swapped," so that when GuestOS reboots, it will use the new GuestOS on the new partition, thereby upgrading the GuestOS. -=== Building upgrade image +=== Building GuestOS upgrade image -The same bazel command to build a guestOS image will also produce a guestOS upgrade image: +The same Bazel command used to build a GuestOS image will also produce a GuestOS upgrade image: bazel build //ic-os/guestos/{prod,dev,dev-malicious}/... -This will output update-img.tar{.gz,.zst} in /ic/bazel-bin/ic-os/guestos/{prod,dev,dev-malicious}, which is your tarred guestOS-update image. +This command will output update-img.tar{.gz,.zst} in /ic/bazel-bin/ic-os/guestos/{prod,dev,dev-malicious}, which is the tarred GuestOS update image. -=== Installing upgrade image +=== Installing GuestOS upgrade image rootfs/opt/ic/bin/manageboot.sh upgrade-install update-img.tar rootfs/opt/ic/bin/manageboot.sh upgrade-commit -After that, the newly installed system will be booted. Note that on the next boot, it will revert to the original system unless you then confirm that the new system is actually fully operational: +After these commands have been run, the newly installed system will be booted. Note that on the next boot, the system will revert back to the original GuestOS unless confirmation is given that the new system is fully operational: rootfs/opt/ic/bin/manageboot.sh confirm @@ -70,18 +69,3 @@ See instructions link:rootfs/README.adoc#[here] on how to make changes to the OS For further reading, see the docs in the link:doc/README.adoc#[doc/ subdirectory] - -== Alternate build paths - -GuestOS images can also be built with Docker or by using the build scripts directly. - -=== Building with Podman - -The container-based build process is described https://github.com/dfinity/ic#building-the-code[here]. -Be sure *not* to use Nix to build the IC binary artifacts. - - ./gitlab-ci/container/build-ic.sh - -This command will output a tarred guestOS image in the artifacts/icos folder: "disk-img.tar.gz" - -Note that building with Podman creates a production image as opposed to a development image. When you run production images inside of QEMU, you will not be able to access the console and other debugging tools. diff --git a/ic-os/guestos/rootfs/README.adoc b/ic-os/guestos/rootfs/README.adoc index 3beff388025..a13f6b8dcff 100644 --- a/ic-os/guestos/rootfs/README.adoc +++ b/ic-os/guestos/rootfs/README.adoc @@ -1 +1 @@ -For information on Ubuntu base OS development, see link:../../README-rootfs.adoc#[here] \ No newline at end of file +For information on Ubuntu base OS development, see link:../../docs/Rootfs.adoc#[here] diff --git a/ic-os/hostos/README.adoc b/ic-os/hostos/README.adoc index a1a80c43695..ebb4696bc16 100644 --- a/ic-os/hostos/README.adoc +++ b/ic-os/hostos/README.adoc @@ -4,142 +4,63 @@ The term HostOS is used for the operating system running on the physical machine. The purpose of this system is to enable virtualization for any node running on top. -Instead of installing and relying on a full blown upstream ISO image, we assemble the system based on a minimal Docker image and add the required components ourselves. This approach allows for a minimal, controlled and well understood system - which is key for a secure platform. +== Building HostOS -== Support +To build a Hostos image, refer to the link:../README.adoc[IC-OS README] -The following vendors and models are currently supported: +== Partitioning -|==== -|Manufacturer|Model |Mainboard|Processor |Memory |Storage -|Dell |PowerEdge R6525 |0DMD2T |2x AMD EPYC 7302|16x 32 GB (512 GB total) DDR4 ECC|10x 3.5 TB NVMe -|Supermicro |AS-1023US-TR4-0-BC27G|H11DSU-iN|2x AMD EPYC 7302|16x 32 GB (512 GB total) DDR4 ECC|1x 3.5 TB NVMe, 4x 8 TB SCSI -|==== - -=== Build Process - -To build with Bazel, you need https://bazel.build/install[Bazel] installed. -Alternatively, the following script allows building in a container with the correct environment already configured: - - ./gitlab-ci/container/container-run.sh - -Build a hostOS image: - -* You can build any of the hostos targets (`prod`, `dev`): - - bazel build //ic-os/hostos/envs/{prod,dev}/... - -This will output disk-img.tar{.gz,.zst} in /ic/bazel-bin/ic-os/hostos/envs/{prod,dev}, which is your tarred hostOS image. - -=== Docker - -We currently split the Docker build process into two Dockerfiles. This split is necessary to ensure a reproducible build. - - ic/ic-os/hostos/rootfs/Dockerfile.base - ic/ic-os/hostos/rootfs/Dockerfile - -The +Dockerfile.base+ takes care of installing all upstream Ubuntu packages. The version of these packages can change at any given time, as updates are published regularly. We publish the result on our public https://hub.docker.com/u/dfinity[Docker Hub]. +The partitioning layout consists of multiple logical volumes and two primary partitions. +Both HostOS and GuestOS have separate config and A/B partitions. The A/B partitions enable a dual-boot system that can be updated and maintained without any downtime. -The +Dockerfile+ takes care of configuring and assembling the main disk-image. Any instruction in this file needs to be reproducible in itself. - -=== Partitioning - -The partitioning layout consists of multiple logical volumes and three primary partitions. Both the Host- and GuestOS (hereinafter referred to as ReplicaOS) have separate config and A/B partitions. Please find a rough schema below. +Please find a rough schema below. |==== 2+^|Primary Partitions 17+^|LVM -9+^|HostOS 10+^| ReplicaOS +9+^|HostOS 10+^| GuestOS |EFI|Grub|Config|Boot A|Root A|Var A|Boot B|Root B|Var B|EFI|Grub|Config|Boot A|Root A|Var A|Boot B|Root B|Var B|Empty |==== +* *EFI*: EFI System Partition (ESP) for storing the bootloader and other UEFI-related data +* *Grub*: Partition for storing the GRUB bootloader configuration +* *Config*: Partition containing configuration files for each OS +* *Boot A/B*: Boot partitions for A and B configurations +* *Root A/B*: Root partitions for A and B configurations +* *Var A/B*: Partitions for storing variable data (e.g., logs) for A and B configurations +* *Empty*: Unallocated space for the GuestOS + The exact partitioning layout can be found in: - ic/ic-os/hostos/build/partitions.csv +`ic/ic-os/hostos/partitions.csv` The LVM configuration is defined in: - ic/ic-os/hostos/build/volumes.csv +`ic/ic-os/hostos/volumes.csv` -==== Sizing +=== Sizing -Most of the disk space is allocated to the logical volume of the ReplicaOS. Only about 65 GB are reserved for the HostOS. Please find the individual partition sizes below: +The majority of the disk space is allocated to the logical volume of the GuestOS, with only about 65 GB reserved for the HostOS. The table below displays individual partition sizes: |==== -10+^|HostOS 10+^| ReplicaOS +10+^|HostOS 10+^| GuestOS |EFI|Grub|Config|Boot A|Root A|Var A|Boot B|Root B|Var B|Unallocated Reserve|EFI|Grub|Config|Boot A|Root A|Var A|Boot B|Root B|Var B|Empty |100 MB|100 MB|100 MB|1 GB|10 GB|10 GB|1 GB|10 GB|10 GB|20 GB|100 MB|100 MB|100 MB|1 GB|10 GB|10 GB|1 GB|10 GB|10 GB|100%FREE |==== -==== Root Partition +=== Root Partition -The root partition is formatted as an +ext4+ file system and is _read-only_ mounted. +The root partition is formatted as an ext4 file system and is mounted as read-only. The corresponding fstab entry for the root partition is: # /dev/rootfs / ext4 ro,errors=remount-ro 0 1 For details, please refer to the +fstab+ file in: - ic/ic-os/hostos/rootfs/etc/fstab - -==== Config Partition - -The config partition holds the following configuration files: - - config.ini # data center specific network settings - deployment.json # deployment specific configurations - nns_public_key.pem # NNS public key - -===== config.ini - -The +config.ini+ configuration file contains all network related settings. These have to be supplied by the node provider/operator prior running the deployment. - -The configuration file expects the following, lower-case key=value pairs: - - ipv6_prefix=2a00:fb01:400:100 - ipv6_subnet=/64 - ipv6_gateway=2a00:fb01:400:100::1 - -[NOTE] -Please note that the values above are only an example. - -===== deployment.json - -The +deployment.json+ configuration file holds all deployment related settings, such as deployment name, log destination, dns servers, etc. - - { - "deployment": { - "name": "mainnet" - }, - "logging": { - "hosts": "host01.example.com:443 host02.example.com:443" - }, - "nns": { - "url": "http://host01.example.com:8080,http://host02.example.com:8080" - }, - "dns": { - "name_servers": "2606:4700:4700::1111 2606:4700:4700::1001 2001:4860:4860::8888 2001:4860:4860::8844" - }, - "resources": { - "memory": "490" - } - } - -[NOTE] -Please note that the values above are only an example. - -===== nns_public_key.pem - -The +nns_public_key.pem+ file holds the public key of the NNS. For mainnet it is: - - -----BEGIN PUBLIC KEY----- - MIGCMB0GDSsGAQQBgtx8BQMBAgEGDCsGAQQBgtx8BQMCAQNhAIFMDm7HH6tYOwi9 - gTc8JVw8NxsuhIY8mKTx4It0I10U+12cDNVG2WhfkToMCyzFNBWDv0tDkuRn25bW - W5u0y3FxEvhHLg1aTRRQX/10hLASkQkcX4e5iINGP5gJGguqrg== - -----END PUBLIC KEY----- +`ic/ic-os/hostos/rootfs/etc/fstab` -=== System Users +== System Users -In addition to the regular, built-in Ubuntu user accounts, we add the following users: +In addition to the regular, built-in Ubuntu user accounts, the following users are added: |==== |Username |Home Directory |Default Shell |Description @@ -150,267 +71,52 @@ In addition to the regular, built-in Ubuntu user accounts, we add the following |node_exporter|/home/node_exporter|/usr/sbin/nologin|node_exporter service account |==== -=== System Configuration - -Besides the build instructions in the Docker files (+Dockerfile.base+ and +Dockerfile+), all hard-coded system configurations can be found in the +rootfs/etc+ directory. The full path is: - - ic/ic-os/hostos/rootfs/etc/ - -=== Network Configuration - -In order to simplify the physical cabling of the machine, we utilize Linux's active-backup bonding technique. This operating mode also improves redundancy if more than one 10 gigabit ethernet network interface is hooked up to the switch. A node operator can decide to either just use one or all of the 10GbE network interfaces in the bond. The Linux operating system will take care of handling the uplink and connectivity. - -Details can be found in: - - ic/ic-os/hostos/rootfs/opt/ic/bin/generate-network-config.sh - -[NOTE] -Please note that this mode does not increase the bandwidth/throughput. Only one link will be active at the same time. - -==== Deterministic MAC Address - -To have unique but deterministic MAC addresses for our nodes, we came up with the following schema: - -- The first 8-bits of the MAC address start with 4a for the IPv4 interface and with 6a for the IPv6 interface. -- The second 8-bits are a consecutive hexadecimal number, starting at 00 and ending at ff. For the HostOS we reserved 00, for the first virtual machine (the ReplicaOS) 01. Any additional virtual machine on the same physical machine gets the next higher hexadecimal number: - - # HostOS - 6a:00: - - # ReplicaOS - 6a:01: - - # BoundaryOS - 6a:02: - - # Next Virtual Machine - 6a:03: - - # SetupOS - 6a:0f: - -[NOTE] -Please note that the MAC address is expected to be lower-case and contains colons between the octets. - -- The remaining 32-bits are deterministically generated based on the management MAC address (BMC, IPMI, iDRAC…) of the physical machine: - - ipmitool lan print | grep 'MAC Address' - -===== Deterministically Generated Part - -Additionally, an arbitrary deployment name is added to the MAC address generation to further increase its uniqueness. The deployment name _mainnet_ is reserved for production. Testnets must use other names to avoid any chance of a MAC address collisions in the same data center. +== Hostname -The deployment name is retrieved from the +deployment.json+ configuration file, generated as part of the SetupOS: +Since every HostOS and GuestOS are created equal, assigning a human-centric hostname isn't feasible (think pets vs. cattle). Instead, the management MAC address is used as part of the hostname. - { - "deployment": { - "name": "mainnet" - } - } +There are two different hostname schemas used, depending on the stage of the setup process: *Transient Setup Hostname* and *Persistent Setup Hostname*. -Based on these two inputs we calculate the sha256 checksum. Please note that there isn’t any white space in-between the two values: +*1. Transient Setup Hostname* - # Example - sha256sum 3c:ec:ef:6b:37:99mainnet +This schema is used during the initial setup, before a replica has joined the IC. The format is: - # Checksum - f409d72aa8c98ea40a82ea5a0a437798a67d36e587b2cc49f9dabf2de1cedeeb +`-` -The first 32-bit of the sha256 checksum are used as the deterministically generated part of the MAC address. - - # Deterministically Generated Part - f409d72a - - # HostOS - 6a:00:f4:09:d7:2a - - # ReplicaOS - 6a:01:f4:09:d7:2a - - # BoundaryOS - 6a:02:f4:09:d7:2a - - # Next Virtual Machine - 6a:03:f4:09:d7:2a - - # SetupOS - 6a:0f:f4:09:d7:2a - -As every virtual machine ends in the same MAC address, we can derive the IPv6 address of each node on the same physical machine, including the hypervisor itself. -In other words, swapping the prefix of the EUI-64 formatted IPv6 SLAAC address gets you to the IPv6 address of the next node. - -==== IPv6 Address - -When assigning the corresponding IPv6 address, we follow the IEEE’s 64-bit Extended Unique Identifier (EUI-64) format. In this convention, the interface’s unique 48-bit MAC address is reformatted to match the EUI-64 specifications. - -The network part (i.e. +ipv6_prefix+) of the IPv6 address is retrieved from the +config.json+ configuration file. The host part is the EUI-64 formatted address. - -=== Hostname - -Since every Host- and ReplicaOS is created equal, assigning a human-centric hostname isn’t feasible (pets vs. cattle). Instead, we use the management MAC address as part of the hostname. - -==== Transient Setup Hostname - -In the initial setup, before replica was able to join the IC, we use the following hostname schema: - - system type - management mac address - -For example: +Examples: host-3cecef6b3799 replica-3cecef6b3799 boundary-3cecef6b3799 -==== Persistent Setup Hostname +*2. Persistent Setup Hostname* -Once a node has successfully joined the IC, we add the first 5 characters of the node-id to the end of the hostname. The +orchestrator+ is used to fetch the node’s node-id. The schema is: +[NOTE] +Currently, the Persistent Setup Hostname feature is not used, but it has been developed. - system type - management mac address - node id[1] +After a node has successfully joined the IC, the first 5 characters of the node-id are added to the end of the hostname. The orchestrator is used to fetch the node's node-id. The format is: -For Example: +`--` + +Examples: host-3cecef6b3799-4wd4u replica--3cecef6b3799-4wd4u boundary-3cecef6b3799-4wd4u -[1] only the first 5 characters - -=== Applications - -==== Ubuntu Repositories - -The following default Ubuntu repositories are active during the Docker image build process: - -|==== -|Distribution|Component |URL -|Focal |focal main restricted |http://archive.ubuntu.com/ubuntu/ -|Focal |focal-updates main restricted |http://archive.ubuntu.com/ubuntu/ -|Focal |focal universe |http://archive.ubuntu.com/ubuntu/ -|Focal |focal-updates universe |http://archive.ubuntu.com/ubuntu/ -|Focal |focal multiverse |http://archive.ubuntu.com/ubuntu/ -|Focal |focal-updates multiverse |http://archive.ubuntu.com/ubuntu/ -|Focal |focal-backports main restricted universe multiverse|http://archive.ubuntu.com/ubuntu/ -|Focal |focal-security main restricted |http://security.ubuntu.com/ubuntu/ -|Focal |focal-security universe |http://security.ubuntu.com/ubuntu/ -|Focal |focal-security multiverse |http://security.ubuntu.com/ubuntu/ -|==== - -==== Upstream Ubuntu Packages - -|==== -|Name |Description -|attr |utilities for manipulating filesystem extended attributes -|ca-certificates |Common CA certificates -|checkpolicy |SELinux policy compiler -|chrony |Versatile implementation of the Network Time Protocol -|curl |command line tool for transferring data with URL syntax -|dosfstools |utilities for making and checking MS-DOS FAT filesystems -|ethtool |display or change Ethernet device settings -|faketime |Report faked system time to programs (command-line tool) -|fdisk |collection of partitioning utilities -|initramfs-tools |generic modular initramfs generator (automation) -|ipmitool |utility for IPMI control with kernel driver or LAN interface (daemon) -|iproute2 |networking and traffic control tools -|isc-dhcp-client |DHCP client for automatically obtaining an IP address -|jq |lightweight and flexible command-line JSON processor -|less |pager program similar to more -|libarchive-zip-perl |Perl module for manipulation of ZIP archives -|libvirt-daemon-system |Libvirt daemon configuration files -|libvirt-dev |development files for the libvirt library -|linux-image-generic-hwe-20.04|Generic Linux kernel image -|locales |GNU C Library: National Language (locale) data [support] -|lvm2 |Linux Logical Volume Manager -|mtools |Tools for manipulating MSDOS files -|net-tools |NET-3 networking toolkit -|nftables |Program to control packet filtering rules by Netfilter project -|opensc |Smart card utilities with support for PKCS#15 compatible cards -|openssh-server |secure shell (SSH) server, for secure access from remote machines -|ovmf |UEFI firmware for 64-bit x86 virtual machines -|parted |disk partition manipulator -|pcsc-tools |Some tools to use with smart cards and PC/SC -|pcscd |Middleware to access a smart card using PC/SC (daemon side) -|policycoreutils |SELinux core policy utilities -|python-is-python3 |symlinks /usr/bin/python to python3 -|python3-libvirt |libvirt Python 3 bindings -|python3-requests |elegant and simple HTTP library for Python3, built for human beings -|rsync |fast, versatile, remote (and local) file-copying tool -|selinux-policy-default |Strict and Targeted variants of the SELinux policy -|selinux-policy-dev |Headers from the SELinux reference policy for building modules -|selinux-utils |SELinux utility programs -|semodule-utils |SELinux core policy utilities (modules utilities) -|sudo |Provide limited super user privileges to specific users -|systemd |system and service manager -|systemd-journal-remote |tools for sending and receiving remote journal logs -|systemd-sysv |system and service manager - SysV links -|udev |/dev/ and hotplug management daemon -|usbutils |Linux USB utilities -|xxd |tool to make (or reverse) a hex dump -|zstd |fast lossless compression algorithm -- CLI tool -|==== - -==== 3rd Party Software - -List of 3rd party software installed from the official source. We strictly install vendor packaged archives, preferably tarballs to have the highest control over the installation. - -|==== -|Name |Description |URL -|Journalbeat |A lightweight shipper for forwarding and centralizing log data from systemd journals.|https://artifacts.elastic.co/downloads/beats/journalbeat/ -|node_exporter|Service to collect and publish system metrics |https://github.com/prometheus/node_exporter/releases -|QEMU |Quick Emulator is a hypervisor. |https://download.qemu.org/ -|==== - -=== Services - -In addition to the regular, built-in Ubuntu services, we add or manage the following systemd unit files: - -|==== -|Name |Type |State |Upstream|Description -|chrony |service|Enabled|Yes |chrony, an NTP client/server -|deploy-updated-ssh-account-keys|service|Enabled|No |Manage SSH public keys -|generate-guestos-config |service|Enabled|No |Configure virtual machine XML configuration from template -|generate-network-config |service|Enabled|No |Configure physical network interfaces, bonds and bridges -|guestos |service|Enabled|No |Start and stop virtual machine -|journalbeat |service|Enabled|No |Logging daemon -|libvirtd |service|Enabled|Yes |Virtualization daemon -|monitor-guestos |service|Enabled|No |Monitor virtual machine service -|monitor-guestos |timer |Enabled|No |Monitor virtual machine interval -|nftables |service|Enabled|Yes |nftables firewall -|node_exporter |service|Enabled|No |Prometheus node_exporter daemon -|relabel-machine-id |service|Enabled|No |Relabel unique machine ID -|save-machine-id |service|Enabled|No |Save unique machine ID -|setup-hostname |service|Enabled|No |Configure hostname -|setup-libvirt |service|Enabled|No |Configure Libvirt -|setup-node_exporter-keys |service|Enabled|No |Configure node_exporter daemon -|setup-ssh-account-keys |service|Enabled|No |Configure SSH public keys -|setup-ssh-keys |service|Enabled|No |Generate SSH host keys -|systemd-journal-gatewayd |service|Enabled|No |Journal Gateway Service -|systemd-networkd-wait-online |service|Enabled|Yes |Wait for Network to be Configured -|systemd-networkd |service|Enabled|Yes |Network Service -|systemd-resolved |service|Enabled|Yes |Network Name Resolution -|vsock-agent |service|Enabled|No |VSOCK agent daemon -|==== - -=== QEMU / Libvirt +== QEMU / Libvirt -For libvirt, we use the official upstream Ubuntu package +libvirt-daemon-system+. QEMU is being installed and compiled from source. +=== Virtual Machines -|==== -|Name |Source |URL -libvirt-daemon-system|DEB package; APT repository|http://archive.ubuntu.com/ubuntu/ -Focal |Tarball; Source |https://www.qemu.org/download/ -|==== - -==== Virtual Machines - -All Virtual machines are configured using the libvirt XML format. The configuration template is located in: +All virtual machines are configured using the libvirt XML format. The configuration template is located at: - /opt/ic/share/.xml.template +`/opt/ic/share/.xml.template` -This template is being used to generate the actual XML configuration. The systemd service +generate-guestos-config.service+ executes this step. It is necessary in order to inject the deterministically generated MAC address. +This template is used to generate the actual XML configuration. The systemd service `generate-guestos-config.service` executes this step, which is necessary to inject the deterministically generated MAC address. -===== CPU Topology +==== CPU Topology -The following CPU topology is defined in the libvirt XML template. +The following CPU topology is defined in the libvirt XML template: 64 @@ -419,9 +125,9 @@ The following CPU topology is defined in the libvirt XML template. -It makes sure the physical CPU topology is reflected in the virtual machine and the mapping is done accordingly. +This configuration ensures that the physical CPU topology is reflected in the virtual machine, and the mapping is done accordingly. -=== Firewall +== Firewall The hard-coded firewall ruleset is rather restrictive. A new disk-image has to be proposed and blessed in order to update the rules. @@ -429,82 +135,38 @@ Please find the raw NFTables ruleset in: ic/ic-os/hostos/rootfs/etc/nftables.conf -==== Filter +=== Filter -===== Input +==== Input -Default INPUT policy is +drop+. +The following TCP/UDP input ports are open: |==== -|Version|Protocol|Port / Type |Source |Description -|IPv4 |ICMP |destination-unreachable|any | -|IPv4 |ICMP |source-quench |any | -|IPv4 |ICMP |time-exceeded |any | -|IPv4 |ICMP |parameter-problem |any | -|IPv4 |ICMP |echo-request |any | -|IPv4 |ICMP |echo-reply |any | -|IPv4 |TCP |22 |RFC 1918 |openssh -|IPv4 |UDP |67 |RFC 1918 |DHCP -|IPv6 |ICMP |destination-unreachable|any | -|IPv6 |ICMP |packet-too-big |any | -|IPv6 |ICMP |time-exceeded |any | -|IPv6 |ICMP |parameter-problem |any | -|IPv6 |ICMP |echo-request |any | -|IPv6 |ICMP |echo-reply |any | -|IPv6 |ICMP |nd-router-advert |any | -|IPv6 |ICMP |nd-neighbor-solicit |any | -|IPv6 |ICMP |nd-neighbor-advert |any | -|IPv6 |TCP |22 |delegated IPv6 subnets from IC registry|openssh -|IPv6 |TCP |9100 |delegated IPv6 subnets from IC registry|node_exporter -|IPv6 |TCP |19531 |delegated IPv6 subnets from IC registry|systemd-journal-gatewayd +|Version|Protocol|Port |Source |Description +|IPv4 |TCP |22 |RFC 1918 |openssh +|IPv4 |UDP |67 |RFC 1918 |DHCP +|IPv6 |TCP |22 |delegated IPv6 subnets from IC registry|openssh +|IPv6 |TCP |9100 |delegated IPv6 subnets from IC registry|node_exporter +|IPv6 |TCP |19531 |delegated IPv6 subnets from IC registry|systemd-journal-gatewayd |==== -===== Forward - -Default FORWARD policy is +drop+. +==== Output +The following TCP/UDP output ports are open: |==== -|Version|Protocol|Port / Type |Source |Description +|Version|Protocol|Port |Destination|Description +|IPv6 |TCP |53 |any |DNS +|IPv6 |UDP |53 |any |DNS +|IPv6 |UDP |123 |any |NTP +|IPv6 |TCP |80 |any |HTTP to download update disk images +|IPv6 |TCP |443 |any |HTTPS to download update disk images |==== -===== Output - -Default OUTPUT policy is +drop+. - -|==== -|Version|Protocol|Port / Type |Destination|Description -|IPv4 |ICMP |destination-unreachable|any | -|IPv4 |ICMP |source-quench |any | -|IPv4 |ICMP |time-exceeded |any | -|IPv4 |ICMP |parameter-problem |any | -|IPv4 |ICMP |echo-request |any | -|IPv4 |ICMP |echo-reply |any | -|IPv6 |ICMP |destination-unreachable|any | -|IPv6 |ICMP |packet-too-big |any | -|IPv6 |ICMP |time-exceeded |any | -|IPv6 |ICMP |parameter-problem |any | -|IPv6 |ICMP |echo-request |any | -|IPv6 |ICMP |echo-reply |any | -|IPv6 |ICMP |nd-router-solicit |any | -|IPv6 |ICMP |nd-neighbor-solicit |any | -|IPv6 |ICMP |nd-neighbor-advert |any | -|IPv6 |TCP |53 |any |DNS -|IPv6 |UDP |53 |any |DNS -|IPv6 |UDP |123 |any |NTP -|IPv6 |TCP |80 |any |HTTP to download update disk images -|IPv6 |TCP |443 |any |HTTPS to download update disk images -|==== - -=== SELinux - -SELinux is currently in permissive mode. Eventually, every service is confined into its own policy and SELinux running in enforcing mode. - -=== VMSockets Interface - -Whilst the whole point of virtualization is to securely isolate operating systems and system resources, we need a way to interact with the underlying hypervisor (HostOS) from the virtual machine (ReplicaOS). This is necessary as the HostOS won’t be running replica and therefore isn’t its own node in the NNS or any APP subnet. +== VMSockets Interface -To retain the highest isolation between the two operating systems, we limit ourselves to strictly defined function calls. All VSOCK commands are triggered from the GuestOS. +The primary goal of virtualization is to securely isolate operating systems and system resources. However, there is a need for the virtual machine (GuestOS) to communicate with the underlying hypervisor (HostOS) to perform certain functions. This is necessary since the HostOS is not running the replica and doesn't communicate with the NNS or any application subnet. -To see the full list of VSOCK commands and a detailed description of the vsock program, link:../../rs/ic_os/vsock/README.md[read the vsock readme] +To maintain the highest level of isolation between the two operating systems, the Guestos is restricted to strictly defined commands. All VSOCK (VM Socket) commands are initiated from the GuestOS. +For a complete list of VSOCK commands and a detailed description of the vsock program, please link:../../rs/ic_os/vsock/README.md[refer to the vsock README]. diff --git a/ic-os/hostos/rootfs/README.adoc b/ic-os/hostos/rootfs/README.adoc index 3beff388025..a13f6b8dcff 100644 --- a/ic-os/hostos/rootfs/README.adoc +++ b/ic-os/hostos/rootfs/README.adoc @@ -1 +1 @@ -For information on Ubuntu base OS development, see link:../../README-rootfs.adoc#[here] \ No newline at end of file +For information on Ubuntu base OS development, see link:../../docs/Rootfs.adoc#[here] diff --git a/ic-os/scripts/README.adoc b/ic-os/scripts/README.adoc index 039df31d997..59976581989 100644 --- a/ic-os/scripts/README.adoc +++ b/ic-os/scripts/README.adoc @@ -1,5 +1,3 @@ = Scripts -This folder contains build and utility scripts for HostOS, intended to be -unified with `ic-os/guestos/scripts` and `ic-os/generic-guestos/scripts` to be -a common build system for any docker based images. +This folder contains build and utility scripts for HostOS, which are intended to be consolidated with `ic-os/guestos/scripts` and `ic-os/generic-guestos/scripts`. This unified system will serve as a common build process for any Docker-based images. diff --git a/ic-os/setupos/README.adoc b/ic-os/setupos/README.adoc index c237a30f73e..9e2688249db 100644 --- a/ic-os/setupos/README.adoc +++ b/ic-os/setupos/README.adoc @@ -2,275 +2,90 @@ == Introduction -The term SetupOS is used for the operating system installing the IC-OS stack (hypervisor and virtual machine / hostOS and guestOS). This installer enables the node providers/operators to independently install their nodes. -Instead of installing and relying on a full blown upstream ISO image, we assemble the system based on a minimal Docker image and add the required components ourselves. This approach allows for a minimal, controlled and well understood system - which is key for a secure platform. +The term SetupOS is used for the operating system installing the IC-OS stack (HostOS and GuestOS / hypervisor and virtual machine). This installer enables the node providers/operators to independently install their nodes. -To learn more about the onboarding and installation process, https://wiki.internetcomputer.org/wiki/Node_Provider_Onboarding#[read the Node Provider Onboarding Wiki]. +To learn more about the onboarding and installation process, as well as the hardware and networking requirements https://wiki.internetcomputer.org/wiki/Node_Provider_Onboarding#[read the Node Provider Onboarding Wiki]. -== Support +== Building SetupOS -The following vendors and models are currently supported: +To build a SetupOS image, refer to the link:../README.adoc[IC-OS README] -|==== -|Manufacturer|Model |Mainboard|Processor |Memory |Storage -|Dell |PowerEdge R6525 |0DMD2T |2x AMD EPYC 7302|16x 32 GB (512 GB total) DDR4 ECC|10x 3.5 TB NVMe -|Supermicro |AS-1023US-TR4-0-BC27G|H11DSU-iN|2x AMD EPYC 7302|16x 32 GB (512 GB total) DDR4 ECC|1x 3.5 TB NVMe, 4x 8 TB SCSI -|==== +== Under the hood: Installation -=== Build Process +The SetupOS installation is initiated by the systemd service unit file `setupos.service`. This service is of type idle, which means the installation is triggered only after every other unit has either completed or started. -To build with Bazel, you need https://bazel.build/install[Bazel] installed. -Alternatively, the following script allows building in a container with the correct environment already configured: +The installation process consists of multiple Shell and Python scripts, which can be found in the following directory: - ./gitlab-ci/container/container-run.sh - -Build a setupOS image: - -* You can build any of the setupos targets (`prod`, `dev`): - - bazel build //ic-os/setupos/envs/{prod,dev}/... - -This will output disk-img.tar{.gz,.zst} in /ic/bazel-bin/ic-os/setupos/envs/{prod,dev}, which is your tarred setupOS image. - -=== Docker - -We currently split the Docker build process into two Dockerfiles. This split is necessary to ensure a reproducible build. - - ic/ic-os/setupos/rootfs/Dockerfile.base - ic/ic-os/setupos/rootfs/Dockerfile + ic-os/setupos/rootfs/opt/ic/bin -The +Dockerfile.base+ takes care of installing all upstream Ubuntu packages. The version of these packages can change at any given time, as updates are published regularly. We publish the result on our public https://hub.docker.com/u/dfinity[Docker Hub]. +The sequence of the scripts is defined in the main installation script, `setupos.sh`. The order of execution is as follows: -The +Dockerfile+ takes care of configuring and assembling the main disk-image. Any instruction in this file needs to be reproducible in itself. + hardware.sh # Verifies the system's hardware components + network.sh # Tests network connectivity and reachability of the NNS + disk.sh # Purges existing LVM configurations and partitions + hostos.sh # Installs and configures the HostOS operating system + guestos.sh # Installs and configures the GuestOS operating system + devices.sh # Handles the HSM -=== System Users +== Configuration -In addition to the regular, built-in Ubuntu user accounts, we add the following users: +This section explains all the files relevant for altering the IC-OS installation. All of these files are copied directly to the HostOS config partition. -|==== -|Username |Home Directory |Default Shell |Description -| | | | -|==== +=== Config partition -=== System Configuration +The configuration for SetupOS is stored on its own config partition, which is formatted as a FAT file system and is 100 MB in size. -Besides the build instructions in the Docker files (+Dockerfile.base+ and +Dockerfile+), all hard-coded system configurations can be found in the +rootfs/etc+ directory. The full path is: +After burning the SetupOS disk image onto a USB drive, the partition will be available. It can be mounted on any operating system that supports FAT file systems. - ic/ic-os/setupos/rootfs/etc/ +The `config` partition contains the following configuration files: -=== Network Configuration + config.ini # Data center-specific network settings + ssh_authorized_keys # SSH private key for obtaining HostOS console access + node_operator_private_key.pem # (OPTIONAL) Node operator private key used in the pseudo-HSM onboarding -In order to simplify the physical cabling of the machine, we utilize Linux's active-backup bonding technique. This operating mode also improves redundancy if more than one 10 gigabit ethernet network interface is hooked up to the switch. A node operator can decide to either just use one or all of the 10GbE network interfaces in the bond. The Linux operating system will take care of handling the uplink and connectivity. +==== config.ini -Details can be found in: +The `config.ini` file contains all network-related settings, which must be provided by the node operator before running the deployment. - ic/ic-os/setupos/rootfs/opt/ic/bin/generate-network-config.sh +The configuration file expects the following key-value pairs in lower-case format: -The network configuration in the SetupOS is only required to test the connectivity, i.e. pinging the default gateway and querying multiple NNS nodes. At least 20% of all NNS nodes need to be reachable in order to proceed with the installation. + ipv6_prefix=2a00:fb01:400:100 + ipv6_subnet=/64 + ipv6_gateway=2a00:fb01:400:100::1 [NOTE] -Please note that this mode does not increase the bandwidth/throughput. Only one link will be active at the same time. +The values above are examples only. -==== Deterministic MAC Address +==== ssh_authorized_keys -To have unique but deterministic MAC addresses for our nodes, we came up with the following schema: - -- The first 8-bits of the MAC address start with 4a for the IPv4 interface and with 6a for the IPv6 interface. -- The second 8-bits are a consecutive hexadecimal number, starting at 00 and ending at ff. For the HostOS we reserved 00, for the first virtual machine (the ReplicaOS) 01. Any additional virtual machine on the same physical machine gets the next higher hexadecimal number: - - # HostOS - 6a:00: - - # ReplicaOS - 6a:01: - - # BoundaryOS - 6a:02: - - # Next Virtual Machine - 6a:03: - - # SetupOS - 6a:0f: +Node Operators can add their private key to the admin file in `ssh_authorized_keys/` in order to gain SSH access to the HostOS. [NOTE] -Please note that the MAC address is expected to be lower-case and contains colons between the octets. - -- The remaining 32-bits are deterministically generated based on the management MAC address (BMC, IPMI, iDRAC…) of the physical machine: - - ipmitool lan print | grep 'MAC Address' - -===== Deterministically Generated Part - -Additionally, an arbitrary deployment name is added to the MAC address generation to further increase its uniqueness. The deployment name _mainnet_ is reserved for production. Testnets must use other names to avoid any chance of a MAC address collisions in the same data center. - -The deployment name is retrieved from the +deployment.json+ configuration file, generated as part of the SetupOS: - - { - "deployment": { - "name": "mainnet" - } - } - -Based on these two inputs we calculate the sha256 checksum. Please note that there isn’t any white space in-between the two values: - - # Example - sha256sum 3c:ec:ef:6b:37:99mainnet - - # Checksum - f409d72aa8c98ea40a82ea5a0a437798a67d36e587b2cc49f9dabf2de1cedeeb - -The first 32-bit of the sha256 checksum are used as the deterministically generated part of the MAC address. - - # Deterministically Generated Part - f409d72a - - # HostOS - 6a:00:f4:09:d7:2a - - # ReplicaOS - 6a:01:f4:09:d7:2a - - # BoundaryOS - 6a:02:f4:09:d7:2a - - # Next Virtual Machine - 6a:03:f4:09:d7:2a - - # SetupOS - 6a:0f:f4:09:d7:2a +HostOS SSH access does not grant Node Operators access to the GuestOS or any of its underlying data. -As every virtual machine ends in the same MAC address, we can derive the IPv6 address of each node on the same physical machine, including the hypervisor itself. -In other words, swapping the prefix of the EUI-64 formatted IPv6 SLAAC address gets you to the IPv6 address of the next node. +==== node_operator_private_key.pem -==== IPv6 Address +This file does not exist by default in the config partition and is only necessary for the pseudo-HSM onboarding process. If a node operator wants to use the pseudo-HSM onboarding, they must create this file on the config partition, containing their Node Operator private key. If they don't create this file on the config partition, they must use the traditional HSM onboarding process. -When assigning the corresponding IPv6 address, we follow the IEEE’s 64-bit Extended Unique Identifier (EUI-64) format. In this convention, the interface’s unique 48-bit MAC address is reformatted to match the EUI-64 specifications. +=== Other configuration files -The network part (i.e. +ipv6_prefix+) of the IPv6 address is retrieved from the +config.json+ configuration file. The host part is the EUI-64 formatted address. +There are other configuration files that do not exist in the config partition. These files are not intended to be modified by Node Operators and are kept separate to avoid cluttering the config partition. They should be modified only for testing and development purposes. -=== Applications +These files include: -==== Ubuntu Repositories + deployment.json # Deployment-specific configurations + nns_public_key.pem # NNS public key -The following default Ubuntu repositories are active during the Docker image build process: +==== deployment.json -|==== -|Distribution|Component |URL -|Focal |focal main restricted |http://archive.ubuntu.com/ubuntu/ -|Focal |focal-updates main restricted |http://archive.ubuntu.com/ubuntu/ -|Focal |focal universe |http://archive.ubuntu.com/ubuntu/ -|Focal |focal-updates universe |http://archive.ubuntu.com/ubuntu/ -|Focal |focal multiverse |http://archive.ubuntu.com/ubuntu/ -|Focal |focal-updates multiverse |http://archive.ubuntu.com/ubuntu/ -|Focal |focal-backports main restricted universe multiverse|http://archive.ubuntu.com/ubuntu/ -|Focal |focal-security main restricted |http://security.ubuntu.com/ubuntu/ -|Focal |focal-security universe |http://security.ubuntu.com/ubuntu/ -|Focal |focal-security multiverse |http://security.ubuntu.com/ubuntu/ -|==== +The default settings can be found in the `data/deployment.json.template` file. -==== Upstream Ubuntu Packages - -|==== -|Name |Description -|attr |utilities for manipulating filesystem extended attributes -|ca-certificates |Common CA certificates -|checkpolicy |SELinux policy compiler -|curl |command line tool for transferring data with URL syntax -|efibootmgr |Interact with the EFI Boot Manager -|ethtool |display or change Ethernet device settings -|faketime |Report faked system time to programs (command-line tool) -|gdisk |GPT fdisk text-mode partitioning tool -|initramfs-tools |generic modular initramfs generator (automation) -|ipmitool |utility for IPMI control with kernel driver or LAN interface (daemon) -|iproute2 |networking and traffic control tools -|iputils-ping |Tools to test the reachability of network hosts -|isc-dhcp-client |DHCP client for automatically obtaining an IP address -|jq |lightweight and flexible command-line JSON processor -|less |pager program similar to more -|linux-image-generic-hwe-20.04|Generic Linux kernel image -|locales |GNU C Library: National Language (locale) data [support] -|lshw |information about hardware configuration -|lvm2 |Linux Logical Volume Manager -|net-tools |NET-3 networking toolkit -|parted |disk partition manipulator -|policycoreutils |SELinux core policy utilities -|python-is-python3 |symlinks /usr/bin/python to python3 -|selinux-policy-default |Strict and Targeted variants of the SELinux policy -|selinux-policy-dev |Headers from the SELinux reference policy for building modules -|selinux-utils |SELinux utility programs -|semodule-utils |SELinux core policy utilities (modules utilities) -|sudo |Provide limited super user privileges to specific users -|systemd |system and service manager -|systemd-journal-remote |tools for sending and receiving remote journal logs -|systemd-sysv |system and service manager - SysV links -|udev |/dev/ and hotplug management daemon -|usbutils |Linux USB utilities -|xfsprogs |Utilities for managing the XFS filesystem -|==== - -=== Services - -In addition to the regular, built-in Ubuntu services, we add or manage the following systemd unit files: - -|==== -|Name |Type |State |Upstream|Description -|config |service|Enabled |No |Normalize config.ini configuration file -|generate-network-config |service|Enabled |No |Configure physical network interfaces, bonds and bridges -|setupos |service|Enabled |No |Initiate the SetupOS installation -|systemd-networkd-wait-online |service|Enabled |Yes |Wait for Network to be Configured -|systemd-networkd |service|Enabled |Yes |Network Service -|systemd-resolved |service|Enabled |Yes |Network Name Resolution -|systemd-timesyncd |service|Disabled|Yes |NTP Client -|==== - -=== SELinux - -SELinux is currently in permissive mode. Eventually, every service is confined into its own policy and SELinux running in enforcing mode. - -=== Firewall - -Since the SetupOS is not listening on any ports, we do not activate and manage a firewall ruleset. - -== Configuration - -The configuration of the SetupOS lives on its own partition, the +config+ partition. It is formatted as FAT file system and 100MB in size. -All files relevant for altering the IC-OS installation can be found on this partition. - -The partition is available after burning the SetupOS disk-image on an USB drive. It can be mounted on any operating system supporting FAT file systems. - -== config.ini - -The +config+ partition holds the following configuration file: - - config.ini # data center specific network settings - -===== config.ini - -The +config.ini+ configuration file contains all network related settings. These have to be supplied by the node provider/operator prior running the deployment. - -The configuration file expects the following, lower-case key=value pairs: - - ipv6_prefix=2a00:fb01:400:100 - ipv6_subnet=/64 - ipv6_gateway=2a00:fb01:400:100::1 - -[NOTE] -Please note that the values above are only an example. - -== Installation - -The SetupOS installation is initiated by the systemd service unit file +setupos.service+. The type of the service is +idle+, which triggers the installation only after every other unit has completed or started. - -The actual installation consists of multiple Shell and Python scripts, which can be found in: - - ic-os/setupos/rootfs/opt/ic/bin +==== nns_public_key.pem -The sequence of the scripts is defined in the main installation script +setupos.sh+. The order is: +The `nns_public_key.pem` file contains the public key of the NNS. For mainnet, it is: - hardware.sh # Verifying the system's hardware components - network.sh # Testing the network connectivity and reachability of the NNS - disk.sh # Purging existing LVM configurations and partitions - hostos.sh # Installing and configuring the HostOS operating system - guestos.sh # Installing and configuring the ReplicaOS operating system - devices.sh # Handling of the HSM + -----BEGIN PUBLIC KEY----- + MIGCMB0GDSsGAQQBgtx8BQMBAgEGDCsGAQQBgtx8BQMCAQNhAIFMDm7HH6tYOwi9 + gTc8JVw8NxsuhIY8mKTx4It0I10U+12cDNVG2WhfkToMCyzFNBWDv0tDkuRn25bW + W5u0y3FxEvhHLg1aTRRQX/10hLASkQkcX4e5iINGP5gJGguqrg== + -----END PUBLIC KEY----- diff --git a/ic-os/setupos/rootfs/README.adoc b/ic-os/setupos/rootfs/README.adoc index 3beff388025..a13f6b8dcff 100644 --- a/ic-os/setupos/rootfs/README.adoc +++ b/ic-os/setupos/rootfs/README.adoc @@ -1 +1 @@ -For information on Ubuntu base OS development, see link:../../README-rootfs.adoc#[here] \ No newline at end of file +For information on Ubuntu base OS development, see link:../../docs/Rootfs.adoc#[here] diff --git a/rs/ic_os/launch-single-vm/src/main.rs b/rs/ic_os/launch-single-vm/src/main.rs index 0840f36536c..f3ab7209335 100644 --- a/rs/ic_os/launch-single-vm/src/main.rs +++ b/rs/ic_os/launch-single-vm/src/main.rs @@ -155,7 +155,7 @@ fn main() { .next() .unwrap(); - // Constrcut SSH Key Directory + // Construct SSH Key Directory let keys_dir = tempdir.as_ref().join("ssh_authorized_keys"); std::fs::create_dir(&keys_dir).unwrap(); if let Some(key) = ssh_key_path {