Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oci-image: init scripts to build and upload image #119856

Merged
merged 12 commits into from Sep 23, 2023

Conversation

ilian
Copy link
Member

@ilian ilian commented Apr 19, 2021

Motivation for this change

Add image configuration for Oracle Cloud Infrastructure and scripts to
build and upload the image as a Custom Image.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@ilian ilian force-pushed the oci-image branch 2 times, most recently from fbae55a to 0b413d2 Compare April 20, 2021 11:14
@@ -0,0 +1,98 @@
#! /usr/bin/env bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it is worth, after working with the AWS uploader for some time, I have needed to reimplement it with Terraform to keep track of the details. I wonder if using Terraform for this instead of a script would work too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Terraform would work for this too. The oci terraform provider seems to have the functionality needed to import a custom image: https://registry.terraform.io/providers/hashicorp/oci/latest/docs/resources/core_image

Would you mind sharing what details needed to be tracked in your use-case for AWS? I can't currently write/test Terraform code as my free trial has expired which unfortunately prevents me from importing custom images.

--operating-system NixOS \
--source-image-type QCOW2 \
--launch-mode PARAVIRTUALIZED \
--display-name NixOS \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very much metadata. Can we add more, or am I missing it? For example:

  • exact version of the channel used to build it
  • any relevant filesystem information
  • who built it and when
  • perhaps a version number for the metadata itself, so we could add more types of images another time and not break automated integrations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I think this can be achieved similar to how EC2 images are built by creating a JSON manifest when buliding the image and passing this data to OCI:

${pkgs.jq}/bin/jq -n \
--arg label ${lib.escapeShellArg config.system.nixos.label} \
--arg system ${lib.escapeShellArg pkgs.stdenv.hostPlatform.system} \
--arg logical_bytes "$(${pkgs.qemu}/bin/qemu-img info --output json "$diskImage" | ${pkgs.jq}/bin/jq '."virtual-size"')" \
--arg file "$diskImage" \
'$ARGS.named' \
> $out/nix-support/image-info.json

I think we should add some basic metadata such as the channel version without any metadata version number to the display name to avoid clutter, and include all verbose metadata to the Custom Image itself with Resource Tags which can be provided in the JSON format with the oci cli tool.

Comment on lines +18 to +28
systemd.services.fetch-ssh-keys = {
description = "Fetch authorized_keys for root user";

wantedBy = [ "sshd.service" ];
before = [ "sshd.service" ];

after = [ "network-online.target" ];
wants = [ "network-online.target" ];

path = [ pkgs.coreutils pkgs.curl ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this service be enabled only on the very first profile? I wonder if this should be part of the -user config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that the service is executed at least once before the initial ssh connection. This service does indeed need to only run once, since the ssh keys can't be modified in OCI after creating the instance AFAIK.
It should indeed be enabled only for the first profile. The user is allowed to change the system configuration to not include the fetch-ssh-keys service, since all ssh keys should be present in /root/.ssh/authorized_keys by then, hence this service is not included in oci-config-user.nix.

samueldr pushed a commit to samueldr/nixpkgs that referenced this pull request May 29, 2021
https://github.com/NixOS/nixpkgs/pull/119856/files

Add image configuration for Oracle Cloud Infrastructure and scripts to
build and upload the image as a Custom Image.
@samueldr
Copy link
Member

samueldr commented May 29, 2021

Hi!

Expect to see some activity on your PR, since that announcement about their A1 instances.

On my end I worked out how to adapt this for those A1 instances

(Look carefully at the commits, the tip adds a "bad" commit to work around an issue with RK3399... That one is not needed.)

I don't know if there's anything else that should be done here though. I'm definitely not an OCI expert. I basically started learning about their cloud offering less than 48 hours ago.


Some additional notes

You will have to ensure that the image capabilities only includes UEFI as the firmware type.

image

Otherwise the instances will invalidly use "BIOS", and will not boot. AFAICT it requires to be set on the image.

This image will not build on A1 VMs!!! This is because it requires kvm, and nested virtualization is not available.

@ilian
Copy link
Member Author

ilian commented May 29, 2021

Hi!

Expect to see some activity on your PR, since that announcement about their A1 instances.

That's great news! Thank you for making me aware of their new ARM instances.

On my end I worked out how to adapt this for those A1 instances

* https://github.com/samueldr/nixpkgs/commits/feature/ocl-image-aarch64

Thanks! I'll add your commits to this PR.

You will have to ensure that the image capabilities only includes UEFI as the firmware type.

Can this manual configuration step be avoided after modifying the --launch-mode PARAVIRTUALIZED to --launch-mode NATIVE? IIRC, this should enable UEFI by default on x86, but I'm not sure what the behavior is for ARM-based instances. The reason I used PARAVIRTUALIZED for x86 is that BIOS is a requirement for custom images as mentioned in https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/importingcustomimagelinux.htm
It seems that the current Oracle Linux image for AArch64 (Oracle-Linux-7.9-aarch64-2021.05.17-0) has its launch mode set to PARAVIRTUALIZED instead of NATIVE and its firmware set to UEFI though. If we want to automate setting UEFI as the preferred firmware, we can do so using Terraform (see launch_options) or a python script that uses oci-sdk. I can't seem to find an option of oci-cli that allows me to customize the LaunchOptions of an image, even though the manpage of oci compute image import --help references LaunchOptions. I have opened an issue upstream: oracle/oci-cli#416
I think the default image of Oracle Linux on x86 uses UEFI-based images so we might want to try building UEFI-based images for both x86 and AArch64 to keep things simple.

Unfortunately, my Free Trial has expired so I am not able to test out any new additions to the build script since the custom-image-count limit has been reverted to 0 after my trial ended.
I will try to install NixOS on a system with the Oracle Linux image in-place for users that have a free account like mine using https://github.com/elitak/nixos-infect or something similar.

Feel free to add/modify any commits as I have enabled edits by maintainers for this PR.

@samueldr
Copy link
Member

Just a reminder:

I'm definitely not an OCI expert. I basically started learning about their cloud offering less than 48 hours ago.

You will have to ensure that the image capabilities only includes UEFI as the firmware type.

Can this manual configuration step be avoided after modifying the --launch-mode PARAVIRTUALIZED to --launch-mode NATIVE? IIRC, this should enable UEFI by default on x86, but I'm not sure what the behavior is for ARM-based instances. The reason I used PARAVIRTUALIZED for x86 is that BIOS is a requirement for custom images as mentioned in https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/importingcustomimagelinux.htm
It seems that the current Oracle Linux image for AArch64 (Oracle-Linux-7.9-aarch64-2021.05.17-0) has its launch mode set to PARAVIRTUALIZED instead of NATIVE and its firmware set to UEFI though. If we want to automate setting UEFI as the preferred firmware, we can do so using Terraform (see launch_options) or a python script that uses oci-sdk. I can't seem to find an option of oci-cli that allows me to customize the LaunchOptions of an image, even though the manpage of oci compute image import --help references LaunchOptions. I have opened an issue upstream: oracle/oci-cli#416

As far as I understand OCI, the "Launch Mode" is distinct from "Firmware". And as the documentation indeed states, "BIOS" seemingly needs to be set for x86_64-linux (though I can't confirm).

Given the following link:

I wonder if it's as simple as adding --firmware UEFI_64. But that's a total guess.

I think the default image of Oracle Linux on x86 uses UEFI-based images so we might want to try building UEFI-based images for both x86 and AArch64 to keep things simple.

It would be nice to see which shapes actually run on UEFI. I am betting that the documentation stating it needs to be legacy boot is plainly wrong.

Unfortunately, my Free Trial has expired so I am not able to test out any new additions to the build script since the custom-image-count limit has been reverted to 0 after my trial ended.
I will try to install NixOS on a system with the Oracle Linux image in-place for users that have a free account like mine using https://github.com/elitak/nixos-infect or something similar.

Oh what??? The custom images are not part of the free offer?? I wasn't finding any information about the pricing for those, like at all... This sucks :|

So I guess going with "infect" like systems will be better in the end. Unless the NixOS foundation somehow gets in touch with Oracle to get some kind of useful deal.

@samueldr
Copy link
Member

samueldr commented May 29, 2021

Reading the (admittedly thick to navigate) pricing list, coupled with the free tied page, I'm led to believe there is up to 10GB available free for custom images.

Expect an edit to this message, I'm chatting with a live agent about this.

Update: Custom Images, or sometimes called BYOI (bring your own image) is NOT part of the free tier. The account needs to be a paid account for this to work.

Note that during the trial you can upload custom images. It is unclear what happens after the fact, but with the agen't responses I believe they count towards the 10GB available for block storage, but you're simply unable to upload new images.

@Djelibeybi
Copy link

Djelibeybi commented May 30, 2021

Hey folks, I just replied to @samueldr (I assume, based on the avatar) over on the Oracle Cloud forum. I'm a member of the Oracle Cloud Developer Adoption team as well as one of Oracle's GitHub admins. I'm more than happy to be your official tour guide to OCI and make sure you have the right tools and contacts to get NixOS published to our Marketplace.

The first thing I'll mention is that upgrading your free trial account to pay-as-you-go is (ironically) one of the requirements to activate the Always Free resources, which will then increase your custom image count back up to 25. You will not be charged for any resources that qualify for Always Free, and those that do will have an "Always Free" tag visible in the Console.

I was also an Oracle Linux product manager for over a decade prior to joining the OCI developer adoption team, so I know how the Oracle Linux images are built (hint: we use Packer. A lot) and published. I can also work as the interface between the NixOS Foundation and our Linux engineers, particularly if you're interested in merging some of the GCC optimizations for the Ampere A1/Neoverse-N1 platform that have still not been merged upstream.

@grahamc
Copy link
Member

grahamc commented May 30, 2021 via email

@Djelibeybi
Copy link

Djelibeybi commented May 30, 2021

It's both useful and possible to publish NixOS images to the Oracle Cloud Marketplace. This will make it available to all Oracle Cloud users via the Create Instance dialog on the Console, as well as via the API or SDK.

Edited to add that publishing to the Marketplace allows the image to take advantage of our metadata service to allow for deeper cloud-init integration and other tooling.

nixos/modules/virtualisation/oci-image.nix Outdated Show resolved Hide resolved
nixos/modules/virtualisation/oci-image.nix Outdated Show resolved Hide resolved
nixos/maintainers/scripts/oci/upload-image.sh Outdated Show resolved Hide resolved
@mweinelt
Copy link
Member

mweinelt commented Jun 3, 2021

Using kexec via https://github.com/cleverca22/nix-tests/blob/master/kexec/session.md worked for me to get into a live system. Unfortunately it rebooted into the EFI shell after a while. I terminated the previous instance and created a new one and disabled the cloud agent in the instance configuration. I assume it works like a watchdog, so remember to disable it if you try that.

@zhaofengli
Copy link
Member

For aarch64, can we use systemd-boot as the bootloader? On boot, GRUB gives me the following error:

file `/boot/grub/arm64-efi/efi_uga.​mod' not found.

It boots, but it's not possible to select the system generation through the serial console. systemd-boot seems to work fine from my testing.

@samueldr
Copy link
Member

samueldr commented Jul 8, 2021

@zhaofengli in theory yes. In practice last I tried our systemd-boot tooling failed on AArch64.

Though what should be done, instead or in addition, is fix the GRUB issue. It goes a bit more deeply than that specific warning/error. Serial vs. graphical GRUB are not working correctly with the current configuration, and I haven't looked at the actual underlying issue. I really don't recall what the problem was, but I recall it wasn't an issue of that module being found or not.

@zhaofengli
Copy link
Member

@zhaofengli in theory yes. In practice last I tried our systemd-boot tooling failed on AArch64.

I tried and it worked fine, though I ran bootctl install instead of NIXOS_INSTALL_BOOTLOADER=1. Can't reboot right now but I'd guess the normal way should also work.

Though what should be done, instead or in addition, is fix the GRUB issue. It goes a bit more deeply than that specific warning/error. Serial vs. graphical GRUB are not working correctly with the current configuration, and I haven't looked at the actual underlying issue.

I think it's fixable by simply removing insmod efi_uga from nixos/modules/system/boot/loader/grub/install-grub.pl for aarch64 and other platforms on which it isn't built. UGA was removed in UEFI in favor of GOP, but it's still in use on EFI 1.x platforms (e.g., some Apple devices).

@samueldr
Copy link
Member

samueldr commented Jul 8, 2021

Though I'm 99% sure this is not the reason you don't see the graphical menu with GRUB. So yes, it probably should be removed, but I expect it will change nothing.

cfg = config.oci;
in
{
imports = [ ../profiles/qemu-guest.nix ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does OCI use from this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profile contains the necessary kernel modules for several QEMU/VirtIO devices. Running lspci on my instance running on OCI confirms that there are several emulated VirtIO devices present:

00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge
00:01.0 Display controller: Red Hat, Inc. Virtio GPU (rev 01)
00:02.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:10.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
...
18:00.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI (rev 01)

@K900
Copy link
Contributor

K900 commented Jan 13, 2022

Maybe we could rewrite the scripts to Python instead of bash? Python has more API access and better controls than the CLI.

@PedroHLC
Copy link
Member

PedroHLC commented Jul 16, 2022

In case someone wonders, this PR still works quite well (14 months after the latest contribution).

  • I've rebased the branch with latest release-22.05, without conflicts;
  • Configured my ~/.oci using OpenSSL commands provided in Oracle's documentation;
  • Ran ./create-image.sh aarch64-linux, it created the file without problems;
  • Ran ./update-image.sh, it automatically uploaded the file, created a temporary bucket, transformed the file in a custom image, as expected;
  • Then it asked me to change the shape compatibility in a link, I've also used the chance to change the boot capability to only-UEFI (tried with BIOS in an earlier attempt, and it didn't boot);
  • Meanwhile, the script took it time but deleted the file and bucket without destroying the custom image;
  • Created an instance from it: new virtual network and new subnet, disabled all OS management plugins, and added my keys by uploading their .pub files;
  • Once I had the public IP, I logged through SSH using the root user and one of the private keys.

Are you guys still interested in merging it?

EDIT: Meanwhile, I have to keep the oci-common.nix and oci-options.nix cloned in my modules...

@dit7ya
Copy link
Member

dit7ya commented Oct 28, 2022

Will love to see this merged!

@puffnfresh
Copy link
Contributor

I've been using the oci-common.nix module for a few months now. Working really well!

@chenlijun99
Copy link
Contributor

I also would love to see this merged.

Hi @puffnfresh. By any chance, are you using Oracle's ARM VPS? If yes, didn't you cross-compile the image for "aarch64-linux" on a "x86_64-linux" machine? I'm asking because I tried to copy the modules of this PR and then to cross-build an image, but I get the error:

error: a 'aarch64-linux' with features {} is required to build '/nix/store/8ifcgasp770hf6ffc98kz3wrn23bcspc-nixos-enter.drv', but I am a 'x86_64-linux' with features {benchmark, big-parallel, kvm, nixos-test}

If you did something similar, I would really appreciate some suggestions. Thanks!

@puffnfresh
Copy link
Contributor

@chenlijun99 I probably did, can't remember. I do have access to a few arm64 systems so I might have just compiled the image from there.

It looks like you're not actually cross-compiling. You probably want to set system to x86_64-linux and then set nixpkgs.crossSystem.

ilian and others added 3 commits September 21, 2023 22:15
Add image configuration for Oracle Cloud Infrastructure and scripts to
build and upload the image as a Custom Image.
@thiagokokada
Copy link
Contributor

Rebased this PR and did a bunch of fixes:

  • Use set -euo pipefail in the scripts, so if they fail for any reason the script will stop instead of continuing
  • Remove deprecated options
  • Always use EFI by default, because this is how most images are configured in OCI nowadays. There is still the option to use BIOS if you want
  • Set networking.useNetworkd = true, because otherwise the fetch-ssh-key.service would fail for me, resulting in a VM without the proper SSH keys

I think this should fix most of the basic issues in this PR. Tested and it is working fine in my personal account.

There is a few outstanding other issues, but I think we could merge this PR as-is and work on then in the future, since this PR already has some value as-is (e.g.: people wanting to play with NixOS in Oracle Cloud, specially because of their generous Always Free tier).

@K900
Copy link
Contributor

K900 commented Sep 21, 2023

Code looks fine, unfortunately I can't test it because of trade embargo.

samueldr and others added 9 commits September 21, 2023 22:57
Follows what amazon images does.
A couple notes:
---------------

Adding invalid `console=` parameters is not an issue. Any invalid
console is unused. The kernel will use the "rightmost" (last) valid
`console=` parameter as the default output. Thus the SBBR-mandated AMA0
on A1, and ttyS0 on x86_64 as documented by Oracle.

`nvme_core.shutdown_timeout=10` was added as it was written this way in
the A1 images. Unclear whether `nvme.shutdown_timeout=10` is wrong. At
worst this is a no-op.
@thiagokokada
Copy link
Contributor

I did 2 tests:

  • Booting the generated x86_64-linux image in the VM.Standard.E2.1.Micro instance
  • Booting the generated aarch64-linux image in the VM.Standard.A1.Flex instance

Both took a while to be reachable through SSH (a few minutes VS a few seconds in the official images from OCI), but they work well enough to give most people a start point.

While most people are probably more interested in the aarch64-linux images, I have had more success using the Kexec tarballs from nixos-anywhere (contrary on what the README says, you can use those images like a Live environment without nixos-anywhere). I also tested nixos-anywhere itself and it works fine, and probably my recommended way if you want to do so.

However, for the VM.Standard.E2.1.Micro the Kexec is simply too big to work, so this image still makes sense for those cases. Keep in mind that the 1GB of RAM restriction really hurts most NixOS setups, and you will need probably need to setup some swap space to make anything build in this environment.

Copy link
Member

@PedroHLC PedroHLC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code still LGTM

nixos/modules/virtualisation/oci-common.nix Show resolved Hide resolved
@thiagokokada
Copy link
Contributor

Going to merge this. Future fixes can go after merging it.

@thiagokokada thiagokokada merged commit a3a7520 into NixOS:master Sep 23, 2023
20 checks passed
@cjemorton
Copy link

Does anyone have a simple step by step tutorial on how to use this?

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/looking-for-instructions-on-how-to-build-nixos-for-oci-cloud-images/40217/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet