Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boot_device sugar is not supported on s390x #453

Open
bgilbert opened this issue Apr 5, 2023 · 12 comments
Open

boot_device sugar is not supported on s390x #453

bgilbert opened this issue Apr 5, 2023 · 12 comments

Comments

@bgilbert
Copy link
Contributor

bgilbert commented Apr 5, 2023

If boot_device.mirror is specified, Butane emits Ignition directives to repartition the entire boot disk(s), create RAID volumes, and create filesystems inside them. During first boot, the OS copies the entire OS contents into memory, does the repartitioning, and copies the contents back to disk. The partition tables created by Butane are hardcoded to match what the OS expects, which is slightly different for each architecture, and so Butane needs to know the CPU architecture via the boot_device.layout field (which defaults to x86_64).

If boot_device.luks is specified, Butane doesn't need to emit directives for repartitioning the entire disk, but it does need to locate the existing root partition so it can create a LUKS volume in it and a new filesystem inside that. (At runtime, the OS still does the copy to RAM and copy back to disk.) To do this, it references the partition by partition label (/dev/disk/by-partlabel/root) so that it doesn't need to know the number of the root partition. Since we're not repartitioning the disk, the layout directive is technically not required when only using boot_device.luks.

If both mirror and luks are specified, Butane does a combination of both.

That all works fine for x86_64, aarch64, and ppc64le, since they all use GPT partition tables. But s390x uses different partition table formats depending on the type of disk. On FBA DASD disks, it uses MBR partition tables, which Ignition doesn't know how to create, and which don't have partition labels. On ECKD DASD disks, it uses the DASD native partitioning format, which Ignition doesn't know how to create and which don't have partition labels (and which only support 3 partitions per disk).

So we could technically have three different layout values, e.g.:

  • s390x-virt - works like the other arches
  • s390x-fba - doesn't support mirror, supports luks by hardcoding a partition number (which requires a field for specifying the boot disk, e.g. /dev/sda)
  • s390x-eckd - same constraints as fba but with different hardcoded constants

but that would be confusing.

Ignition and the OS copy-to-RAM/copy-to-disk code should work fine on s390x, it's just that the Butane sugar doesn't know how to configure them. For now, all users on s390x should bypass the boot_device sugar and manually configure LUKS and/or mirroring using the low-level directives, similar to how an encrypted/mirrored data volume would be configured. Do not use boot_device with the default x86_64 layout on s390x, even in VMs where it appears to work, since the x86_64 layout is not guaranteed to remain compatible with the needs of s390x.

On VMs using GPT partition tables this might look like:

variant: fcos
version: 1.5.0
storage:
  luks:
    - name: root
      label: luks-root
      device: /dev/disk/by-partlabel/root
      wipe_volume: true
      clevis:
        tang:
          - url: http://example.com/
            thumbprint: ...
  filesystems:
    - device: /dev/mapper/root
      format: xfs
      label: root
      wipe_filesystem: true

/dev/disk/by-partlabel/root will only work in VMs using GPT partition tables. Other values will need to be used on DASD disks.

@madhu-pillai
Copy link
Contributor

Hi @bgilbert
here is the proposal to add boot_device sugar for s390x.

Presently the testing has been done vda/zfcp/eckd-dasd. We have not done any luks disk encryption testing on zfba-dasd.

So for the device like vda/zfcp/eckd-dasd can be configured with partition number.
Like if we use "layout: s390x-zfcp" then it generates the config as below, or if we use "layout: s390x-eckd" or "layout: s390x-virt".

Layout S390x-zfcp

# butane worker-storage.bu -o worker-storage.yaml 
Content of worker-storage.bu

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-zfcp
  luks:
    tang: 
      - url: http://tang1.example.com:7500
        thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
    threshold: 1


# butane worker-storage.bu -o test-worker-storage.yaml

# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-storage
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      filesystems:
        - device: /dev/mapper/root
          format: xfs
          label: root
          wipeFilesystem: true
      luks:
        - clevis:
            tang:
              - thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
                url: http://tang1.example.com:7500
            threshold: 1
          device: /dev/sda4
          label: luks-root
          name: root
          wipeVolume: true

For eckd-dasd disk

Layout S390x-eckd

# butane worker-storage.bu -o worker-storage.yaml 
Content of worker-storage.bu

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-eckd
  luks:
    tang: 
      - url: http://tang1.example.com:7500
        thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
    threshold: 1


# butane worker-storage.bu -o test-worker-storage.yaml

# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-storage
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      filesystems:
        - device: /dev/mapper/root
          format: xfs
          label: root
          wipeFilesystem: true
      luks:
        - clevis:
            tang:
              - thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
                url: http://tang1.example.com:7500
            threshold: 1
          device: /dev/dasda4
          label: luks-root
          name: root
          wipeVolume: true

For virtual disk S390x-virt

Layout S390x-virt

# butane worker-storage.bu -o worker-storage.yaml 
Content of worker-storage.bu

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-virt
  luks:
    tang: 
      - url: http://tang1.example.com:7500
        thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
    threshold: 1


# butane worker-storage.bu -o test-worker-storage.yaml

# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-storage
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      filesystems:
        - device: /dev/mapper/root
          format: xfs
          label: root
          wipeFilesystem: true
      luks:
        - clevis:
            tang:
              - thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
                url: http://tang1.example.com:7500
            threshold: 1
          device: /dev/vda4
          label: luks-root
          name: root
          wipeVolume: true

@bgilbert
Copy link
Contributor Author

In the zfcp layout, it looks like you're hardcoding /dev/sda. Is that a safe assumption to make, or should we add a Butane field for specifying the /dev/sda part?

boot_device:
  layout: s390x-zfcp
  luks:
    # only permitted for layouts that use it.  should probably be here and not directly
    # under boot_device, to avoid a semantic conflict with the device list in the mirror
    # section.
    device: /dev/sda
    tang: 
      - url: http://tang1.example.com:7500
        thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX

Similarly for DASD and /dev/dasda. And also, shouldn't we be using partition 2?

For virt setups, we can continue to use /dev/disk/by-partlabel, right? If so, we probably should, to avoid unnecessary hardcoding of partition details.

In the DASD and zfcp cases, we should make sure to fail if a mirror configuration is specified.

@madhu-pillai
Copy link
Contributor

I was trying to depict that the butane generates the device as /dev/sda.

This is right , /dev/sda for zfcp and /dev/dasda for eckd in boot_device.luks.device as below. For virtual device default boot_device sugar works.

And will make a conditional check that the configuration will generate only with boot_device.luks and it fails if mirror boot_device.mirror configuration is specified for s390x-eckd and s390x-zfcp

zfcp

butane worker-storage.bu -o worker-storage.yaml
Content of worker-storage.bu

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-zfcp
  luks:
    device: /dev/sda
    tang: 
        - url: http://tang1.example.com:7500
          thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
    

eckd-dasd

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-eckd
  luks:
    device: /dev/sda
    tang: 
        - url: http://tang1.example.com:7500
          thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX

@bgilbert
Copy link
Contributor Author

So, to be explicit: is it 100% certain that the user will always want /dev/sda and /dev/dasda respectively? Or is it possible that they'll want e.g. /dev/sdb or /dev/dasdb? If it's 100% certain, then we don't need the new device field. But if there's any chance that they'll want a different device, we should add the field.

And to be clear, we'll still need a separate s390x-virt layout even if we're using /dev/disk/by-partlabel. It's not safe to have users default to the x86_64 layout, since there might be arch-specific differences in the future.

@madhu-pillai
Copy link
Contributor

An example when user try to create boot_device sugar, so the butane template looks like this.

variant: openshift
version: x.xx.x
metadata:
  name: <name>
  labels:
    machineconfiguration.openshift.io/role: <node-name-string)  >
boot_device:
  layout: <arch (string)>
  luks:
    device: <string>    #/dev/sda || /dev/dasda Optional but default /dev/sda 
    tang: 
        - url: <string>
          thumbprint: <string>

/dev/sda is an example when user specifies for zfcp and /dev/dasda eckd. So user must specify layout: s390x-zfcp and device: <string> /dev/sd[a-z] is optional (thanks for the idea _/\_ ). , otherwise it uses the default /dev/sda ?.

Similarly for dasd.

And condition to ensure that s390x-zfcp accept scsi naming convention like sd<a-z] and similarly for s390-eckd.

For zKVM we can still provision layout: s390x-virt which use the same semantic explained for scsi and dasd and default device to /dev/vda unless specifically mention in device: <> ?.
If it is specific like layout: s390x-virt and reduces confusion.

@bgilbert
Copy link
Contributor Author

Please directly answer the question I asked in #453 (comment): is it true or false that the user will sometimes want to install on a disk other than the first one?

If we add a device field, it should be forbidden for layouts that don't support it, and mandatory for layouts that do. Otherwise the user will likely forget to set it.

I think you're right that we should require a /dev/sd prefix for zfcp and /dev/dasd for ECKD. It'll prevent mistakes, and we can always relax that restriction later.

For the s390x-virt layout, as I've been saying, we shouldn't allow the device field and shouldn't hardcode a /dev/vd* device. Instead, we should use /dev/disk/by-partlabel/root as the other arches do. That approach avoids the need to specify any device at all, and in the KVM case there's no reason not to do that.

@madhu-pillai
Copy link
Contributor

Is it 100% certain that the user will always want /dev/sda and /dev/dasda respectively? -> False

(I cannot assert the statement True, because it depends on the user requirements ).

I've a question here related to above question. if we use boot_device sugar, it generates ignition with valid device automatically lets say for x86 it is /dev/disk/part-label/root.
However for the s390x device /dev/sd[] needs to be provided manually in that case right? because butane does not know how many disk present in the vm node.

If we add a device field, it should be forbidden for layouts that don't support it, and mandatory for layouts that do. Otherwise the user will likely forget to set it. - Will do that.

Will add s390x-virt layout. something like below.

variant: OpenShift
......
......
boot_device:
   layout: s390x-virt
   luks:
     tang:
        url:

so the butane should generate the following output...

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      filesystems:
        - device: /dev/mapper/root
          format: xfs
          label: root
          wipeFilesystem: true
      luks:
        - clevis:
            tang:
              - thumbprint: QcPr_NHFJammnRCA3fFMVdNBwjs
                url: http://12.23.21.58:7500
          device: /dev/disk/by-partlabel/root
          label: luks-root
          name: root
          options:
            - --cipher
            - aes-cbc-essiv:sha256
          wipeVolume: true

@bgilbert
Copy link
Contributor Author

if we use boot_device sugar, it generates ignition with valid device automatically lets say for x86 it is /dev/disk/part-label/root.
However for the s390x device /dev/sd[] needs to be provided manually in that case right? because butane does not know how many disk present in the vm node.

It doesn't have anything to do with the number of disks. On other arches, we can find the existing partition with the label root and reformat it, since CoreOS reserves that label for the root partition. But MBR and DASD partition tables don't support partition labels, or any equivalent functionality, so that trick doesn't work.

@madhu-pillai
Copy link
Contributor

Hi @bgilbert
From the above discussion, here are the major rules I captured for s390x. Please let me know if any errors or additional requirements.

  1. Add layout specific to s390x .
    - > s390x-zfcp s390x-eckd and s390x-virt .

  2. Add boot_device.luks.device specifically for s390x-zfcp s390x-eckd and forbidden for other arch including s390x-virt.

  3. The configuration will generate only with boot_device.luks and it fails if mirror boot_device.mirror configuration is specified for s390x-eckd and s390x-zfcp .

  4. Expected Butane sugar from user perspective below.

s390x-eckd

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-eckd
  luks:
    device: /dev/dasd[a-z]
    tang: 
        - url: http://tang1.example.com:7500
          thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX

s390x-zfcp

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-zfcp
  luks:
    device: /dev/sd[a-z]
    tang: 
        - url: http://tang1.example.com:7500
          thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX

s390x-virt

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-virt
  luks:
    tang: 
        - url: http://tang1.example.com:7500
          thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX

@bgilbert
Copy link
Contributor Author

Looks good! Could you also post the expected output for each of those configs?

Note that FCOS should also support these layouts, so the implementation should happen in the fcos experimental spec.

@madhu-pillai
Copy link
Contributor

Will implement on FCOS also , in the experimental spec.

Here is the expected output for each configs.
s390x-eckd s390x-zfcp s390x-virt

s390x-eckd

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-eckd
  luks:
    device: /dev/dasda   // given as an example, if we give /dev/dasdb it returns corresponding disk 
    tang: 
        - url: http://tang1.example.com:7500
          thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX

Converting to ignition

# butane s390x-eckd -o s390x-eckd_out.yaml

# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-storage
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      filesystems:
        - device: /dev/mapper/root
          format: xfs
          label: root
          wipeFilesystem: true
      luks:
        - clevis:
            tang:
              - thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
                url: http://tang1.example.com:7500
          device: /dev/dasda2   // corresponding disk as per the example.  `/dev/dasdb2`, if user use `/dev/dasdb` in device .
          label: luks-root
          name: root
          wipeVolume: true

s390x-zfcp

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-zfcp
  luks:
    device: /dev/sda // given as an example, if we give /dev/sdb it returns corresponding disk.
    tang: 
        - url: http://tang1.example.com:7500
          thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX

Converting to ignition

# butane s390x-zfcp.bu -o s390x_zfcp_out.yaml

# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-storage
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      filesystems:
        - device: /dev/mapper/root
          format: xfs
          label: root
          wipeFilesystem: true
      luks:
        - clevis:
            tang:
              - thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
                url: http://tang1.example.com:7500
          device: /dev/sda2.  // corresponding disk as per the example.  `/dev/sdb2` , if user use `/dev/sdb` in device.
          label: luks-root
          name: root
          wipeVolume: true

s390x-virt

device section forbidden for s390x-virt, like other archs.

variant: openshift
version: 4.13.0
metadata:
  name: worker-storage
  labels:
    machineconfiguration.openshift.io/role: worker
boot_device:
  layout: s390x-virt
  luks:
    tang: 
        - url: http://tang1.example.com:7500
          thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX

Converting to ignition

# butane s390x-virt.bu -o s390x_virt_out.yaml

# Generated by Butane; do not edit
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-storage
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      filesystems:
        - device: /dev/mapper/root
          format: xfs
          label: root
          wipeFilesystem: true
      luks:
        - clevis:
            tang:
              - thumbprint: jwGN5tRFK-kF6pIX89ssF3khxxX
                url: http://tang1.example.com:7500
          device: /dev/disk/by-partlabel/root
          label: luks-root
          name: root
          wipeVolume: true

@bgilbert
Copy link
Contributor Author

Looks good. Note that you don't need to implement on FCOS also. If you implement it in the FCOS experimental spec, the OpenShift spec will automatically inherit.

This was referenced Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants