Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix serial datasource definition for nocloud #4949

Merged
merged 2 commits into from Feb 27, 2024

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Feb 23, 2024

Proposed Commit Message

fix(nocloud): NoCloud serial datasource definition is broken

Fix it.
Deprecate the "nocloud-net" name.

Additional Context

#4945
#4948

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Can we get tests?

@@ -366,7 +366,8 @@ def override_ds_detect(self) -> bool:
does not run, _something_ needs to detect the kernel command line
definition.
"""
if self.dsname.lower() == parse_cmdline().lower():
dsname = self.dsname.lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This change isn't needed anymore (but also doesn't hurt anything)

@holmanb
Copy link
Member Author

holmanb commented Feb 23, 2024

Changes LGTM. Can we get tests?

Good call.

FYI - The qemu tutorial is currently broken by this regression and could be used to manually test with a modified image (mount it locally, chroot into it, install a test cloud-init deb).

I assume that using qemu via pycloudlib would be best so we can set the bios serial, but I haven't tried it out yet, and I don't think that we have any Jenkins runners set up to use it yet, right?

@TheRealFalcon Is that what you were thinking of or did you have something else in mind?

@TheRealFalcon
Copy link
Member

I was really only thinking about unit tests.

I was also thinking we would need QEMU, but it looks like LXD has recently added support for something like this:

$ lxc launch ubuntu:jammy jammy-test --vm -c raw.qemu="-smbios type=1,serial=HITHERE"
Creating jammy-test
Starting jammy-test
$ lxc exec jammy-test -- cat /sys/class/dmi/id/product_serial              
HITHERE

@holmanb
Copy link
Member Author

holmanb commented Feb 24, 2024

I was really only thinking about unit tests.

Gotcha

I was also thinking we would need QEMU, but it looks like LXD has recently added support for something like this:

Oh sweet.

@@ -362,18 +363,50 @@ def _merge_new_seed(cur, seeded):
return ret


def _parse_dmi() -> str:
serial = dmi.read_dmi_data("product_serial")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/product_serial/system-serial-number/ I think, otherwise this always fails

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvandegrift It looks like applying this actually broke the test case, what were you seeing that led to this suggestion?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issued wasn't fixed, at least with ds=nocloud (not 100% sure I tested nocloud-net at that point). From a quick grep, it looked like read_dmi_data wanted keys from dmi.DMIDECODE_TO_KERNEL - see

kmap = DMIDECODE_TO_KERNEL.get(key)

Could be I was wrong - though when I made that change, your patch started working for the ds=nocloud case. I didn't run the tests, just built a deb and tried it with a qmeu + fake imds setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holmanb , I think system-serial-number is "more right" here. We use the map @rvandegrift mentioned to read the DMI value from /sys. If that fails, we fallback to reading using dmidecode. It looks like specifying product_serial as a key will trigger the fallback code, but we should be using /sys if it's available.

broke the test case

Which test case? Mine will be broken unless you apply blackboxsw's suggested fix 😬

return True

serial = _parse_dmi().lower()
if serial in (self.dsname, "nocloud-net"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think s/self.dsname/self.dsname.lower()/ is needed to fix the ds=nocloud case

@rvandegrift
Copy link

@holmanb with the above changes, this is working!

@holmanb
Copy link
Member Author

holmanb commented Feb 26, 2024

@holmanb with the above changes, this is working!

Thanks for the review! I just pushed the requested changes.

@TheRealFalcon
Copy link
Member

@holmanb

I think something like this could work as a basic integration test. It should be hard to modify for http roots. Also, I did attempt to do the file pushing in the setup code, but LXD VMs need to be started before you can do that.

SMBIOS_USERDATA = """\
#cloud-config
runcmd:
  - touch /var/tmp/smbios_test_file
"""
SMBIOS_SEED_DIR = "/smbios_seed"


def setup_nocloud_serial(instance: LXDInstance):
    subp(
        [
            "lxc",
            "config",
            "set",
            instance.name,
            "raw.qemu=-smbios "
            f"type=1,serial=ds=nocloud;s=file://{SMBIOS_SEED_DIR};h=myhost",
        ]
    )


@pytest.mark.lxd_setup.with_args(setup_nocloud_serial)
@pytest.mark.lxd_use_exec
@pytest.mark.skipif(
    PLATFORM != "lxd_vm",
    reason="Requires NoCloud with raw QEMU serial setup",
)
def test_smbios_seed(client: IntegrationInstance):
    client.execute(f"mkdir {SMBIOS_SEED_DIR}")
    client.write_to_file(f"{SMBIOS_SEED_DIR}/user-data", SMBIOS_USERDATA)
    client.write_to_file(f"{SMBIOS_SEED_DIR }/smbios_seed/meta-data", "")
    client.execute("cloud-init clean --logs")
    client.restart()
    assert client.execute("test -f /var/tmp/smbios_test_file").ok

def test_smbios_seed(client: IntegrationInstance):
client.execute(f"mkdir {SMBIOS_SEED_DIR}")
client.write_to_file(f"{SMBIOS_SEED_DIR}/user-data", SMBIOS_USERDATA)
client.write_to_file(f"{SMBIOS_SEED_DIR }/smbios_seed/meta-data", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test fails due to the path typo duplicating SMBIOS_SEED_DIR/smbois_seed

Suggested change
client.write_to_file(f"{SMBIOS_SEED_DIR }/smbios_seed/meta-data", "")
client.write_to_file(f"{SMBIOS_SEED_DIR}/meta-data", "")

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor test fix then approve

@@ -362,18 +363,50 @@ def _merge_new_seed(cur, seeded):
return ret


def _parse_dmi() -> str:
serial = dmi.read_dmi_data("product_serial")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holmanb , I think system-serial-number is "more right" here. We use the map @rvandegrift mentioned to read the DMI value from /sys. If that fails, we fallback to reading using dmidecode. It looks like specifying product_serial as a key will trigger the fallback code, but we should be using /sys if it's available.

broke the test case

Which test case? Mine will be broken unless you apply blackboxsw's suggested fix 😬

@holmanb holmanb force-pushed the holmanb/fix-serial-nocloud branch 3 times, most recently from 5f29713 to 6cc8de1 Compare February 26, 2024 22:56
@TheRealFalcon
Copy link
Member

TheRealFalcon commented Feb 27, 2024

There's an extra .swp file that got added, but otherwise LGTM! I'm still running through the test in various scenarios.

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are spending an integration test on NoCloudNet let's also double-dip and get a check on the deprecation logic too.

Also we need to pass the value we pull out of system-serial-number /sys/class/dmi/id/product_serial through our new parse_cmdline_or_dmi otherwise ds_detect is trying to find a full ds=nocloud-net;s=http://blah in (nocloud, nocloud-net) which never matches.

--- a/cloudinit/sources/DataSourceNoCloud.py
+++ b/cloudinit/sources/DataSourceNoCloud.py
@@ -389,7 +389,9 @@ class DataSourceNoCloudNet(DataSourceNoCloud):
             log_deprecated()
             return True
 
-        serial = (dmi.read_dmi_data("product_serial") or "").lower()
+        serial = sources.parse_cmdline_or_dmi(
+            dmi.read_dmi_data("system-serial-number") or ""
+        ).lower()
         if serial in (self.dsname.lower(), "nocloud-net"):
             LOG.debug(
                 "Machine is configured by dmi serial number to run on "
diff --git a/tests/integration_tests/datasources/.test_nocloud.py.swp b/tests/integration_tests/datasources/.test_nocloud.py.swp
deleted file mode 100644
index 97bda3f6b..000000000
Binary files a/tests/integration_tests/datasources/.test_nocloud.py.swp and /dev/null differ
diff --git a/tests/integration_tests/datasources/test_nocloud.py b/tests/integration_tests/datasources/test_nocloud.py
index 2e5ea458f..9ab7ecd5b 100644
--- a/tests/integration_tests/datasources/test_nocloud.py
+++ b/tests/integration_tests/datasources/test_nocloud.py
@@ -121,7 +121,7 @@ def setup_nocloud_network_serial(instance: LXDInstance):
             "set",
             instance.name,
             "raw.qemu=-smbios "
-            "type=1,serial=ds=nocloud;s=http://0.0.0.0/;h=myhost",
+            "type=1,serial=ds=nocloud-net;s=http://0.0.0.0/;h=myhost",
         ]
     )
 
@@ -185,4 +185,7 @@ class TestSmbios:
         client.write_to_file(f"{SMBIOS_SEED_DIR}/vendor-data", "")
         assert client.execute("cloud-init clean --logs").ok
         client.restart()
+        assert "'nocloud-net' datasource name is deprecated" in client.execute(
+            "cloud-init status --format json"
+        )
         assert client.execute("test -f /var/tmp/smbios_test_file").ok

assert client.execute("systemctl enable local-server.service").ok
client.write_to_file(
"/etc/cloud/cloud.cfg.d/91_do_not_use_lxd.cfg",
"datasource_list: [ NoCloud, None ]\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can even drop this to [ NoCloud ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do, we won't exercise the ds_detect logic

"set",
instance.name,
"raw.qemu=-smbios "
"type=1,serial=ds=nocloud;s=http://0.0.0.0/;h=myhost",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also suggest the deprecated nocloud-net so we can also validate the deprecation warning?

Suggested change
"type=1,serial=ds=nocloud;s=http://0.0.0.0/;h=myhost",
"type=1,serial=ds=nocloud-net;s=http://0.0.0.0/;h=myhost",

log_deprecated()
return True

serial = (dmi.read_dmi_data("product_serial") or "").lower()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really should be systemd-serial-number to align with the data we are pulling in DataSourceNoCloud._get_data. Otherwise we fall-through to dmidecode on Ubuntu which emits the following error in logs

2024-02-26 23:26:42,721 - dmi.py[DEBUG]: failed dmidecode cmd: ['/usr/sbin/dmidecode', '--string', 'product_serial']
Unexpected error while running command.
Command: ['/usr/sbin/dmidecode', '--string', 'product_serial']
Exit code: 2
Reason: -
Stdout:
Stderr: Invalid string keyword: product_serial
        Valid string keywords are:
          bios-vendor
          bios-version
          bios-release-date
          bios-revision
          firmware-revision
          system-manufacturer
          system-product-name
          system-version
          system-serial-number

This causes us to fail to detect nocloud datasource and fallback to datasourcenone.

@holmanb holmanb force-pushed the holmanb/fix-serial-nocloud branch 3 times, most recently from 9c472cf to e2fc0b8 Compare February 27, 2024 00:56
- Verify that NoCloud smbios seed gets read for file://
- Verify that NoCloud smbios seed gets read from http:// and https://
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! tests pass locally here too.

@blackboxsw blackboxsw dismissed TheRealFalcon’s stale review February 27, 2024 01:22

James' comments are now resolved and final pass is now functional.

@holmanb holmanb merged commit 66b5ce9 into canonical:main Feb 27, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants