Skip to content

cc_set_hostname: ignore /var/lib/cloud/data/set-hostname if it's empty#1967

Merged
TheRealFalcon merged 2 commits into
canonical:mainfrom
esposem:buffering
Jan 18, 2023
Merged

cc_set_hostname: ignore /var/lib/cloud/data/set-hostname if it's empty#1967
TheRealFalcon merged 2 commits into
canonical:mainfrom
esposem:buffering

Conversation

@esposem

@esposem esposem commented Jan 17, 2023

Copy link
Copy Markdown
Contributor

Proposed Commit Message

If the file exists but is empty, do nothing.
Otherwise cloud-init will crash because it does not handle the empty file.

RHBZ: 2140893

Signed-off-by: Emanuele Giuseppe Esposito eesposit@redhat.com

Test Steps

Start an instance running RHEL-9.1 on aws t4g.large system, after reboot system via sysrq 'b', the system is not accessible via ssh and cloudinit service failed to start.
$ cat cloud-init.service.log
× cloud-init.service - Initial cloud-init job (metadata service crawler)
     Loaded: loaded (/usr/lib/systemd/system/cloud-init.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Tue 2022-11-08 02:33:59 UTC; 19min ago
    Process: 703 ExecStart=/usr/bin/cloud-init init (code=exited, status=1/FAILURE)
   Main PID: 703 (code=exited, status=1/FAILURE)
        CPU: 424ms

Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:     return _default_decoder.decode(s)
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:   File "/usr/lib64/python3.9/json/decoder.py", line 337, in decode
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:   File "/usr/lib64/python3.9/json/decoder.py", line 355, in raw_decode
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:     raise JSONDecodeError("Expecting value", s, err.value) from None
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]: ------------------------------------------------------------
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: cloud-init.service: Main process exited, code=exited, status=1/FAILURE
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: cloud-init.service: Failed with result 'exit-code'.
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: Failed to start Initial cloud-init job (metadata service crawler).
$ cat journal.log|grep sshd
Nov 08 02:33:57 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: Created slice Slice /system/sshd-keygen.
Nov 08 02:33:57 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: Reached target sshd-keygen.target.
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal sshd[949]: sshd: no hostkeys available -- exiting.
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: sshd.service: Main process exited, code=exited, status=1/FAILURE
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: sshd.service: Failed with result 'exit-code'.
Nov 08 02:34:41 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: sshd.service: Scheduled restart job, restart counter is at 1.
RHEL Version:
RHEL-9.1(5.14.0-162.6.1.el9_1.x86_64)

How reproducible:
50%

Steps to Reproduce:
1. Create an aws t4g.large instance using RHEL-9.1.0_HVM-20221101
2. Trigger system reboot('echo b > /proc/sysrq-trigger & echo b > /proc/sysrq-trigger')
3. Repeat step1~2 if cannot reproduce it.
4. option, reproduce in auto
$ os-tests --user ec2-user --keyfile /home/virtqe_s1.pem --platform_profile /home/aws.yaml -p test_reboot_simultaneous


Actual results:
cannot access system via ssh after boot up

Expected results:
system can boot up and access normally

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

If the file exists but is empty, do nothing.
Otherwise cloud-init will crash because it does not handle the empty file.

RHBZ: 2140893

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
@aciba90 aciba90 self-assigned this Jan 17, 2023

@aciba90 aciba90 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @esposem, for the fix.

It looks reasonable to be defensive in the case that the artifact file is empty.

Do you happen to know why is that file empty in the first place? AFAICT it should be only written by cc_set_hostname with valid non-empty json content.

Could we please cover this case with a unit test? Something in the lines of:

diff --git a/tests/unittests/config/test_cc_set_hostname.py b/tests/unittests/config/test_cc_set_hostname.py
index 3d1d86eef..57f531200 100644
--- a/tests/unittests/config/test_cc_set_hostname.py
+++ b/tests/unittests/config/test_cc_set_hostname.py
@@ -2,6 +2,7 @@

 import logging
 import os
+from pathlib import Path
 import shutil
 import tempfile
 from io import BytesIO
@@ -242,5 +243,21 @@ class TestHostname(t_help.FilesystemMockingTestCase):
             str(ctx_mgr.exception),
         )

+    def test_ignore_empty_previous_artifact_file(self):
+        cfg = {
+            "hostname": "blah",
+            "fqdn": "blah.blah.blah.yahoo.com",
+        }
+        distro = self._fetch_distro("debian")
+        paths = helpers.Paths({"cloud_dir": self.tmp})
+        ds = None
+        cc = cloud.Cloud(ds, paths, {}, distro, None)
+        self.patchUtils(self.tmp)
+        prev_fn = Path(cc.get_cpath("data")) / "set-hostname"
+        prev_fn.touch()
+        cc_set_hostname.handle("cc_set_hostname", cfg, cc, LOG, [])
+        contents = util.load_file("/etc/hostname")
+        self.assertEqual("blah", contents.strip())
+

 # vi: ts=4 expandtab

@esposem

esposem commented Jan 18, 2023

Copy link
Copy Markdown
Contributor Author

Hi @aciba90, thanks for the review!

No honestly we don't know why is the file empty, but our QA managed to reproduce the bug.

I'll add your test in my PR, thanks!

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

@aciba90 aciba90 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@TheRealFalcon TheRealFalcon merged commit 9c7502a into canonical:main Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants