Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.3] Ephemeral boot environment does not renew DHCP leases #3057

Closed
ubuntu-server-builder opened this issue May 11, 2023 · 12 comments
Closed
Labels
launchpad Migrated from Launchpad

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1732522

Launchpad details
affected_projects = ['maas', 'cloud-initramfs-tools (Ubuntu)', 'systemd (Ubuntu)']
assignee = None
assignee_name = None
date_closed = 2021-07-01T17:10:32.805900+00:00
date_created = 2017-11-15T19:40:26.537748+00:00
date_fix_committed = None
date_fix_released = None
id = 1732522
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1732522
milestone = None
owner = mpontillo
owner_name = Mike Pontillo
private = False
status = invalid
submitter = andreserl
submitter_name = Andres Rodriguez
tags = []
duplicates = []

Launchpad user Andres Rodriguez(andreserl) wrote on 2017-11-15T19:40:26.537748+00:00

I started commissioning+hardware testing on a machine, and while the machine was testing (for 2hrs+) i noticed that the IP address had disappeared. The machine has the MAC of 00:25:90:4c:e7:9e and IP of 192.168.0.211 from the dynamic range.

Checking the MAAS server, I noticed that the IP/MAC was in the ARP table:

andreserl@maas:/var/lib/maas/dhcp$ arp -a | grep 211
192-168-9-211.maas (192.168.9.211) at 00:25:90:4c:e7:9e [ether] on bond-lan

Checking the leases file has the following: http://pastebin.ubuntu.com/25969442/

Then I checked a couple areas of MAAS:

  • Device discovery, the machine wasn't there.
  • Subnet details page, the machine wasn't there (e.g. as observed)
@ubuntu-server-builder ubuntu-server-builder added the launchpad Migrated from Launchpad label May 11, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Andres Rodriguez(andreserl) wrote on 2017-11-15T19:43:27.677840+00:00

Launchpad attachments: no-ip.png

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Andres Rodriguez(andreserl) wrote on 2017-11-17T18:51:15.430454+00:00

SO this is what I noticed, after about 10 minutes of running hardware testing, the IP disappeared of the PXE interface. See screenshots attached below.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Andres Rodriguez(andreserl) wrote on 2017-11-17T18:51:38.559820+00:00

Argh, the IP disappeared, but the machine still holds such IP in the PXE interface.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Andres Rodriguez(andreserl) wrote on 2017-11-17T18:53:31.801042+00:00

Ok, I'm being unclear again. This is the behavior:

  1. MAAS PXE boots to commissioning+testing, gets the IP address on the PXE interface.
  2. During commissioning, the second interface, which is connected to the same VLAN, gets another IP as part of the network discovery process.
  3. The machine then transitions to testing. After about 10 minutes of testing, I noticed that the machine no longers shows the IP for the PXE interface.
  4. After a few more minutes, the same happens for the second interface.
  5. That said, machine still holds the IP on the PXE interface and can connect to it just fine.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Andres Rodriguez(andreserl) wrote on 2017-11-17T18:53:41.928797+00:00

Ok, I'm being unclear again. This is the behavior:

  1. MAAS PXE boots to commissioning+testing, gets the IP address on the PXE interface.
  2. During commissioning, the second interface, which is connected to the same VLAN, gets another IP as part of the network discovery process.
  3. The machine then transitions to testing. After about 10 minutes of testing, I noticed that the machine no longers shows the IP for the PXE interface.
  4. After a few more minutes, the same happens for the second interface.
  5. That said, machine still holds the IP on the PXE interface and can connect to it just fine.
    Launchpad attachments: running-testing-ip-address.png

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Andres Rodriguez(andreserl) wrote on 2017-11-17T18:53:52.322896+00:00

Launchpad attachments: Running-test-ip-disappears.png

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Mike Pontillo(mpontillo) wrote on 2017-11-17T22:48:21.576773+00:00

After triaging this issue, I think we need to discuss the proper fix with the cloud-init team.

When cloud-init configures the interfaces[1], I see that "bringup=True" is set. I'm guessing this means cloud-init should have run the equivalent of "ifup" on each interface after it has been configured. However, looking at "ifquery --state", this is not happening; only the loopback interface is configured.

ifquery --state

lo=lo

I also notice that no DHCP client is running. I guess this means that whatever IP address is currently assigned to eno1 (172.16.100.153) came from a different run of a DHCP client; possibly from the PXE process (but I don't see it passed up through /proc/cmdline).

Since the DHCP client is not (no longer?) running, that means the host is holding onto an IP address from the DHCP pool, has not released it, and is no longer renewing it. Therefore it eventually expires and is no longer shown in MAAS.

If I bring the interface up manually, all becomes well, and I see the IP address as-expected in MAAS (though it handed out a different IP address, .160, so I now am holding the expired IP, and a new legitimate IP address from DHCP):

ifup eno1

Internet Systems Consortium DHCP Client 4.3.3
Copyright 2004-2015 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/eno1/ec:a8:6b:fd:aa:24
Sending on LPF/eno1/ec:a8:6b:fd:aa:24
Sending on Socket/fallback
DHCPDISCOVER on eno1 to 255.255.255.255 port 67 interval 3 (xid=0xc83a8b40)
DHCPDISCOVER on eno1 to 255.255.255.255 port 67 interval 6 (xid=0xc83a8b40)
DHCPREQUEST of 172.16.100.160 on eno1 to 255.255.255.255 port 67 (xid=0x408b3ac8)
DHCPOFFER of 172.16.100.160 from 172.16.100.2
DHCPACK of 172.16.100.160 from 172.16.100.2
Restarting ntp (via systemctl): ntp.service.
bound to 172.16.100.160 -- renewal in 269 seconds.

ifquery --state

eno1=eno1
lo=lo

g# ip addr show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether ec:a8:6b:fd:aa:24 brd ff:ff:ff:ff:ff:ff
inet 172.16.100.153/24 brd 172.16.100.255 scope global eno1
valid_lft forever preferred_lft forever
inet 172.16.100.160/24 brd 172.16.100.255 scope global secondary eno1
valid_lft forever preferred_lft forever
inet6 fe80::eea8:6bff:fefd:aa24/64 scope link
valid_lft forever preferred_lft forever

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Mike Pontillo(mpontillo) wrote on 2017-11-17T22:55:57.850918+00:00

Forgot to include the relevant portion of the cloud-init logs in my comment above.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Mike Pontillo(mpontillo) wrote on 2017-11-17T23:39:53.645720+00:00

I'm landing some debug logging in MAAS to help identify if this issue is occurring; regiond.log will contain lines like "Lease update: ..." when MAAS receives notifications about lease changes.

Note that this will NOT be a fix for this issue. But I don't think there's anything more I can do for this in MAAS itself.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2017-11-20T19:45:20.020598+00:00

This issue is discussed in a document at
https://docs.google.com/document/d/14xH2Q3VH_7ArXzRPhqogfACeOI0rmEinm_Q98imNWlc/

Its all about "transition" of networking information from the initramfs environment which is configured by the kernel command line over to the "real root".

The Ubuntu foundations team is expecting to have dhcp transition correctly in 18.04.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dan Watkins(oddbloke) wrote on 2019-02-25T19:56:33.562664+00:00

Is this still an issue that needs cloud-init work?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dan Streetman(ddstreet) wrote on 2021-06-30T21:13:06.752337+00:00

please reopen if this is still an issue

@ubuntu-server-builder ubuntu-server-builder closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
launchpad Migrated from Launchpad
Projects
None yet
Development

No branches or pull requests

1 participant