Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsmasq-dhcp - ignored #8158

Closed
ccycv opened this issue Oct 30, 2023 · 19 comments · Fixed by #8741
Closed

dnsmasq-dhcp - ignored #8158

ccycv opened this issue Oct 30, 2023 · 19 comments · Fixed by #8741

Comments

@ccycv
Copy link

ccycv commented Oct 30, 2023

ISSUE TYPE
  • Bug Report
COMPONENT NAME
VR - DHCP
CLOUDSTACK VERSION
ACS 4.18.1 with router upgraded to 4.18.1
CONFIGURATION
Shared  network configuration with multiple CIDR in different zones. DHCP service provided by dnsmasq.
OS / ENVIRONMENT
Cloudstack + VMware 
SUMMARY

The DHCP service on the CloudStack guest router is ignoring DHCP requests for specific IP classes post deployment. Manually adding missing classes to the configuration resolves the issue temporarily.

STEPS TO REPRODUCE
- The issue occurs when the system is configured with multiple CIDRs and DHCP options for some classes are missing.
- Comparing the `dhcp-option` entries from a router with a working configuration to the one with issues shows a lack of certain DHCP options for the non-working IP classes.
- After adding the missing classes manually to `cloud.conf`, the issue seems to be resolved, suggesting a bug in DHCP configuration generation or application.

[root@r-6143-VMAMS:/etc/dnsmasq.d#](mailto:root@r-6143-VMAMS:/etc/dnsmasq.d#) cat cloud.conf
listen-address=127.0.0.1,23.29.xxx.130
dhcp-range=set:interface-eth0-0,23.29.xxx.130,static
dhcp-option=tag:interface-eth0-0,15,my.host
dhcp-option=tag:interface-eth0-0,6,23.29.xxx.130,8.8.8.8
dhcp-option=tag:interface-eth0-0,3,23.29.xxx.129
dhcp-option=eth0,26,1500
dhcp-option=tag:interface-eth0-0,1,255.255.255.224
[root@r-6143-VMAMS:/etc/dnsmasq.d#](mailto:root@r-6143-VMAMS:/etc/dnsmasq.d#)


root@r-6143-VMAMS:~# grep -v '^#\|^$' /etc/dnsmasq.conf domain-needed bogus-priv resolv-file=/etc/dnsmasq-resolv.conf
local=/my.host/
except-interface=eth1
except-interface=eth2
except-interface=lo
no-dhcp-interface=eth1
no-dhcp-interface=eth2
expand-hosts
domain=my.host
domain=my.host
domain=my.host
dhcp-range=217.79.xxx.129,static
dhcp-hostsfile=/etc/dhcphosts.txt
dhcp-ignore=tag:!known
dhcp-option=15,"my.host "
dhcp-option=vendor:MSFT,2,1i
dhcp-boot=pxelinux.0
enable-tftp
tftp-root=/opt/tftpboot
dhcp-lease-max=2100
domain=shape.host
log-facility=/var/log/dnsmasq.log
conf-dir=/etc/dnsmasq.d
dhcp-optsfile=/etc/dhcpopts.txt
localise-queries
dhcp-option=option:router,217.79.xxx.129
dhcp-option=6,217.79.xxx.130,8.8.8.8
dhcp-client-update

EXPECTED RESULTS
The expected result is that all the IP classes should be served by the DHCP without needing manual intervention. All IP ranges should have appropriate `dhcp-option` configurations applied automatically.
ACTUAL RESULTS
Some IP classes are being ignored by the DHCP service, leading to instances not receiving an IP upon boot. This issue is observed across different CIDRs in the same zone and requires manual addition of missing `dhcp-option` configurations to resolve.
@ccycv
Copy link
Author

ccycv commented Oct 31, 2023

Additional info:

Tue 31 Oct 2023 08:32:15 AM UTC Setting up dnsmasq
2023-10-31 08:32:15,789 INFO Wrote edited file /etc/dnsmasq.d/cloud.conf
2023-10-31 08:32:15,789 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change
2023-10-31 08:32:15,790 INFO Attempting to delete entries from dnsmasq.leases file for VMs which are not on dhcphosts file
2023-10-31 08:32:15,790 ERROR Caught error while trying to delete entries from dnsmasq.leases file: [Errno 2] No such file or directory: '/etc/dhcphosts.txt'
2023-10-31 08:32:15,790 INFO Executing: systemctl restart dnsmasq
2023-10-31 08:32:15,825 INFO Service dnsmasq restart
2023-10-31 08:32:15,828 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change
2023-10-31 08:32:15,828 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change
2023-10-31 08:32:15,828 INFO Executing: systemctl is-active dnsmasq
2023-10-31 08:32:15,833 INFO Executing: systemctl reload dnsmasq
2023-10-31 08:32:15,840 INFO Service dnsmasq reload
2023-10-31 08:32:19,962 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change
2023-10-31 08:32:19,962 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change
2023-10-31 08:32:19,963 INFO Executing: systemctl is-active dnsmasq
2023-10-31 08:32:19,966 INFO Executing: systemctl reload dnsmasq
2023-10-31 08:32:19,973 INFO Service dnsmasq reload
2023-10-31 08:32:25,427 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change
2023-10-31 08:32:25,427 INFO Wrote edited file /var/lib/misc/dnsmasq.leases
2023-10-31 08:32:25,427 INFO Attempting to delete entries from dnsmasq.leases file for VMs which are not on dhcphosts file
2023-10-31 08:32:25,428 INFO Deleted 0 entries from dnsmasq.leases file
2023-10-31 08:32:25,428 INFO Executing: systemctl is-active dnsmasq
2023-10-31 08:32:25,432 INFO Executing: systemctl reload dnsmasq
2023-10-31 08:32:25,438 INFO Service dnsmasq reload
2023-10-31 08:32:26,653 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change
2023-10-31 08:32:26,653 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change
2023-10-31 08:32:26,654 INFO Executing: systemctl is-active dnsmasq
2023-10-31 08:32:26,659 INFO Executing: systemctl reload dnsmasq
2023-10-31 08:32:26,670 INFO Service dnsmasq reload
2023-10-31 08:38:48,777 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change
2023-10-31 08:38:48,777 INFO Wrote edited file /var/lib/misc/dnsmasq.leases
2023-10-31 08:38:48,778 INFO Attempting to delete entries from dnsmasq.leases file for VMs which are not on dhcphosts file
2023-10-31 08:38:48,778 INFO Deleted 0 entries from dnsmasq.leases file
2023-10-31 08:38:48,778 INFO Executing: systemctl is-active dnsmasq
2023-10-31 08:38:48,784 INFO Executing: systemctl reload dnsmasq
2023-10-31 08:38:48,796 INFO Service dnsmasq reload
2023-10-31 08:41:04,420 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change
2023-10-31 08:41:04,420 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change
2023-10-31 08:41:04,420 INFO Attempting to delete entries from dnsmasq.leases file for VMs which are not on dhcphosts file
2023-10-31 08:41:04,427 INFO Deleted 1 entries from dnsmasq.leases file
2023-10-31 08:41:04,428 INFO Executing: systemctl is-active dnsmasq
2023-10-31 08:41:04,434 INFO Executing: systemctl reload dnsmasq
2023-10-31 08:41:04,447 INFO Service dnsmasq reload

@MartinEmrich
Copy link

MartinEmrich commented Oct 31, 2023

I also start having issues with one of two shared networks here, my VR does no longer assing IP addresses. Restarting the network including cleanup (replacing the VR) did not help. Maybe it's the same issue (if not, I will open a new one).

In my case, the DHCP requests reach the VR, but are actively declined, for whatever reason. My logs contain lines like not using configured address xx.yy.zz.aa because it was previously declined).

@ccycv can you check whether the requests really are ignored or rather declined? try running e.g. dhclient eth0 -v on a client, check if there are DHCPDECLINE messages.

Also I can see the DHCP requests on the VR (with tcpdump -i eth0 ether src <client-vm-mac-address<).

@ccycv
Copy link
Author

ccycv commented Oct 31, 2023 via email

@MartinEmrich
Copy link

My /etc/dnsmasq.d/cloud.conf of the working and non-working router are identical (apart from the network address of course)

@DaanHoogland
Copy link
Contributor

@ccycv @MartinEmrich is this still an issue for you?
Do you have any tangible data, like configuration file contents that goes with your issue?

@ccycv
Copy link
Author

ccycv commented Jan 12, 2024

@DaanHoogland Something was changed in the new version; I have this issue with 3 separate CloudStack environments after the upgrade to 4.18.1. All the routers with a shared network that have more than one Guest IP range are affected. So when I check the /etc/dnsmasq.d/cloud.conf, the config includes only one Guest IP range. For now, I configured manually on each, but after router rebuild/cleanup, I must do it again.

So, what for data do you need?

@MartinEmrich
Copy link

Hmm IIRC in the end it was a second VM which had the same IP address for whatever reason. So I guess the VR already assigned the IP to that machine, and thus declined the second request from this machine (different MAC not in the table).

After deleting that rogue VM, the issue did not reappear for me. So most probably not the same as @ccycv's issue at all.

@DaanHoogland
Copy link
Contributor

Hmm IIRC in the end it was a second VM which had the same IP address for whatever reason. So I guess the VR already assigned the IP to that machine, and thus declined the second request from this machine (different MAC not in the table).

After deleting that rogue VM, the issue did not reappear for me. So most probably not the same as @ccycv's issue at all.

ok, please create a new issue when you have a clear picture, @MartinEmrich ?

@DaanHoogland
Copy link
Contributor

@DaanHoogland Something was changed in the new version; I have this issue with 3 separate CloudStack environments after the upgrade to 4.18.1. All the routers with a shared network that have more than one Guest IP range are affected. So when I check the /etc/dnsmasq.d/cloud.conf, the config includes only one Guest IP range. For now, I configured manually on each, but after router rebuild/cleanup, I must do it again.

So, what for data do you need?

a detailed description of what is the environment that causes it. So far what I understand:

  • multiple zones
  • a shared network

the part "multiple CIDR in different zones. " I am not quite sure of. How did you configure this @ccycv ?

@ccycv
Copy link
Author

ccycv commented Jan 12, 2024 via email

@ccycv
Copy link
Author

ccycv commented Jan 12, 2024

It is quite easy to reproduce, create shared network, even if the deploy was with basic or advanced networking, and add multiple ranges to shared network, let say 5. Then check cloud.conf from dnsmasq the config/deploy VMs.

I also have a cloudstack with 1 zone, basic, and I have the same issue.

@DaanHoogland
Copy link
Contributor

Ah, I got confused about the multiple zones. Thanks, I'll have a go on reproducing it.

@weizhouapache
Copy link
Member

weizhouapache commented Jan 12, 2024

It is quite easy to reproduce, create shared network, even if the deploy was with basic or advanced networking, and add multiple ranges to shared network, let say 5. Then check cloud.conf from dnsmasq the config/deploy VMs.

I also have a cloudstack with 1 zone, basic, and I have the same issue.

If it worked before, it might be a regression issue.
As I remember, it worked when I tested #5530 (see the description of second commit in the PR).
There were very few changes with the VR scripts recently, it may be caused by some mgmt server changes, need investigation.

@DaanHoogland DaanHoogland self-assigned this Jan 13, 2024
@ccycv
Copy link
Author

ccycv commented Feb 6, 2024

Update:

I recently tested a new router and started with just one IP class. When I introduced an additional IP class, I noticed the cloud.conf file in the /etc/dnsmasq.d directory had been updated with the new configuration for the new range. However, it replaced the existing configuration instead of adding to it. It seems the process overwrites the cloud.conf file without retaining the previous settings.

@weizhouapache
Copy link
Member

Update:

I recently tested a new router and started with just one IP class. When I introduced an additional IP class, I noticed the cloud.conf file in the /etc/dnsmasq.d directory had been updated with the new configuration for the new range. However, it replaced the existing configuration instead of adding to it. It seems the process overwrites the cloud.conf file without retaining the previous settings.

@ccycv
can you give more details, for example the Ip of vm, and the content of cloud.conf ?

@ccycv
Copy link
Author

ccycv commented Feb 6, 2024

@weizhouapache It is easy to replicate. Create a shared type; shared network in a CloudStack with advanced settings. And just add an additional Guest IP range, you will see how the cloud.conf is populated. I didn't had this issue in the previous version.

This was the configuration with the initial Guest IP range:

listen-address=127.0.0.1,181.xx.xxx.178
dhcp-range=set:interface-eth0-0,181.xx.xxx.178,static
dhcp-option=tag:interface-eth0-0,15,shape.host
dhcp-option=tag:interface-eth0-0,6,181.xx.xxx.178,8.8.8.8,8.8.4.4
dhcp-option=tag:interface-eth0-0,3,181.xx.xxx.177
dhcp-option=eth0,26,1500
dhcp-option=tag:interface-eth0-0,1,255.255.255.240

After I added an additional Guest IP range, it was replaced with the last added one.

listen-address=127.0.0.1,181.xx.xxx.245
dhcp-range=set:interface-eth0-0,181.xx.xxx.245,static
dhcp-option=tag:interface-eth0-0,15,shape.host
dhcp-option=tag:interface-eth0-0,6,181.xx.xxx.245,8.8.8.8,8.8.4.4
dhcp-option=tag:interface-eth0-0,3,181.xx.xxx.225
dhcp-option=eth0,26,1500
dhcp-option=tag:interface-eth0-0,1,255.255.255.224

Before, where there was no issue, there was a config for each class in this cloud.conf; now, no matter how many Guest IP ranges you add, you will have only one.

@weizhouapache
Copy link
Member

weizhouapache commented Feb 6, 2024

@weizhouapache It is easy to replicate. Create a shared type; shared network in a CloudStack with advanced settings. And just add an additional Guest IP range, you will see how the cloud.conf is populated. I didn't had this issue in the previous version.

This was the configuration with the initial Guest IP range:

listen-address=127.0.0.1,181.xx.xxx.178 dhcp-range=set:interface-eth0-0,181.xx.xxx.178,static dhcp-option=tag:interface-eth0-0,15,shape.host dhcp-option=tag:interface-eth0-0,6,181.xx.xxx.178,8.8.8.8,8.8.4.4 dhcp-option=tag:interface-eth0-0,3,181.xx.xxx.177 dhcp-option=eth0,26,1500 dhcp-option=tag:interface-eth0-0,1,255.255.255.240

After I added an additional Guest IP range, it was replaced with the last added one.

listen-address=127.0.0.1,181.xx.xxx.245 dhcp-range=set:interface-eth0-0,181.xx.xxx.245,static dhcp-option=tag:interface-eth0-0,15,shape.host dhcp-option=tag:interface-eth0-0,6,181.xx.xxx.245,8.8.8.8,8.8.4.4 dhcp-option=tag:interface-eth0-0,3,181.xx.xxx.225 dhcp-option=eth0,26,1500 dhcp-option=tag:interface-eth0-0,1,255.255.255.224

Before, where there was no issue, there was a config for each class in this cloud.conf; now, no matter how many Guest IP ranges you add, you will have only one.

@ccycv
thanks for sharing. I was able to reproduce the issue

It worked fine when I tested #5530 (9f5ac89) . maybe some changes after that caused it.
cc @DaanHoogland

@JoaoJandre JoaoJandre added this to the 4.18.3 milestone Mar 21, 2024
@rohityadavcloud rohityadavcloud modified the milestones: 4.18.3, 4.19.1.0 Apr 30, 2024
@rohityadavcloud
Copy link
Member

Has this been fixed now @weizhouapache ?

@weizhouapache
Copy link
Member

Has this been fixed now @weizhouapache ?

@rohityadavcloud
yes, it has been fixed by #8741
closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
6 participants