Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network and SSH problems using bento/ubuntu-20.04-arm64 and bento/ubuntu-22.04-arm64 with Vagrant #1473

Closed
Evantage-WS opened this issue Feb 13, 2023 · 31 comments
Assignees
Labels
Status: Untriaged An issue that has yet to be triaged.

Comments

@Evantage-WS
Copy link

Version

bento/ubuntu-20.04-arm64 v202301.20.0

Environment

Vagrant 2.3.4

Scenario

Installing multiple vm's and randomly failing with login into vm with the vagrant ssh key at the vm creating stage, eventually it will time out with vagrant up.

As stated, it happens randomly, so when a login is succesful into a vm, it happens a lot that the internal networking remains working (172.16.0.0/24), but the internet connection is working for about a minute, then unreachable for 5 minutes or more, every time at a different stage in my Vagrantfile. I use multiple shell provisioners.

This all happens with the 202301.20.0 version. When using the 202112.19.0 version it all works fine, every time.

Steps to Reproduce

You need to setup multiple vm's and use a couple of shell provisioners and download data. In my case there are helm charts and apt packages.

Expected Result

With 202301.20.0 the same result as using 202112.19.0

Actual Result

Tried it 11 times, it fails all the time with 202301.20.0, tested it 4 times with 202112.19.0, 4 times succesful. I also tried it with bento/ubuntu-22.04-arm64, but the same login problems rise. I was unable to get to the shell provisioners part at my 2 tests with bento/ubuntu-22.04-arm64.

@Evantage-WS Evantage-WS added the Status: Untriaged An issue that has yet to be triaged. label Feb 13, 2023
@sford
Copy link

sford commented Feb 17, 2023

This feels similar to #1421. There is a bug where /etc/machine-id isn't unique, which can impact DHCP and cause network problems. Fix #1471 has been merged but boxes haven't been updated yet on vagrant cloud.

@Evantage-WS
Copy link
Author

@sford, thanks for your info, I will test this next week. Enjoy the weekend

@Evantage-WS
Copy link
Author

Hello @sford, it is indeed the same. Does anybody knows when the images are going to be updated? Thanks!

@Stromweld
Copy link
Collaborator

Beginning of march I plan to update the images. Hoping to keep a quarterly cadence. With the new hcl rewrite and dedicated builders for testing I’m currently working on getting the builds stable and working. Most x86_64 builds are working for virtualbox, parallels, and VMware, but aarch64, hyperv, and qemu machines still have a bit of tweaking needed to get builds working.

@Stromweld
Copy link
Collaborator

With that, any testing and fixes you may find, if you can submit PR’s for them that’ll be greatly appreciated.

@Evantage-WS
Copy link
Author

Hi, thanks for the reply. When I find something I will certainty look at it.

@Stromweld
Copy link
Collaborator

Stromweld commented Mar 5, 2023

Can someone test one of these images to verify it fixes the issues you guys are seeing? If so then I can work on getting them officially published. https://github.com/chef/bento/actions/runs/4328485365#artifacts

@Evantage-WS
Copy link
Author

Sure, but I need some help how to test the aarch64 Ubuntu 22.10, how do I specify this image in the Vagrant file?

@Stromweld
Copy link
Collaborator

If you download the image you want to test then you can use the vagrant box add command and give it a name like test/ubuntu-version and point to the downloaded file. Then in your vagrant file you’d set the image to the name you gave it.

@Evantage-WS
Copy link
Author

Evantage-WS commented Mar 6, 2023

Hi,

I have done it, it is picking up the new box, but I do get an error, when changing some SSH parameters, FYI, this is working in 20.04:

BTW, this is a config with only 1 vm in it.

    cluster1-single-node-cluster: -------------------------------------------------------------------------------------------------------------
    cluster1-single-node-cluster: ====  base.sh: START: stop and disable firewall
    cluster1-single-node-cluster: Synchronizing state of ufw.service with SysV service script with /lib/systemd/systemd-sysv-install.
    cluster1-single-node-cluster: Executing: /lib/systemd/systemd-sysv-install disable ufw
    cluster1-single-node-cluster: Removed "/etc/systemd/system/multi-user.target.wants/ufw.service".
    cluster1-single-node-cluster: ====  base.sh: END  : stop and disable firewall
    cluster1-single-node-cluster: ====  base.sh: START: copy SSH public key to root
    cluster1-single-node-cluster: ====  base.sh: END  : copy SSH public key to root
    cluster1-single-node-cluster: ====  base.sh: START: change SSH PasswordAuthentication
    cluster1-single-node-cluster: Failed to reload sshd.service: Unit sshd.service not found.
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

Maybe something changed in Ubuntu regarding SSH or something corrupt? https://codetryout.com/failed-to-restart-sshd-service-unit-sshd-service-not-found/

@Evantage-WS
Copy link
Author

Evantage-WS commented Mar 6, 2023

Hi, testing with 22.10, 4 vm's, gives problems:

This is with 20.04, working:

==> management-loadbalancer1: Registering VM image from the base box 'bento/ubuntu-20.04-arm64'...
==> management-loadbalancer1: Creating new virtual machine as a linked clone of the box image...
==> management-loadbalancer1: Unregistering the box VM image...
==> management-loadbalancer1: Setting the default configuration for VM...
==> management-loadbalancer1: Checking if box 'bento/ubuntu-20.04-arm64' version '202301.20.0' is up to date...
==> management-loadbalancer1: Setting the name of the VM: management-loadbalancer1
==> management-loadbalancer1: Preparing network interfaces based on configuration...
    management-loadbalancer1: Adapter 0: shared
    management-loadbalancer1: Adapter 1: hostonly
==> management-loadbalancer1: Clearing any previously set network interfaces...
==> management-loadbalancer1: Running 'pre-boot' VM customizations...
==> management-loadbalancer1: Booting VM...
==> management-loadbalancer1: Waiting for machine to boot. This may take a few minutes...
    management-loadbalancer1: SSH address: :22
    management-loadbalancer1: SSH username: vagrant
    management-loadbalancer1: SSH auth method: private key
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: 
    management-loadbalancer1: Vagrant insecure key detected. Vagrant will automatically replace
    management-loadbalancer1: this with a newly generated keypair for better security.
    management-loadbalancer1: 
    management-loadbalancer1: Inserting generated public key within guest...
    management-loadbalancer1: Removing insecure key from the guest if it's present...
    management-loadbalancer1: Key inserted! Disconnecting and reconnecting using new SSH key...
==> management-loadbalancer1: Machine booted and ready!
==> management-loadbalancer1: Checking for Parallels Tools installed on the VM...

This is with 22.10, I only changed the box in the config, failing:

==> management-loadbalancer1: Registering VM image from the base box 'test/ubuntu-2210'...
==> management-loadbalancer1: Creating new virtual machine as a linked clone of the box image...
==> management-loadbalancer1: Unregistering the box VM image...
==> management-loadbalancer1: Setting the default configuration for VM...
==> management-loadbalancer1: Setting the name of the VM: management-loadbalancer1
==> management-loadbalancer1: Preparing network interfaces based on configuration...
    management-loadbalancer1: Adapter 0: shared
    management-loadbalancer1: Adapter 1: hostonly
==> management-loadbalancer1: Clearing any previously set network interfaces...
==> management-loadbalancer1: Running 'pre-boot' VM customizations...
==> management-loadbalancer1: Booting VM...
==> management-loadbalancer1: Waiting for machine to boot. This may take a few minutes...
    management-loadbalancer1: SSH address: :22
    management-loadbalancer1: SSH username: vagrant
    management-loadbalancer1: SSH auth method: private key
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...

it times out

@Stromweld
Copy link
Collaborator

Thanks. What do you get with testing 22.04? I know starting with 22.04 and rhel 9 based machines they finally removed the old ssh key algorithms based on sha1 and less than 2048 bits. Since that was the old default it requires users to generate new ssh keys for auth. I’m not sure if there are any other ssh changes that would be affecting the service.

@Stromweld
Copy link
Collaborator

Your output for working 20.04 box is that based on new version from the PR build yet to be released or is that old version that is currently in vagrant cloud?

@Evantage-WS
Copy link
Author

It is the one in Vagrant Cloud

@Evantage-WS
Copy link
Author

Thanks. What do you get with testing 22.04? I know starting with 22.04 and rhel 9 based machines they finally removed the old ssh key algorithms based on sha1 and less than 2048 bits. Since that was the old default it requires users to generate new ssh keys for auth. I’m not sure if there are any other ssh changes that would be affecting the service.

I will test it with 22.04 today, give me a few hours

@Evantage-WS
Copy link
Author

With 22.04 same problem as with 22.10:

==> management-loadbalancer1: Registering VM image from the base box 'test/ubuntu-2204'...
==> management-loadbalancer1: Creating new virtual machine as a linked clone of the box image...
==> management-loadbalancer1: Unregistering the box VM image...
==> management-loadbalancer1: Setting the default configuration for VM...
==> management-loadbalancer1: Setting the name of the VM: management-loadbalancer1
==> management-loadbalancer1: Fixed port collision for 22 => 2222. Now on port 2200.
==> management-loadbalancer1: Preparing network interfaces based on configuration...
    management-loadbalancer1: Adapter 0: shared
    management-loadbalancer1: Adapter 1: hostonly
==> management-loadbalancer1: Clearing any previously set network interfaces...
==> management-loadbalancer1: Running 'pre-boot' VM customizations...
==> management-loadbalancer1: Booting VM...
==> management-loadbalancer1: Waiting for machine to boot. This may take a few minutes...
    management-loadbalancer1: SSH address: :22
    management-loadbalancer1: SSH username: vagrant
    management-loadbalancer1: SSH auth method: private key
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...

@Stromweld
Copy link
Collaborator

Can you try the 20.04 release in the linked build job? Want to rule out OS ssh changes or Bento build script changes.

@Evantage-WS
Copy link
Author

Hi, the 20.04 in the linked build job is working correctly

@Stromweld
Copy link
Collaborator

ok thanks, since all versions are running the same scripts/setup that means it probably has to do with dropping of old ssh algorithms. Can you see if updating your keys or adding to the sshd_config for weak ciphers fixes your issue for you. This article explains it pretty well. https://askubuntu.com/questions/1409105/ubuntu-22-04-ssh-the-rsa-key-isnt-working-since-upgrading-from-20-04.

Thanks again for all the help testing.

@sford
Copy link

sford commented Mar 6, 2023

Hi @Stromweld, this is probably the wrong place to document since I am not using arm64, but with respect to testing recent bug fixes... I successfully tested ubuntu-22.04-x86_64.virtualbox.box from that github actions link. I didn't run into any problems with DHCP, networking, or duplicate IPs 🎉

Some more details... I am using vagrant-libvirt so I installed ubuntu-22.04-x86_64.virtualbox.box and mutated it to libvirt provider.

Here are logs showing bento/ubuntu-22.04 and all VMs having unique IP addresses:

==> vm3:  -- Base box:          bento/ubuntu-22.04
==> vm2:  -- Base box:          bento/ubuntu-22.04
==> vm1:  -- Base box:          bento/ubuntu-22.04
==> vm6:  -- Base box:          bento/ubuntu-22.04
==> vm5:  -- Base box:          bento/ubuntu-22.04
==> vm7:  -- Base box:          bento/ubuntu-22.04
==> vm4:  -- Base box:          bento/ubuntu-22.04

vm1: SSH address: 192.168.121.121:22
vm2: SSH address: 192.168.121.11:22
vm6: SSH address: 192.168.121.126:22
vm5: SSH address: 192.168.121.190:22
vm7: SSH address: 192.168.121.210:22
vm4: SSH address: 192.168.121.140:22
vm3: SSH address: 192.168.121.69:22

I also ran our test suite on these VMs. We configure them with Puppet (sorry chef!) and run our tests. Everything passed and didn't run into any problems.

Thanks @Stromweld !

@Evantage-WS
Copy link
Author

ok thanks, since all versions are running the same scripts/setup that means it probably has to do with dropping of old ssh algorithms. Can you see if updating your keys or adding to the sshd_config for weak ciphers fixes your issue for you. This article explains it pretty well. https://askubuntu.com/questions/1409105/ubuntu-22-04-ssh-the-rsa-key-isnt-working-since-upgrading-from-20-04.

Thanks again for all the help testing.

Hi, it is logging in with the Vagrant user, so I guess it is using an self generated keypair. For my own keypair, I have exactly done as written in your link, but that is for my own user.

Log:

==> management-loadbalancer1: Registering VM image from the base box 'test/ubuntu-2210'...
==> management-loadbalancer1: Creating new virtual machine as a linked clone of the box image...
==> management-loadbalancer1: Unregistering the box VM image...
==> management-loadbalancer1: Setting the default configuration for VM...
==> management-loadbalancer1: Setting the name of the VM: management-loadbalancer1
==> management-loadbalancer1: Fixed port collision for 22 => 2222. Now on port 2200.
==> management-loadbalancer1: Preparing network interfaces based on configuration...
    management-loadbalancer1: Adapter 0: shared
    management-loadbalancer1: Adapter 1: hostonly
==> management-loadbalancer1: Clearing any previously set network interfaces...
==> management-loadbalancer1: Running 'pre-boot' VM customizations...
==> management-loadbalancer1: Booting VM...
==> management-loadbalancer1: Waiting for machine to boot. This may take a few minutes...
    management-loadbalancer1: SSH address: :22
    management-loadbalancer1: SSH username: vagrant
    management-loadbalancer1: SSH auth method: private key
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...

@Stromweld
Copy link
Collaborator

@sford Thanks for letting me know. Glad to hear that's fixed.

@Evantage-WS Thanks for all the testing. I'll try to dig into it some more and see if I can figure out the issue/fix.

@Evantage-WS
Copy link
Author

Great, thanks

@Evantage-WS
Copy link
Author

Hi @Stromweld, I do see a new release, is the problem I reported fixed so I can test it?

@Stromweld
Copy link
Collaborator

There hasn't been a new release of the Vagrant Ubuntu boxes for Arm architecture yet. I'm hoping to get those built by the end of this week.

@Stromweld Stromweld self-assigned this Mar 20, 2023
@Evantage-WS
Copy link
Author

Ok, thanks.

xinkecf35 added a commit to xinkecf35/vagrant-k8s-experiments that referenced this issue Apr 12, 2023
Turns out the issue with duplicate DHCP address is an issue with the cox itself in chef/bento#1473
@Stromweld
Copy link
Collaborator

I don't have a working ubuntu-20.04-arm64 box yet, but there is the 22.04-arm box with the fixes mentioned above. Does this fit the need and does this box also fix the dhcp issue https://app.vagrantup.com/bento/boxes/ubuntu-22.04-arm64?

@Evantage-WS
Copy link
Author

Evantage-WS commented May 8, 2023

Hi @Stromweld, just a quick test, unfortunally the same problem:

==> management-loadbalancer1: Box 'bento/ubuntu-22.04-arm64' could not be found. Attempting to find and install...
    management-loadbalancer1: Box Provider: parallels
    management-loadbalancer1: Box Version: 202304.27.0
==> management-loadbalancer1: Loading metadata for box 'bento/ubuntu-22.04-arm64'
    management-loadbalancer1: URL: https://vagrantcloud.com/bento/ubuntu-22.04-arm64
==> management-loadbalancer1: Adding box 'bento/ubuntu-22.04-arm64' (v202304.27.0) for provider: parallels
    management-loadbalancer1: Downloading: https://vagrantcloud.com/bento/boxes/ubuntu-22.04-arm64/versions/202304.27.0/providers/parallels.box
==> management-loadbalancer1: Successfully added box 'bento/ubuntu-22.04-arm64' (v202304.27.0) for 'parallels'!
==> management-loadbalancer1: Registering VM image from the base box 'bento/ubuntu-22.04-arm64'...
==> management-loadbalancer1: Creating new virtual machine as a linked clone of the box image...
==> management-loadbalancer1: Unregistering the box VM image...
==> management-loadbalancer1: Setting the default configuration for VM...
==> management-loadbalancer1: Checking if box 'bento/ubuntu-22.04-arm64' version '202304.27.0' is up to date...
==> management-loadbalancer1: Setting the name of the VM: management-loadbalancer1
==> management-loadbalancer1: Preparing network interfaces based on configuration...
    management-loadbalancer1: Adapter 0: shared
    management-loadbalancer1: Adapter 1: hostonly
==> management-loadbalancer1: Clearing any previously set network interfaces...
==> management-loadbalancer1: Running 'pre-boot' VM customizations...
==> management-loadbalancer1: Booting VM...
==> management-loadbalancer1: Waiting for machine to boot. This may take a few minutes...
    management-loadbalancer1: SSH address: :22
    management-loadbalancer1: SSH username: vagrant
    management-loadbalancer1: SSH auth method: private key
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...
    management-loadbalancer1: Warning: Connection refused. Retrying...

@Stromweld
Copy link
Collaborator

I believe this is actually a vagrant issue and should be fixed in the latest version of vagrant with the ssh key algorithm fix https://github.com/hashicorp/vagrant/blob/v2.3.7/CHANGELOG.md

I also just released new builds of 20.04-arm64 and 22.04-arm64. Can you confirm this is working for you now?

@Evantage-WS
Copy link
Author

Hi @Stromweld,

Thanks, I have upgraded Vagrant and updated my sources to use the newly created box, but when using it I do get an EFI_RNG_PROTOCOL unavailable error. When using the 20.04 image, it is working fine, so it think it is something in the newly created image
image

@Stromweld
Copy link
Collaborator

Both boxes are working for me. Please check that your vagrant files are the same.
image
Here's my test-kitchen vagrant file:

Vagrant.configure("2") do |c|
  c.berkshelf.enabled = false if Vagrant.has_plugin?("vagrant-berkshelf")
  c.vm.box = "bento/ubuntu-22.04-arm64"
  c.vm.hostname = "default-ubuntu-2204.vagrantup.com"
  c.vm.synced_folder ".", "/vagrant", disabled: true
  c.vm.synced_folder "$HOME/.kitchen/cache", "/tmp/omnibus/cache", create: true
  c.vm.provider :parallels do |p|
  end
end

Closing this issue. Please feel free to open a new issue or reopen this issue if problems are found with boxes or code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Untriaged An issue that has yet to be triaged.
Projects
None yet
Development

No branches or pull requests

3 participants