[BUG] Error writing node IP when join node, The status has been constantly in Upgrading #648

xuzheng0017 · 2023-12-05T05:16:03Z

Describe the bug
Error writing node IP when join node, The status has been constantly in Upgrading.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Environments (please complete the following information):

OS: centos7.9
AutoK3s Version: 0.9.1

Additional context

time="2023-12-05T11:59:34+08:00" level=info msg="the 4/5 time tring to ssh to 74.48.115.18:22 with user root"
https://mirrors.sonic.net/epel/7/x86_64/repodata/d526a7fd5dbf31d263829b2d144a41ca6126a8ead6d8a75fe0da87b1f250efb1-primary.sqlite.bz2: [Errno 14] HTTPS Error 404 - Not Found
Trying other mirror.
To address this issue please refer to the below wiki article
https://wiki.centos.org/yum-errors
If above article doesn't help to resolve this issue please use https://bugs.centos.org/.
http://mirror.tornadovps.com/pub/epel/7/x86_64/repodata/d526a7fd5dbf31d263829b2d144a41ca6126a8ead6d8a75fe0da87b1f250efb1-primary.sqlite.bz2: [Errno 14] HTTP Error 404 - Not Found
Trying other mirror.
time="2023-12-05T12:00:04+08:00" level=info msg="the 5/5 time tring to ssh to 74.48.115.18:22 with user root"
Package yum-utils-1.1.31-54.el7_8.noarch already installed and latest version
Nothing to do
Loaded plugins: fastestmirror
Command line error: no such option: --refresh
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirror.web-ster.com
* epel: lolhost.mm.fcix.net
* extras: mirrors.oit.uci.edu
* updates: mirror.sfo12.us.leaseweb.net
Package yum-utils-1.1.31-54.el7_8.noarch already installed and latest version
Nothing to do
Loaded plugins: fastestmirror
Package yum-utils-1.1.31-54.el7_8.noarch already installed and latest version
Nothing to do
Command line error: no such option: --refresh
Loaded plugins: fastestmirror
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: ix-denver.mm.fcix.net
* epel: mirrors.ocf.berkeley.edu
* extras: mirrors.oit.uci.edu
* updates: mirror.sfo12.us.leaseweb.net
Command line error: no such option: --refresh
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirror.web-ster.com
* epel: mirrors.ocf.berkeley.edu
* extras: mirrors.oit.uci.edu
* updates: ix-denver.mm.fcix.net
Package yum-utils-1.1.31-54.el7_8.noarch already installed and latest version
Nothing to do
Loaded plugins: fastestmirror

JacieChao · 2023-12-05T06:01:52Z

Thanks for your feedback.

Are all cluster nodes CentOS 7.9 or only the newly added worker node is using CentOS 7.9?
It seemed that the new worker node could not fetch from the rpm mirror.
Could you please provide the join node parameters and the full log with joined the new node?

xuzheng0017 · 2023-12-05T06:15:29Z

vps-regtech.log
This is all the logs of this cluster.
all cluster nodes CentOS 7.9, When I added a new batch of nodes, an IP address was written incorrectly. The cluster is currently in an Upgraded state

JacieChao · 2023-12-05T06:34:15Z

Is the joining node action stuck at the last line of the log you provide?
It seems like AutoK3s can't access the node 74.48.115.18 by SSH tunnel. Is this node IP the incorrect one?

xuzheng0017 · 2023-12-05T06:47:59Z

Q1: Yes
Q2: 74.48.115.18 is wrong, I made a mistake in writing. emmmmmm...

JacieChao · 2023-12-05T14:08:17Z

Sometimes the native provider can't catch the join error correctly. When this situation happens, the cluster's status will be Upgrade forever.
I will find out if there's a workaround.

xuzheng0017 · 2023-12-05T16:02:28Z

Okay, I'll rebuild the cluster. Thank you for your answer.
Best wishes to you.

JacieChao · 2023-12-06T01:31:59Z

@xuzheng0017 There's no need to rebuild the cluster. The K3s cluster won't impact by the AutoK3s cluster status.

xuzheng0017 · 2023-12-06T01:46:36Z

Okay, but I want to join other nodes to the page without any options.

JacieChao · 2023-12-06T02:10:16Z

The workaround below may help you:

Using kubectl get nodes to check whether the batch of nodes have joined to the cluster successfully.
If not, use autok3s join CLI to join a node to refresh the cluster status.

autok3s join -p native --name jacie-test --ip <master-ip> --ssh-user <your-ssh-user> --ssh-key-path <your-ssh-key-path> --worker-ips <one-worker-ip>

Once the join process is complete, the cluster status will be refreshed to Running and the UI can work properly.

JacieChao · 2023-12-06T02:12:08Z

The bug is relative to the wrong catch of error in defer function. Will fix this in the next version.

xuzheng0017 · 2023-12-06T07:03:14Z

I have encountered another problem:
When I deleted a node in kube-explorer and then returned to the cluster page, the number of nodes did not decrease.
I added the node again with the command:

81d5d17a77de:/home/shell # autok3s join -p native --name vps-cargogo --ip xx.xx.xx.xx --ssh-user root --ssh-key-path /root/.autok3s/vps-cargogo/id_rsa --worker-ips xx.xx.xx.xx
time="2023-12-06T14:53:03+08:00" level=info msg="[native] begin to join nodes for vps-cargogo..."
time="2023-12-06T14:53:03+08:00" level=info msg="[native] executing join k3s node logic"
time="2023-12-06T14:53:03+08:00" level=info msg="[native] successfully executed join k3s node logic"
time="2023-12-06T14:53:03+08:00" level=info msg="[native] successfully executed join logic"

xuzheng0017 · 2023-12-06T07:05:11Z

Can only use commands on nodes to rejoin?

JacieChao · 2023-12-06T07:12:39Z

Yes. AutoK3s can't synchronize your operation because the node was removed manually and didn't synchronize the AutoK3s database. So you can't rejoin the node by AutoK3s because the node is already in the cluster by AutoK3s side.
The workaround is to add the node back by K3s CLI manually for now.

JacieChao · 2024-01-09T06:26:40Z

tested with v0.9.2-rc1. AutoK3s can return the correct status of the cluster if join nodes fail.
Close as complete

xuzheng0017 added the bug Something isn't working label Dec 5, 2023

xuzheng0017 changed the title ~~[BUG]~~ [BUG] Error writing node IP when join node, The status has been constantly in Upgrading Dec 5, 2023

cnrancher deleted a comment from JacieChao Dec 5, 2023

JacieChao self-assigned this Dec 6, 2023

JacieChao added this to the v0.9.2 milestone Dec 6, 2023

JacieChao mentioned this issue Dec 6, 2023

fix(cluster): fix stuck in Upgrading status when join nodes failed #649

Merged

JacieChao added dev working to test Need to test and removed dev working labels Dec 6, 2023

JacieChao closed this as completed Jan 9, 2024

JacieChao mentioned this issue May 6, 2024

[BUG] autok3s ui show unkown for deleted node #671

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Error writing node IP when join node, The status has been constantly in Upgrading #648

[BUG] Error writing node IP when join node, The status has been constantly in Upgrading #648

xuzheng0017 commented Dec 5, 2023

JacieChao commented Dec 5, 2023

xuzheng0017 commented Dec 5, 2023

JacieChao commented Dec 5, 2023

xuzheng0017 commented Dec 5, 2023

JacieChao commented Dec 5, 2023

xuzheng0017 commented Dec 5, 2023

JacieChao commented Dec 6, 2023

xuzheng0017 commented Dec 6, 2023

JacieChao commented Dec 6, 2023

JacieChao commented Dec 6, 2023

xuzheng0017 commented Dec 6, 2023

xuzheng0017 commented Dec 6, 2023

JacieChao commented Dec 6, 2023 •

edited

Loading

JacieChao commented Jan 9, 2024

[BUG] Error writing node IP when join node, The status has been constantly in Upgrading #648

[BUG] Error writing node IP when join node, The status has been constantly in Upgrading #648

Comments

xuzheng0017 commented Dec 5, 2023

JacieChao commented Dec 5, 2023

xuzheng0017 commented Dec 5, 2023

JacieChao commented Dec 5, 2023

xuzheng0017 commented Dec 5, 2023

JacieChao commented Dec 5, 2023

xuzheng0017 commented Dec 5, 2023

JacieChao commented Dec 6, 2023

xuzheng0017 commented Dec 6, 2023

JacieChao commented Dec 6, 2023

JacieChao commented Dec 6, 2023

xuzheng0017 commented Dec 6, 2023

xuzheng0017 commented Dec 6, 2023

JacieChao commented Dec 6, 2023 • edited Loading

JacieChao commented Jan 9, 2024

JacieChao commented Dec 6, 2023 •

edited

Loading