New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui: should we hide or move the decomissioned nodes list? #24636

Open
wzrdtales opened this Issue Apr 10, 2018 · 14 comments

Comments

5 participants
@wzrdtales
Copy link

wzrdtales commented Apr 10, 2018

QUESTION

Hey there, since I can only find topics on when decomissioned nodes are being hidden from the ui, but never found any information anywhere, when dead nodes will be hidden. So in this case one node died completely and was replaced with a complete new one, but it keeps in the ui forever and just does not disappear. Although I did marked it as decomissioned afterwards.

Is there any way to finalize the removal?

@benesch

This comment has been minimized.

Copy link
Collaborator

benesch commented Apr 10, 2018

Hi @wzrdtales! We never remove dead nodes from the UI unless you decommission them. Are you having trouble decommissioning dead nodes? I can't quite tell from your report.

@wzrdtales

This comment has been minimized.

Copy link

wzrdtales commented Apr 10, 2018

@benesch The nodes were dead before I decomissioned them. So now there is unnecessary trash in the UI which never comes back to live, since the new node registered under the same name as the old one registered itself as a complete new node. So yeah basically I have a problem with getting rid of dead nodes, decomissioning them is not the problem, but they still don't disappear from the ui, they're not dead anymore, but they're still trash flying around in the ui now in form of decomissioned nodes.

Now it looks like that, and that is probably not getting better.

image

@benesch

This comment has been minimized.

Copy link
Collaborator

benesch commented Apr 11, 2018

@wzrdtales

This comment has been minimized.

Copy link

wzrdtales commented Apr 11, 2018

Having an archive would be I think more meaningful. What I expected UX wise, that old decomissioned node just went invisible after a certain amount of time (a day or so) unless you look them up in the archive. Having this on the dashboard is UX wise not the best decision IMHO, since you make the assumption something is wrong, while it is not. By the way how wide is the integer storing the node numbers? Just curious.

@benesch

This comment has been minimized.

Copy link
Collaborator

benesch commented Apr 12, 2018

Yeah, this view used to be better buried under the "View node list" link, but it appears the view was repurposed for the new 2.0 admin UI homepage, putting these decommissioned nodes front and center. /cc @couchand @vilterp

The limiting factor on the node ID is probably the int32 used in the NodeDescriptor proto:

optional int32 node_id = 1 [(gogoproto.nullable) = false,

@wzrdtales

This comment has been minimized.

Copy link

wzrdtales commented Apr 12, 2018

ok than there is at least room for 2 billion nodes (or dead ones) :p at least I guess that int32 is not unsigned here.

Displaying them for a certain amount of time wouldn't be that wrong, but hiding them afterwards would make sense IMHO.

@couchand

This comment has been minimized.

Copy link
Collaborator

couchand commented Apr 12, 2018

We've actually talked about this quite a bit, mostly offline, but you can see some of the history in issues going way back to the first implementation (see the comments starting at #17553 (comment)). I think @benesch's point is strong: there's a gap in the node ids which is useful to explain, and there is also a cost to the cluster associated with decommissioned nodes beyond just using the bits of the id field.

I can see the value in having that information be available if you seek it out, but not front and center. I'm not entirely sold on the argument that it makes it seem like something is wrong -- I think the meaning of "decommission" is pretty clear in this case, but it's not information you need daily, so it adds noise to the overview page.

As an aside, best practices when a node dies unexpectedly is to bring it back up, not replace it with a new one. All this requires is the data from the store directory. The decommissioning process is intended mostly for planned cluster changes, and the UX has been designed around that usage pattern.

As just one more aside, I was under the impression that removing decommissioned nodes from the liveness table eventually was still in the works (removing the nodes from the liveness table would mean the cluster actually forgets about them and the cost I alluded to above is recovered). At the end of #17553 that was punted to #15609, which I see was recently closed without any action being taken. It also looks like #20639 was opened to address this exact question, which was also closed in favor of #15609, so perhaps it was a mistake to close #15609 at all? @tschottdorf you commented on a few of these, what do you think? It sounds like your perspective is that we should just get rid of the "Decommissioned" list entirely, how does that square with @benesch's argument?

@couchand couchand changed the title How to hide dead nodes? ui: should we hide or move the decomissioned nodes list? Apr 12, 2018

@couchand couchand added this to Release: Features in Web UI Apr 12, 2018

@couchand

This comment has been minimized.

Copy link
Collaborator

couchand commented Apr 12, 2018

@wzrdtales

This comment has been minimized.

Copy link

wzrdtales commented Apr 12, 2018

@couchand So this nodes died in completeness: Means no data available anymore, how are you supposed to bring this back up? In fact: They came back up with the same name, but registered as a complete new node. If I would have had an option to reregister them as the same node as before, but which lost all of his data, I would have done this though.

@couchand

This comment has been minimized.

Copy link
Collaborator

couchand commented Apr 12, 2018

There's no concept in CockroachDB of restoring a node that has lost all data. If the data is gone, you simply can't bring it back up. The node's identity is completely independent of the interface it's listening on: a new node on the old host and port will still be a new node, and an old node on a new host and port will still be the old node.

@wzrdtales

This comment has been minimized.

Copy link

wzrdtales commented Apr 12, 2018

ok, then I will have a lot of decomissioned nodes in the current scenario. Any node with all data could be gone at any random point in time though. But always enough left to restore the new one.

@knz

This comment has been minimized.

Copy link
Member

knz commented Dec 20, 2018

@knz knz added the S-3-ux-surprise label Dec 20, 2018

@Timbo000002

This comment has been minimized.

Copy link

Timbo000002 commented Dec 20, 2018

Use Case to support remove option to clear decommissioned nodes from UI/DB (not dead node):
We are trying to use cockroachdb in the cloud, hosts will be replaced/refreshed every 30 days using Jenkins.
Running a cluster of 4 in 3 regions (Oregon, NOVA, Ohio) is at least 12 nodes per month accumulating under Decommissioned in prod.
We don't care about those nodes, they are gone forever.
Dev/test where we build rotation processes, is getting messy with all the decomms in the UI.
Recommendation: Build option to clear with time frame of how far back, let customer decide between info and clutter. Send a query that can purge decomms from the DB and it will be added to our decomm code.

@couchand

This comment has been minimized.

Copy link
Collaborator

couchand commented Jan 7, 2019

Unless something has changed that I'm unaware of, it's not accurate to say that decommissioned nodes are gone forever: there is still an overhead that will continue to accumulate as nodes are continually decommissioned. As long as this exists, it needs to be reportable on the front end.

Thus @Timbo000002, I don't believe your suggestion applies to the web UI, but rather to the decommissioning process of the cluster itself. This issue is limited to the suggestion to change the way that we display nodes which have been decommissioned but the cluster still remembers. Your suggestion is valid, so I've opened another issue to track it: #33542.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment