ui: should we hide or move the decomissioned nodes list? #24636

wzrdtales · 2018-04-10T17:54:59Z

QUESTION

Hey there, since I can only find topics on when decomissioned nodes are being hidden from the ui, but never found any information anywhere, when dead nodes will be hidden. So in this case one node died completely and was replaced with a complete new one, but it keeps in the ui forever and just does not disappear. Although I did marked it as decomissioned afterwards.

Is there any way to finalize the removal?

benesch · 2018-04-10T21:07:11Z

Hi @wzrdtales! We never remove dead nodes from the UI unless you decommission them. Are you having trouble decommissioning dead nodes? I can't quite tell from your report.

wzrdtales · 2018-04-10T23:29:44Z

@benesch The nodes were dead before I decomissioned them. So now there is unnecessary trash in the UI which never comes back to live, since the new node registered under the same name as the old one registered itself as a complete new node. So yeah basically I have a problem with getting rid of dead nodes, decomissioning them is not the problem, but they still don't disappear from the ui, they're not dead anymore, but they're still trash flying around in the ui now in form of decomissioned nodes.

Now it looks like that, and that is probably not getting better.

benesch · 2018-04-11T03:36:32Z

Yep, that’s expected. We never forget about decommissioned nodes. You won’t see them anywhere but that one list you screenshotted, though; we’ve hidden them from e.g. graphs views in which those nodes have no data. At least, if you see them anywhere else, it's a bug. Decommissioning is not entirely free, so showing those decommissioned nodes in the UI reminds you of the baggage your cluster will have to carry around forever. It also explains to future administrations why your nodes are numbered n1, n2, and n8. That said, we might want to consider making the decommissioned nodes section collapsed by default. Would that adequately address your concern?

…

On Tue, Apr 10, 2018 at 7:29 PM Tobias Gurtzick ***@***.***> wrote: @benesch <https://github.com/benesch> The nodes were dead before I decomissed them. So now there is unnecessary trash in the UI which never comes back to live, since the new node registered under the same name as the old one registered itself as a complete new node. So yeah basically I have a problem with getting rid of dead nodes, decomissioning them is not the problem, but they still don't disappear from the ui, they're not dead anymore, but they're still trash flying around in the ui now in form of decomissioned nodes. Now it looks like that, and that is probably not getting better. [image: image] <https://user-images.githubusercontent.com/1786821/38588739-c3b79726-3d27-11e8-8b49-56f0abc68ac4.png> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24636 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA15IGwtGF_P-nhphZDUjosHU5morTDRks5tnUB2gaJpZM4TOucl> .

wzrdtales · 2018-04-11T08:58:20Z

Having an archive would be I think more meaningful. What I expected UX wise, that old decomissioned node just went invisible after a certain amount of time (a day or so) unless you look them up in the archive. Having this on the dashboard is UX wise not the best decision IMHO, since you make the assumption something is wrong, while it is not. By the way how wide is the integer storing the node numbers? Just curious.

benesch · 2018-04-12T05:04:01Z

Yeah, this view used to be better buried under the "View node list" link, but it appears the view was repurposed for the new 2.0 admin UI homepage, putting these decommissioned nodes front and center. /cc @couchand @vilterp

The limiting factor on the node ID is probably the int32 used in the NodeDescriptor proto:

cockroach/pkg/roachpb/metadata.proto

Line 145 in bfa9931

optional int32 node_id = 1 [(gogoproto.nullable) = false,

wzrdtales · 2018-04-12T08:20:24Z

ok than there is at least room for 2 billion nodes (or dead ones) :p at least I guess that int32 is not unsigned here.

Displaying them for a certain amount of time wouldn't be that wrong, but hiding them afterwards would make sense IMHO.

couchand · 2018-04-12T14:41:50Z

We've actually talked about this quite a bit, mostly offline, but you can see some of the history in issues going way back to the first implementation (see the comments starting at #17553 (comment)). I think @benesch's point is strong: there's a gap in the node ids which is useful to explain, and there is also a cost to the cluster associated with decommissioned nodes beyond just using the bits of the id field.

I can see the value in having that information be available if you seek it out, but not front and center. I'm not entirely sold on the argument that it makes it seem like something is wrong -- I think the meaning of "decommission" is pretty clear in this case, but it's not information you need daily, so it adds noise to the overview page.

As an aside, best practices when a node dies unexpectedly is to bring it back up, not replace it with a new one. All this requires is the data from the store directory. The decommissioning process is intended mostly for planned cluster changes, and the UX has been designed around that usage pattern.

As just one more aside, I was under the impression that removing decommissioned nodes from the liveness table eventually was still in the works (removing the nodes from the liveness table would mean the cluster actually forgets about them and the cost I alluded to above is recovered). At the end of #17553 that was punted to #15609, which I see was recently closed without any action being taken. It also looks like #20639 was opened to address this exact question, which was also closed in favor of #15609, so perhaps it was a mistake to close #15609 at all? @tschottdorf you commented on a few of these, what do you think? It sounds like your perspective is that we should just get rid of the "Decommissioned" list entirely, how does that square with @benesch's argument?

couchand · 2018-04-12T16:40:40Z

cc @piyush-singh

wzrdtales · 2018-04-12T17:46:38Z

@couchand So this nodes died in completeness: Means no data available anymore, how are you supposed to bring this back up? In fact: They came back up with the same name, but registered as a complete new node. If I would have had an option to reregister them as the same node as before, but which lost all of his data, I would have done this though.

couchand · 2018-04-12T18:39:01Z

There's no concept in CockroachDB of restoring a node that has lost all data. If the data is gone, you simply can't bring it back up. The node's identity is completely independent of the interface it's listening on: a new node on the old host and port will still be a new node, and an old node on a new host and port will still be the old node.

wzrdtales · 2018-04-12T21:22:10Z

ok, then I will have a lot of decomissioned nodes in the current scenario. Any node with all data could be gone at any random point in time though. But always enough left to restore the new one.

knz · 2018-12-20T09:17:55Z

A user was surprised again about this here: https://forum.cockroachlabs.com/t/remove-decommissioned-nodes-that-dont-exist-anymore/1523/12

Timbo000002 · 2018-12-20T14:01:34Z

Use Case to support remove option to clear decommissioned nodes from UI/DB (not dead node):
We are trying to use cockroachdb in the cloud, hosts will be replaced/refreshed every 30 days using Jenkins.
Running a cluster of 4 in 3 regions (Oregon, NOVA, Ohio) is at least 12 nodes per month accumulating under Decommissioned in prod.
We don't care about those nodes, they are gone forever.
Dev/test where we build rotation processes, is getting messy with all the decomms in the UI.
Recommendation: Build option to clear with time frame of how far back, let customer decide between info and clutter. Send a query that can purge decomms from the DB and it will be added to our decomm code.

couchand · 2019-01-07T17:18:41Z

Unless something has changed that I'm unaware of, it's not accurate to say that decommissioned nodes are gone forever: there is still an overhead that will continue to accumulate as nodes are continually decommissioned. As long as this exists, it needs to be reportable on the front end.

Thus @Timbo000002, I don't believe your suggestion applies to the web UI, but rather to the decommissioning process of the cluster itself. This issue is limited to the suggestion to change the way that we display nodes which have been decommissioned but the cluster still remembers. Your suggestion is valid, so I've opened another issue to track it: #33542.

tim-o · 2019-07-15T19:16:44Z

Zendesk ticket #3096 has been linked to this issue.

robert-s-lee · 2019-09-18T02:43:37Z

”repave” is becoming a common technique for applying OS patch. machines are immutable - So the patching requires a roll out a new nodes. Not cleaning up decommissioned nodes will be problematic

Annebirzin · 2019-11-25T21:16:59Z

Designs for updates to the decommissioned node list can be found here: https://zpl.io/boG0kEo

These illustrate the following user stories:

As an application developer I need to:

See when the decommissioning of a node is in progress in the Admin UI
Have the ability to clear decommissioned nodes from the overview page
Have the ability to view a history of all decommissioned nodes

A possible edge case user story:

get alerted when a node cannot be decommissioned because there are only 3 nodes (meaning there are no nodes available to transfer to)

Note: we are also looking to update the 'Node Status' numbers at the top of the overview page for more accuracy: Currently, when a node is decommissioning, the UI displays that node as 'Suspect' which isn't accurate.

Instead, we will update the 'Live Nodes' count. For example, if you have 9 live nodes and decommission 1, the live nodes count updates to 8 and the suspect nodes count stays at 0.

- url path for new page is following: '#/reports/nodes/history', please let me know if it has to be changed to something more meaningful. I decided to make it more generic in case we want to extend it with other 'historical' content. Also, it is nested route under 'nodes' as it belongs to this entity. - this page is supposed to display the history of all decommissioned nodes in one place; - the logic of retrieving and displaying nodes are almost the same as in Nodes Overview container with adjustments according to provided designs; - Redux selectors, and helper functions are refactored to external utils module so it reduces code duplication; Release note (admin ui change): Decommissioned node history page is added as a dedicated page to reduce amount of information on main Cluster Overview page (issue: cockroachdb#24636)

Release note (admin ui change): Link on Decommissioned Node List from Cluster Overview page (cockroachdb#24636)

Release note (admin ui change): Link on Decommissioned Node List from Debug page (cockroachdb#24636)

couchand added the community-questions label Apr 12, 2018

couchand changed the title ~~How to hide dead nodes?~~ ui: should we hide or move the decomissioned nodes list? Apr 12, 2018

couchand added this to Release: Features in Web UI Apr 12, 2018

knz added O-community Originated from the community C-question A question rather than an issue. No code/spec/doc change needed. and removed O-community-questions labels Apr 24, 2018

couchand added the A-webui-general Issues on the DB Console that span multiple areas or don't have another clear category. label Apr 24, 2018

knz added the S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. label Dec 20, 2018

couchand mentioned this issue Jan 7, 2019

server: option to obliterate decommissioned nodes from cluster memory #33542

Closed

piyush-singh added this to Backlog in Cluster UI Oct 29, 2019

koorosh moved this from Backlog to Milestone: In Progress in Cluster UI Nov 27, 2019

koorosh self-assigned this Dec 5, 2019

koorosh moved this from Milestone: In Progress to Milestone: In Review in Cluster UI Dec 23, 2019

koorosh added a commit to koorosh/cockroach that referenced this issue Dec 26, 2019

ui: Add 'Decommissioned node history' link on Nodes Overview page

88a71ab

Release note (admin ui change): Link on Decommissioned Node List from Cluster Overview page (cockroachdb#24636)

koorosh added a commit to koorosh/cockroach that referenced this issue Dec 26, 2019

ui: Add 'Decommissioned node history' link on Debug page

a183da8

Release note (admin ui change): Link on Decommissioned Node List from Debug page (cockroachdb#24636)

Annebirzin mentioned this issue Jan 10, 2020

ui: decommissioned nodes follow up work #43881

Closed

1 task

piyush-singh closed this as completed Feb 20, 2020

Cluster UI automation moved this from Milestone: In Review to Milestone: Done Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ui: should we hide or move the decomissioned nodes list? #24636

ui: should we hide or move the decomissioned nodes list? #24636

wzrdtales commented Apr 10, 2018

benesch commented Apr 10, 2018

wzrdtales commented Apr 10, 2018 •

edited

Loading

benesch commented Apr 11, 2018 via email

wzrdtales commented Apr 11, 2018 •

edited

Loading

benesch commented Apr 12, 2018

wzrdtales commented Apr 12, 2018

couchand commented Apr 12, 2018

couchand commented Apr 12, 2018

wzrdtales commented Apr 12, 2018

couchand commented Apr 12, 2018 •

edited

Loading

wzrdtales commented Apr 12, 2018

knz commented Dec 20, 2018

Timbo000002 commented Dec 20, 2018

couchand commented Jan 7, 2019

tim-o commented Jul 15, 2019

robert-s-lee commented Sep 18, 2019

Annebirzin commented Nov 25, 2019

ui: should we hide or move the decomissioned nodes list? #24636

ui: should we hide or move the decomissioned nodes list? #24636

Comments

wzrdtales commented Apr 10, 2018

benesch commented Apr 10, 2018

wzrdtales commented Apr 10, 2018 • edited Loading

benesch commented Apr 11, 2018 via email

wzrdtales commented Apr 11, 2018 • edited Loading

benesch commented Apr 12, 2018

wzrdtales commented Apr 12, 2018

couchand commented Apr 12, 2018

couchand commented Apr 12, 2018

wzrdtales commented Apr 12, 2018

couchand commented Apr 12, 2018 • edited Loading

wzrdtales commented Apr 12, 2018

knz commented Dec 20, 2018

Timbo000002 commented Dec 20, 2018

couchand commented Jan 7, 2019

tim-o commented Jul 15, 2019

robert-s-lee commented Sep 18, 2019

Annebirzin commented Nov 25, 2019

wzrdtales commented Apr 10, 2018 •

edited

Loading

wzrdtales commented Apr 11, 2018 •

edited

Loading

couchand commented Apr 12, 2018 •

edited

Loading