Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui: should we hide or move the decomissioned nodes list? #24636

Closed
wzrdtales opened this issue Apr 10, 2018 · 17 comments
Closed

ui: should we hide or move the decomissioned nodes list? #24636

wzrdtales opened this issue Apr 10, 2018 · 17 comments
Assignees
Labels
A-webui-general Issues on the DB Console that span multiple areas or don't have another clear category. C-question A question rather than an issue. No code/spec/doc change needed. O-community Originated from the community S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.

Comments

@wzrdtales
Copy link

QUESTION

Hey there, since I can only find topics on when decomissioned nodes are being hidden from the ui, but never found any information anywhere, when dead nodes will be hidden. So in this case one node died completely and was replaced with a complete new one, but it keeps in the ui forever and just does not disappear. Although I did marked it as decomissioned afterwards.

Is there any way to finalize the removal?

@benesch
Copy link
Contributor

benesch commented Apr 10, 2018

Hi @wzrdtales! We never remove dead nodes from the UI unless you decommission them. Are you having trouble decommissioning dead nodes? I can't quite tell from your report.

@wzrdtales
Copy link
Author

wzrdtales commented Apr 10, 2018

@benesch The nodes were dead before I decomissioned them. So now there is unnecessary trash in the UI which never comes back to live, since the new node registered under the same name as the old one registered itself as a complete new node. So yeah basically I have a problem with getting rid of dead nodes, decomissioning them is not the problem, but they still don't disappear from the ui, they're not dead anymore, but they're still trash flying around in the ui now in form of decomissioned nodes.

Now it looks like that, and that is probably not getting better.

image

@benesch
Copy link
Contributor

benesch commented Apr 11, 2018 via email

@wzrdtales
Copy link
Author

wzrdtales commented Apr 11, 2018

Having an archive would be I think more meaningful. What I expected UX wise, that old decomissioned node just went invisible after a certain amount of time (a day or so) unless you look them up in the archive. Having this on the dashboard is UX wise not the best decision IMHO, since you make the assumption something is wrong, while it is not. By the way how wide is the integer storing the node numbers? Just curious.

@benesch
Copy link
Contributor

benesch commented Apr 12, 2018

Yeah, this view used to be better buried under the "View node list" link, but it appears the view was repurposed for the new 2.0 admin UI homepage, putting these decommissioned nodes front and center. /cc @couchand @vilterp

The limiting factor on the node ID is probably the int32 used in the NodeDescriptor proto:

optional int32 node_id = 1 [(gogoproto.nullable) = false,

@wzrdtales
Copy link
Author

ok than there is at least room for 2 billion nodes (or dead ones) :p at least I guess that int32 is not unsigned here.

Displaying them for a certain amount of time wouldn't be that wrong, but hiding them afterwards would make sense IMHO.

@couchand
Copy link
Contributor

We've actually talked about this quite a bit, mostly offline, but you can see some of the history in issues going way back to the first implementation (see the comments starting at #17553 (comment)). I think @benesch's point is strong: there's a gap in the node ids which is useful to explain, and there is also a cost to the cluster associated with decommissioned nodes beyond just using the bits of the id field.

I can see the value in having that information be available if you seek it out, but not front and center. I'm not entirely sold on the argument that it makes it seem like something is wrong -- I think the meaning of "decommission" is pretty clear in this case, but it's not information you need daily, so it adds noise to the overview page.

As an aside, best practices when a node dies unexpectedly is to bring it back up, not replace it with a new one. All this requires is the data from the store directory. The decommissioning process is intended mostly for planned cluster changes, and the UX has been designed around that usage pattern.

As just one more aside, I was under the impression that removing decommissioned nodes from the liveness table eventually was still in the works (removing the nodes from the liveness table would mean the cluster actually forgets about them and the cost I alluded to above is recovered). At the end of #17553 that was punted to #15609, which I see was recently closed without any action being taken. It also looks like #20639 was opened to address this exact question, which was also closed in favor of #15609, so perhaps it was a mistake to close #15609 at all? @tschottdorf you commented on a few of these, what do you think? It sounds like your perspective is that we should just get rid of the "Decommissioned" list entirely, how does that square with @benesch's argument?

@couchand couchand changed the title How to hide dead nodes? ui: should we hide or move the decomissioned nodes list? Apr 12, 2018
@couchand couchand added this to Release: Features in Web UI Apr 12, 2018
@couchand
Copy link
Contributor

cc @piyush-singh

@wzrdtales
Copy link
Author

@couchand So this nodes died in completeness: Means no data available anymore, how are you supposed to bring this back up? In fact: They came back up with the same name, but registered as a complete new node. If I would have had an option to reregister them as the same node as before, but which lost all of his data, I would have done this though.

@couchand
Copy link
Contributor

couchand commented Apr 12, 2018

There's no concept in CockroachDB of restoring a node that has lost all data. If the data is gone, you simply can't bring it back up. The node's identity is completely independent of the interface it's listening on: a new node on the old host and port will still be a new node, and an old node on a new host and port will still be the old node.

@wzrdtales
Copy link
Author

ok, then I will have a lot of decomissioned nodes in the current scenario. Any node with all data could be gone at any random point in time though. But always enough left to restore the new one.

@knz knz added O-community Originated from the community C-question A question rather than an issue. No code/spec/doc change needed. and removed O-community-questions labels Apr 24, 2018
@couchand couchand added the A-webui-general Issues on the DB Console that span multiple areas or don't have another clear category. label Apr 24, 2018
@knz
Copy link
Contributor

knz commented Dec 20, 2018

@knz knz added the S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. label Dec 20, 2018
@Timbo000002
Copy link

Use Case to support remove option to clear decommissioned nodes from UI/DB (not dead node):
We are trying to use cockroachdb in the cloud, hosts will be replaced/refreshed every 30 days using Jenkins.
Running a cluster of 4 in 3 regions (Oregon, NOVA, Ohio) is at least 12 nodes per month accumulating under Decommissioned in prod.
We don't care about those nodes, they are gone forever.
Dev/test where we build rotation processes, is getting messy with all the decomms in the UI.
Recommendation: Build option to clear with time frame of how far back, let customer decide between info and clutter. Send a query that can purge decomms from the DB and it will be added to our decomm code.

@couchand
Copy link
Contributor

couchand commented Jan 7, 2019

Unless something has changed that I'm unaware of, it's not accurate to say that decommissioned nodes are gone forever: there is still an overhead that will continue to accumulate as nodes are continually decommissioned. As long as this exists, it needs to be reportable on the front end.

Thus @Timbo000002, I don't believe your suggestion applies to the web UI, but rather to the decommissioning process of the cluster itself. This issue is limited to the suggestion to change the way that we display nodes which have been decommissioned but the cluster still remembers. Your suggestion is valid, so I've opened another issue to track it: #33542.

@tim-o
Copy link
Contributor

tim-o commented Jul 15, 2019

Zendesk ticket #3096 has been linked to this issue.

@robert-s-lee
Copy link
Contributor

”repave” is becoming a common technique for applying OS patch. machines are immutable - So the patching requires a roll out a new nodes. Not cleaning up decommissioned nodes will be problematic

@piyush-singh piyush-singh added this to Backlog in Cluster UI Oct 29, 2019
@Annebirzin
Copy link

Designs for updates to the decommissioned node list can be found here: https://zpl.io/boG0kEo

These illustrate the following user stories:

As an application developer I need to:

  • See when the decommissioning of a node is in progress in the Admin UI
  • Have the ability to clear decommissioned nodes from the overview page
  • Have the ability to view a history of all decommissioned nodes

A possible edge case user story:

  • get alerted when a node cannot be decommissioned because there are only 3 nodes (meaning there are no nodes available to transfer to)

Note: we are also looking to update the 'Node Status' numbers at the top of the overview page for more accuracy: Currently, when a node is decommissioning, the UI displays that node as 'Suspect' which isn't accurate.

Instead, we will update the 'Live Nodes' count. For example, if you have 9 live nodes and decommission 1, the live nodes count updates to 8 and the suspect nodes count stays at 0.

@koorosh koorosh moved this from Backlog to Milestone: In Progress in Cluster UI Nov 27, 2019
@koorosh koorosh self-assigned this Dec 5, 2019
@koorosh koorosh moved this from Milestone: In Progress to Milestone: In Review in Cluster UI Dec 23, 2019
koorosh added a commit to koorosh/cockroach that referenced this issue Dec 26, 2019
- url path for new page is following: '#/reports/nodes/history',
please let me know if it has to be changed to something more meaningful.
I decided to make it more generic in case we want to extend it with
other 'historical' content. Also, it is nested route under 'nodes' as it
belongs to this entity.
- this page is supposed to display the history of all decommissioned
nodes in one place;
- the logic of retrieving and displaying nodes are almost the same as
in Nodes Overview container with adjustments according to provided
designs;
- Redux selectors, and helper functions are refactored to external
utils module so it reduces code duplication;

Release note (admin ui change): Decommissioned node history page is added as a dedicated page to reduce amount of information on main Cluster Overview page (issue: cockroachdb#24636)
koorosh added a commit to koorosh/cockroach that referenced this issue Dec 26, 2019
Release note (admin ui change): Link on Decommissioned Node List from Cluster Overview page (cockroachdb#24636)
koorosh added a commit to koorosh/cockroach that referenced this issue Dec 26, 2019
Release note (admin ui change): Link on Decommissioned Node List from Debug page (cockroachdb#24636)
Cluster UI automation moved this from Milestone: In Review to Milestone: Done Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-webui-general Issues on the DB Console that span multiple areas or don't have another clear category. C-question A question rather than an issue. No code/spec/doc change needed. O-community Originated from the community S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
Projects
No open projects
Cluster UI
  
Milestone: Done
Web UI
  
Release: Features
Development

No branches or pull requests

10 participants