Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI cockroach node status does not show correct number of ranges for each node #99702

Open
daniel-crlabs opened this issue Mar 27, 2023 · 4 comments
Labels
A-kv-observability C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-observability

Comments

@daniel-crlabs
Copy link
Contributor

daniel-crlabs commented Mar 27, 2023

Describe the problem

The CLI, when we run the command cockroach node status --all shows the total number of ranges for the entire cluster, and not the range count for a given node, as one might expect.

To Reproduce

The CLI output shows the total ranges, this matches with the UI when you select the entire cluster. However, when you select a specific host in the UI, the UI displays the number of ranges for that host only, whereas the CLI does not do this and shows the range count for all nodes.

  1. UI shows the number of all ranges when selecting CLUSTER from dropdown, i.e 51 (this is correct, nothing wrong here)

1

  1. When we select a specific host, the UI is updated and only shows the number of ranges for that given host, i.e 16 (this is correct, nothing wrong here)

2

  1. Unexpected behavior: This is where the behavior at hand seems to be confusing. The CLI, when we run the command cockroach node status --all shows the total number of ranges for the entire cluster (ranges = 51), and not the range count for a given node (ranges = 16), as one might expect. Each node reports (resembling screenshot # 1 above), shouldn't this be the number of ranges in that particular node (resembling screenshot # 2 above)?
[root@cockroachdb-0 cockroach]# cockroach node status --all --certs-dir /cockroach/cockroach-certs --format records
-[ RECORD 1 ]
id                     | 1
address                | cockroachdb-0.cockroachdb.cockroach-sts-secure.svc.cluster.local:26257
sql_address            | cockroachdb-0.cockroachdb.cockroach-sts-secure.svc.cluster.local:26257
build                  | v22.2.5
started_at             | 2023-03-21 20:01:29.303301 +0000 UTC
updated_at             | 2023-03-22 14:25:30.662009 +0000 UTC
locality               | region=us-east,zone=us-east-1
is_available           | true
is_live                | true
replicas_leaders       | 16
replicas_leaseholders  | 16
ranges                 | 51
ranges_unavailable     | 0
ranges_underreplicated | 0
live_bytes             | 135692100
key_bytes              | 564240
value_bytes            | 136558228
range_key_bytes        | 0
range_value_bytes      | 0
intent_bytes           | 0
system_bytes           | 29023
gossiped_replicas      | 51
is_decommissioning     | false
membership             | active
is_draining            | false
-[ RECORD 2 ]
id                     | 2
address                | cockroachdb-1.cockroachdb.cockroach-sts-secure.svc.cluster.local:26257
sql_address            | cockroachdb-1.cockroachdb.cockroach-sts-secure.svc.cluster.local:26257
build                  | v22.2.5
started_at             | 2023-03-22 13:02:01.309179 +0000 UTC
updated_at             | 2023-03-22 14:25:33.303563 +0000 UTC
locality               | region=us-east,zone=us-east-1
is_available           | true
is_live                | true
replicas_leaders       | 20
replicas_leaseholders  | 20
ranges                 | 51
ranges_unavailable     | 0
ranges_underreplicated | 0
live_bytes             | 135634805
key_bytes              | 564216
value_bytes            | 136500889
range_key_bytes        | 0
range_value_bytes      | 0
intent_bytes           | 0
system_bytes           | 29023
gossiped_replicas      | 51
is_decommissioning     | false
membership             | active
is_draining            | false
-[ RECORD 3 ]
id                     | 3
address                | cockroachdb-2.cockroachdb.cockroach-sts-secure.svc.cluster.local:26257
sql_address            | cockroachdb-2.cockroachdb.cockroach-sts-secure.svc.cluster.local:26257
build                  | v22.2.5
started_at             | 2023-03-22 13:02:02.69602 +0000 UTC
updated_at             | 2023-03-22 14:25:31.744918 +0000 UTC
locality               | region=us-east,zone=us-east-1
is_available           | true
is_live                | true
replicas_leaders       | 15
replicas_leaseholders  | 15
ranges                 | 51
ranges_unavailable     | 0
ranges_underreplicated | 0
live_bytes             | 135577510
key_bytes              | 564144
value_bytes            | 136443462
range_key_bytes        | 0
range_value_bytes      | 0
intent_bytes           | 0
system_bytes           | 29023
gossiped_replicas      | 51
is_decommissioning     | false
membership             | active
is_draining            | false

Expected behavior
CLI output of cockroach node status --all should display the correct number of ranges for each given node.

Jira issue: CRDB-26029

gz#16399

@daniel-crlabs daniel-crlabs added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-observability labels Mar 27, 2023
@aliher1911
Copy link
Contributor

aliher1911 commented Mar 27, 2023

I looked onto what is actually shown on charts and in CLI.

  • ranges in CLI is number of replicas on particular node (this corresponds to Replicas per Node chart on same replication dashboard)
  • ranges in WebUI is number of leaseholders or (in case of no leaseholder on range, replicas that has this node as first replica in its list) so it is number of ranges that this node serves.

In this particular case raised, we have 3 nodes and each holds replicas for all ranges in the system. While leaseholders is 1/3 of ranges. If we run experiment with 6 nodes, then we will have a subset of ranges reported by CLI in ranges and it will differ from node to node. Same for ranges in UI where it would be 1/6 of ranges.

Not sure what would be a solution here beside docs as changing naming could generate confusion for existing customers who are used to current naming.

@daniel-crlabs
Copy link
Contributor Author

Thank you for looking into this.

ranges in WebUI is number of leaseholders or (in case of no leaseholder on range, replicas that has this node as first replica in its list) so it is number of ranges that this node serves.

This is definitely confusing, especially since the UI has a specific graph for each of these (ranges, replicas and leaseholders per node) as you can see below:

Screenshot 2023-03-27 at 3 34 05 PM

Screenshot 2023-03-27 at 3 34 14 PM

Screenshot 2023-03-27 at 3 34 24 PM

The point of this issue however, is more specifically as it relates to the CLI:

ranges in CLI is number of replicas on particular node (this corresponds to Replicas per Node chart on same replication dashboard)

This is exactly the point of this bug report, this is not what the CLI is showing for ranges. The CLI is NOT showing the number of replicas on particular node, it is showing the number of replicas for all nodes combined. In the example below, 52 is the total number of ranges for the cluster, so if this was correct, it should show 16 (number of ranges on a particular node).

Are you saying the CLI ranges = WebUI replicas per node ? If so, it seems the CLI needs to be fixed, so instead of saying ranges, it should say replicas per node.

[root@cockroachdb-0 cockroach]# cockroach node status --all --certs-dir /cockroach/cockroach-certs --format records | egrep "id|replicas_leaders|replicas_leaseholders|ranges"
id                     | 1
replicas_leaders       | 16
replicas_leaseholders  | 16
ranges                 | 52
ranges_unavailable     | 0
ranges_underreplicated | 0


id                     | 2
replicas_leaders       | 16
replicas_leaseholders  | 16
ranges                 | 52
ranges_unavailable     | 0
ranges_underreplicated | 0


id                     | 3
replicas_leaders       | 20
replicas_leaseholders  | 20
ranges                 | 52
ranges_unavailable     | 0
ranges_underreplicated | 0

@aliher1911
Copy link
Contributor

Are you saying the CLI ranges = WebUI replicas per node ? If so, it seems the CLI needs to be fixed, so instead of saying ranges, it should say replicas per node.

I think that would be reasonable. Maybe just replicas would do as we have replica_leaseholders which is a subset of our counter in question.

@daniel-crlabs
Copy link
Contributor Author

That sounds good, just trying to make it more consistent, whatever we decide to call it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-observability
Projects
No open projects
Development

No branches or pull requests

3 participants