Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_cat/recovery stage done but not 100% #7803

Closed
lefce opened this issue Sep 19, 2014 · 3 comments
Closed

_cat/recovery stage done but not 100% #7803

lefce opened this issue Sep 19, 2014 · 3 comments
Assignees
Labels

Comments

@lefce
Copy link

lefce commented Sep 19, 2014

I have two clusters each three nodes running elasticsearch version: 1.3.2, build: dee175d.

Network issues disconnected one node each cluster - they periodically rejoined and disconnected before completing shard recovery - finally they rejoined again and completed recovery successfully.

cluster state is green and all shards are in state started on both clusters.

But if I call _cat/recovery api I get a lot of shard-entries with stage done but below 100 percent.

Maybe there is a small bug in the file_percent and bytes_percent calculation?

Example of one index - in about 70% of the entries files_percent and bytes_percent are below 100%

index             shard time   type       stage source_host          target_host          repository snapshot files files_percent bytes      bytes_percent
myindex.1       0     341    replica    done  esnode2 esnode1 n/a        n/a      13    38.5%         3366643    0.0%          
myindex.1       0     573    replica    done  esnode1 esnode2 n/a        n/a      13    7.7%          3366643    0.0%          
myindex.1       0     4992   replica    done  esnode1 esnode3 n/a        n/a      13    7.7%          3366643    0.0%          
myindex.1       1     433    replica    done  esnode2 esnode1 n/a        n/a      16    56.3%         626683     13.3%         
myindex.1       1     641    replica    done  esnode1 esnode2 n/a        n/a      16    6.3%          628222     0.1%          
myindex.1       1     2332   replica    done  esnode1 esnode3 n/a        n/a      16    12.5%         628222     0.1%          
myindex.1       2     736    replica    done  esnode2 esnode1 n/a        n/a      38    68.4%         3119317    50.4%         
myindex.1       2     666    replica    done  esnode1 esnode2 n/a        n/a      41    12.2%         3131721    0.1%          
myindex.1       2     1498   replica    done  esnode1 esnode3 n/a        n/a      41    12.2%         3131721    0.1%          
myindex.1       3     4611   replica    done  esnode2 esnode1 n/a        n/a      23    91.3%         9763636    99.7%         
myindex.1       3     724    replica    done  esnode1 esnode2 n/a        n/a      23    4.3%          9763636    0.0%          
myindex.1       3     30222  replica    done  esnode1 esnode3 n/a        n/a      23    4.3%          9763636    0.0%          
@lefce lefce changed the title _cat/recovery state done but not 100% _cat/recovery stage done but not 100% Sep 19, 2014
@wkoot
Copy link

wkoot commented Apr 2, 2015

Experiencing a similar mismatch in v1.4.5, it seems like the _cat/recovery is displaying old info instead of latest:

[root@server6 es_backup]# curl localhost:9200/_cat/recovery?v
index               shard time    type    stage source_host             target_host             repository snapshot files files_percent bytes       bytes_percent
logstash-2015.04.02 0     483354  replica done  server9 server8 n/a        n/a      264   100.0%        12464831362 100.0%
logstash-2015.04.02 0     234888  replica done  server6 server9 n/a        n/a      344   96.2%         12132788120 56.1%
logstash-2015.03.30 0     11508   replica done  server7 server9 n/a        n/a      297   0.0%          25026154544 0.0%
logstash-2015.03.30 0     1899    replica done  server9 server7 n/a        n/a      297   0.3%          25026154544 0.0%
logstash-2015.03.31 0     2536    replica done  server6 server8 n/a        n/a      261   0.0%          24031903745 0.0%
logstash-2015.03.31 0     1065129 replica done  server8 server6 n/a        n/a      261   100.0%        24031903745 100.0%
logstash-2015.04.01 0     757     replica done  server6 server8 n/a        n/a      357   0.0%          24812247622 0.0%
logstash-2015.04.01 0     1461104 replica done  server8 server6 n/a        n/a      357   100.0%        24812247622 100.0%
kibana-int          0     249     replica done  server7 server9 n/a        n/a      10    10.0%         24745       0.9%
kibana-int          0     558     replica done  server9 server7 n/a        n/a      10    0.0%          24745       0.0%
logstash-2015.03.28 0     2247    gateway done  server6 server6 n/a        n/a      390   100.0%        24113948752 100.0%
logstash-2015.03.28 0     767     replica done  server6 server7 n/a        n/a      389   0.3%          24113948752 0.0%
logstash-2015.03.27 0     12602   replica done  server7 server9 n/a        n/a      320   0.0%          23894985390 0.0%
logstash-2015.03.27 0     3307    replica done  server9 server7 n/a        n/a      320   0.0%          23894985390 0.0%
logstash-2015.03.29 0     2839    replica done  server9 server8 n/a        n/a      377   0.0%          22019045112 0.0%
logstash-2015.03.29 0     8148    replica done  server8 server9 n/a        n/a      377   0.0%          22019045112 0.0%

Even though all primary and replica shards are started ok:

[root@server6 es_backup]# curl localhost:9200/_cat/shards?v
index               shard prirep state       docs  store host    node
logstash-2015.04.02 0     r      STARTED 44426778 13.2gb server8 Eric Williams
logstash-2015.04.02 0     p      STARTED 44424546 13.5gb server9 Kingo Sunen
logstash-2015.03.30 0     r      STARTED 76295131 23.3gb server9 Kingo Sunen
logstash-2015.03.30 0     p      STARTED 76295131 23.3gb server7 Black Panther
logstash-2015.03.31 0     p      STARTED 75755714 22.3gb server8 Eric Williams
logstash-2015.03.31 0     r      STARTED 75755714 22.3gb server6 Kehl of Tauran
logstash-2015.04.01 0     p      STARTED 77378857 23.1gb server8 Eric Williams
logstash-2015.04.01 0     r      STARTED 77378857 23.1gb server6 Kehl of Tauran
kibana-int          0     r      STARTED        3 24.2kb server9 Kingo Sunen
kibana-int          0     p      STARTED        3 24.2kb server7 Black Panther
logstash-2015.03.28 0     p      STARTED 74580676 22.4gb server6 Kehl of Tauran
logstash-2015.03.28 0     r      STARTED 74580676 22.4gb server7 Black Panther
logstash-2015.03.27 0     r      STARTED 74898627 22.2gb server9 Kingo Sunen
logstash-2015.03.27 0     p      STARTED 74898627 22.2gb server7 Black Panther
logstash-2015.03.29 0     p      STARTED 68759201 20.5gb server8 Eric Williams
logstash-2015.03.29 0     r      STARTED 68759201 20.5gb server9 Kingo Sunen

@clintongormley
Copy link

@bleskes any ideas about this?

@bleskes
Copy link
Contributor

bleskes commented Apr 7, 2015

with 1.3.2 the file % shows of recovered bytes of the total shard size. That means it will bellow 100% if files are re-used. It's confusing and we changed in #9811 to make 100% to mean all bytes that needs to be recovered (instead of total bytes).

I think we can close this, as the class is now refactored and cleaned up. If you still feel there are issues with it (or I missed something), please feel free to reopen.

@bleskes bleskes closed this as completed Apr 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants