Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volume.fix.replication "failed to place volume" #1253

Closed
antitbone opened this issue Apr 1, 2020 · 6 comments
Closed

volume.fix.replication "failed to place volume" #1253

antitbone opened this issue Apr 1, 2020 · 6 comments

Comments

@antitbone
Copy link

Hello

seaweedfs is a beautiful project that I discovered recently.
So I decided to perform resiliency tests.

Version 1.70
The test topology is as follows:
1 dc, 4 rack, 4 volume server per rack
default replica is 011

After some tests

  • copy of several TB via the weed filer.copy command (copying continues)
  • rm -rf /working_dir on a volume node
  • poweroff the node
  • version.balance -force
  • poweron node
  • version.balance -force
  • ...
  • add new nodes
  • version.balance -force

volume.fix.replication can no longer replicate all volumes:

failed to place volume 208 replica as 011, existing:[vau r2 test-seaweed-r2-h3:8080 vau r3 test-seaweed-r3-h2:8080]

If i change the replication factor, volune.fix.replication work one time

> volume.configure.replication -replication 012 -volumeId 208
> volume.fix.replication -force
replicating volume 208 012 from test-seaweed-r2-h3:8080 to dataNode test-seaweed-r3-h4:8080 ...

And fail

failed to place volume 174 replica as 012, existing:[vau r3 test-seaweed-r3-h1:8080 vau r1 test-seaweed-r1-h1:8080 vau r4 test-seaweed-r4-h3:8080]

Another time

> volume.configure.replication -replication 022 -volumeId 208
> volume.fix.replication -force
replicating volume 208 022 from test-seaweed-r3-h2:8080 to dataNode test-seaweed-r4-h3:8080 ...

And fail

> volume.fix.replication -force
failed to place volume 208 replica as 022, existing:[vau r2 test-seaweed-r2-h3:8080 vau r4 test-seaweed-r4-h3:8080 vau r3 test-seaweed-r3-h2:8080 v

by returning the replication factor to 011 the error is no longer visible

> volume.configure.replication -replication 011 -volumeId 208

In the master log:

[root@test-seaweed-r1-m ~]# journalctl -u seaweed-master   |egrep -e '(Volume 208|Id:208|208 )'
avril 01 09:36:27 test-seaweed-r1-m weed[40447]: I0401 09:36:27 40447 volume_growth.go:224] Created Volume 208 on topo:vau:r3:test-seaweed-r3-h3:8080
avril 01 09:36:27 test-seaweed-r1-m weed[40447]: I0401 09:36:27 40447 volume_growth.go:224] Created Volume 208 on topo:vau:r3:test-seaweed-r3-h2:8080
avril 01 09:36:27 test-seaweed-r1-m weed[40447]: I0401 09:36:27 40447 volume_layout.go:241] Volume 208 becomes writable
avril 01 09:36:27 test-seaweed-r1-m weed[40447]: I0401 09:36:27 40447 volume_growth.go:224] Created Volume 208 on topo:vau:r1:test-seaweed-r1-h2:8080
avril 01 09:37:40 test-seaweed-r1-m weed[40447]: I0401 09:37:40 40447 volume_layout.go:229] Volume 208 becomes unwritable
avril 01 12:23:17 test-seaweed-r1-m weed[40447]: copying volume 208 from test-seaweed-r1-h2:8080 to test-seaweed-r4-h4:8080
avril 01 12:23:17 test-seaweed-r1-m weed[40447]: moving volume ldi-prod_208 test-seaweed-r1-h2:8080 => test-seaweed-r4-h4:8080
avril 01 12:23:28 test-seaweed-r1-m weed[40447]: tailing volume 208 from test-seaweed-r1-h2:8080 to test-seaweed-r4-h4:8080
avril 01 12:23:40 test-seaweed-r1-m weed[40447]: deleting volume 208 from test-seaweed-r1-h2:8080
avril 01 12:23:41 test-seaweed-r1-m weed[40447]: moved volume 208 from test-seaweed-r1-h2:8080 to test-seaweed-r4-h4:8080
avril 01 12:23:41 test-seaweed-r1-m weed[40447]: I0401 12:23:41 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:011, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 12:42:10 test-seaweed-r1-m weed[40447]: I0401 12:42:10 40447 topology_vacuum.go:66] 0 Start vacuuming 208 on test-seaweed-r4-h4:8080
avril 01 12:42:10 test-seaweed-r1-m weed[40447]: I0401 12:42:10 40447 topology_vacuum.go:78] Complete vacuuming 208 on test-seaweed-r4-h4:8080
avril 01 12:42:10 test-seaweed-r1-m weed[40447]: I0401 12:42:10 40447 topology_vacuum.go:98] Start Committing vacuum 208 on test-seaweed-r4-h4:8080
avril 01 12:42:10 test-seaweed-r1-m weed[40447]: I0401 12:42:10 40447 topology_vacuum.go:112] Complete Committing vacuum 208 on test-seaweed-r4-h4:8080
avril 01 13:05:33 test-seaweed-r1-m weed[40447]: copying volume 208 from test-seaweed-r4-h4:8080 to test-seaweed-r3-h3:8080
avril 01 13:05:33 test-seaweed-r1-m weed[40447]: moving volume ldi-prod_208 test-seaweed-r4-h4:8080 => test-seaweed-r3-h3:8080
avril 01 13:05:33 test-seaweed-r1-m weed[40447]: I0401 13:05:33 40447 master_server.go:242] error: copy volume 208 from test-seaweed-r4-h4:8080 to test-seaweed-r3-h3:8080: rpc error: code = Unknown desc = volume 208 already exists
avril 01 13:50:59 test-seaweed-r1-m weed[40447]: I0401 13:50:59 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:011, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 13:56:05 test-seaweed-r1-m weed[40447]: I0401 13:56:05 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:011, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 14:05:53 test-seaweed-r1-m weed[40447]: I0401 14:05:53 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:011, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 14:12:13 test-seaweed-r1-m weed[40447]: I0401 14:12:13 40447 topology_vacuum.go:66] 0 Start vacuuming 208 on test-seaweed-r3-h2:8080
avril 01 14:12:13 test-seaweed-r1-m weed[40447]: I0401 14:12:13 40447 topology_vacuum.go:78] Complete vacuuming 208 on test-seaweed-r3-h2:8080
avril 01 14:12:13 test-seaweed-r1-m weed[40447]: I0401 14:12:13 40447 topology_vacuum.go:98] Start Committing vacuum 208 on test-seaweed-r3-h2:8080
avril 01 14:12:13 test-seaweed-r1-m weed[40447]: I0401 14:12:13 40447 topology_vacuum.go:112] Complete Committing vacuum 208 on test-seaweed-r3-h2:8080
avril 01 15:04:38 test-seaweed-r1-m weed[40447]: I0401 15:04:38 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:011, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 15:04:38 test-seaweed-r1-m weed[40447]: I0401 15:04:38 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:011, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 15:07:05 test-seaweed-r1-m weed[40447]: I0401 15:07:05 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:012, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 15:07:05 test-seaweed-r1-m weed[40447]: I0401 15:07:05 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:011, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 15:07:05 test-seaweed-r1-m weed[40447]: I0401 15:07:05 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:012, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 15:11:30 test-seaweed-r1-m weed[40447]: I0401 15:11:30 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:022, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 15:11:30 test-seaweed-r1-m weed[40447]: I0401 15:11:30 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:022, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false
avril 01 15:11:30 test-seaweed-r1-m weed[40447]: I0401 15:11:30 40447 volume_layout.go:241] Volume 208 becomes writable
avril 01 15:11:30 test-seaweed-r1-m weed[40447]: I0401 15:11:30 40447 topology.go:182] removing volume info:Id:208, Size:0, ReplicaPlacement:022, Collection:ldi-prod, Version:3, FileCount:0, DeleteCount:0, DeletedByteCount:0, ReadOnly:false

Antoine

@jameshartig
Copy link
Contributor

We ran into the same issue but with -replication 210. The volume was originally replicated 200 and we were trying to increase it to 210 but it was failing with the same error.

@chrislusf
Copy link
Collaborator

I add test cases for complicated moving. Hopefully this should resolve this issue. Let me know if you can confirm or not.

If not working, please also post results of volume.list, so that I can create a test case for it.

@chrislusf chrislusf reopened this Apr 2, 2020
@antitbone
Copy link
Author

I confirm that this issue seems to be resolved

before.txt
after.txt

@antitbone
Copy link
Author

antitbone commented Apr 2, 2020

After volume.fix.replication volume.balance displays errors

volume.list is in after.txt of the previous post

> volume.balance -force

moving volume ldi-prod_189 test-seaweed-r2-h1:8080 => test-seaweed-r3-h1:8080
2020-04-02 15:07:46.678703 I | copying volume 189 from test-seaweed-r2-h1:8080 to test-seaweed-r3-h1:8080
error: copy volume 189 from test-seaweed-r2-h1:8080 to test-seaweed-r3-h1:8080: rpc error: code = Unknown desc = stat idx file .idx failed, stat .idx: no such file or directory

moving volume ldi-prod_193 test-seaweed-r2-h1:8080 => test-seaweed-r2-h3:8080
2020-04-02 15:08:57.801715 I | copying volume 193 from test-seaweed-r2-h1:8080 to test-seaweed-r2-h3:8080
error: copy volume 193 from test-seaweed-r2-h1:8080 to test-seaweed-r2-h3:8080: rpc error: code = Unknown desc = volume 193 already exists

moving volume ldi-prod_258 test-seaweed-r1-h4:8080 => test-seaweed-r2-h3:8080
2020-04-02 15:09:39.143338 I | copying volume 258 from test-seaweed-r1-h4:8080 to test-seaweed-r2-h3:8080
error: copy volume 258 from test-seaweed-r1-h4:8080 to test-seaweed-r2-h3:8080: rpc error: code = Unknown desc = volume 258 already exists

moving volume ldi-prod_142 test-seaweed-r3-h4:8080 => test-seaweed-r4-h1:8080
2020-04-02 15:11:22.999930 I | copying volume 142 from test-seaweed-r3-h4:8080 to test-seaweed-r4-h1:8080
error: copy volume 142 from test-seaweed-r3-h4:8080 to test-seaweed-r4-h1:8080: rpc error: code = Unknown desc = volume 142 already exists

moving volume ldi-prod_138 test-seaweed-r1-h3:8080 => test-seaweed-r4-h1:8080
2020-04-02 15:13:06.427066 I | copying volume 138 from test-seaweed-r1-h3:8080 to test-seaweed-r4-h1:8080
error: copy volume 138 from test-seaweed-r1-h3:8080 to test-seaweed-r4-h1:8080: rpc error: code = Unknown desc = volume 138 already exists

@antitbone antitbone reopened this Apr 2, 2020
@chrislusf
Copy link
Collaborator

chrislusf commented Apr 2, 2020

Added fix for the copying process. Your env may need some clean up first.

@chrislusf chrislusf reopened this Apr 2, 2020
@antitbone
Copy link
Author

cea52a4 corrects previous errors.

My test env probably needs to be completely reset.
But before it can still generate some new errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants