Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]- weed couldn't estimate the free space of destination disks for transferring ec shards even after adjusting "Max" option to 0 #2642

Closed
hamidreza-hosseini opened this issue Feb 7, 2022 · 9 comments

Comments

@hamidreza-hosseini
Copy link

hamidreza-hosseini commented Feb 7, 2022

Hi, @chrislusf
Based on our conversation about this issue, I reconfigure my cluster config and I adjust the "Max " option to "0"
and weed examined the best number for it based on the disk size,
for example, my disk has 7.3 TB of space and weed adjusted max parameter to 73
It's good but in some disks, with the same amount of space it sets the max number to 218 or even more for example
weed-volume-005.local:8082 has 7.3TB but weed adjust 218 caused an error for transferring shards.
This is because of the count of volumes that were in that specific disks
This will cause problems in encoding volumes to ec shards and even concurrency about some volumes of a collection (because volumes of that collection are not filled to weed create a new one and disk is full and weed couldn't accept more objects and files to save on volumes)

image

I test ec.encode again and I got the error about it:
weed-volume-005.local:8082 is the one that has 218 volume and cause the following error:

> ec.encode -collection 'Channel_20' -volumeId "811" -parallelCopy false 
markVolumeReadonly 811 on weed-volume-002.local:8085 ...
markVolumeReadonly 811 on weed-volume-001.local:8080 ...
generateEcShards Channel_20 811 on weed-volume-002.local:8085 ...
parallelCopyEcShardsFromSource 811 weed-volume-002.local:8085
allocate 811.[4] weed-volume-002.local:8085 => weed-volume-028.local:8082
copy 811.[4] weed-volume-002.local:8085 => weed-volume-028.local:8082
allocate 811.[11] weed-volume-002.local:8085 => weed-volume-004.local:8083
copy 811.[11] weed-volume-002.local:8085 => weed-volume-004.local:8083
allocate 811.[10] weed-volume-002.local:8085 => weed-volume-013.local:8082
copy 811.[10] weed-volume-002.local:8085 => weed-volume-013.local:8082
allocate 811.[9] weed-volume-002.local:8085 => weed-volume-001.local:8083
copy 811.[9] weed-volume-002.local:8085 => weed-volume-001.local:8083
allocate 811.[0] weed-volume-002.local:8085 => weed-volume-009.local:8081
copy 811.[0] weed-volume-002.local:8085 => weed-volume-009.local:8081
allocate 811.[3] weed-volume-002.local:8085 => weed-volume-006.local:8081
copy 811.[3] weed-volume-002.local:8085 => weed-volume-006.local:8081
allocate 811.[5] weed-volume-002.local:8085 => weed-volume-005.local:8082
copy 811.[5] weed-volume-002.local:8085 => weed-volume-005.local:8082
allocate 811.[13] weed-volume-002.local:8085 => weed-volume-009.local:8080
allocate 811.[8] weed-volume-002.local:8085 => weed-volume-014.local:8082
allocate 811.[1] weed-volume-002.local:8085 => weed-volume-016.local:8086
allocate 811.[2] weed-volume-002.local:8085 => weed-volume-003.local:8086
allocate 811.[6] weed-volume-002.local:8085 => weed-volume-006.local:8085
allocate 811.[7] weed-volume-002.local:8085 => weed-volume-007.local:8080
copy 811.[13] weed-volume-002.local:8085 => weed-volume-009.local:8080
allocate 811.[12] weed-volume-002.local:8085 => weed-volume-004.local:8080
copy 811.[12] weed-volume-002.local:8085 => weed-volume-004.local:8080
copy 811.[8] weed-volume-002.local:8085 => weed-volume-014.local:8082
copy 811.[1] weed-volume-002.local:8085 => weed-volume-016.local:8086
copy 811.[2] weed-volume-002.local:8085 => weed-volume-003.local:8086
copy 811.[6] weed-volume-002.local:8085 => weed-volume-006.local:8085
copy 811.[7] weed-volume-002.local:8085 => weed-volume-007.local:8080
mount 811.[4] on weed-volume-028.local:8082
I0207 17:36:36 11508 command_ec_common.go:95] weed-volume-002.local:8085 ec volume 811 deletes shards [4]
unmount 811.[5] from weed-volume-005.local:8082
delete 811.[5] from weed-volume-005.local:8082
remove aborted shards 811.[5] on weed-volume-005.local:8082: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[6] from weed-volume-006.local:8085
delete 811.[6] from weed-volume-006.local:8085
remove aborted shards 811.[6] on weed-volume-006.local:8085: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[7] from weed-volume-007.local:8080
delete 811.[7] from weed-volume-007.local:8080
remove aborted shards 811.[7] on weed-volume-007.local:8080: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[8] from weed-volume-014.local:8082
delete 811.[8] from weed-volume-014.local:8082
remove aborted shards 811.[8] on weed-volume-014.local:8082: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[9] from weed-volume-001.local:8083
delete 811.[9] from weed-volume-001.local:8083
remove aborted shards 811.[9] on weed-volume-001.local:8083: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[10] from weed-volume-013.local:8082
delete 811.[10] from weed-volume-013.local:8082
remove aborted shards 811.[10] on weed-volume-013.local:8082: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[11] from weed-volume-004.local:8083
delete 811.[11] from weed-volume-004.local:8083
remove aborted shards 811.[11] on weed-volume-004.local:8083: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[12] from weed-volume-004.local:8080
delete 811.[12] from weed-volume-004.local:8080
remove aborted shards 811.[12] on weed-volume-004.local:8080: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[13] from weed-volume-009.local:8080
delete 811.[13] from weed-volume-009.local:8080
remove aborted shards 811.[13] on weed-volume-009.local:8080: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[0] from weed-volume-009.local:8081
delete 811.[0] from weed-volume-009.local:8081
remove aborted shards 811.[0] on weed-volume-009.local:8081: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[1] from weed-volume-016.local:8086
delete 811.[1] from weed-volume-016.local:8086
remove aborted shards 811.[1] on weed-volume-016.local:8086: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[2] from weed-volume-003.local:8086
delete 811.[2] from weed-volume-003.local:8086
remove aborted shards 811.[2] on weed-volume-003.local:8086: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[3] from weed-volume-006.local:8081
delete 811.[3] from weed-volume-006.local:8081
remove aborted shards 811.[3] on weed-volume-006.local:8081: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
unmount 811.[4] from weed-volume-028.local:8082
delete 811.[4] from weed-volume-028.local:8082
error: spread ec shards for volume 811 from weed-volume-002.local:8085: copy 811.[5] weed-volume-002.local:8085 => weed-volume-005.local:8082 : rpc error: code = Unknown des
c = no space left

Why weed is trying to transfer shards to a volume with max number of 218 and is filled with 218 volumes?

Isn't better for weed to estimate the free space of destination based on the actual free space, not just the count of volumes?

@hamidreza-hosseini hamidreza-hosseini changed the title Feature for rebalancing a disk or a server instead of a whole cluster [Bug]- Feature for rebalancing a disk or a server instead of a whole cluster Feb 7, 2022
@hamidreza-hosseini hamidreza-hosseini changed the title [Bug]- Feature for rebalancing a disk or a server instead of a whole cluster [Bug]- weed wouldn't check destination free space for transferring ec shards even after adjusting max to 0 Feb 7, 2022
@hamidreza-hosseini hamidreza-hosseini changed the title [Bug]- weed wouldn't check destination free space for transferring ec shards even after adjusting max to 0 [Bug]- weed wouldn't check destination free space for transferring ec shards even after adjusting "Max" option to 0 Feb 7, 2022
@hamidreza-hosseini hamidreza-hosseini changed the title [Bug]- weed wouldn't check destination free space for transferring ec shards even after adjusting "Max" option to 0 [Bug]- weed couldn't estimate the free space of free space destination for transferring ec shards even after adjusting "Max" option to 0 Feb 7, 2022
@hamidreza-hosseini hamidreza-hosseini changed the title [Bug]- weed couldn't estimate the free space of free space destination for transferring ec shards even after adjusting "Max" option to 0 [Bug]- weed couldn't estimate the free space of destination disks for transferring ec shards even after adjusting "Max" option to 0 Feb 7, 2022
@chrislusf
Copy link
Collaborator

chrislusf commented Feb 8, 2022

The volumes are balanced according to the volume "slots", not based on the actual free space.
The free space information is already used when estimating with "max=0".

For your case, you can move some of the 218 volumes to other servers, to get to an expected normal state.
You can use volume.move in weed shell to move specific volumes to a specific volume server.

@hamidreza-hosseini
Copy link
Author

The volumes are balanced according to the volume "slots", not based on the actual free space. The free space information is already used when estimating with "max=0".

For your case, you can move some of the 218 volumes to other servers, to get to an expected normal state. You can use volume.move in weed shell to move specific volumes to a specific volume server.

@chrislusf
Ok, but why weed is trying to transfer shards into a filled disk?
218 is maxed volume size and 218 volume is in disks...

@chrislusf
Copy link
Collaborator

maybe share "volume.list" result (again) and I will try to reproduce it locally.

@hamidreza-hosseini
Copy link
Author

hamidreza-hosseini commented Feb 8, 2022

maybe share "volume.list" result (again) and I will try to reproduce it locally.

Here you are @chrislusf :
volume.list.md

@chrislusf
Copy link
Collaborator

the volume.list output is malformed.

@hamidreza-hosseini
Copy link
Author

hamidreza-hosseini commented Feb 8, 2022

the volume.list output is malformed.

New output @chrislusf :

org.volume.list.md

@hamidreza-hosseini
Copy link
Author

Is the new file format I've uploaded correct @chrislusf ?

the volume.list output is malformed.

@chrislusf
Copy link
Collaborator

Works now. Added a fix.

@hamidreza-hosseini
Copy link
Author

hamidreza-hosseini commented Feb 19, 2022

Hi @chrislusf ,
I've upgraded my weed cluster to the new version with your fix patch(version 2.89),
It works successfully but for spreading ec shards to the servers it wouldn't consider defaultReplication=010 (I've configured every disk as a server and a server as a rack so I adjust 010 for replication)
But now after making ec shards, weed will send 4 shards to one server, which means that if I lose one server due to an outage, I need to rebuild ec shards and I wouldn't have another chance for losing another server:
It should spread shards based on "defaultReplication=010".
weed shell log:

> ec.encode -collection=Channel_10 -volumeId=58 -parallelCopy=false
markVolumeReadonly 58 on weed-volume-011.local:8082 ...
markVolumeReadonly 58 on weed-volume-009.local:8080 ...
generateEcShards Channel_10 58 on weed-volume-011.local:8082 ...
parallelCopyEcShardsFromSource 58 weed-volume-011.local:8082
allocate 58.[12] weed-volume-011.local:8082 => weed-volume-028.local:8082
copy 58.[12] weed-volume-011.local:8082 => weed-volume-028.local:8082
mount 58.[12] on weed-volume-028.local:8082
I0219 17:32:49  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [12]
allocate 58.[13] weed-volume-011.local:8082 => weed-volume-028.local:8086
copy 58.[13] weed-volume-011.local:8082 => weed-volume-028.local:8086
mount 58.[13] on weed-volume-028.local:8086
I0219 17:34:37  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [13]
allocate 58.[0] weed-volume-011.local:8082 => weed-volume-029.local:8085
copy 58.[0] weed-volume-011.local:8082 => weed-volume-029.local:8085
mount 58.[0] on weed-volume-029.local:8085
I0219 17:36:25  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [0]
allocate 58.[1] weed-volume-011.local:8082 => weed-volume-028.local:8084
copy 58.[1] weed-volume-011.local:8082 => weed-volume-028.local:8084
mount 58.[1] on weed-volume-028.local:8084
I0219 17:38:13  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [1]
allocate 58.[2] weed-volume-011.local:8082 => weed-volume-027.local:8081
copy 58.[2] weed-volume-011.local:8082 => weed-volume-027.local:8081
mount 58.[2] on weed-volume-027.local:8081
I0219 17:40:01  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [2]
allocate 58.[3] weed-volume-011.local:8082 => weed-volume-029.local:8080
copy 58.[3] weed-volume-011.local:8082 => weed-volume-029.local:8080
mount 58.[3] on weed-volume-029.local:8080
I0219 17:41:50  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [3]
allocate 58.[4] weed-volume-011.local:8082 => weed-volume-027.local:8086
copy 58.[4] weed-volume-011.local:8082 => weed-volume-027.local:8086
mount 58.[4] on weed-volume-027.local:8086
I0219 17:43:37  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [4]
allocate 58.[5] weed-volume-011.local:8082 => weed-volume-029.local:8084
copy 58.[5] weed-volume-011.local:8082 => weed-volume-029.local:8084
mount 58.[5] on weed-volume-029.local:8084
I0219 17:45:25  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [5]
allocate 58.[6] weed-volume-011.local:8082 => weed-volume-027.local:8085
copy 58.[6] weed-volume-011.local:8082 => weed-volume-027.local:8085
mount 58.[6] on weed-volume-027.local:8085
I0219 17:47:13  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [6]
allocate 58.[7] weed-volume-011.local:8082 => weed-volume-025.local:8082
copy 58.[7] weed-volume-011.local:8082 => weed-volume-025.local:8082
mount 58.[7] on weed-volume-025.local:8082
I0219 17:49:01  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [7]
allocate 58.[8] weed-volume-011.local:8082 => weed-volume-029.local:8081
copy 58.[8] weed-volume-011.local:8082 => weed-volume-029.local:8081
mount 58.[8] on weed-volume-029.local:8081
I0219 17:50:49  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [8]
allocate 58.[9] weed-volume-011.local:8082 => weed-volume-025.local:8080
copy 58.[9] weed-volume-011.local:8082 => weed-volume-025.local:8080
mount 58.[9] on weed-volume-025.local:8080
I0219 17:52:36  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [9]
allocate 58.[10] weed-volume-011.local:8082 => weed-volume-028.local:8083
copy 58.[10] weed-volume-011.local:8082 => weed-volume-028.local:8083
mount 58.[10] on weed-volume-028.local:8083
I0219 17:54:23  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [10]
allocate 58.[11] weed-volume-011.local:8082 => weed-volume-027.local:8082
copy 58.[11] weed-volume-011.local:8082 => weed-volume-027.local:8082
mount 58.[11] on weed-volume-027.local:8082
I0219 17:56:12  6577 command_ec_common.go:95] weed-volume-011.local:8082 ec volume 58 deletes shards [11]
unmount 58.[12 13 0 1 2 3 4 5 6 7 8 9 10 11] from weed-volume-011.local:8082
delete 58.[12 13 0 1 2 3 4 5 6 7 8 9 10 11] from weed-volume-011.local:8082
delete volume 58 from weed-volume-011.local:8082
delete volume 58 from weed-volume-009.local:8080
> 

This is the second time that I tried to encode another volume
as you can see in the log weed just transferring shards to special nodes and the amount that it transfer is 4
and by losing one node I have not any chance for another:

> ec.encode -collection=UserCloud_10 -volumeId=94
markVolumeReadonly 94 on weed-volume-004.local:8081 ...
markVolumeReadonly 94 on weed-volume-007.local:8080 ...      
generateEcShards UserCloud_10 94 on weed-volume-004.local:8081 ...
parallelCopyEcShardsFromSource 94 weed-volume-004.local:8081
allocate 94.[12] weed-volume-004.local:8081 => weed-volume-028.local:8082
allocate 94.[0] weed-volume-004.local:8081 => weed-volume-028.local:8086
allocate 94.[13] weed-volume-004.local:8081 => weed-volume-029.local:8085
allocate 94.[11] weed-volume-004.local:8081 => weed-volume-025.local:8083
allocate 94.[5] weed-volume-004.local:8081 => weed-volume-029.local:8084
allocate 94.[8] weed-volume-004.local:8081 => weed-volume-025.local:8080
allocate 94.[1] weed-volume-004.local:8081 => weed-volume-028.local:8084
allocate 94.[6] weed-volume-004.local:8081 => weed-volume-027.local:8085
allocate 94.[2] weed-volume-004.local:8081 => weed-volume-027.local:8081
allocate 94.[7] weed-volume-004.local:8081 => weed-volume-025.local:8082
allocate 94.[3] weed-volume-004.local:8081 => weed-volume-027.local:8086
allocate 94.[4] weed-volume-004.local:8081 => weed-volume-029.local:8080
allocate 94.[10] weed-volume-004.local:8081 => weed-volume-027.local:8082
allocate 94.[9] weed-volume-004.local:8081 => weed-volume-029.local:8081
copy 94.[12] weed-volume-004.local:8081 => weed-volume-028.local:8082
copy 94.[0] weed-volume-004.local:8081 => weed-volume-028.local:8086
copy 94.[13] weed-volume-004.local:8081 => weed-volume-029.local:8085
copy 94.[5] weed-volume-004.local:8081 => weed-volume-029.local:8084
copy 94.[8] weed-volume-004.local:8081 => weed-volume-025.local:8080
copy 94.[1] weed-volume-004.local:8081 => weed-volume-028.local:8084
copy 94.[6] weed-volume-004.local:8081 => weed-volume-027.local:8085
copy 94.[2] weed-volume-004.local:8081 => weed-volume-027.local:8081
copy 94.[7] weed-volume-004.local:8081 => weed-volume-025.local:8082
copy 94.[3] weed-volume-004.local:8081 => weed-volume-027.local:8086
copy 94.[4] weed-volume-004.local:8081 => weed-volume-029.local:8080
copy 94.[10] weed-volume-004.local:8081 => weed-volume-027.local:8082
copy 94.[9] weed-volume-004.local:8081 => weed-volume-029.local:8081
copy 94.[11] weed-volume-004.local:8081 => weed-volume-025.local:8083
mount 94.[2] on weed-volume-027.local:8081
I0219 20:21:18 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [2]
mount 94.[5] on weed-volume-029.local:8084
I0219 20:21:37 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [5]
mount 94.[12] on weed-volume-028.local:8082
I0219 20:21:48 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [12]
mount 94.[6] on weed-volume-027.local:8085
I0219 20:21:49 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [6]
mount 94.[3] on weed-volume-027.local:8086
I0219 20:21:55 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [3]
mount 94.[4] on weed-volume-029.local:8080
I0219 20:21:56 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [4]
mount 94.[1] on weed-volume-028.local:8084
I0219 20:21:57 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [1]
mount 94.[13] on weed-volume-029.local:8085
I0219 20:21:59 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [13]
mount 94.[0] on weed-volume-028.local:8086
I0219 20:22:28 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [0]
mount 94.[10] on weed-volume-027.local:8082
I0219 20:22:45 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [10]
mount 94.[8] on weed-volume-025.local:8080
I0219 20:23:04 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [8]
mount 94.[11] on weed-volume-025.local:8083
I0219 20:23:08 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [11]
mount 94.[9] on weed-volume-029.local:8081
I0219 20:24:43 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [9]
mount 94.[7] on weed-volume-025.local:8082
I0219 20:25:04 26756 command_ec_common.go:95] weed-volume-004.local:8081 ec volume 94 deletes shards [7]
unmount 94.[2 5 12 6 3 4 1 13 0 10 8 11 9 7] from weed-volume-004.local:8081
delete 94.[2 5 12 6 3 4 1 13 0 10 8 11 9 7] from weed-volume-004.local:8081
delete volume 94 from weed-volume-004.local:8081
delete volume 94 from weed-volume-007.local:8080
>

Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants