Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume file becomes readonly and failing to write occur after master leader completes committing vacuum. #1233

Closed
binwu5 opened this issue Mar 17, 2020 · 2 comments

Comments

@binwu5
Copy link

binwu5 commented Mar 17, 2020

Describe the bug
-volumeSizeLimitMB=50000, it was changed from 30000 to this value three months ago. Before vacuuming, the volume file datasize is about 36G and the volume file has about 2G trash, and file write is normal.

After master complete committing vacuum:

  1. Why does volume file become readonly
  2. Why does filer still write file to the readonly volume

This occur in both linux_amd64_large_disk-v1.61 and linux_amd64_large_disk-v1.63.
For the running cluster, I try to restart master leader node, then vacuuming is stopped, and filer does not write file to the readonly volume, but the volume file still keep readonly.

In another cluster running the same version, -volumeSizeLimitMB=1000, the volume file datasize is almost 1000M and the volume file has 300M trash. The read and write rate are low in this cluster. After vacuum, everything is ok.

System Setup
nohup ./weed master -metrics.address=metricsIp:9091 -volumeSizeLimitMB=50000 -mdir=./masterData -ip=masterIp1 -port=8180 -peers=masterIp1:8180,masterIp2:8180,masterIp3:8180 -defaultReplication=002 -cpuprofile=./logs/master-cpuprofile.log -memprofile=./logs/master-memprofile.log >> ./logs/master.log &

nohup ./weed volume -dir=./volumeData -ip=ip1 -port=8180 -max=18 -mserver=masterIp1:8180,masterIp2:8180,masterIp3:8180 -dataCenter=dataCenter1 -rack=dataCenter1-rack1 -cpuprofile=./logs/volume-cpuprofile.log -memprofile=./logs/volume-memprofile.log >> ./logs/volume.log &

nohup ./weed -v=4 filer -collection=filer -ip=filerIp -port=8180 -master=masterIp1:8180,masterIp2:8180,masterIp3:8180 -dataCenter=dataCenter1 -defaultReplicaPlacement=002 -maxMB=32 >> ./logs/filer.log &

3 master, 4 volume, 2 filer. One machine runs one node.

Logs about vacuum

  • Master leader log:
    I0316 23:13:54 30854 volume_layout.go:229] Volume 13 becomes unwritable
    I0316 23:13:54 30854 topology_vacuum.go:66] 2 Start vacuuming 13 on ip2:8180
    I0316 23:13:54 30854 topology_vacuum.go:66] 0 Start vacuuming 13 on ip3:8180
    I0316 23:13:54 30854 topology_vacuum.go:66] 1 Start vacuuming 13 on ip1:8180
    I0316 23:16:36 30854 topology_vacuum.go:78] Complete vacuuming 13 on ip1:8180
    I0316 23:17:23 30854 topology_vacuum.go:78] Complete vacuuming 13 on ip3:8180
    I0316 23:18:04 30854 topology_vacuum.go:78] Complete vacuuming 13 on ip2:8180
    I0316 23:18:04 30854 topology_vacuum.go:97] Start Committing vacuum 13 on ip3:8180
    I0316 23:18:18 30854 topology_vacuum.go:108] Complete Committing vacuum 13 on ip3:8180
    I0316 23:18:18 30854 topology_vacuum.go:97] Start Committing vacuum 13 on ip1:8180
    I0316 23:18:33 30854 topology_vacuum.go:108] Complete Committing vacuum 13 on ip1:8180
    I0316 23:18:33 30854 topology_vacuum.go:97] Start Committing vacuum 13 on ip2:8180
    I0316 23:18:55 30854 topology_vacuum.go:108] Complete Committing vacuum 13 on ip2:8180
    I0316 23:18:55 30854 volume_layout.go:241] Volume 13 becomes writable
  • ip1 volume log
    I0316 23:18:18 14822 volume_vacuum.go:83] Committing volume 13 vacuuming...
    I0316 23:18:25 14822 volume_loading.go:94] volumeDataIntegrityChecking failed verifyNeedleIntegrity /opt/app/seaweedfs/volumeData/filer_13.idx failed: EOF
    I0316 23:18:25 14822 needle_map_sorted_file.go:24] Start to Generate /opt/app/seaweedfs/volumeData/filer_13.sdx from /opt/app/seaweedfs/volumeData/filer_13.idx
    I0316 23:18:31 14822 needle_map_sorted_file.go:26] Finished Generating /opt/app/seaweedfs/volumeData/filer_13.sdx from /opt/app/seaweedfs/volumeData/filer_13.idx
  • ip2 volume log
    I0316 23:18:33 13143 volume_vacuum.go:83] Committing volume 13 vacuuming...
    I0316 23:18:47 13143 volume_loading.go:94] volumeDataIntegrityChecking failed verifyNeedleIntegrity /opt/app/seaweedfs/volumeData/filer_13.idx failed: EOF
    I0316 23:18:47 13143 needle_map_sorted_file.go:24] Start to Generate /opt/app/seaweedfs/volumeData/filer_13.sdx from /opt/app/seaweedfs/volumeData/filer_13.idx
    I0316 23:18:53 13143 needle_map_sorted_file.go:26] Finished Generating /opt/app/seaweedfs/volumeData/filer_13.sdx from /opt/app/seaweedfs/volumeData/filer_13.idx
  • ip3 volume log
    I0316 23:18:04 36486 volume_vacuum.go:83] Committing volume 13 vacuuming...
    I0316 23:18:11 36486 volume_loading.go:94] volumeDataIntegrityChecking failed verifyNeedleIntegrity /opt/app/seaweedfs/volumeData/filer_13.idx failed: EOF
    I0316 23:18:11 36486 needle_map_sorted_file.go:24] Start to Generate /opt/app/seaweedfs/volumeData/filer_13.sdx from /opt/app/seaweedfs/volumeData/filer_13.idx
    I0316 23:18:17 36486 needle_map_sorted_file.go:26] Finished Generating /opt/app/seaweedfs/volumeData/filer_13.sdx from /opt/app/seaweedfs/volumeData/filer_13.idx

Logs about falling to write

  • ip2 volume log
    I0316 23:19:31 13143 store_replicate.go:39] failed to write to local disk: volume 13 is read only
    I0316 23:19:39 13143 store_replicate.go:39] failed to write to local disk: volume 13 is read only
    I0316 23:19:39 13143 store_replicate.go:39] failed to write to local disk: volume 13 is read only
    I0316 23:19:57 13143 store_replicate.go:39] failed to write to local disk: volume 13 is read only
    I0316 23:20:00 13143 store_replicate.go:39] failed to write to local disk: volume 13 is read only
    I0316 23:20:17 13143 store_replicate.go:39] failed to write to local disk: volume 13 is read only
    I0316 23:21:29 13143 store_replicate.go:39] failed to write to local disk: volume 13 is read only
    I0316 23:21:29 13143 store_replicate.go:39] failed to write to local disk: volume 13 is read only
    I0316 23:21:40 13143 store_replicate.go:39] failed to write to local disk: volume 13 is read only
  • filer log
    I0316 23:19:31 23042 filer_server_handlers_write_autochunk.go:39] AutoChunking level set to 32 (MB)
    I0316 23:19:31 23042 filer_server_handlers_write_autochunk.go:47] Content-Length of 95948 is less than the chunk size of 33554432 so autoChunking will be skipped.
    I0316 23:19:31 23042 filer_server_handlers_write.go:122] write /a/b/32.jpg to http://ip2:8180/13,011ca1fac012f5a4
    I0316 23:19:31 23042 filer_server_handlers_write.go:260] post result {"name":"32.jpg","size":95723,"error":"failed to write to local disk: volume 13 is read only","eTag":"a313d1ac"}
    I0316 23:19:31 23042 filer_server_handlers_write.go:270] failing to post to volume server /a/b/32.jpg failed to write to local disk: volume 13 is read only
chrislusf added a commit that referenced this issue Mar 17, 2020
@chrislusf
Copy link
Collaborator

Thanks for all the details!

I added a fix to mark volume13 as readonly correctly.

@chrislusf
Copy link
Collaborator

added to 1.64 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants