Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic compaction not working #2396

Closed
nivekuil opened this issue Oct 24, 2021 · 5 comments
Closed

Automatic compaction not working #2396

nivekuil opened this issue Oct 24, 2021 · 5 comments

Comments

@nivekuil
Copy link
Contributor

nivekuil commented Oct 24, 2021

Describe the bug
The master's automatic compaction seems to be unable to compact volumes, so every 15 minutes it will keep trying to compact the same volumes. For example:

I1024 03:54:13     1 volume_create_linux.go:20] Preallocated 16777216000 bytes disk space for /data/nivekuil_80.cpd
I1024 03:54:28     1 volume_vacuum.go:157] Cleaning up volume 80 vacuuming...
I1024 03:54:28     1 volume_grpc_vacuum.go:68] cleanup volume 80
I1024 04:00:36     1 volume_grpc_vacuum.go:36] compact volume 80
I1024 04:09:13     1 volume_create_linux.go:20] Preallocated 16777216000 bytes disk space for /data/nivekuil_80.cpd
I1024 04:09:28     1 volume_vacuum.go:157] Cleaning up volume 80 vacuuming...
I1024 04:09:28     1 volume_grpc_vacuum.go:68] cleanup volume 80
I1024 04:15:24     1 volume_grpc_vacuum.go:36] compact volume 80
I1024 04:24:13     1 volume_create_linux.go:20] Preallocated 16777216000 bytes disk space for /data/nivekuil_80.cpd
I1024 04:24:28     1 volume_vacuum.go:157] Cleaning up volume 80 vacuuming...
I1024 04:24:28     1 volume_grpc_vacuum.go:68] cleanup volume 80
I1024 04:28:37     1 volume_grpc_vacuum.go:36] compact volume 80
I1024 04:39:14     1 volume_create_linux.go:20] Preallocated 16777216000 bytes disk space for /data/nivekuil_80.cpd
I1024 04:39:29     1 volume_vacuum.go:157] Cleaning up volume 80 vacuuming...
I1024 04:39:29     1 volume_grpc_vacuum.go:68] cleanup volume 80
I1024 04:45:36     1 volume_grpc_vacuum.go:36] compact volume 80
I1024 04:54:13     1 volume_create_linux.go:20] Preallocated 16777216000 bytes disk space for /data/nivekuil_80.cpd
I1024 04:54:28     1 volume_vacuum.go:157] Cleaning up volume 80 vacuuming...
I1024 04:54:28     1 volume_grpc_vacuum.go:68] cleanup volume 80

I am able to run weed compact manually and mv the cpd/cpx files to the proper place. In the volume UI I can see that the volume trash size remains the same (above the compaction threshold). Restarting the components does not make a difference.

System Setup
One master, one volume, one filer + 3 mounts
master: -volumeSizeLimitMB 16000
volume: -index leveldb

  • output of weed version
    2.74

Expected behavior
Automatic compaction should succeed

Additional context
Possibly triggered by running volume.fsck at the same time?

Two questions:

  1. with a lot of compaction activity going on, it will saturate the host IOPS. Since this is an offline operation, is it possible to limit/QoS this?
  2. the compacted volumes are preallocated to the full size, but are read only. Is this just wasted space on the host?
@chrislusf
Copy link
Collaborator

Need to see the error on the weed shell side.

  1. see weed volume -h
  -compactionMBps int
    	limit background compaction or copying speed in mega bytes per second
  1. they should not be read ony.

@nivekuil
Copy link
Contributor Author

nivekuil commented Oct 24, 2021

Need to see the error on the weed shell side.

volume.vacuum has no output in weed shell. On volume it's the same logs:

I1024 06:04:34     1 volume_create_linux.go:20] Preallocated 16777216000 bytes disk space for /data/nivekuil_80.cpd
I1024 06:04:49     1 volume_vacuum.go:157] Cleaning up volume 80 vacuuming...
I1024 06:04:49     1 volume_grpc_vacuum.go:68] cleanup volume 80

the .cpd is getting created then deleted (renamed?) but the size/trash of the volume in the UI is still the same, even after restart

weed compact -method=1 -dir /data -volumeId 80 -collection nivekuil logs:

I1024 06:08:02    21 volume_info.go:21] maybeLoadVolumeInfo checks /data/nivekuil_80.vif
I1024 06:08:02    21 volume_info.go:37] maybeLoadVolumeInfo reads /data/nivekuil_80.vif
I1024 06:08:02    21 volume_info.go:46] maybeLoadVolumeInfo Unmarshal volume info /data/nivekuil_80.vif
I1024 06:08:02    21 volume_loading.go:86] readSuperBlock volume 80 version 3
I1024 06:08:02    21 volume_loading.go:116] open to write file /data/nivekuil_80.idx
I1024 06:08:02    21 volume_loading.go:133] loading index /data/nivekuil_80.idx to memory
I1024 06:08:02    21 walk.go:19] readerOffset 0 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 16384 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 32768 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 49152 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 65536 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 81920 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 98304 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 114688 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 131072 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 147456 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 163840 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 180224 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 196608 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 212992 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 229376 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 245760 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 262144 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 278528 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 294912 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 311296 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 327680 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 344064 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 360448 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 376832 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 393216 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 409600 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 425984 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 442368 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 458752 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 475136 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 491520 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 507904 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 524288 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 540672 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 557056 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 573440 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 589824 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 606208 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 622592 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 638976 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 655360 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 671744 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 688128 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 704512 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 720896 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 737280 count 0 err: EOF
I1024 06:08:02    21 needle_map_memory.go:53] max file key: 661387 for file: /data/nivekuil_80.idx
I1024 06:08:02    21 volume_vacuum.go:70] Compact2 volume 80 ...
I1024 06:08:02    21 volume_vacuum.go:79] creating copies for volume 80 ...
I1024 06:08:02    21 walk.go:19] readerOffset 0 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 16384 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 32768 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 49152 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 65536 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 81920 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 98304 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 114688 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 131072 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 147456 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 163840 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 180224 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 196608 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 212992 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 229376 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 245760 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 262144 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 278528 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 294912 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 311296 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 327680 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 344064 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 360448 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 376832 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 393216 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 409600 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 425984 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 442368 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 458752 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 475136 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 491520 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 507904 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 524288 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 540672 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 557056 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 573440 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 589824 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 606208 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 622592 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 638976 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 655360 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 671744 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 688128 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 704512 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 720896 count 16384 err: <nil>
I1024 06:08:02    21 walk.go:39] readerOffset 737280 count 0 err: EOF
I1024 06:08:02    21 volume_vacuum.go:443] saving key 71fcb volume offset 1 => 2097272 data_size 2097234
I1024 06:08:02    21 volume_vacuum.go:443] saving key 71fd3 volume offset 262159 => 4194536 data_size 2097234
I1024 06:08:02    21 volume_vacuum.go:443] saving key 71ffa volume offset 524317 => 6291800 data_size 2097234
I1024 06:08:02    21 volume_vacuum.go:443] saving key 72005 volume offset 786475 => 8389064 data_size 2097234
I1024 06:08:02    21 volume_vacuum.go:443] saving key 7201d volume offset 1048633 => 10486328 data_size 2097234
<rest of output is just more "saving key ...">

after weed compact, I rename the cpd to dat and cpx to idx, then after a restart the volume will turn up in the UI with the truncated size and read only.

@chrislusf
Copy link
Collaborator

the master may have some logs

			if err != nil {
				glog.Errorf("Error when vacuuming %d on %s: %v", vid, url, err)
				ch <- false
			} else {
				glog.V(0).Infof("Complete vacuuming %d on %s", vid, url)
				ch <- true
			}

@nivekuil
Copy link
Contributor Author

nivekuil commented Oct 24, 2021

It looks like there is a timeout somewhere of 300 seconds. Still looking for the source but it would make sense that this can cause a cascading failure, where failed vacuums retry forever and cause future vacuums to run slower.

I1024 06:35:42     1 topology_vacuum.go:71] 0 Start vacuuming 80 on 127.0.0.1:8081
E1024 06:40:42     1 topology_vacuum.go:80] Error when vacuuming 80 on 127.0.0.1:8081: rpc error: code = Unknown desc = stream timeout
I1024 06:40:42     1 topology_vacuum.go:133] Start cleaning up 80 on 127.0.0.1:8081
I1024 06:40:43     1 topology_vacuum.go:143] Complete cleaning up vacuum 80 on 127.0.0.1:8081

I wonder if the master can do some kind of cleanup when the vacuum fails

@chrislusf
Copy link
Collaborator

Added some code to stream report vacuum progress and also avoid timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants