Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unlink panic on NFS #61

Closed
sh0rez opened this issue Jun 9, 2019 · 5 comments
Closed

Unlink panic on NFS #61

sh0rez opened this issue Jun 9, 2019 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@sh0rez
Copy link

sh0rez commented Jun 9, 2019

Hi! I am currently testing out Victoria Metrics and so far it works really well. Awesome speed and resource usage. However, I usually store big amounts of data on a dedicated NFS storage server when running in production. But when running Victoria against this setup, it quickly panics:

2019-06-09T11:51:12.639+0000    panic   lib/storage/partition.go:765    FATAL: unrecoverable error when merging small parts in the partition "/victoria-metrics-data/data/small/2019_06": cannot execute transaction "/victoria-metrics-data/data/small/2019_06/txn/15A68575B8D388A0": cannot remove "/victoria-metrics-data/data/small/2019_06/89_89_20190609115059.446_20190609115059.446_15A68575B8D3889F": unlinkat /victoria-metrics-data/data/small/2019_06/89_89_20190609115059.446_20190609115059.446_15A68575B8D3889F: directory not empty
panic: FATAL: unrecoverable error when merging small parts in the partition "/victoria-metrics-data/data/small/2019_06": cannot execute transaction "/victoria-metrics-data/data/small/2019_06/txn/15A68575B8D388A0": cannot remove "/victoria-metrics-data/data/small/2019_06/89_89_20190609115059.446_20190609115059.446_15A68575B8D3889F": unlinkat /victoria-metrics-data/data/small/2019_06/89_89_20190609115059.446_20190609115059.446_15A68575B8D3889F: directory not empty

goroutine 5 [running]:
github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logMessage(0xa9dc6b, 0x5, 0xc0000ea3c0, 0x1cb, 0x3)
        /VictoriaMetrics/lib/logger/logger.go:124 +0x53d
github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logLevel(0xa9dc6b, 0x5, 0xac1f67, 0x4b, 0xc0011d5f88, 0x2, 0x2)
        /VictoriaMetrics/lib/logger/logger.go:78 +0x113
github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.Panicf(...)
        /VictoriaMetrics/lib/logger/logger.go:62
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partition).smallPartsMerger(0xc000ffe000)
        /VictoriaMetrics/lib/storage/partition.go:765 +0x126
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partition).startMergeWorkers.func1(0xc000ffe000)
        /VictoriaMetrics/lib/storage/partition.go:743 +0x2b
created by github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partition).startMergeWorkers
        /VictoriaMetrics/lib/storage/partition.go:742 +0x6c

The NFS Host is a Synology DiskStation, the underlying filesystem should be btrfs. The nfs mounting is done directly using docker volume:

      docker_volume:
        name: "victoria"
        driver: local
        driver_options:
          type: nfs
          o: "addr=synology.home,rw,soft,nolock"
          device: ":/volume1/int/monitor/tsdb"

Any ideas how to address this?

@tenmozes tenmozes added the bug Something isn't working label Jun 9, 2019
@valyala
Copy link
Collaborator

valyala commented Jun 10, 2019

NFS leaves .nfsXXX files when deleting open files. This prevents from directory deletion. See these docs. The only workaround is to wait for a while until all the files are closed and try deleting the directory again.

valyala added a commit that referenced this issue Jun 10, 2019
@valyala
Copy link
Collaborator

valyala commented Jun 10, 2019

@sh0rez , could you verify the issue is fixed in v1.19.0?

@sh0rez
Copy link
Author

sh0rez commented Jun 11, 2019

Hi! It does not panic anymore, however it is constantly logging errors:

2019-06-11T05:50:11.944+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/180_90_20190611054549.446_20190611054559.446_15A70EAF78DEA3B2": unlinkat /victoria-metrics-data/data/small/2019_06/180_90_20190611054549.446_20190611054559.446_15A70EAF78DEA3B2/.nfs000000000000055f00000009: device or resource busy
2019-06-11T05:50:11.952+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/360_90_20190611054509.446_20190611054539.446_15A70EAF78DEA3B0": unlinkat /victoria-metrics-data/data/small/2019_06/360_90_20190611054509.446_20190611054539.446_15A70EAF78DEA3B0/.nfs00000000000005490000000c: device or resource busy
2019-06-11T05:50:11.961+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/720_90_20190611054349.446_20190611054459.446_15A68A9DE06C8CA7": unlinkat /victoria-metrics-data/data/small/2019_06/720_90_20190611054349.446_20190611054459.446_15A68A9DE06C8CA7/.nfs000000000000053a0000000f: device or resource busy
2019-06-11T05:50:11.971+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/1440_90_20190611054109.446_20190611054339.446_15A68A9DE06C8C9F": unlinkat /victoria-metrics-data/data/small/2019_06/1440_90_20190611054109.446_20190611054339.446_15A68A9DE06C8C9F/.nfs000000000000054f00000012: device or resource busy
2019-06-11T05:50:20.925+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/90_90_20190611054609.446_20190611054609.446_15A70EEC815CFF9C": unlinkat /victoria-metrics-data/data/small/2019_06/90_90_20190611054609.446_20190611054609.446_15A70EEC815CFF9C/.nfs000000000000058d00000015: device or resource busy
2019-06-11T05:50:40.958+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/90_90_20190611055029.446_20190611055029.446_15A70EEC815CFF9E": unlinkat /victoria-metrics-data/data/small/2019_06/90_90_20190611055029.446_20190611055029.446_15A70EEC815CFF9E/.nfs000000000000059900000018: device or resource busy
2019-06-11T05:50:40.967+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/180_90_20190611054609.446_20190611055019.446_15A70EEC815CFF9D": unlinkat /victoria-metrics-data/data/small/2019_06/180_90_20190611054609.446_20190611055019.446_15A70EEC815CFF9D/.nfs00000000000005930000001b: device or resource busy
2019-06-11T05:51:00.958+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/90_90_20190611055049.446_20190611055049.446_15A70EEC815CFFA0": unlinkat /victoria-metrics-data/data/small/2019_06/90_90_20190611055049.446_20190611055049.446_15A70EEC815CFFA0/.nfs00000000000005a50000001e: device or resource busy
2019-06-11T05:51:20.997+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/90_90_20190611055109.446_20190611055109.446_15A70EEC815CFFA2": unlinkat /victoria-metrics-data/data/small/2019_06/90_90_20190611055109.446_20190611055109.446_15A70EEC815CFFA2/.nfs00000000000005b100000021: device or resource busy
2019-06-11T05:51:21.005+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/180_90_20190611055049.446_20190611055059.446_15A70EEC815CFFA1": unlinkat /victoria-metrics-data/data/small/2019_06/180_90_20190611055049.446_20190611055059.446_15A70EEC815CFFA1/.nfs00000000000005ab00000024: device or resource busy
2019-06-11T05:51:21.015+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/360_90_20190611054609.446_20190611055039.446_15A70EEC815CFF9F": unlinkat /victoria-metrics-data/data/small/2019_06/360_90_20190611054609.446_20190611055039.446_15A70EEC815CFF9F/.nfs000000000000059f00000027: device or resource busy
2019-06-11T05:51:40.991+0000    error   lib/fs/fs.go:236        cannot remove "/victoria-metrics-data/data/small/2019_06/90_90_20190611055129.446_20190611055129.446_15A70EEC815CFFA4": unlinkat /victoria-metrics-data/data/small/2019_06/90_90_20190611055129.446_20190611055129.446_15A70EEC815CFFA4/.nfs00000000000005bd0000002a: device or resource busy

Nevertheless it seems to operate properly. Maybe handle that error by checking whether it persists and error out in case it does?

@valyala
Copy link
Collaborator

valyala commented Jun 11, 2019

It looks like VictoriaMetrics still cannot remove these directories and they pile up in the /victoria-metrics-data/data/small/2019_06/ directory.

Could you build VictoriaMetrics from the latest commit and verify on a fresh -storageDataPath in NFS that:

  1. It stops logging these errors
  2. The number of subdirectories in the /victoria-metrics-data/data/small/2019_06/ directory stays low (up to 100) and doesn't grow with time
  3. It works properly and it restarts without issues and data loss

@valyala
Copy link
Collaborator

valyala commented Dec 2, 2019

Related issue - #162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants