Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weed fuse mount hangs. #2952

Closed
ThomasADavis opened this issue Apr 21, 2022 · 18 comments
Closed

weed fuse mount hangs. #2952

ThomasADavis opened this issue Apr 21, 2022 · 18 comments

Comments

@ThomasADavis
Copy link

Describe the bug
Fuse mount/filer hangs after high load from a restic restore.

[root@mouse-r13 ~]# sh -x /home/tdavis/home-backup.sh 
+ RESTIC_PASSWORD=badbadbadjuju
+ export RESTIC_PASSWORD
+ RESTIC_REPOSITORY=s3:http://mouse-r11:8333/homes
+ export RESTIC_REPOSITORY
+ restic backup /home
repository 7192c674 opened successfully, password is correct
using parent snapshot 2e42bd8f

Files:         231 new,    29 changed, 1201279 unmodified
Dirs:           22 new,    43 changed, 203382 unmodified
Added to the repo: 130.694 MiB

processed 1201539 files, 233.310 GiB in 54:39
snapshot 02dacef4 saved
[root@mouse-r13 ~]# 

this is the snapshot to restore from. 1.2 million files, 233 GB..

mkdir /weed/
weed mount -dir /weed -filer mouse-r11:8889  

this runs for a while, creates all the directories.. starts to restore, then hangs.

mkdir home
cd home
restic restore latest --target .

The repo is the s3 based repo from above backup.

System Setup

3ea, AMD 8/32 CPU, 64Gb of ram, 10GB/bonded interfaces, NVME/m2 2TB storage.

The S3 storage is 5 nodes, 2x1G ethernet, fanless Zotac ci329's, 8gb ram, 4 core intel celeron @ 1.10Ghz, 8TB SSD Micron 5100 Pro SATA (it's for disaster recovery)

A restic restore from those 5 nodes to the local nvme drive is capable of doing 1.5Gb/sec easily - see:

[root@mouse-r13 restore]# time restic restore  latest --target .
repository 7192c674 opened successfully, password is correct
restoring <Snapshot 02dacef4 of [/home] at 2022-04-21 12:07:27.565531555 -0700 PDT by root@mouse-r13> to .

real	17m17.017s
user	32m43.975s
sys	9m46.551s
[root@mouse-r13 restore]# 

The same restore to the seaweedfs fuse mount hangs after several minutes.

If I do a systemctl restart seaweedfs-filer/kill -9 the filer, it restarts, the restore restarts, and then hangs again..

[root@mouse-r11 system]# weed version
version 30GB 2.99 8e98d7326b8cbd715033ec5a0e602732a4034850 linux amd64
[root@mouse-r11 system]# 
[root@mouse-r11 system]# uname -a
Linux mouse-r11 5.16.11-200.fc35.x86_64 #1 SMP PREEMPT Wed Feb 23 17:08:49 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
[root@mouse-r11 system]# 

OS is Fedora 35 Server, selinux disabled.

systemd service files:

seaweedfs-volume.service.txt
seaweedfs-s3.service.txt
seaweedfs-master.service.txt
seaweedfs-filer.service.txt

  • List the command line to start "weed master", "weed volume", "weed filer", "weed s3", "weed mount".
  • OS version
  • output of weed version
  • if using filer, show the content of filer.toml

filer.toml is empty.

[root@mouse-r11 system]# weed shell
master: localhost:9333 filers: [mouse-r11:8889]
> fs.configure
{
  "locations": [
    {
      "locationPrefix": "/backup/",
      "replication": "001",
      "diskType": "ssd"
    },
    {
      "locationPrefix": "/buckets/",
      "replication": "001",
      "diskType": "ssd"
    },
    {
      "locationPrefix": "/home/",
      "replication": "001",
      "diskType": "nvme"
    }
  ]
}
> 

Expected behavior
not hangs, restore finishes.

Additional context

weed mount reports this when I crash and restart the filer:

2022/04/20 23:47:05 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:06 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:07 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:08 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:09 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:12 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:12 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:12 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:12 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:15 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:15 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/20 23:47:20 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
E0421 08:53:54 70094 weedfs_file_sync.go:168] fh flush create /home/home/boverhof/collectors/modbus/pollmb/logs/b59-ups1.log: CreateEntry: rpc error: code = Unavailable desc = error reading from server: EOF
I0421 08:53:54 70094 wfs_filer_client.go:29] WithFilerClient 0 mouse-r11:18889: fh flush create /home/home/boverhof/collectors/modbus/pollmb/logs/b59-ups1.log: CreateEntry: rpc error: code = Unavailable desc = error reading from server: EOF
E0421 08:53:54 70094 weedfs_file_sync.go:182] /home/home/boverhof/collectors/modbus/pollmb/logs/b59-ups1.log fh 0 flush: fh flush create /home/home/boverhof/collectors/modbus/pollmb/logs/b59-ups1.log: CreateEntry: rpc error: code = Unavailable desc = error reading from server: EOF
I0421 08:53:54 70094 wfs_filer_client.go:29] WithFilerClient 0 mouse-r11:18889: rpc error: code = Unavailable desc = error reading from server: EOF
E0421 08:53:54 70094 meta_cache_subscribe.go:63] follow metadata updates: subscribing filer meta change: rpc error: code = Unavailable desc = error reading from server: EOF
I0421 08:53:54 70094 weedfs_write.go:36] assign volume failure count:1 path:"/home/home/tdavis/go/pkg/dep/sources/https---go.googlesource.com-crypto/.git/hooks/prepare-commit-msg.sample": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.84.2:18889: connect: connection refused"
I0421 08:53:54 70094 retry.go:25] retry assignVolume: err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.84.2:18889: connect: connection refused"
I0421 08:53:55 70094 weedfs_write.go:36] assign volume failure count:1 path:"/home/home/tdavis/go/pkg/dep/sources/https---go.googlesource.com-crypto/.git/hooks/prepare-commit-msg.sample": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.84.2:18889: connect: connection refused"
I0421 08:53:55 70094 retry.go:25] retry assignVolume: err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.84.2:18889: connect: connection refused"
I0421 08:53:57 70094 weedfs_write.go:36] assign volume failure count:1 path:"/home/home/tdavis/go/pkg/dep/sources/https---go.googlesource.com-crypto/.git/hooks/prepare-commit-msg.sample": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.84.2:18889: connect: connection refused"
I0421 08:53:57 70094 retry.go:25] retry assignVolume: err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.84.2:18889: connect: connection refused"
I0421 08:53:59 70094 weedfs_write.go:36] assign volume failure count:1 path:"/home/home/tdavis/go/pkg/dep/sources/https---go.googlesource.com-crypto/.git/hooks/prepare-commit-msg.sample": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.84.2:18889: connect: connection refused"
I0421 08:53:59 70094 retry.go:25] retry assignVolume: err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.84.2:18889: connect: connection refused"
I0421 08:54:02 70094 retry.go:19] retry assignVolume successfully
2022/04/21 08:54:03 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/21 08:54:03 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/21 08:54:03 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/21 08:54:03 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
2022/04/21 08:54:03 writer: Write/Writev failed, err: 2=no such file or directory. opcode: INTERRUPT
@ThomasADavis
Copy link
Author

I'm able to replicate this with rsync using the same data set. It gets farther along, but eventually just stops..

@chrislusf
Copy link
Collaborator

chrislusf commented Apr 22, 2022

The filer is unlikely to hang since the file content (not metadata) do not go through it.

You can run "weed mount -debug" to see the activities. It seems slow, but I could not get it hang though.

If it hangs, please use "kill -3" to get a thread dump for the weed mount.

@ThomasADavis
Copy link
Author

ThomasADavis commented Apr 26, 2022

I'd not rule out filer yet. So far, I've to had restart it along with the fuse mount to fix it - filer will hang s3 connections (but not all).

Still working on it, have to move it a different system, and it takes forever to get it to the hang stage (like - 2-3 hours)

@chrislusf
Copy link
Collaborator

any updates?

@ThomasADavis
Copy link
Author

yea, I tried to duplicate on a different cluster, and couldn't do it.

I just had it hang on the original cluster.

this is the fs.configure on the one that hangs:

[root@mouse-r11 ~]# weed shell
I0510 09:48:46 57829 masterclient.go:141] redirected to leader mouse-r13:9333
master: localhost:9333 filers: [mouse-r11:8889]
> fs.configure
{
  "locations": [
    {
      "locationPrefix": "/backup/",
      "replication": "001",
      "diskType": "ssd"
    },
    {
      "locationPrefix": "/buckets/",
      "replication": "001",
      "diskType": "ssd"
    },
    {
      "locationPrefix": "/buckets/dev-k3s/",
      "ttl": "14"
    },
    {
      "locationPrefix": "/buckets/omni-core-k3s/",
      "ttl": "14"
    },
    {
      "locationPrefix": "/buckets/omni-k3s/",
      "ttl": "14"
    },
    {
      "locationPrefix": "/buckets/ood-k3s/",
      "ttl": "14"
    },
    {
      "locationPrefix": "/buckets/shasta-k3s/",
      "ttl": "14"
    },
    {
      "locationPrefix": "/buckets/storage-k3s/",
      "ttl": "14"
    },
    {
      "locationPrefix": "/buckets/syslog-k3s/",
      "ttl": "14"
    },
    {
      "locationPrefix": "/buckets/vmetric-k3s/",
      "ttl": "14"
    },
    {
      "locationPrefix": "/home/",
      "replication": "001",
      "diskType": "nvme"
    }
  ]
}
> 

note that 3 nodes have a different diskType, and the home directory is supposed to be located there.

the following is the kill -3 output of weed mount:

weedmount.log

Restarting the mount does NOT fix the problem - it still hangs.

kill -3 output of filer

weed-filer.log

@ThomasADavis
Copy link
Author

So I am seeing filer hanging after several days of usage via S3 buckets, with this.. going to re-configure and update to latest to see what happens.

@chrislusf
Copy link
Collaborator

Need to have at least one hdd volume. Seems all your disk types are not hdd.

@ThomasADavis
Copy link
Author

ThomasADavis commented Jun 2, 2022

Ok, reset that... now getting

[root@mouse-r11 ~]# weed mount -dir /mnt -filer mouse-r11:8889
mount point owner uid=0 gid=0 mode=drwxr-xr-x
current uid=0 gid=0
I0601 23:44:10 33730 leveldb_store.go:47] filer store dir: /tmp/abee8ba8/meta
I0601 23:44:10 33730 file_util.go:23] Folder /tmp/abee8ba8/meta Permission: -rwxr-xr-x
This is SeaweedFS version 30GB 3.08 8a49240d64f5e53f119e58b923c3a6b25e85b37b linux amd64
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1c52dac]

goroutine 4110072 [running]:
github.com/chrislusf/seaweedfs/weed/mount.(*FileHandle).addChunks(0xc013db2d80, {0xc0136f5d58, 0x1, 0x48?})
        /github/workspace/weed/mount/filehandle.go:72 +0xec
github.com/chrislusf/seaweedfs/weed/mount.(*ChunkedDirtyPages).saveChunkedFileIntevalToStorage(0xc00ea71aa0, {0x282dde0, 0xc009dd3d60}, 0x0, 0x200000, 0x23b6ab8)
        /github/workspace/weed/mount/dirty_pages_chunked.go:87 +0x22e
github.com/chrislusf/seaweedfs/weed/mount/page_writer.(*MemChunk).SaveContent(0xc010fc3dd0, 0xc003ca4ac0)
        /github/workspace/weed/mount/page_writer/page_chunk_mem.go:71 +0x8a
github.com/chrislusf/seaweedfs/weed/mount/page_writer.(*UploadPipeline).moveToSealed.func1()
        /github/workspace/weed/mount/page_writer/upload_pipeline.go:162 +0x66
github.com/chrislusf/seaweedfs/weed/util.(*LimitedConcurrentExecutor).Execute.func1()
        /github/workspace/weed/util/limiter.go:45 +0x71
created by github.com/chrislusf/seaweedfs/weed/util.(*LimitedConcurrentExecutor).Execute
        /github/workspace/weed/util/limiter.go:40 +0xae
[root@mouse-r11 ~]# weed version
version 30GB 3.08 8a49240d64f5e53f119e58b923c3a6b25e85b37b linux amd64
[root@mouse-r11 ~]# 

restarted the mount, and filer didn't hang this time..

@chrislusf
Copy link
Collaborator

how to reproduce it?

@ThomasADavis
Copy link
Author

I changed the 'ssd' type to 'hdd', backed up the data I wanted to keep, totally destroyed the weed cluster, and then two rsync's of data to put back into the filesystem.

The two datasets are 243G and 487G in size. I was doing both rsync's at the same time, on the same host, using just one weed mount process, with the above setup with the 'ssd' tag changed to 'hdd' - but a totally clean system. I wiped the volumes, the filer and the master db's.

This is not a fast process - it got about half way through the 243G set before it crashed, which took several hours.

@chrislusf
Copy link
Collaborator

are the files small or large or mixed?

@ThomasADavis
Copy link
Author

Totally mixed. the 487G data set is based on restic backups and etcd snapshots, the 243G data set is a /home directory structure.

@ThomasADavis
Copy link
Author

ThomasADavis commented Jun 2, 2022

oh, here's the fs.configure for this system also:

[root@mouse-r11 ~]# weed shell
master: localhost:9333 filers: [mouse-r11:8889]
> fs.configure
{
  "locations": [
    {
      "locationPrefix": "/buckets/",
      "diskType": "hdd"
    },
    {
      "locationPrefix": "/home/",
      "diskType": "nvme"
    }
  ]
}
> 

3 nodes are drive type nvme, 5 nodes are hdd taged, one rsync (the 487G set) was writing to the /buckets/ path, the other rsync was writing to /home path.

replication setting is '001'. the nodes have different rack settings (hdd are one, nvme are another), all use the same datacenter.
Screenshot_2022-06-02_11-56-47

chrislusf added a commit that referenced this issue Jun 6, 2022
@chrislusf
Copy link
Collaborator

added fix d65bb2c to address "invalid memory address"

@ThomasADavis
Copy link
Author

Ok, updated the cluster to v3.09, we get farther.. but still issues.

this is the same dataset, we get through it once, then I do it again, and on the second try we get:

mount point owner uid=0 gid=0 mode=drwxr-xr-x
current uid=0 gid=0
I0606 23:08:20 24347 leveldb_store.go:47] filer store dir: /tmp/2ddb8e02/meta
I0606 23:08:20 24347 file_util.go:23] Folder /tmp/2ddb8e02/meta Permission: -rwxr-xr-x
This is SeaweedFS version 30GB 3.09 4a046e4de7a40730895f8149120ce8d6e95f961d linux amd64
I0607 20:36:28 24347 wfs_filer_client.go:29] WithFilerClient 0 mouse-r11:18889: UpdateEntry dir /home/home/bdlalli: rpc error: code = Unknown desc = not found /home/home/bdlalli: filer: no entry is found in filer store
E0607 20:36:28 24347 wfs_save.go:42] saveEntry /home/home/bdlalli: UpdateEntry dir /home/home/bdlalli: rpc error: code = Unknown desc = not found /home/home/bdlalli: filer: no entry is found in filer store
I0607 21:05:03 24347 wfs_filer_client.go:29] WithFilerClient 0 mouse-r11:18889: UpdateEntry dir /home/home/ebuckner: rpc error: code = Unknown desc = not found /home/home/ebuckner: filer: no entry is found in filer store
E0607 21:05:03 24347 wfs_save.go:42] saveEntry /home/home/ebuckner: UpdateEntry dir /home/home/ebuckner: rpc error: code = Unknown desc = not found /home/home/ebuckner: filer: no entry is found in filer store
I0607 21:12:57 24347 wfs_filer_client.go:29] WithFilerClient 0 mouse-r11:18889: UpdateEntry dir /home/home/melrom: rpc error: code = Unknown desc = not found /home/home/melrom: filer: no entry is found in filer store
E0607 21:12:57 24347 wfs_save.go:42] saveEntry /home/home/melrom: UpdateEntry dir /home/home/melrom: rpc error: code = Unknown desc = not found /home/home/melrom: filer: no entry is found in filer store
I0607 21:52:37 24347 wfs_filer_client.go:29] WithFilerClient 0 mouse-r11:18889: UpdateEntry dir /home/home/siqideng: rpc error: code = Unknown desc = not found /home/home/siqideng: filer: no entry is found in filer store
E0607 21:52:37 24347 wfs_save.go:42] saveEntry /home/home/siqideng: UpdateEntry dir /home/home/siqideng: rpc error: code = Unknown desc = not found /home/home/siqideng: filer: no entry is found in filer store
I0607 22:10:24 24347 wfs_filer_client.go:29] WithFilerClient 0 mouse-r11:18889: UpdateEntry dir /home/home/tdavis: rpc error: code = Unknown desc = not found /home/home/tdavis: filer: no entry is found in filer store
E0607 22:10:24 24347 wfs_save.go:42] saveEntry /home/home/tdavis: UpdateEntry dir /home/home/tdavis: rpc error: code = Unknown desc = not found /home/home/tdavis: filer: no entry is found in filer store
I0607 22:26:16 24347 wfs_filer_client.go:29] WithFilerClient 0 mouse-r11:18889: UpdateEntry dir /home/home/yllam: rpc error: code = Unknown desc = not found /home/home/yllam: filer: no entry is found in filer store
E0607 22:26:16 24347 wfs_save.go:42] saveEntry /home/home/yllam: UpdateEntry dir /home/home/yllam: rpc error: code = Unknown desc = not found /home/home/yllam: filer: no entry is found in filer store

and

*** Skipping any contents from this failed directory ***
rsync: [generator] recv_generator: mkdir "/mnt/home/home/tdavis/skydive" failed: Not a directory (20)
*** Skipping any contents from this failed directory ***
rsync: [generator] recv_generator: mkdir "/mnt/home/home/tdavis/snmp" failed: Not a directory (20)
*** Skipping any contents from this failed directory ***
rsync: [generator] recv_generator: mkdir "/mnt/home/home/tdavis/sonic" failed: Not a directory (20)
*** Skipping any contents from this failed directory ***
rsync: [generator] recv_generator: mkdir "/mnt/home/home/tdavis/sshd" failed: Not a directory (20)
*** Skipping any contents from this failed directory ***
rsync: [generator] recv_generator: mkdir "/mnt/home/home/tdavis/stash" failed: Not a directory (20)
*** Skipping any contents from this failed directory ***

and

> volume.check.disk -force
volume 23 ark-3:8088 has 11 entries, ark-1:8088 missed 0 entries
volume 23 ark-1:8088 has 11 entries, ark-3:8088 missed 0 entries
volume 33 mouse-r13:8088 has 181494 entries, mouse-r11:8088 missed 37068 entries
volume 33 mouse-r11:8088 has 144426 entries, mouse-r13:8088 missed 0 entries
volume 33 mouse-r13:8088 has 181494 entries, mouse-r11:8088 missed 0 entries
volume 33 mouse-r11:8088 has 181494 entries, mouse-r13:8088 missed 0 entries
volume 34 mouse-r11:8088 has 127608 entries, mouse-r12:8088 missed 0 entries
volume 34 mouse-r12:8088 has 255901 entries, mouse-r11:8088 missed 128293 entries
sync volume 34 on mouse-r12:8088 and mouse-r11:8088: doVolumeCheckDisk source:id:"mouse-r12:8088" diskInfos:{key:"nvme" value:{type:"nvme" volume_count:8 max_volume_count:256 free_volume_count:248 active_volume_count:7 volume_infos:{id:141 size:17281786912 file_count:144883 replica_placement:1 version:3 compact_revision:3 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:144 size:17706210256 file_count:145335 delete_count:1 deleted_byte_count:253 replica_placement:1 version:3 compact_revision:3 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:31 size:17612901208 file_count:145481 delete_count:2 deleted_byte_count:456 replica_placement:1 version:3 compact_revision:4 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:36 size:17495902744 file_count:144607 delete_count:1 deleted_byte_count:253 replica_placement:1 version:3 compact_revision:4 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:140 size:17633624352 file_count:144480 replica_placement:1 version:3 compact_revision:3 modified_at_second:1654665874 disk_type:"nvme"} volume_infos:{id:32 size:17349633808 file_count:145013 delete_count:1 deleted_byte_count:102 replica_placement:1 version:3 compact_revision:4 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:35 size:17522328432 file_count:145466 delete_count:3 deleted_byte_count:354 replica_placement:1 version:3 compact_revision:4 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:34 size:31474374512 file_count:258812 delete_count:2989 deleted_byte_count:851157886 replica_placement:1 version:3 compact_revision:1 modified_at_second:1654662468 disk_type:"nvme"}}} grpc_port:18088 target:id:"mouse-r11:8088" diskInfos:{key:"nvme" value:{type:"nvme" volume_count:8 max_volume_count:256 free_volume_count:248 active_volume_count:2 volume_infos:{id:36 size:17495902744 file_count:144607 delete_count:1 deleted_byte_count:253 replica_placement:1 version:3 compact_revision:4 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:34 size:15387452328 file_count:127608 replica_placement:1 version:3 compact_revision:4 modified_at_second:1654662468 disk_type:"nvme"} volume_infos:{id:139 size:17328694648 file_count:144996 replica_placement:1 version:3 compact_revision:3 modified_at_second:1654665874 disk_type:"nvme"} volume_infos:{id:142 size:17214030568 file_count:144494 delete_count:3 deleted_byte_count:620 replica_placement:1 version:3 compact_revision:3 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:143 size:17889759280 file_count:145177 replica_placement:1 version:3 compact_revision:3 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:33 size:17236602472 file_count:144426 replica_placement:1 version:3 compact_revision:3 modified_at_second:1654664873 disk_type:"nvme"} volume_infos:{id:35 size:17522328432 file_count:145466 delete_count:3 deleted_byte_count:354 replica_placement:1 version:3 compact_revision:4 modified_at_second:1654665977 disk_type:"nvme"} volume_infos:{id:31 size:17612901208 file_count:145481 delete_count:2 deleted_byte_count:456 replica_placement:1 version:3 compact_revision:4 modified_at_second:1654665977 disk_type:"nvme"}}} grpc_port:18088 volume 34: failed to start repair volume 34, percentage of missing keys is greater than the threshold: 0.50 > 0.30
> 

@ThomasADavis
Copy link
Author

ThomasADavis commented Jun 9, 2022

At some point, these showed up in the filer output
Screenshot_2022-06-08_21-49-10

I think it ran out of file handles..

@stemcc
Copy link

stemcc commented Aug 17, 2022

I think it ran out of file handles..

Once you increased the file handles on the host, did your issue resolve itself @ThomasADavis?

@ThomasADavis
Copy link
Author

I'd see every once a while there the weed process would exceed 1000 open file handles; upping that limit in the systemd service file I'm using stopped that issue.

I haven't been able to recreate this problem with newer versions of seaweedfs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants