-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weed fuse mount hangs. #2952
Comments
I'm able to replicate this with rsync using the same data set. It gets farther along, but eventually just stops.. |
The filer is unlikely to hang since the file content (not metadata) do not go through it. You can run "weed mount -debug" to see the activities. It seems slow, but I could not get it hang though. If it hangs, please use "kill -3" to get a thread dump for the |
I'd not rule out filer yet. So far, I've to had restart it along with the fuse mount to fix it - filer will hang s3 connections (but not all). Still working on it, have to move it a different system, and it takes forever to get it to the hang stage (like - 2-3 hours) |
any updates? |
yea, I tried to duplicate on a different cluster, and couldn't do it. I just had it hang on the original cluster. this is the fs.configure on the one that hangs:
note that 3 nodes have a different diskType, and the home directory is supposed to be located there. the following is the kill -3 output of weed mount: Restarting the mount does NOT fix the problem - it still hangs. kill -3 output of filer |
So I am seeing filer hanging after several days of usage via S3 buckets, with this.. going to re-configure and update to latest to see what happens. |
Need to have at least one hdd volume. Seems all your disk types are not hdd. |
Ok, reset that... now getting
restarted the mount, and filer didn't hang this time.. |
how to reproduce it? |
I changed the 'ssd' type to 'hdd', backed up the data I wanted to keep, totally destroyed the weed cluster, and then two rsync's of data to put back into the filesystem. The two datasets are 243G and 487G in size. I was doing both rsync's at the same time, on the same host, using just one weed mount process, with the above setup with the 'ssd' tag changed to 'hdd' - but a totally clean system. I wiped the volumes, the filer and the master db's. This is not a fast process - it got about half way through the 243G set before it crashed, which took several hours. |
are the files small or large or mixed? |
Totally mixed. the 487G data set is based on restic backups and etcd snapshots, the 243G data set is a /home directory structure. |
added fix d65bb2c to address "invalid memory address" |
Ok, updated the cluster to v3.09, we get farther.. but still issues. this is the same dataset, we get through it once, then I do it again, and on the second try we get:
and
and
|
Once you increased the file handles on the host, did your issue resolve itself @ThomasADavis? |
I'd see every once a while there the weed process would exceed 1000 open file handles; upping that limit in the systemd service file I'm using stopped that issue. I haven't been able to recreate this problem with newer versions of seaweedfs. |
Describe the bug
Fuse mount/filer hangs after high load from a restic restore.
this is the snapshot to restore from. 1.2 million files, 233 GB..
this runs for a while, creates all the directories.. starts to restore, then hangs.
The repo is the s3 based repo from above backup.
System Setup
3ea, AMD 8/32 CPU, 64Gb of ram, 10GB/bonded interfaces, NVME/m2 2TB storage.
The S3 storage is 5 nodes, 2x1G ethernet, fanless Zotac ci329's, 8gb ram, 4 core intel celeron @ 1.10Ghz, 8TB SSD Micron 5100 Pro SATA (it's for disaster recovery)
A restic restore from those 5 nodes to the local nvme drive is capable of doing 1.5Gb/sec easily - see:
The same restore to the seaweedfs fuse mount hangs after several minutes.
If I do a systemctl restart seaweedfs-filer/kill -9 the filer, it restarts, the restore restarts, and then hangs again..
OS is Fedora 35 Server, selinux disabled.
systemd service files:
seaweedfs-volume.service.txt
seaweedfs-s3.service.txt
seaweedfs-master.service.txt
seaweedfs-filer.service.txt
weed version
filer.toml
filer.toml is empty.
Expected behavior
not hangs, restore finishes.
Additional context
weed mount reports this when I crash and restart the filer:
The text was updated successfully, but these errors were encountered: