-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timeouts from volume-server while doing GC #1108
Comments
Is the timeout only about getting the status or for all requests? The log showed when trying to write a large json map to the client, it has a broken pipe. |
Both. Running curl locally for status/vid,fid:
Logs from timespan:
|
Please try to use |
Thanks, will try this parameter and reopen the issue on the next big GC if still an issue. |
After testing compactionMBps=16 i still get timeouts from time to time and broken pipe in the volume log with the request running locally with curl: weed
GC triggered by:
Locally on the volume curl runs /status and a random /vid,fid:
Logs from the last period 1142ish: https://gist.githubusercontent.com/roflmao/195e3caea2e996baaf80737effae4606/raw/3d9ecbca65c3f7b30b75f8bba21b788108ee2fb9/weed-volume.log |
how busy is the hard drive? compactionMBps=16 means 16MB/s write speed. Consider lower them. |
There are no writes other than compaction, around 200k read requests per day (avg size 300kb per document) which is avg 2.3 reads per second (divided by 86400). The utilization of the disks (RAID6) are peaking at around 70% with iostat while running compaction and regular reads. |
In 1.45, there is a change for Also, if the problem still exists, could you setup and share metrics dashboard? https://github.com/chrislusf/seaweedfs/wiki/System-Metrics |
I have upgraded to 1.45 and will come back with more data |
@roflmao updates? |
I get no timeouts for this request:
For the status i still get alot of timeouts:
Looks like the broken pipe correlates with these timestamps:
|
The issue still persists with 1.50, doesnt look directly connected to GC but broken pipe to /status happens more often while forcing GC. |
any updates with the latest version? |
@chrislusf Will test 1.58 and come back to you |
Still the same on 1.58 while forcing gc:
|
Same issue as #1222 |
@chrislusf This has gotten worse with 1.61RC. What more can i provide for you to give debug data? |
@chrislusf can we reopen this? still an issue on 1.67 |
But there are no new information. Seems you are busy compacting thousands of volumes and stressing the computer. It seems normal to have some timeouts. What about timeouts for normal http requests? |
Stressing the computer is relative to the maxCompaction setting. We're not
really stressing the system now. Could it be related to locking of some
sort? There is quite a lot of information that is gathered when doing
/status on a volume server with ~ 8000 volumes
man. 23. mar. 2020 kl. 09:52 skrev Chris Lu <notifications@github.com>:
… But there are no new information. Seems you are busy compacting thousands
of volumes and stressing the computer. It seems normal to have some
timeouts.
What about timeouts for normal http requests?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1108 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARXS53ATAZ7KGSRFRM35BTRI4PLHANCNFSM4JKXEPBA>
.
--
Ingard Mevåg
Driftssjef
Jottacloud
Mobil: +47 450 22 834
E-post: ingard@jottacloud.com
Webside: www.jottacloud.com
|
The lock would be specific to some code path. So I want to know behavior for other normal http requests. |
@chrislusf we have been running this for 5 hours without any output so regular fetching of fids seems ok:
The status endpoint is what we use for monitoring of seaweed, and also puts a whole lot of logging for all these broken pipes. We are willingly to fetch more debug data if needed. |
Thanks for confirming! This narrows down to the section of code to collect volume server status. There does exist a lock for reading the volume file counts, sizes, deletion counts, etc. This would block when the volume commits the compaction, switching from the old version to the new version. I will try to address this in later versions, to reduce the lock scope. However, this only benefits this http://<volume_server>/status page. What info you are interested in this page? Can you get the info from other places? The volume info is available in |
Is this lock only viable for the http endpoint? Could we use the grpc instead? We use /status for monitoring liveness of the volume server. |
The gRPC calls directly to master to collect stats for all volumes and does not have locks. You can check https://github.com/chrislusf/seaweedfs/blob/master/weed/shell/command_volume_list.go for examples. |
Describe the bug
Getting timeout over 10 seconds from volumeserver while doing GC
System Setup
weed version
: version 30GB 1.44 linux amd64Expected behavior
Do not get highly reduced performance while doing GC
Screenshots
https://www.jottacloud.com/s/006d6b606b06d544986b1cdd288d7a3524d
Logs
Logs from the volume server shows alot of log and a broken pipe in the periods of timeout:
https://gist.github.com/roflmao/5f27a6944c8ab25deb521e65bde675b2
Additional context
GC by:
The text was updated successfully, but these errors were encountered: