Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upsiad scanning / rewriting entire siahostdata.dat; requests to siad hang #2594
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tbenz9
Dec 31, 2017
Collaborator
Duplicate of #2483
These are helpful logs, thanks for uploading them.
|
Duplicate of #2483 |
tbenz9
closed this
Dec 31, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tbenz9
Dec 31, 2017
Collaborator
I've changed my mind, I think it's a similar error to #2483 but possibly a different bug. Reopening and labelling bug.
|
I've changed my mind, I think it's a similar error to #2483 but possibly a different bug. Reopening and labelling bug. |
tbenz9
reopened this
Dec 31, 2017
tbenz9
added
the
Bug
label
Dec 31, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hackneyb
Dec 31, 2017
thanks .. it looks like the symptom is the same - siad not responding to some things, and API calls happen to be made and fail and are logged. But there is probably 50 different ways to make siad stop responding.
hackneyb
commented
Dec 31, 2017
|
thanks .. it looks like the symptom is the same - siad not responding to some things, and API calls happen to be made and fail and are logged. But there is probably 50 different ways to make siad stop responding. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
DavidVorick
Dec 31, 2017
Member
Is this in the middle of an upgrade? Did you make some change which may cause it to look at all the data?
Seems like it's either moving data or upgrading data, and then as a byproduct locking the whole system up.
If you just upgraded from an earlier version, this could happen I think.
|
Is this in the middle of an upgrade? Did you make some change which may cause it to look at all the data? Seems like it's either moving data or upgrading data, and then as a byproduct locking the whole system up. If you just upgraded from an earlier version, this could happen I think. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hackneyb
Dec 31, 2017
No, I've been running on 1.3.1 for about 5-6 days, this seemed to coincide with getting first 2 contracts, but no data is being uploaded - usage was 0
then suddenly a lot of chirping from the drive made me look what's going on, so unexpected
hackneyb
commented
Dec 31, 2017
|
No, I've been running on 1.3.1 for about 5-6 days, this seemed to coincide with getting first 2 contracts, but no data is being uploaded - usage was 0 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hackneyb
Dec 31, 2017
first time I Ctrl-C out of siad console .. it hung, I rebooted, came back with everything OK, except for this:
2017/12/30 16:03:38.224283 writeaheadlog.go:285: WARN: WAL file detected, performing recovery after unclean shutdown.
Then OK for a while, then IO restarted on its own
hackneyb
commented
Dec 31, 2017
|
first time I Ctrl-C out of siad console .. it hung, I rebooted, came back with everything OK, except for this: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hackneyb
Dec 31, 2017
IO finished by itself and all hung requests to siad went through. I do believe the entire operation was triggered by the first upload of data ever, to the host instance. siad then responded properly to termination and is healthy after restart.
Just in case the impact was worsened on by relative performance of hardware, I've enabled drive write caching. This required a reboot .. nothing further to report for now.
Logs
Storage Used value updated from 0 to a positive value.
Storage Folders:
Used Capacity % Used Path
4.19 MB 4.3999 TB 0.00 D:\SIAStorage01
host.log filled with about 250 lines similar to these:
2017/12/31 16:06:15.018778 errors.go:151: connection error: error with 104.175.232.198:59633: incoming RPCReviseContract failed: revision iteration failed: iteration signal failed to send: write tcp 192.<privIP>:9982->104.175.232.198:59633: i/o timeout
2017/12/31 16:06:18.514740 errors.go:151: connection error: error with 24.47.202.181:54857: incoming RPCSettings failed: failed WriteSignedObject during RPCSettings: write tcp 192.<privIP>:9982->24.47.202.181:54857: i/o timeout
2017/12/31 16:06:18.535741 errors.go:151: connection error: error with 94.224.211.150:51387: incoming RPCSettings failed: failed WriteSignedObject during RPCSettings: write tcp
hackneyb
commented
Dec 31, 2017
|
IO finished by itself and all hung requests to siad went through. I do believe the entire operation was triggered by the first upload of data ever, to the host instance. siad then responded properly to termination and is healthy after restart. Just in case the impact was worsened on by relative performance of hardware, I've enabled drive write caching. This required a reboot .. nothing further to report for now. Logs Storage Used value updated from 0 to a positive value.
host.log filled with about 250 lines similar to these:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
DavidVorick
Jan 2, 2018
Member
Hmm. Glad it's all working again. The 'performing recovery' thing is normal if you do unclean shutdown. Recovery should be complete, you won't be missing any data or anything.
Let me know if you see the issue again, but that's pretty strange.
Actually, how much data are you hosting on the network? Is it possible that you have a very large contract (greater than 100 GB) that you are hosting?
|
Hmm. Glad it's all working again. The 'performing recovery' thing is normal if you do unclean shutdown. Recovery should be complete, you won't be missing any data or anything. Let me know if you see the issue again, but that's pretty strange. Actually, how much data are you hosting on the network? Is it possible that you have a very large contract (greater than 100 GB) that you are hosting? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hackneyb
Jan 3, 2018
Thanks .. I do believe the NTFS file may first be created in a way which causes some sort of zero-out or thick-provisioning action, but only when data is first written to the file. I will have a chance to reproduce this in the future. I could have dumped siad.exe when in state and then see what threads existed but really wanted to see if it completes by itself. This is not the only host I plan on running so will update if I ever encounter this again.
I cannot easily tell what contracts I have but my locked collateral is about 2/3 of my per TB per month price. Does that mean the contracts amount to 2/3 of a TB?
hackneyb
commented
Jan 3, 2018
•
|
Thanks .. I do believe the NTFS file may first be created in a way which causes some sort of zero-out or thick-provisioning action, but only when data is first written to the file. I will have a chance to reproduce this in the future. I could have dumped siad.exe when in state and then see what threads existed but really wanted to see if it completes by itself. This is not the only host I plan on running so will update if I ever encounter this again. I cannot easily tell what contracts I have but my locked collateral is about 2/3 of my per TB per month price. Does that mean the contracts amount to 2/3 of a TB? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tbenz9
closed this
Jan 5, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hackneyb
Jan 13, 2018
My issue came back. This was after I moved the entire storage to another disk (same drive letter, copied all offline so siad/sia-ui didn't know it was moved).
Symptoms similar as before:
siac hostdb -v output shows unresponsive siad while in state:
Recent Scans
11111111111111111111111101101
host.log - 376 lines like this, all same time, different IPs:
2018/01/12 23:04:14.447751 errors.go:151: connection error: error with 93.83.187.18:38492: incoming RPCReviseContract failed: revision iteration failed: iteration signal failed to send: write tcp 192.168.x.y:9982->93.83.187.18:38492: i/o timeout
2018/01/12 23:04:14.464750 errors.go:151: connection error: error with 173.77.154.207:64012: incoming RPCSettings failed: failed WriteSignedObject during RPCSettings: write tcp 192.168.x.y:9982->173.77.154.207:64012: i/o timeout
[.. snip..]
2018/01/12 23:04:14.716296 errors.go:151: connection error: error with 104.15.124.37:55994: incoming RPCSettings failed: failed WriteSignedObject during RPCSettings: write tcp 192.168.x.y:9982->104.15.124.37:55994: i/o timeout
2018/01/12 23:04:14.716296 errors.go:151: connection error: error with 108.248.249.252:58305: incoming RPCSettings failed: failed WriteSignedObject during RPCSettings: write tcp 192.168.x.y:9982->108.248.249.252:58305: i/o timeout
siad.exe console:
timeout on consensus errors (can't quote, didn't get to capture)
contractmanager.log - probably red herring errors, this logged later:
2018/01/12 05:39:26.016837 writeaheadlogsync.go:50: SEVERE: ERROR: unable to atomically copy the contract manager settings: rename C:\Users\Administrator\AppData\Roaming\Sia-UI\sia\host\contractmanager\contractmanager.json_temp C:\Users\Administrator\AppData\Roaming\Sia-UI\sia\host\contractmanager\contractmanager.json: Access is denied.
2018/01/12 23:28:06.544894 writeaheadlogsync.go:50: SEVERE: ERROR: unable to atomically copy the contract manager settings: rename C:\Users\Administrator\AppData\Roaming\Sia-UI\sia\host\contractmanager\contractmanager.json_temp C:\Users\Administrator\AppData\Roaming\Sia-UI\sia\host\contractmanager\contractmanager.json: Access is denied.
2018/01/12 23:28:06.589784 writeaheadlogsync.go:110: SEVERE: ERROR: could not rename temporary write-ahead-log in contract manager: rename C:\Users\Administrator\AppData\Roaming\Sia-UI\sia\host\contractmanager\contractmanager.wal_temp C:\Users\Administrator\AppData\Roaming\Sia-UI\sia\host\contractmanager\contractmanager.wal: Access is denied.
Overall I started running kernel tools in Windows and that froze the machine. On reboot things normal initially, then zeroing out the storage file came back. Currently in 3rd or 4th hour of writing 30MB chunks of data to the file, offset 2TB to 3.2TB in a 4TB file. Commands using siad hanging.
Wallet Hosting screen reduced active contract count by 14 after reboot. Most active siad thread stack:
ntoskrnl.exe!KeSynchronizeExecution+0x2246
ntoskrnl.exe!KeWaitForMultipleObjects+0x135e
ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9
ntoskrnl.exe!KeWaitForMutexObject+0x373
Ntfs.sys+0x14522
Ntfs.sys+0x16834
Ntfs.sys+0x19c5b
fltmgr.sys!FltIsCallbackDataDirty+0x2ee
fltmgr.sys+0x10c2
ntoskrnl.exe!IoSynchronousPageWrite+0x138
ntoskrnl.exe!CcZeroDataOnDisk+0x25d
ntoskrnl.exe!CcZeroDataOnDisk+0xcc
ntoskrnl.exe!CcZeroDataOnDisk+0x38
ntoskrnl.exe!CcZeroData+0x16a
Ntfs.sys+0x8c261
Ntfs.sys+0x16e99
Ntfs.sys+0x19c5b
fltmgr.sys+0x1101
ntoskrnl.exe!FsRtlInitializeOplock+0x32f
ntoskrnl.exe!NtWriteFile+0x694
ntoskrnl.exe!setjmpex+0x34a3
ntdll.dll!ZwWriteFile+0xa
KERNELBASE.dll!GetEnvironmentVariableW+0x699
siad.exe+0x5d86e
siad.exe+0x15519
hackneyb
commented
Jan 13, 2018
•
|
My issue came back. This was after I moved the entire storage to another disk (same drive letter, copied all offline so siad/sia-ui didn't know it was moved). Symptoms similar as before: siac hostdb -v output shows unresponsive siad while in state:
host.log - 376 lines like this, all same time, different IPs:
siad.exe console: contractmanager.log - probably red herring errors, this logged later:
Overall I started running kernel tools in Windows and that froze the machine. On reboot things normal initially, then zeroing out the storage file came back. Currently in 3rd or 4th hour of writing 30MB chunks of data to the file, offset 2TB to 3.2TB in a 4TB file. Commands using siad hanging. Wallet Hosting screen reduced active contract count by 14 after reboot. Most active siad thread stack:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gchilders
Feb 10, 2018
I'm having this issue now. I created a 7 TB NTFS file. When I tried to move data to it siad locked up for over 24 hours until the computer rebooted for updates. On the next write, it came back again.
Following up, I created a 50 GB file and moved data to it. Once it completed, I resized the file back to 7TB. After a couple days, however, the problem returned.
The 7TB file is on an NTFS partition on a SATA connected drive. During the writes, siad is unresponsive. Thus the time (days) spent writing count as downtime. If the computer reboots in the process, it starts over from the beginning. Largely due to this, my relatively new host is at 86% uptime and dropping.
gchilders
commented
Feb 10, 2018
•
|
I'm having this issue now. I created a 7 TB NTFS file. When I tried to move data to it siad locked up for over 24 hours until the computer rebooted for updates. On the next write, it came back again. Following up, I created a 50 GB file and moved data to it. Once it completed, I resized the file back to 7TB. After a couple days, however, the problem returned. The 7TB file is on an NTFS partition on a SATA connected drive. During the writes, siad is unresponsive. Thus the time (days) spent writing count as downtime. If the computer reboots in the process, it starts over from the beginning. Largely due to this, my relatively new host is at 86% uptime and dropping. |
hackneyb commentedDec 31, 2017
Bug Report
Severe IO on siahostdata.dat file in hosted folder. siahostdata.dat is larger than 4TB
IO has written 1.2TB in 5 hours into this file.
During this IO operation other requests to siad hang and host appears offline, although it is not offline.
What hangs:
siac host -v
external queries to host
Drive is 8TB external Seagate on USB 3.0 mounted as drive D:
siad reads and writes from the file at about 25MB/s simultaneously, with peaks of 30-40MB/s.
Overall drive performance - writes can occur at 50-80MB/s.
Stack Trace or error message
These errors appear in siad console after 1 hour of this activity:
goroutine 674092 [running]:
runtime/debug.Stack(0x446d5e, 0xc04e7c5eb8, 0xc04e7c5e80)
/usr/local/go/src/runtime/debug/stack.go:24 +0xae
runtime/debug.PrintStack()
/usr/local/go/src/runtime/debug/stack.go:16 +0x29
github.com/NebulousLabs/Sia/build.Severe(0xc04e7c5f18, 0x2, 0x2)
/home/luke/go/src/github.com/NebulousLabs/Sia/build/critical.go:32 +0x8c
github.com/NebulousLabs/Sia/api.cleanCloseHandler.func1.2(0xc052a21c20, 0xc054329a00, 0xc0588a4600)
/home/luke/go/src/github.com/NebulousLabs/Sia/api/routes.go:159 +0x18d
created by github.com/NebulousLabs/Sia/api.cleanCloseHandler.func1
/home/luke/go/src/github.com/NebulousLabs/Sia/api/routes.go:154 +0x1e3
Severe error: api call is taking more than 60 minutes to return: /consensus
Expected Behavior
siad should not write (or scan?) the entire siahostdata.dat file when there are no ongoing upload/download or file create operations; siad should not stop responding to requests
Environment
Any useful logs
Disk IO shows repeated writes to the file in this fashion (this has started at offset 0 and presumably will continue for another day or two if uninterrupted, until it reaches file size of +4TB):
8:06:40.7567275 PM siad.exe 3668 IRP_MJ_WRITE D:\SIAStorage01\siahostdata.dat SUCCESS Offset: 1,239,435,706,368, Length: 31,457,280, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
8:06:41.5951523 PM siad.exe 3668 IRP_MJ_WRITE D:\SIAStorage01\siahostdata.dat SUCCESS Offset: 1,239,469,260,800, Length: 31,457,280, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
8:06:44.7835456 PM siad.exe 3668 IRP_MJ_WRITE D:\SIAStorage01\siahostdata.dat SUCCESS Offset: 1,239,502,815,232, Length: 31,457,280, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
8:06:45.5828451 PM siad.exe 3668 IRP_MJ_WRITE D:\SIAStorage01\siahostdata.dat SUCCESS Offset: 1,239,536,369,664, Length: 31,457,280, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
8:06:46.3650888 PM siad.exe 3668 IRP_MJ_WRITE D:\SIAStorage01\siahostdata.dat SUCCESS Offset: 1,239,569,924,096, Length: 31,457,280, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
8:06:47.2798532 PM siad.exe 3668 IRP_MJ_WRITE D:\SIAStorage01\siahostdata.dat SUCCESS Offset: 1,239,603,478,528, Length: 31,457,280, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
8:06:50.6805290 PM siad.exe 3668 IRP_MJ_WRITE D:\SIAStorage01\siahostdata.dat SUCCESS Offset: 1,239,637,032,960, Length: 31,457,280, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
8:06:51.5031529 PM siad.exe 3668 IRP_MJ_WRITE D:\SIAStorage01\siahostdata.dat SUCCESS Offset: 1,239,670,587,392, Length: 31,457,280, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal