Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Very long lookup times can block the harvester #7788

Open
tonybeesknees opened this issue Jul 26, 2021 · 18 comments
Open

[BUG] Very long lookup times can block the harvester #7788

tonybeesknees opened this issue Jul 26, 2021 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@tonybeesknees
Copy link

proofs stop on any harvester that has network traffic like sending plots to a nas or even to another farmer
sending plots or files internally is fine
it is only when sending plots or files to outside over network that crashes proofs

last attempt at proofs just stop on the farmer when there is network traffic to or from that farmer

i have 3 farmers setup fully synced and all the same problem
all are latest version

so untill you have all plots built and in place on a farmer it does earns you nothing

i reported this weeks ago but you marked it as not a bug with out even testing it or doing nothing

chia will go no where with that attitude

@tonybeesknees tonybeesknees added the bug Something isn't working label Jul 26, 2021
@keliew
Copy link

keliew commented Jul 26, 2021

logs would help.

@berlef80
Copy link

I have the same issue, but it does not require other things to be going on. The proofs work fine but after a while just stop.
image
You can see the time on the bottom is 6:27pm and the proofs are stuck at 5:17pm. This has happened since 1.2.1.
Here is the debug file. Looks like it has a lot going on but not sure what!
debug.log

@tonybeesknees
Copy link
Author

There is nothing in the logs
It is like chia connection to internet is set as low priority
Or goes to be a background process or something
As even download a big file of the net stops proofs as well
Any network traffic in or out of the farm rig stops proofs until the traffic has stopped then proofs start working again

@tonybeesknees
Copy link
Author

Mine verus coin and staking
Even mine ether with gpu do not stop with network traffic in and out of the computer
But chia proofs and syncing everything but build a plot stop working

@tonybeesknees
Copy link
Author

So easy to reproduce
Open chia let it sync up
Then send a couple plots to another computer or nas drive
Proofs stop
If you are sending 10 plots that take long time then for that whole time you are not farming

@alex8900vbs
Copy link

Is frustrating, one time a day, minimum, harvester stop proofs with error 1006

@keliew
Copy link

keliew commented Jul 28, 2021

Here is the debug file. Looks like it has a lot going on but not sure what!
debug.log

You have a bunch of bad plots...maybe try to isolate the problem first.

Could also be a bad cable or hardware, which cause those badbits, etc. Or maybe an unofficial pooling protocol affecting it.

@berlef80
Copy link

berlef80 commented Jul 28, 2021 via email

@Nusstoertchen
Copy link

unfortunately i have the same problem. The proofs stops several times a day.

2021-07-28T18:33:31.831 farmer farmer_server : INFO Connection closed: 127.0.0.1, node id: xxxxxxxxxxxxxxxxxxxxxxx
2021-07-28T18:33:31.833 farmer chia.farmer.farmer : INFO peer disconnected None
2021-07-28T18:33:31.846 daemon main : INFO Websocket exception. Closing websocket with chia_harvester code = 1006 (connection closed abnormally [internal]), no reason Traceback (most recent call last):
File "asyncio\windows_events.py", line 457, in finish_recv
OSError: [WinError 64] Der angegebene Netzwerkname ist nicht mehr verfügbar

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "websockets\protocol.py", line 827, in transfer_data
File "websockets\protocol.py", line 895, in read_message
File "websockets\protocol.py", line 971, in read_data_frame
File "websockets\protocol.py", line 1047, in read_frame
File "websockets\framing.py", line 105, in read
File "asyncio\streams.py", line 723, in readexactly
File "asyncio\streams.py", line 517, in _wait_for_data
File "asyncio\proactor_events.py", line 280, in _loop_reading
File "asyncio\windows_events.py", line 812, in _poll
File "asyncio\windows_events.py", line 461, in finish_recv
ConnectionResetError: [WinError 64] Der angegebene Netzwerkname ist nicht mehr verfügbar

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\a\chia-blockchain\chia-blockchain\venv\Lib\site-packages\chia\daemon\server.py", line 172, in safe_handle
File "websockets\protocol.py", line 439, in aiter
File "websockets\protocol.py", line 509, in recv
File "websockets\protocol.py", line 803, in ensure_open
websockets.exceptions.ConnectionClosedError: code = 1006 (connection closed abnormally [internal]), no reason

@emlowe
Copy link
Contributor

emlowe commented Jul 28, 2021

Closing websocket with chia_harvester code = 1006 - likely means the harvester is crashing. The primary (but not sole) source of harvester crashes is bad plots.
Checking your plots with chia plots check is a first step.

@berlef80
Copy link

berlef80 commented Jul 28, 2021 via email

@berlef80
Copy link

OK i have done a bit more investigating and I think it has something to do with the start_harvester. When the proofs stop I close the GUI but the start_harvester won't end. I can't stop the process in task manager. The only way to get rid of it is to shut down the computer. Once I have gotten rid of it I can start the GUI again and it works fine for 1-6 hours and then freezes. Any other ideas of what is causing the proofs to freeze? It is extremely frustrating as I am only online farming about 50% of the time because of this.

@keliew
Copy link

keliew commented Jul 30, 2021

Turn on DEBUG log, and try to catch that part where it starts to freeze/crash.

@alex8900vbs
Copy link

I resolve 1006 issue with a plots check , 17 of 1500 plots are bad , i delete it , for now, after 24 hour no problem

@berlef80
Copy link

berlef80 commented Jul 30, 2021 via email

@tonybeesknees
Copy link
Author

I have found the problem
There is no time limit on plot lookups
So if you have plots on a external dive like a nas box and send files or plots over your network the plot lookups can run for 40000 seconds and more
That stops the whole proofs and syncing

Plot lookup should have a time limit of say 45 seconds

Same happens if you shut the nas down say to put biger drives in
Chia stops because it can not do the plot lookups

@berlef80
Copy link

I have found the problem
There is no time limit on plot lookups
So if you have plots on a external dive like a nas box and send files or plots over your network the plot lookups can run for 40000 seconds and more
That stops the whole proofs and syncing

Plot lookup should have a time limit of say 45 seconds

Same happens if you shut the nas down say to put biger drives in
Chia stops because it can not do the plot lookups

So how do I go about changing that? Is there any way to do that?

I am using HDD drives on a usb hub

@emlowe emlowe self-assigned this Aug 23, 2021
@emlowe emlowe changed the title [BUG] proofs stop working [BUG] Very long lookup times can block the harvester Aug 23, 2021
@emlowe
Copy link
Contributor

emlowe commented Aug 23, 2021

The harvester is single-threaded, so any high delay in the plot lookup process will certainly cause problems. And there is no timeout at this time.
I was mistaken here - the harvester does do multi-threaded lookups - however, I believe it waits for all the lookups to finish before moving on, and there is no timeout on each lookup. Therefore any single lookup thread can stall the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants