-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trading: timed out waiting for "init" response #1561
Comments
Thank you for dog-fooding so hard. It is absolutely critical and you should get a medal when this is done. Now, there is a lot to discuss in this issue, but, an issue you have is that the server's broadcast timeout of 1 minute is very short.
Here's the timeline for match
Here's where I have questions. Why couldn't server find it? Doesn't the full node on server have the ability to do this? I thought so. Are we relying on it getting mined before even accepting it and acknowledging the action so it does not revoke the match? This would be a big fundamental deviation from the utxo coin handling. It also seems to have not been mined even with eth blocks going by.
Again, it was finally mined in 6659792. Although there was no need to revoke at 1 minute, we really need to get to the bottom of server's locating the ETH contracts. |
Ah I think what we're really seeing here is serious lock contention because of timeouts checking the eth contracts, just like the timeouts you've seen client-side (#1552). Note this was almost a minute late and it really does indicate the contract was found:
But I think it found it a good deal earlier, but it was hanging at the Lines 1473 to 1477 in fb76b59
Combined with the fact that the |
Continuing to look at this, I now think server is also having a hard time locating pending txns in under a minute (your broadcast timeout). I suspect @JoeGruffins increase the broadcast timeout back to closer to 20 minutes, at least 5 minutes, and see how it goes. When you consider that the tx was not mined for several blocks, but it was found before it was mined, it suggest that the tx broadcast was taking forever and it simply did not propagate promptly (to the server node or miners). It was close, but not under a minute. |
This comment was marked as resolved.
This comment was marked as resolved.
Actually that last one was fine for that client. They called the match inactive and did not init at all as taker. The maker this time failed for timeout. Will leave the logs here for this second failure I've seen on testnet. The broadcast timeout was not increased, but I will do that now... Leaving logs anyhow. Other client: Server: |
Yeah, increase btimeout please. I just merged #1541 because there was an error getting silenced that might be relevant here: bd5e52e#diff-6af7c56e40de4eb4dba203d67b1b07ccfc43ed88e99811b9697e9d754d01e21cR882 |
Oh BTW, I started tweaking Swapper to reduce lock contention in #1563 Might be worth a test for this issue (with a slightly more reasonable btimeout though) if you can reproduce it fairly regularly @JoeGruffins. |
Does broadcast timer above two minutes matter here? We are gated by |
For eth, don't we not error as long as the contract data looks good? I need to look at server more I guess, but I don't think finding the swap is even necessary to get things going. It should be super forgiving. edit: I see we do need to find the tx that started the swap. If the server can't find that, might as well throw in the towel joking. Hoping it's something else. The light clients do have trouble randomly though... We should see the coin not found warning in that case, and I don't think we've seen that. |
I'm comfortable upping the |
Whoops sorry I dropped the ball on this conversation.
I believe the 2 min In this case, AFAICT we discovered the issue was the txn not propagating. Not to the miners or the server (and I'm sure ethscan would not have seen it either). This is a clear indication of light client connectivity. It will resolve itself as it regains peers and rebroadcasts, steps that just take time and which we cannot influence.
Right, server does need to find the tx before bothering the taker. If the tx is never found on the network or on chain (never confirmed), the maker has just wasted at least 8 hrs of everyone's time... and they wouldn't even have to actually broadcast a txn to do it, just pretend. More than that, our current design is to have server send the participant the "txData" that it retrieves from its node, not from the init tx creator.
Yah, that's definitely something to consider. |
I really don't think this is it. Going to find the issue tomorrow. |
There may be multiple issues yet, but the timings described here in the logs files make it clear that miners and server couldn't see the transaction for almost a minute. Several blocks were mined without the transaction, therefore, miners did not have it. |
Have been testing a bit on testnet/simnet lately and not seeing this anymore. Maybe fixed. Will close if I can't hit it in the next few days. |
Have been seeing this a lot while testing. Not sure if eth related. Today I hit for the first time on testnet so decided to just go ahead and make an issue.
It ultimately leads to a confusing state for the client. For example:
If you refresh, it does show refunding for a few seconds... I guess until something happens, a tick?
The client did get an error from the server that matches were revoked. But what are we doing now? Waiting for refund? The other client seems sure that they are refunding:
But besides the UI uncertainty here, isn't it odd the init failed? Just one of four of the same trade was ok? This is on the same machine too, so probably not a communication issue.
This is on #1552 with the
confCheckTimeout
set to 10 seconds but I've see the same init response timeout error on master as well on simnet. Can also be seen in #1538 so maybe solved by #1541 ? If so there is still the wierd UI issue for the errored client.From logs it looks like the three failed matches were Taker, and the success was Maker. It looks like all three of the failed matches sent their inits at
02:28:26
and server got them all. Then server searched for a minute and failed them for timeout. Then it found them all the next second!? Seems suspicious. Is it possibles.matchMtx.Lock()
is being held somewhere preventingprocessInit
from continuing?Then the client doesn't know how to interpret the message:
[ERR] CORE[wss://127.0.0.1:17273/ws]: No handler found for response: {"type":2,"id":18,"payload":{"result":null,"error":{"code":25,"message":"match already revoked due to inaction"}}}
which I think is an old issue.Errored client log:
dexc.log
Other client log:
dexc.log
Server log:
dcrdex.log
The text was updated successfully, but these errors were encountered: