-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate causes of and solutions to transaction broadcast failures in v0.6.3 #1195
Comments
One additional problem can be that the user closed the app before the broadcast is completed. That can happen already now but if we add a retry it can extend the time when the user must not close the app for several minutes. So we should add some information in the UI to deal with that. I would like to first investigate if the Tor issues are really external issues or (mixed) with issues introduced in the 0.6 release. I will try to setup old 0.5 or even 0.4.9 nodes and see if the Tor network is slow and less reliable there as well. |
@ManfredKarrer wrote in #1193 (comment) (which I'm bringing over to this issue, because the other issue is just about keeping track of incidences of the problem. This issue is about fixing it):
|
Hey @ManfredKarrer, responding to the above:
It’s Tuesday morning now. Per #1193, I’ve had 5 of these issues show up in the last 48 hours, and the oldest of my trades is one that was offered on Jan 12th / taken on Jan 13th. @keo-bisq has had 5 of these issues, too, but strangely, his oldest one is from all the way back on Dec 31st, and his newest one is from Jan 12th. I do not have a theory as to why I've started seeing this issue in trades that happened only after v0.6.3 was released and Keo has seen these issues in trades that happened before it. The situation doesn't make sense for 2 reasons:
In any case, the issues seem to be coming in at an increasing rate. As I type this, we've had 100 trades since Jan 12th (a little over 3 full days), and 5 of those trades have ended up in arbitration with this problem. Surely more are suffering from it already, but have just not had a dispute ticket opened yet. So it's at least 5/100 trades, and probably something more like 8/100. If we keep going at the current rate of ~30 trades per day, that'll mean another ~180 trades between now and Sunday (inclusive), which at the current error rate will mean something like another 15 cases on top of the existing 10. This means we should expect to do something like ~25 reimbursements as a result of this problem, assuming we ship a fix by the end of the weekend. Note that this is on top of the 29 reimbursements I'm currently in the middle of processing from the earlier timeout issues (see bisq-network/support#1 and its child issues). In any case, I’m in full firefighting mode here, between working through the earlier timeout reimbursements and now working through these new failed tx broadcast issues. I don’t think it makes sense for me to try to fix this problem myself and ship a new release on my own. That will just introduce more risk, and it will cause my arbitration / reimbursement efforts to queue up and get further delayed. At this point I think what makes the most sense is to do the following every time one of these issues comes in:
I will start working through steps 1–4 for my 5 issues as soon as possible, but I'm going to focus on finishing staging the timeout reimbursements first. @keo-bisq, @ManfredKarrer, if you have any comments or questions about the plan above, please let me know. |
Regarding the old reports at @keo-bisq: If maker and taker fee tx was happening before the 0.6.3. release the log should show different msg and it seems that the tx got broadcasted to our btc nodes but not further out. That can be bc of too low fees. Analyzing those log files should show more. What would be a great help if anyone can start analyzing the log files. We decreased the tx fee estimation. That might be an additional factor. We don't know for sure if the tx got broadcasted but the the min fee policies in full nodes ignores the tx because of too low fee. What I can do today:
@cbeams What do you think? Should we do that? |
Our fee ist atm 435 sat/byte (http://37.139.14.34:8080/getFees) |
The data on https://dedi.jochen-hoenicke.de/queue/#24h are very different from those on earn.com. |
Just checked mempoolminfee on my btc nodes and it is about 50 sat/bytes. So that cannot be the issue as we use about 400. I will ask the other node operators to post their values but assume it will not be much higher. |
@bisq-network/btcnode-operators, could you please report your e.g.:
And @ManfredKarrer, regarding your idea about restarting btc nodes to reset I would be very surprised if our current transaction fees were causing our transactions to get rejected in any case. I see transactions coming into the mempool all the time that are well below our fee rates. And furthermore, if this were the cause, then we should expect to see a much higher tx broadcast failure rate that we are actually seeing. Right now it's 1 out of every 15 or 20 trades that suffers from failed tx broadcasts. If our estimates were getting rejected from even entering the mempool because they're too low, we'd probably be seeing a majority of trades with failed tx broadcasts. |
@cbeams Yes agree. Just looked into one log and saw: |
@cbeams What do u think about bumping the tx fee? now we use maxblocks 10 we could go up to maxblocks 5. |
“size”: 51234, “size”: 51259, “size”: 51319, “size”: 51557, |
mike@bitcoin:~$ btc getmempoolinfo cat bitcoin.conf: mike@bitcoin:~$ btc getnetworkinfo Second node: mike@bitcoin:~$ btc getmempoolinfo cat bitcoin.conf: mike@bitcoin:~$ btc getnetworkinfo @ManfredKarrer |
@cbeams The case where the tx fee was 100 sat/bytes is probably caused due not connecting to the fee service. The default fee is 100 sat/byte in the code (if connection to fee service has not succeeded yet). I saw often that it takes quite a while to get the connection. |
Well, this fact alone could explain the failed tx broadcasts. Typical Bisq maker/taker transactions are between 225-270 bytes in size. At a 100 sat/byte feerate, that means these users are broadcasting transactions with ~ The hypothesis, then is as follows:
What we should do about it: I see a few options:
Option (1) is something we could do and ship a fix for ASAP. I'm going to trawl through the logs we have to see if I can find evidence of the hypothesized series of events playing out. |
Major flaw in the thinking above: The value of |
This means that, with I would say this invalidates the hypothesis above completely, but I did just see at least one report of a node with a Perhaps we deal with spikes across our federation of nodes where our |
"size": 54836, "size": 55063, "size": 54520, "size": 55204, |
I assume that 100 sat/byte tx was an exception. But in the logs the tx will show the tx fee. At least I have not seen any so far (also from the last weeks) with 100 sat/byte. There is also code for checking that the min. nr. of bitcoin peers is >=4 for make offer and take offer views (at time when the user clicks take offer in the offer book view, not at take offer confirm button -> should be added there as well as the user could lose connections in the meantime). |
Maybe anyone can check Tor mailing lists, etc. to see if there is anything relevant regarding our issues.... |
It could be that lowering @bisq-network/btcnode-operators' Could we also cache the transactions meant for broadcast, saving them for re-broadcast/debugging purposes? I'll try tinkering to satisfy my curiousity... I'd appreciate a file/class tip in that direction! Edit: the fee estimation spits out It'd be great to have a bisq branch for testnet. Is there such a thing? Tips... much appreciated :P |
BitcoinJ re-broadcasts the tx at startup as far I have seen. Not 100% sure if that is the case also if the tx has been committed to the wallet (as we do in the timeout). The broadcast happens in the Broadcaster class (including the timeout handling). Reason was that we did not heard back from the number of required nodes (usually 2) in time and that caused a timeout in the trade protocol (taker fee was broadcasted but as we waited too long to hear back the trade protocol timed out and the take-offer-attempt failed, leading to paid trade fee tx but no trade). Re external funding: Re Testnet: |
@MaximFL, please send me an email at chris@beams.io with the transaction id(s) that you believe should be reimbursed and a screenshot of them from your |
FYI, here is what looks like a textbook fee transaction that fell back to the 100 sat/byte default. From https://tradeblock.com/bitcoin/tx/0573815a72c9bd69947084d80ddc73d42cca3461f460c5dd9480d061c248df03: |
Most recent log regarding trade EJOLOw |
@ManfredKarrer wrote in #1193 (comment):
And shortly after posting the above, @ManfredKarrer created issue #1244 to implement a fix for this. I'll leave this investigation issue open for the time being, until we're sure that this fix is indeed going to take care of the problem. In the meantime, keep an eye on #1244 and upgrade to v0.6.4 right when it ships. |
Closing as complete with the changes in #1244. That is, we believe we have fixed the cause of these issues in Bisq v0.6.4. Everyone should update to that version as soon as possible, and certainly before engaging in further trades. If we find that this issue crops back up, we can re-open this issue to continue the investigation. Thanks to all who helped us track this down by posting their logs, etc! |
I got this message in Bisq but didn't reply before the dispute was closed: Instead of getting the deposit back I got a error-message and was told to open a github issue (#1261) - and this issue includes my log-file. |
@ManfredKarrer wrote in #1168 (comment):
We're now seeing seeing a significant number of issues coming into arbitration (~10 at time of writing) where maker and deposit transactions have not been broadcast. @keo-bisq and I are tracking these cases in #1193.
This sort of message is what we see in the logs:
And that log message is part of the change that was implemented in #1168, specifically in commits 1b7db08 and dc310d9.
The text was updated successfully, but these errors were encountered: