Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor send-message business logic in Connections #28

Closed
6 tasks
freimair opened this issue Apr 6, 2020 · 4 comments
Closed
6 tasks

Refactor send-message business logic in Connections #28

freimair opened this issue Apr 6, 2020 · 4 comments
Assignees
Labels
a:proposal bisq.wiki/Project_management#Proposal needs:triage bisq.wiki/Project_management#Triage was:rejected bisq.wiki/Project_management#Approval

Comments

@freimair
Copy link
Member

freimair commented Apr 6, 2020

This is a Bisq Network project. Please familiarize yourself with the project management process.

Description

The business logic of sending messages needs some love. During bisq-network/bisq#4047, it became apparent, that the logic might miss sending messages entirely. Main takeaways from the project:

  • long term: ground work that improves reliability.
  • short term: maybe get rid of spurious message loss (during trade or mediation)

Rationale

Why is it important?

  • Messages are submitted to the connection asynchronously, thus, chances are that messages do not get sent because threads are abandoned, killed, time out, or a connection is closed before all message in queue are sent
  • definitely explains why we see nasty walls of exceptions on app shutdown quite frequently
  • concrete example: removeOfferMessages on shutdown may or may not be sent, depending on the timeouts and therefore on the performance of the host, network load, message load of the bisq app, seed node load, ...
  • it can happen for more crucial messages (messages are messages are messages, there are no priorities built into them)
  • might explain why we see messages getting lost

IMO:

  • I consider this a high priority task
  • given my >1,5 years experience with the p2p part of Bisq
  • the p2p message handling needs cleanup and refactoring, technology is outdated, changing stuff is a minefield, there is synchronization everywhere which immediately causes deadlocks on the slightest change, attack counter measures are scattered throughout the code to make it almost impossible to understand how/if they work, yet alone understand, control and tweak them, copy-and-pasted spagetti code provides plenty of places for bugs to hide in
  • however, I cannot provide a concrete issue # that will be fixed by working on this

Why should it be done now? What will happen if we don't do it or delay doing it?

  • consider it as basic maintenance
  • thus, no, it does not have to be done now
  • delaying it will work as well

however,

  • we might just see a more robust network
  • less lost messages
  • eliminate unforeseen deadlock situations
  • confine timing issues to the Connection, where they can be (at least) handled somehow
  • track messages and see if they are actually sent

Criteria for delivery

  • have a test suit for message sending BL
  • more robust code

Measures of success

  • cleaner code
  • maybe catch a few bugs we are not aware of yet

Risks

  • as always, changing the P2P part of Bisq is highestest risk
  • this one only touches the message sending business logic, so the risk is somewhat confined. Yet, if nobody can send messages, the network is going to die as well.

Tasks

  • create test suit for message sending business logic (ie. on Connection level)
  • implement a proper message queue for messages to be sent
  • implement a proper connection shutdown process, move away from dropping anything instantly
  • gradually remove "external" message scheduling mechanics

Estimates

hard to say, as the project will only show its true face once we are knee-deep into it.

Task Amount [USD]
create test suit 1800,00
message queue 900,00
remove "external" scheduling 1200,00
testing 700,00
other 500,00
total 5100,00

Notes

@freimair freimair added a:proposal bisq.wiki/Project_management#Proposal needs:triage bisq.wiki/Project_management#Triage labels Apr 6, 2020
@chimp1984
Copy link

short term: maybe get rid of spurious message loss (during trade or mediation)

I doubt that it has a bigger impact on that. At shutdown of headless nodes there is/was no graceful shutdown, but they don't send crucial messages (trade, dispute). GUI clients do a graceful shutdown and the only case where it might be critical is when a crucial message was sent and the user immediately shuts down the app (or kills it hard). Even with a graceful shutdown there should be enough time to deliver the messages. The message queued up are usually not much (batching did not work as expected, and most of the time there is no batching).

@freimair
Copy link
Member Author

freimair commented Apr 30, 2020

Even with a graceful shutdown there should be enough time to deliver the messages.

actually, before bisq-network/bisq#4047, even with "graceful shutdown", tor has been terminated before all messages have been flushed out. No RemoveOfferMsg, no CloseConnectionMsg. And, there is no central queue and there could be severe consequences to that. Here is the scenario:

  1. A critical message might be "queued" in a UserThread.runAfter(>0, connection.sendMessage(.))
  2. Thus, the connection does not know about the message yet
  3. The business logic triggering the UserThread.runAfter assumes it has been sent (and depending on the message, may also memorizes that information)

Now, given, the client gets shut down before the message is sent, the business logic has no way of knowing that the message hasn't been sent (so no resend). Thus, we have a "lost" message.

The message queued up are usually not much

Give it enough time and trials and it will happen.

At shutdown of headless nodes there is/was no graceful shutdown

The issue we have been/are facing here is that the data store files got corrupted frequently. A graceful shutdown did help some. However, #25 and #29 will complement this very issue.

@chimp1984
Copy link

@ripcurlx @cbeams Can we close that project?

@cbeams cbeams added the was:rejected bisq.wiki/Project_management#Approval label Feb 8, 2021
@cbeams
Copy link
Member

cbeams commented Feb 8, 2021

Closing as rejected.

@cbeams cbeams closed this as completed Feb 8, 2021
Master Projects Board automation moved this from Backlog to Done Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:proposal bisq.wiki/Project_management#Proposal needs:triage bisq.wiki/Project_management#Triage was:rejected bisq.wiki/Project_management#Approval
Development

No branches or pull requests

3 participants