Reduce bandwidth-requirement on Bisq app startup #25

freimair · 2020-03-05T17:07:13Z

During startup, Bisq is required to send >4MB of data to seednodes in order to get its initial data. This is an issue because

these requests tend to timeout on slow internet connections and leave the users stranded.
the protocol as is does not scale and thus, startup will eventually become a much bigger issue

The primary goal of this project is to reduce the amount of data to be sent on startup.

Why/why now?

the solution as is does not scale
- the initial request grows on average about 100kB each release given growth efforts are NOT successful (1MB of additional data to be sent on startup in just 5 months, in other words: +50% of data)
- Internet speeds might not keep up with that
the more successful growth efforts are, the worse it gets
fixes Startup problem on slow/degraded networks (the "3/4" problem) bisq#2547
fixes Unhandled unrecoverable program state following failed handshake with seed nodes bisq#2549
fixes Application won't progress past Step 3/4 Hidden Service published bisq#3763
might lessen Seednode CLOSE_REQUESTED_BY_PEER + Timeout == 90s delay in startup bisq#3797
might help limit RAM and ROM usage/requirements
EDIT: might lessen disk io and CPU load as well, therefore fixes High system load on Arch linux bisq#4003

I frequently get 100% CPU and I just now found that Bisq writes 2GB of data to disk in only 2 hours. I suspect the cause to be the database writer. Whenever a new object comes in, Bisq serializes everything (all 100000+ objects) and writes them to disk as a blob (not amending the file already there). This causes massive IO and high CPU load while serializing which does not scale. Having smaller buckets might take the edge of short-term.

Details: Problem statement, Numbers, Proposal, ...

Click to unfold

Problem statement

On startup, a Bisq application first requests up-to-date network data from two seednodes. Once data comes in, the Bisq app jumps from the loading screen to the trading UI. However, if no data arrives, Bisq stays at the loading screen forever.

There are two main reasons why this happens:

internet uplink is too slow and hits a seednode's connection timeout during request
the initial data request is huge. It by the time of writing exceeds 4MB and is bound to grow further

Numbers

The numbers below can be transformed to actual request size since each object is represented by a 20 Byte key in the initial outgoing data request basically saying "I already have that, please do not send it".

Data table

Version	Release date	SignedWitness	AccountAgeWitness	TradeStatistics2	total	others	total diff
v0.9.1	2018-12-13	1057	21132	19490	41802	123
v0.9.2	2019-01-08	1057	22056	21384	44620	123	2818
v0.9.5	2019-03-06	1057	24593	25212	50985	123	6365
v1.0.1	2019-04-16	1057	26550	27249	54979	123	3994
v1.1.1	2019-05-06	1057	27360	28585	57125	123	2146
v1.1.2	2019-06-04	1057	29437	30558	61196	144	4071
v1.1.3	2019-07-16	1057	32172	34344	67753	180	6557
v1.1.5	2019-08-08	1057	33664	36248	71149	180	3396
v1.1.7	2019-09-23	1057	36493	40273	77938	115	6789
v1.2.2	2019-11-01	1057	38665	42657	82494	115	4556
v1.2.3	2019-11-07	1171	39415	43009	83710	115	1216
v1.2.4	2019-12-05	1441	41114	45475	88145	115	4435
v1.2.5	2020-01-09	1693	43102	48049	92959	115	4814
v1.2.7	2020-02-13	1920	45204	51222	98461	115	5502
live	2020-03-06	2123	46989	54322	103997	563	5536

Proposed Solution

By adding the info "I am Bisq v1.2.1" to the list of known objects, we know what objects the client has - namely, objects shipped with the data stores of v1.2.1.

bin up the data
create a "special" key for addressing bins
tie those bins to the application version
create a new bin on every release holding only new data
if someday, the bins grow too big, we can split them even further since special keys are only identifiers

Advantages

reduce the number of historical keys to be send to O(1) (right now, it is O(n), @stejbac's solution is probably O(ln(n))(?))
no new fields needed in the protocol
robust? if a requestee does not know the "special" key, it just sends all the data
much simpler and therefore, easier to maintain compared to @stejbac's approach
maybe someday we are forced to not ship all historical appendOnly data to keep the memory footprint reasonable
- there is approx. +300kb per release now (source is @ripcurlx)
- if we reach our goal of 10x everything, we might get +3MB per release
- beyond, 10x that and we are at +30MB per release and that is starting to become an issue
- the Bitcoin blockchain has light clients as well
if we succeed in binning the data, we can move on to lazy-loading bins, thus, limiting RAM usage
- approx. 6000 new objects since v1.2.7 with a release coming up (check with @ripculx?)
- in other words plus 6k * 20Bytes = 120kB for the next release *2 on app startup)
- given successful growth campaigns, this will worsen the issue fast
  - if we reach our goal of 10x everything, we might face +1.2MB * 2 for the initial request per release
- lower RAM requirements might pave the way to moving Bisq to resource-restricted devices like smart phones or single-board computers

Disadvantages

introduce a dependency on "time", ie. the bisq version
given growth efforts succeed, we might have to release more often (or move to another method of binning the data)

Optional followup projects

lazy-load bins to keep RAM usage in check, thus, address
do not ship oldest trade data
[WIP] Use keyset delta algorithm to speed up GetData requests bisq#3896

Risks

as always with p2p network stuff, it might break everything (not if we do our jobs properly)
if we fail to address the issue, the network might suffer (because the current solution does not scale)

Alternative approaches

[WIP] Use keyset delta algorithm to speed up GetData requests bisq#3896
- needs protocol change
- is most likely overburdened to handle today's load alone
- cuts down to O(ln(n)) (?) vs. O(1) for historical data
- can complement this approach for data produced in between releases
- future work

Tasks

feasibility study
proof-of-concept implementation with benchmarks
make implementation production ready
thorough testing
upgrade seed nodes before releasing the feature
release the feature

Criteria for Delivery

benchmark data
test reports for application startup on all major OSs
upgraded seednodes
release

Estimates

Task	Amount [USD]
feasibility study	600,00
proof-of-concept impl	2400,00
production-ready	2100,00
testing	700,00
other	500,00
total	6300,00

freimair · 2020-03-06T09:23:42Z

I just compiled in some more data

cbeams · 2020-04-01T09:20:39Z

Regarding the following items from the description above:

bin up the data

create a "special" key for addressing bins

tie those bins to the application version

create a new bin on every release holding only new data

Do you intend here to check these binary blobs into the main Bisq repository, or something else? I would really like to avoid adding more binary data to the repository (as we're already doing with all the stuff in p2p/src/main/resources).

cbeams · 2020-04-01T09:28:32Z

If checking the blobs in is the intended solution, @freimair, I'd like us to look into doing this properly with Git LFS instead, and at the same time migrating the p2p/src/main/resources files there, too. GitHub has a free tier we might want to start with. I ran some basic numbers and I think we could get away with it, but it depends on how many people are cloning the repository every month (because the pricing is metered on bandwidth used). We could also potentially run our own LFS server, but it would probably be best to go with the free GitHub service until we see we're running it out.

See:

/cc @wiz as running our own LFS server would be ops territory. Just FYI at this point.

cbeams · 2020-04-01T09:31:40Z

Also, from the introduction:

During startup, Bisq is required to send >4MB of data to seednodes in order to get its initial data.

I'm unfamiliar with this problem, and reading this doesn't help me understand what's really going on. Why would a Bisq node need to send so much data to its seednode(s)? I could understand that it might need to receive quite a bit of data, but I'm clearly missing something. I read through the rest of the description a couple times and I don't think this is ever made clear. ELI5, please :)

freimair · 2020-04-01T11:58:29Z

Why would a Bisq node need to send so much data to its seednode(s)?

sorry, I took that as common knowledge, because that is how Bisq always worked
let the ELI5 commence:
1. on startup, the Bisq app asks the seed node for a "distributed-database update"
2. In order to not burden the seednode to send all the data (> 12MB), Bisq tells the seednode which objects it already has (ie. sends data to the seednode).
3. The seed node then only sends the data the bisq app does not have already.
The trouble comes with success: We now have more than 100k objects in the "distributed database" which makes bisq send 100k "keys" to the seednode (100k * 20byte hash = 2MByte = substantial, and rising).
And because that is not enough: for redundancy purposes, the Bisq app asks two seednodes for data
given a "bad" internet connection, Bisq simply fails to start
- ie. if net upstream is < 35kB/s = 280kb/s (= 4MB/120 second connection timeout)
- does not seem like a lot, but there are bug reports (labeled critical bug) out there and I encountered it myself while not at home
- Tor is not at fault: p50 of tor speed is 28Mb/s, however, if you catch a bad circuit, it will.
- we need more bandwidth as time goes on (because the database grows -> the number of objects grows -> the request size grows)
- if we succeed with bisq, the required bandwidth will outgrow infrastructure development

Do you intend here to check these binary blobs into the main Bisq repository, or something else? I would really like to avoid adding more binary data to the repository (as we're already doing with all the stuff in p2p/src/main/resources).

yes, I intend to check these binary blobs into the main Bisq repository. It is exactly about the stuff in p2p/src/main/resources which is a snapshot of the "distributed-database" we ship with each release.

Atm, there is only one blob that gets bigger and bigger. Plus it replaces the old one, so repo size grows with size(t) = size(t-1)+size(newData) per release. (actually, it is several files for different message types, but overall, it is one blob of data)
after this project is done, a new blob will be added for every release with size(t) = size(newData), the "old" blobs are left untouched and are used as they are (historical data does not change)
doing it that way is a very minimal change to the current release processes and we can focus on fixing the real issue quickly

I'd like us to look into doing this properly with Git LFS instead

I totally agree that we have to move away from committing binary data to the repo, but
- using [insert your favorite storage technology here] does not collide with this project
- can be done later
- should be done sooner than later
- will look into Git LFS as a followup-project

All in all, this project aims for making small steps towards a more reliable service. Rethinking the storage synchronization and locations is a whole other can of worms.

Btw. just checked. We have 110k objects now, at the time of project creation it has been 104k -> approx. +5% in 25 days.

The large binary objects in p2p/src/main/resources/ are updated on every Bisq release with the latest network data to avoid the need for new Bisq clients to download all of this information from the network, which would easily overload seed nodes and generally bog down the client. This approach works well enough for its purposes, but comes with the siginficant downside of storing all of this binary data in Git history forever. The current version of these binary objects total about 65M, and they grow with every release. In aggregate, this has caused the total size of the repository to grow to 360M, making it cumbersome to clone over a low-bandwith connection, and slowing down various local Git operations. To avoid further exacerbating this problem, this commit sets these files up to be tracked via Git LFS. There's nothing we can do about the 360M of files that already exist in history, but we can ensure it doesn't grow in this unchecked way going forward. For an understanding of how Git LFS works, see the reference material at [1], and see also the sample project and README at [2]. We are using GitHub's built-in LFS service here, and it's important to understand that there are storage and bandwidth limits in place. We have 1G total storage and 1G per month of bandwidth on the free tier. If we exceed this, we will need to purchase a "data pack" from GitHub, which will get us to 50G storage and bandwith. These are reasonably priced and not the end of the world if it becomes necessary. In an attempt to avoid this, however, the Travis CI build configuration has been updated to cache Git LFS files, such that they are not re-downloaded on every CI build, as this would very quickly cause us to exceed the free tier bandwith limit (see [3] and [4]). With that out of the way, the variable determining whether we exceed the monthly limit is how many clones we have every month. Only the latest version of LFS-tracked files are downloaded on a given clone, and at the current 65M that means we have about 15 clones per month. This is not very many, and will almost certainly mean that we exceed the limit. Our options at that point are to buy a data pack or to run our own LFS server. We would almost certainly do the former to start. Tracking these files via LFS means that developers will need to have Git LFS installed in order to properly synchronize the files. If a developer does not have LFS installed, cloning will complete successfully and the build would complete successfully, but the app would fail when trying to actually load the p2p data store files. For this reason, the build has been updated to proactively check that the p2p data store files have been properly synchronized via LFS, and if not, the build fails with a helpful error message. The docs/build.md instructions have also been updated accordingly. It is important that we make this change now, not only to avoid growing the repository in the way described above as we have been doing now for many releases, but also because we are now considering adding yet more binary objects to the repository, as proposed at bisq-network/projects#25. [1]: https://git-lfs.github.com [2]: https://github.com/cbeams/lfs-test [3]: https://docs-staging.travis-ci.com/user/customizing-the-build/#git-lfs [4]: travis-ci/travis-ci#8787 (comment)

cbeams · 2020-04-02T19:24:50Z

The proposal looks well-formed, so I've removed the needs:triage label and added needs:approval per the process.

I am simply not well-informed enough about the details and alternatives to give a meaningful thumbs-up on approving this, but mine is just one voice. Like any other proposal, we should be looking for a broader consensus of interested and informed parties. If you are one of these people (@stejbac?), please provide feedback. The approach here looks pragmatic enough, but it would be good to see other informed opinions.

From a budgeting perspective, it appears to me this is 100% dev team budget, so @ripcurlx, I'll leave it to you to weigh in on.

cbeams · 2020-04-02T19:26:08Z

And regarding my comments about Git LFS above, see bisq-network/bisq#4114, which will be treated separately from this project.

wiz · 2020-04-03T04:42:33Z

This is a critical issue that reproduces on slow network connections often now

ripcurlx · 2020-04-03T13:39:01Z

From a budgeting perspective, it appears to me this is 100% dev team budget, so @ripcurlx, I'll leave it to you to weigh in on.

For me this is a critical issue atm for some of our users, but as mentioned the group of people affected by this is growing every day. So from my side it would be a 👍 to start working on this project.

cbeams · 2020-04-06T09:13:07Z

@ripcurlx, I'll add the has:budget label, then.

It would be great to see more engagement on approval, but even though we've gotten only a few comments here, it sounds like there's consensus we should go head. I'll add the has:approval label accordingly.

cbeams · 2020-04-06T09:14:08Z

@freimair, please move this to In Progress as and when appropriate.

The large binary objects in p2p/src/main/resources/ are updated on every Bisq release with the latest network data to avoid the need for new Bisq clients to download all of this information from the network, which would easily overload seed nodes and generally bog down the client. This approach works well enough for its purposes, but comes with the significant downside of storing all of this binary data in Git history forever. The current version of these binary objects total about 65M, and they grow with every release. In aggregate, this has caused the total size of the repository to grow to 360M, making it cumbersome to clone over a low-bandwith connection, and slowing down various local Git operations. To avoid further exacerbating this problem, this commit sets these files up to be tracked via Git LFS. There's nothing we can do about the 360M of files that already exist in history, but we can ensure it doesn't grow in this unchecked way going forward. For an understanding of how Git LFS works, see the reference material at [1], and see also the sample project and README at [2]. The following command was used to track the files: $ git lfs track "p2p/src/main/resources/*BTC_MAINNET" Tracking "p2p/src/main/resources/AccountAgeWitnessStore_BTC_MAINNET" Tracking "p2p/src/main/resources/BlindVoteStore_BTC_MAINNET" Tracking "p2p/src/main/resources/DaoStateStore_BTC_MAINNET" Tracking "p2p/src/main/resources/ProposalStore_BTC_MAINNET" Tracking "p2p/src/main/resources/SignedWitnessStore_BTC_MAINNET" Tracking "p2p/src/main/resources/TradeStatistics2Store_BTC_MAINNET" We are using GitHub's built-in LFS service here, and it's important to understand that there are storage and bandwidth limits there. We have 1G total storage and 1G per month of bandwidth on the free tier. We will certainly exceed this, and so must purchase at least one "data pack" from GitHub, possibly two. One gets us to 50G storage and bandwith. In an attempt to avoid unnecessary LFS bandwidth usage, this commit also updates the Travis CI build configuration to cache Git LFS files, such that they are not re-downloaded on every CI build (see [3] and [4] below). With that out of the way, the variable determining whether we exceed the monthly limit is how many clones we have every month, and there are many, though it's not clear how many are are Travis CI and how many are users / developers. Tracking these files via LFS means that developers will need to have Git LFS installed in order to properly synchronize the files. If a developer does not have LFS installed, cloning will complete successfully and the build would complete successfully, but the app would fail when trying to actually load the p2p data store files. For this reason, the build has been updated to proactively check that the p2p data store files have been properly synchronized via LFS, and if not, the build fails with a helpful error message. The docs/build.md instructions have also been updated accordingly. It is important that we make this change now, not only to avoid growing the repository in the way described above as we have been doing now for many releases, but also because we are now considering adding yet more binary objects to the repository, as proposed at bisq-network/projects#25. [1]: https://git-lfs.github.com [2]: https://github.com/cbeams/lfs-test [3]: https://docs-staging.travis-ci.com/user/customizing-the-build/#git-lfs [4]: travis-ci/travis-ci#8787 (comment)

freimair · 2020-06-18T09:21:24Z

the implementation is currently being prepared to be tested in the production network. @sqrrm will upgrade his explorer-seednode to run the new code (that would be v1.3.5 + the changes of this project) so that a few devs can use it productively and see if anything bad shows up. The plan is to do so for one release cycle. If nothing bad shows up, we will proceed with the rather complex upgrade process.

ripcurlx · 2020-07-27T09:38:03Z

the implementation is currently being prepared to be tested in the production network. @sqrrm will upgrade his explorer-seednode to run the new code (that would be v1.3.5 + the changes of this project) so that a few devs can use it productively and see if anything bad shows up. The plan is to do so for one release cycle. If nothing bad shows up, we will proceed with the rather complex upgrade process.

Is there any update on this?

freimair · 2020-10-20T13:38:07Z

the project has been completed by bisq-network/bisq#4586

cbeams assigned freimair Mar 6, 2020

cbeams added a:proposal bisq.wiki/Project_management#Proposal needs:triage bisq.wiki/Project_management#Triage to:Improve Reliability labels Mar 6, 2020

cbeams added this to Backlog in Master Projects Board Mar 6, 2020

stejbac mentioned this issue Mar 11, 2020

For Cycle 11 bisq-network/compensation#509

Closed

freimair mentioned this issue Mar 15, 2020

High system load on Arch linux bisq-network/bisq#4003

Open

cbeams mentioned this issue Apr 2, 2020

Track p2p data store files using Git LFS bisq-network/bisq#4114

Merged

cbeams added needs:approval bisq.wiki/Project_management#Approval and removed needs:triage bisq.wiki/Project_management#Triage labels Apr 2, 2020

stejbac mentioned this issue Apr 4, 2020

[WIP] Use keyset delta algorithm to speed up GetData requests bisq-network/bisq#3896

Closed

cbeams added has:approval bisq.wiki/Project_management#Approval has:budget bisq.wiki/Project_management#Budgeting and removed needs:approval bisq.wiki/Project_management#Approval labels Apr 6, 2020

cbeams removed the a:proposal bisq.wiki/Project_management#Proposal label Apr 6, 2020

freimair moved this from Backlog to In progress in Master Projects Board Apr 6, 2020

freimair mentioned this issue Apr 7, 2020

Migrate datastores to real database #29

Closed

7 tasks

This was referenced Apr 30, 2020

Refactor send-message business logic in Connections #28

Closed

Reduce initial request size bisq-network/bisq#4233

Closed

This was referenced May 15, 2020

Limit system ram to 4GB bisq-network/bisq#4048

Merged

For Cycle 13 bisq-network/compensation#549

Closed

freimair mentioned this issue Jun 19, 2020

For Cycle 14 bisq-network/compensation#596

Closed

freimair mentioned this issue Jul 16, 2020

For Cycle 15 bisq-network/compensation#611

Closed

ripcurlx added this to Projects in Priorities Jul 29, 2020

freimair mentioned this issue Sep 3, 2020

For Cycle 17 bisq-network/compensation#655

Closed

freimair mentioned this issue Sep 10, 2020

Monitor: DAO State Head available is faulty bisq-network/bisq#4166

Closed

freimair closed this as completed Oct 20, 2020

Master Projects Board automation moved this from In progress to Done Oct 20, 2020

freimair added the was:delivered bisq.wiki/Project_management#Closing_as_delivered label Oct 20, 2020

freimair mentioned this issue Oct 20, 2020

For Cycle 18 bisq-network/compensation#685

Closed

ripcurlx removed this from Priorities in Priorities Oct 22, 2020

freimair mentioned this issue Nov 10, 2020

For Cycle 20 bisq-network/compensation#709

Closed

freimair mentioned this issue Dec 3, 2020

For Cycle 20 bisq-network/compensation#734

Closed

freimair mentioned this issue Jan 21, 2021

For Cycle 21 bisq-network/compensation#762

Closed

freimair mentioned this issue Mar 3, 2021

For Cycle 22 bisq-network/compensation#802

Closed

freimair mentioned this issue Apr 4, 2021

For Cycle 23 bisq-network/compensation#824

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce bandwidth-requirement on Bisq app startup #25

Reduce bandwidth-requirement on Bisq app startup #25

freimair commented Mar 5, 2020 •

edited

Problem statement

Numbers

Proposed Solution

Advantages

Disadvantages

Optional followup projects

freimair commented Mar 6, 2020

cbeams commented Apr 1, 2020

cbeams commented Apr 1, 2020

cbeams commented Apr 1, 2020

freimair commented Apr 1, 2020 •

edited

cbeams commented Apr 2, 2020

cbeams commented Apr 2, 2020 •

edited

wiz commented Apr 3, 2020

ripcurlx commented Apr 3, 2020

cbeams commented Apr 6, 2020

cbeams commented Apr 6, 2020

freimair commented Jun 18, 2020

ripcurlx commented Jul 27, 2020

freimair commented Oct 20, 2020

Reduce bandwidth-requirement on Bisq app startup #25

Reduce bandwidth-requirement on Bisq app startup #25

Comments

freimair commented Mar 5, 2020 • edited

Why/why now?

Details: Problem statement, Numbers, Proposal, ...

Problem statement

Numbers

Proposed Solution

Advantages

Disadvantages

Optional followup projects

Risks

Alternative approaches

Tasks

Criteria for Delivery

Estimates

freimair commented Mar 6, 2020

cbeams commented Apr 1, 2020

cbeams commented Apr 1, 2020

cbeams commented Apr 1, 2020

freimair commented Apr 1, 2020 • edited

cbeams commented Apr 2, 2020

cbeams commented Apr 2, 2020 • edited

wiz commented Apr 3, 2020

ripcurlx commented Apr 3, 2020

cbeams commented Apr 6, 2020

cbeams commented Apr 6, 2020

freimair commented Jun 18, 2020

ripcurlx commented Jul 27, 2020

freimair commented Oct 20, 2020

freimair commented Mar 5, 2020 •

edited

freimair commented Apr 1, 2020 •

edited

cbeams commented Apr 2, 2020 •

edited