Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttling for (for sync and other) #310

Open
SuperTango opened this issue Dec 26, 2016 · 14 comments
Open

Throttling for (for sync and other) #310

SuperTango opened this issue Dec 26, 2016 · 14 comments

Comments

@SuperTango
Copy link

Can we get throttling control in the b2 command line tool (especially for sync)?

Thanks.

@ppolewicz
Copy link
Collaborator

Hi @SuperTango, thanks for your interest in B2 CLI. You can already impact speed to some extent by using --threads N parameter. If it is not sufficient for you, could you please describe your use case, so that we can better understand it?

@SuperTango
Copy link
Author

Thanks @ppolewicz. My use case is pretty simple, I only want to use a percentage of the available bandwidth. For example, my outbound pipe from the datacenter where my Linux machine is has a max throughput of about 200kBps, however I for backups, I want to ensure we only use a max of 75kBps.

@ppolewicz
Copy link
Collaborator

It is possible to implement such a limiter, but doing it well in our environment is not easy, as we support many threads. There is no good open-source implementation of a module which would do the heavy lifting, that I could find, and I have spent some time searching for it.

Have you tried using trickle?

@svonohr
Copy link

svonohr commented Dec 26, 2016

This is also a feature I was looking for a while ago. I've tried trickle back then and wasn't able to limit the bandwidth. I don't know what the problem was, so maybe there is a workaround. I've just moved the backup job to the middle of the night, so it's no priority for me.

@ppolewicz
Copy link
Collaborator

I think you can also use iptables to limit bandwidth per destination. This will not allow you to set different limits if you run two sync processes concurrently.

I have researched this further and I got interested in writing something like it, just because I found lots of questions about this and no answers other than "use urlgrabber" (which is a libcurl wrapper). But first I need to deal with another challenge in b2 cli, so I'll leave it unassigned.

I don't think it is worth to implement this just for b2 CLI, but it can be made abstract enough to become useful.

If someone is going to work on this, please post here so that we can coordinate.

@SuperTango
Copy link
Author

I think this is a pretty core feature for any backup (especially a sync) solution. Not flooding the network when performing a backup of potentially Terabytes of data is a requirement for me, not a "nice to have".

I haven't looked at the B2 command line tool codebase, but I've implemented a simple, yet effective throttling solution for another product I worked on a long time ago. It wasn't particularly difficult, but we were writing to sockets directly (not using a 3rd party lib). With many threads having each thread use 1/N (N = number of threads) amount of the bandwidth is good enough for this use case.

@ppolewicz
Copy link
Collaborator

Sync should be smart - if there is an upload limit and a download limit, it should maximize the usage of both resources to minimize the session time, right? If only the limits are added, then likely first the bottleneck will be on uploading and then the bottleneck will be on downloading.

Another issue is that the number of parallel uploads/downloads will change over time as new tasks are scheduled and executed. A simple 1/N would be quite inefficient when compared to a smart one.

If you would be willing to contribute some code to b2 CLI, it would be very welcome! We encourage outside contributors to perform changes on our codebase. Many such changes have been merged already. In order to make it easier to contribute, core developers of this project:

  • provide guidance (through the issue reporting system)
  • provide tool assisted code review (through the Pull Request system)
  • maintain a set of integration tests (run with a production cloud)
  • maintain a set of (well over a hundred) unit tests
  • automatically run unit tests on 14 versions of python (including osx, Jython and pypy)
  • format the code automatically using yapf
  • use static code analysis to find subtle/potential issues with maintainability
  • maintain other Continous Integration tools (coverage tracker)

@rwky
Copy link

rwky commented Jun 3, 2018

Trickle works for me, an example is trickle -s -u 200 b2 sync --threads 1 /src b2://dst

@devhen
Copy link

devhen commented Jan 18, 2020

Does b2 download_file_by_id use threads as well? I'm using it to get specific versions of files and it saturates my bandwidth and sometimes causes issues. I will try @rwky's trickle example. Are there any plans to implement --threads N on b2 download_file_by_id? Thanks!

@ppolewicz
Copy link
Collaborator

in the current version it uses threads to parallelize downloads (it's required by b2 integration checklist), however the number of threads is not changeable from the CLI yet.

The uploading/downloading machinery in b2sdk is being reworked as we speak and one of the many improvements will be the ability to change the number of upload and download threads, or maybe even provide native bandwidth limiters, as a bit more global settings, so that you can tweak it for download, upload, sync, copy and metadata operations (sync internally listing the contents of the bucket also consumes bandwidth). Bandwidth limiting is not planned in the initial scope of the rework, but the new structure of the code goes a long way towards enabling it.

@Addvilz
Copy link

Addvilz commented Jun 12, 2021

We really need to have this in b2 CLI directly.

I see some people are suggesting trickle here, however, note that trickle does NOT work with Python 3.x, only Python 2.x. You can not use trickle to limit bandwidth utilization of python3 scripts, it will transparently fail.

Edit: for those looking for some kind of solution, if you can throttle NIC of the host doing uploads, you can do so for the duration of the upload, however, this is only valid when you have nothing else running on the host. And, this is not really a solution to this issue per-se.

@programster
Copy link

Trickle works for me, an example is trickle -s -u 200 b2 sync --threads 1 /src b2://dst

That doesn't appear to have any impact for me. Is it possibly related to trickle has no effect on Python 3 scripts? I am using Ubuntu 20.04 with B2 version 3.2.1 and trickle version 1.07.

My command (in case I am doing it wrong):

trickle -v -s -u 1 -t 1 b2 sync \
  --delete \
  --threads 1 \
  $FOLDER_TO_BACKUP \
  b2://${B2_BUCKET_NAME}/test

@ppolewicz
Copy link
Collaborator

It looks like Ubuntu version of trickle from apt doesn't work very well. That bug report says you should just compile it from source, then it will work. Maybe do that, as opposed to implementing rate limiting in every single program you will ever use in a constrained environment.

Actually the best way to solve it permanently would be to bug Ubuntu to fix their trickle to work with python3.

@blewa
Copy link

blewa commented Oct 6, 2022

I've had a bit of luck with the Ubuntu packaged version of Trickle by setting the number of b2 threads to 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants