Throttling data transfers #24

rzvoncek · 2023-07-25T14:17:16Z

Hello. In thelastpickle/cassadra-medusa we use the cloud storages a lot. Initially, we built our thing atop libcloud. As the time has passed, it turned out most of the cloud storages are S3-compatible (even GCS). So we we are basically left with S3 and Azure APIs that we need to support.

We stumbled across this S3 client and we like it enough to use it to cover our cloud storage interaction. But we're missing a few features.

Mostly, we would need the capability to throttle data transfers. Medusa is doing backups of a database and we cannot have it exhaust the network capacity.

We have some thoughts on how to tackle this, but none of them are good enough. That's why I'd like to reach out and see if you have any thoughts on how this could be achieved.

dizballanze · 2023-07-25T14:30:12Z

Hello. It's possible to use an async generator as a source of data. I think this should be enough to implement any throttling logic:

    async def gen(source):
        async for chunk in source:
            yield chunk
            await asyncio.sleep(1)  # change to a proper throttling logic here

    async with client.put("bucket/file", gen(source)) as resp:
        assert resp.status == HTTPStatus.OK

mosquito · 2023-07-25T14:31:14Z

Hi @rzvoncek the best option as I can imagine is use IP_TOS socket option. These are Linux-specific parameters, and the kernel will ensure that traffic passes through as expected.

IP_TOS socket option man

IP_TOS (since Linux 1.0)
              Set or receive the Type-Of-Service (TOS) field that is
              sent with every IP packet originating from this socket.
              It is used to prioritize packets on the network.  TOS is a
              byte.  There are some standard TOS flags defined:
              IPTOS_LOWDELAY to minimize delays for interactive traffic,
              IPTOS_THROUGHPUT to optimize throughput, IPTOS_RELIABILITY
              to optimize for reliability, IPTOS_MINCOST should be used
              for "filler data" where slow transmission doesn't matter.
              At most one of these TOS values can be specified.  Other
              bits are invalid and shall be cleared.  Linux sends
              IPTOS_LOWDELAY datagrams first by default, but the exact
              behavior depends on the configured queueing discipline.
              Some high-priority levels may require superuser privileges
              (the CAP_NET_ADMIN capability).

For using it you should build aiohttp.ClientSession with custom TCPConnectior which should set specific options to the created sockets. And just pass it as a parameter.

rzvoncek · 2023-07-26T08:25:19Z

Thank you both for the quick responses! This gives us something to work with.

rzvoncek · 2023-09-20T10:19:08Z

Hello, I'd just like to circle back and clean up after myself. Thanks for hte help once again.

mosquito added the question Further information is requested label Jul 25, 2023

rzvoncek closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throttling data transfers #24

Throttling data transfers #24

rzvoncek commented Jul 25, 2023

dizballanze commented Jul 25, 2023

mosquito commented Jul 25, 2023

rzvoncek commented Jul 26, 2023

rzvoncek commented Sep 20, 2023

Throttling data transfers #24

Throttling data transfers #24

Comments

rzvoncek commented Jul 25, 2023

dizballanze commented Jul 25, 2023

mosquito commented Jul 25, 2023

rzvoncek commented Jul 26, 2023

rzvoncek commented Sep 20, 2023