BEP-56: Data compression extension #125

Saiv46 · 2021-10-01T11:03:18Z

When Bittorrent was created, compression algorithms was slow and expensive, so user must have share uncompressed files or compress it manually to ZIP/RAR/etc.

Now people still share files in .zip archives (or even in .rar, yikes!), which takes additional space and needs to be uncompressed to somewhere. Instead, why shouldn't we compress torrent pieces on-the-fly?

We have fast compression algorithms like LZO, LZ4, Snappy and Zstandard, which allows torrenting uncompressed files directly, yet not sacrifing upload speed like if it was pre-compressed.

Discuss here: #124

arvidn

It seems to me, that a much simpler approach would be to specify a new extension message, say PIECE_ZSTD and assign it a message ID in the m dictionary of the BEP10 handshake.

This would tell the other peer that it may send responses to REQUEST messages with a compressed buffer. You could add more extension messages as well, like PIECE_LZ4 etc. It's then up to the uploader to pick whichever favourite compression algorithm it supports and send blocks compressed by it.

One important distinction is that the bytes included in the PIECE message are compressed, not whole pieces. Compressing pieces raises more questions that needs to be addressed both in the specification and in the implementation.

As for stream compression, you could have another extension message called, START_COMPRESSION_ZSTD. When received, every byte following this message is compressed with zstd. Also other messages for for other compression algorithms.

arvidn · 2024-02-21T11:11:34Z

beps/bep_0056.rst

+
+Compression algorithms must satisfy the following requirements:
+
+1. Decompression speed must not be lower than 500 MB/s.


this doesn't really mean anything unless you specify the hardware you run it on

Totally agree. I used data from Silesia compression corpus and forgot to include reference hardware.

arvidn · 2024-02-21T11:15:42Z

beps/bep_0056.rst

+decompressed when saving to disk or sending to peer, which not supports
+compression. To reduce piece re-compression, client should raise
+current algorithm's priority during handshake. This method has low
+efficiency with pieces smaller than 4 MB.


There are a lot of details omitted here. This needs to fit into the way blocks are requested and sent according to the protocol, see http://bittorrent.org/beps/bep_0003.html

Crucially, when you say the whole piece is compressed, do you mean that I have to request all blocks for that piece from the same peer, in order to decompress any part of it?

The offset and size that's specified in the request message, is the referring to the uncompressed piece (as it does in the current protocol) or does it refer to the compressed piece? The requestor would need to know the compressed size of each piece in that case, which there doesn't seem to be a mechanism to learn.

It seems far more practical to introduce a new PIECE message which indicates which compression algorithm it's using, leaving everything else the same. But that would require compressing each block individually, and maybe even smaller and unaligned parts of pieces. You don't have to request blocks at 16 kiB alignments.

Thank you, I should have introduced CPIECE message in the first place.

arvidn · 2024-02-21T11:16:23Z

beps/bep_0056.rst

+
+1. Decompression speed must not be lower than 500 MB/s.
+
+2. It must not produce a larger piece than the original by 1%.


so there must be an option for the sending side to send a block uncompressed, even if it was requested as compressed then, right?

That was a short requirement list for compression algorithm candidates of specification.

Removed requirement list altogether for now.

arvidn · 2024-02-21T11:20:14Z

beps/bep_0056.rst

+
+The compression algorithm is selected by taking the dictionary item with
+highest priority from intersection of items supported by both peers,
+if there isn't any suitable compression algorithm - compression will be disabled.


It seems like an unnecessary requirement that the same algorithm is used in both directions. It also seems like it would complicate things.

The fact that there's no message to ensure the clients agree on which algorithm is used seems risky. You don't specify how to resolve ambiguities. There may be 2 algorithms that are equally good options.

I have taken the assumption that the same algorithm is used in both directions to simplify the negotiation process. Once both clients shared dictionaries, no further messages are required. It's unlikely that two algorithms would have the same priority on two different clients, but I should have explained it more clearly.

I have taken the assumption that the same algorithm is used in both directions to simplify the negotiation process.

you're making it more complicated by introducing negotiation in the first place.

It's unlikely that two algorithms would have the same priority on two different clients

Unlikely things happen all the time, especially when you have ~100 million peers.

you're making it more complicated by introducing negotiation in the first place.

Yeah, but that's necessary due to clients disabling/implementing various algorithms.

It's unlikely that two algorithms would have the same priority on two different clients

Unlikely things happen all the time, especially when you have ~100 million peers.

Tried to resolve this by taking TLS approach, now in crequest client will enumerate what algorithms it's capable of, and then having other client to respond in cresponse with selected algorithms to send and receive.

arvidn · 2024-02-21T11:21:25Z

beps/bep_0056.rst

+specified otherwise.
+
+**NOTE**: Currently, only ``p_zstd`` and ``s_zstd`` algorithms
+are required for implementation.


What's the point of requiring this? Would it be a problem if the negotiations resulted in an empty set of algorithms and normal protocol was used?

There were concerns about different clients supporting non-overlapping sets of algorithms, so specification should require one that must be implemented universally. There wouldn't be a problem if negotiations resulted in an empty set, as the compression feature could be disabled by the user.

this is an extension to begin with. There will be clients not implementing it. I don't see a problem with that.

Currently, algorithm list must be reworked, as there can be additional options.
Until it's necessary, I removed the note.

arvidn · 2024-02-21T11:23:16Z

beps/bep_0056.rst

+clients should lower or raise algorithm's priority depending on expected
+factors that could impact compression efficiency and performance. This
+method can introduce performance issues if used on thousands of
+simultaneous connections.


How do you synchronize which byte to start stream compression at?
Sending messages is asynchronous, in both directions. By the time I receive this handshake, I may have already sent other messages. In fact, I'm quite likely to.

I think you would need a message indicating that everything past it is compressed, and you probably ought to include which compression algorithm you picked in this message as well.

Done with cresponse message

Alexander Ivanov and others added 2 commits October 1, 2021 19:01

Create bep_0056.rst

bb5ee2f

Update bep_0056.rst

99193e6

Saiv46 mentioned this pull request May 24, 2022

[Proposal] BEP56: Data compression extension #124

Open

Saiv46 added 2 commits February 21, 2024 18:14

Grammar and consistency fix

d4b3ab4

Rework

853c3fa

Saiv46 changed the title ~~BEP-56 draft (compression)~~ BEP-56: Data compression extension Feb 21, 2024

Saiv46 marked this pull request as ready for review February 21, 2024 10:19

Quickfix

3fc2849

Saiv46 mentioned this pull request Feb 21, 2024

[New feature] Compression support (BEP56) arvidn/libtorrent#7630

Open

arvidn reviewed Feb 21, 2024

View reviewed changes

Saiv46 added 7 commits February 22, 2024 16:44

Rolled back ID prefixing

6c17e1d

Renamed methods to modes

4896a5f

Rewriting protocol extension

f75109a

Describe new messages

7c5bc3c

Update email

dc66ae2

TODO: Make proper algorithm list

5615857

Add stream mode clarification

ed2add6

Saiv46 requested a review from arvidn February 23, 2024 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BEP-56: Data compression extension #125

BEP-56: Data compression extension #125

Saiv46 commented Oct 1, 2021 •

edited

Loading

arvidn left a comment

arvidn Feb 21, 2024

Saiv46 Feb 22, 2024

arvidn Feb 21, 2024

Saiv46 Feb 22, 2024

arvidn Feb 21, 2024

Saiv46 Feb 22, 2024

Saiv46 Feb 22, 2024

arvidn Feb 21, 2024

Saiv46 Feb 22, 2024

arvidn Feb 22, 2024

Saiv46 Feb 22, 2024

arvidn Feb 21, 2024

Saiv46 Feb 22, 2024

arvidn Feb 22, 2024

Saiv46 Feb 22, 2024

arvidn Feb 21, 2024

Saiv46 Feb 22, 2024


		Compression algorithms must satisfy the following requirements:

		1. Decompression speed must not be lower than 500 MB/s.


		1. Decompression speed must not be lower than 500 MB/s.

		2. It must not produce a larger piece than the original by 1%.

BEP-56: Data compression extension #125

Are you sure you want to change the base?

BEP-56: Data compression extension #125

Conversation

Saiv46 commented Oct 1, 2021 • edited Loading

arvidn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Saiv46 commented Oct 1, 2021 •

edited

Loading