Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*.utf-8 fields in created torrent files are violate BEP standart #1274

Closed
amigdala999 opened this issue Sep 28, 2019 · 4 comments
Closed

*.utf-8 fields in created torrent files are violate BEP standart #1274

amigdala999 opened this issue Sep 28, 2019 · 4 comments

Comments

@amigdala999
Copy link

Hi,
There is an old hack in Bigly code that's originally is from Vuze and in its turn - from Azureus.
They all aren't using UTF-8 for filenames in generated torrent files (as well as comments and filepaths) for regular fields but adding additional non-standart field "*.utf-8" for this:

protected static final String TK_PATH_UTF8 = "path.utf-8";

file_map.put( TOTorrentImpl.TK_PATH_UTF8, utf8_path );

That's a violation of BEP 3 standard. There are no any *.utf-8 fields allowed and regular fields are expected to be in UTF-8 instead.

On practice this means that if you create a torrent file with such Bittorrent client and filename of the file that you seed was in Russian (for ex.) it will put 2 name/path tags in torrent file: name and name.utf-8 + path and path.utf-8.

Then other user with other bittorent client tries to open such torrent file - his client ignores *.utf-8 (coz it's not in BEP 3) fields and tries to decode regular path/name fields as UTF-8.
This results in broken characters in filenames of files that you try to download.

There are few bittorent clients that have added an exception case for Azureus's *.utf-8 fields long time ago (qBittorent for ex.) but most of others (KTorrent for ex.) aren't and still have issue with torrent files generated with Azureus/Vuze.
It seems (not tested but looking at your code it's so) this client reproduce this problem too.

Could you please fix this behavior in your client?

@TuxPaper
Copy link
Member

You have a good point about torrent creation, the main file name/path fields should be written in UTF-8. I'll check that out to make sure we are doing that. The torrent creation code is terribly old and many cobwebs are probably covering it.

You can ignore the rest of my comment if you don't want to read me ranting :P :)

utf-8 may be not be in the spec, but even uTorrent uses (and even prioritizes them, last I checked) over the non ".utf-8" ones. In addition, the spec doesn't disallow additional keys, such as ".utf-8" or the common "md5sum" one.

We also can't rely on BEP 3 fully, since BT-Inc has been known to change the spec and break backwards compatibility, specifically regarding UTF-8. The original spec never required filenames to be UTF-8, and a lot of early clients created torrent using the user's current locale. When BT-Inc changed the spec, it broke all those torrents, so most clients still have to deal with non-UTF8. Honestly I can't remember who came up with the "utf-8" field in the first place, but at the time it was a great way to ensure one could store the names in UTF-8 while being backwards compatible.

@anacrolix
Copy link

@parg what triggered this being completed? Is there a relevant commit?

@parg
Copy link
Contributor

parg commented Jan 14, 2024

Nope, cleaning house

@parg parg reopened this Jan 14, 2024
@parg
Copy link
Contributor

parg commented Jan 14, 2024

Adding random shit (such as .utf entries) is not in violation of the spec, any client that can't handle additional keys is broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants