Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bencode should support [u8] lookups #60

Closed
the8472 opened this issue Dec 28, 2016 · 4 comments
Closed

Bencode should support [u8] lookups #60

the8472 opened this issue Dec 28, 2016 · 4 comments

Comments

@the8472
Copy link

the8472 commented Dec 28, 2016

Rust strings must be valid utf8. Bencoding is a binary format where keys are primarily binary data which just usually happen to be ascii or utf8.

In some cases they do not represent valid utf8 sequences. E.g. HTTP scrape responses contain binary infohashes as dictionary keys.

@GGist
Copy link
Owner

GGist commented Dec 28, 2016

Thanks for bringing this up.

This was a case of me misinterpreting the specification. I read that dictionary keys had to be strings, and then read that metainfo file dictionary keys must be UTF-8 and ended up conflating the two. At the time, I figured the specification was just loosely worded (as it tends to be), but it seems reasonable that bencode dictionary keys don't have to be UTF-8.

Luckily this should be an easy fix, I believe most of the dependency on the Bencode object is via BencodeConvert and so far, all of my usage of key lookups have been valid UTF-8. So I should just have to modify the actual Bencode object and then, to not break dependent libraries, either add methods to BencodeConvert to also accept [u8] lookups, or update the existing methods to be generic over both str and [u8].

I will probably push a fix by tomorrow.

@sijanec
Copy link

sijanec commented Mar 8, 2023

Some clients do not format name as UTF-8, in contrast to what standard BEP-0003 requires. I am in the process of rewriting a simple script for analysis of torrents from python to rust using your library and many torrents have non-UTF-8 characters in file names, so naturally this leads to an Err, because name and many other strings are parsed with lookup_and_convert_str.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseError(BencodeConvert(BencodeConvertError(WrongType { key: [110, 97, 109, 101], expected_type: "UTF-8 Bytes" }, State { next_error: None, backtrace: Some(   0: error_chain::make_backtrace

I know about rust only for about an hour or two, so I absolutely can't help you with a PR or a possible solution to this problem just yet (:

You can download a "test suite" of torrent files that I'm analysing (many have non-UTF-8 strings) -- cumulatively 50k .torrent files -- via rsync from the following two servers:

rsync -avi 83.212.126.242::travnik /tmp/dest # ~10k torrents
rsync-ssl -avi tranzistor.sijanec.eu::travnik /tmp/dest # ~40k torrents

By the way, does bip_metainfo support V2 torrents as well?

@the8472
Copy link
Author

the8472 commented Mar 8, 2023

This issue is about [u8] support for dictionary keys. For values you should already be able to get them by calling bytes() instead of str() in bip_bencode.

But if you want to handle non-utf8 filenames then yeah, bip_metainfo seems to do the wrong thing at the moment and you'll have to drop to a lower level, using bip_bencode or one of the serde-bencode plugins instead.

@sijanec
Copy link

sijanec commented Mar 10, 2023

Thank you for the reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants