-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Possible Bug] imdl fails to deserialize torrents with .utf-8 key variants #534
Comments
Thanks for the report! I'd be fine if someone created a PR supporting this, although it might, in practice, be quite a messy PR. |
I think maybe the best way to support this would be to have a |
The torrent dump command outputs them normally. But show and all others fail. I think dump command uses plain Bencode. It can be fixed by putting the values of .utf-8 fields into the normal ones. In torf something simple like this did it for me: #!/usr/bin/env python3
import sys
from torf import Torrent
def main(argv=None):
t = Torrent().read("my_torrent.torrent")
for num, my_file in enumerate(t.metainfo["info"]["files"]):
t.metainfo["info"]["files"][num]["path"] = my_file["path.utf-8"]
my_file.pop("path.utf-8")
t.metainfo["info"]["name"] = t.metainfo["info"]["name.utf-8"]
t.metainfo["info"].pop("name.utf-8")
t.write("my_torrent_fixed.torrent")
if __name__ == "__main__":
main(sys.argv) |
Just removing doesn't help by the way. Because of the fact that in this case the encoding in name field can be everything this leads to the same error. So UTF-8 seems to be strictly expected by serde. |
imdl should be able to handle these things natively. The reason is that files that get modified like this have new info hashes. This is true for the variant where optional md5sum keys contain invalid data (not plain strings) or .utf-8 key variants. That's at least the implementation state for now. I guess it's possible to do it in anthoer way but this could be a violation of specification. So the fix above leads to a torrent with another infohash then the original but can be verified due to the fact that the piece hashes aren't touched. torf creates a new info hash when writing out the file by hashing the info dict. I guess there is no real specification for torrent modification so I guess it would be possible to just modify the torrent without changing the info hash (just modify the bencode). BUT the changing of the info hash will be confusing. Because in the case where only the bencode is modified the hash wouldn't be valid for the info dict. |
I think this would just be too complex for the implementation. |
Hi,
imdl fails to deserialize torrents which have .utf-8 key variants name.utf-8 and path.utf-8.
They are not explicit defined in BEP3 but seem to be introduced by BitTorrent Inc. and used in uTorrent also see this.
To explain the thing shortly:
Torrents of this type hold e.g. the name key and a name.utf-8 key. The encoding of the value of this dict entries are different. So they have two variants of name (think it's ASCII or the system default encoding and UTF-8 in the .utf-8 key). BEP3 normally says they should be UTF-8 always. This is also usual in files dict entries with path and path.utf-8 which is the same like in the case of the name key.
torf accepts this without problems (files are simply valid).
Can be fixed by rewriting the torrent e.g. with torf by using the value of the .utf-8 fields and simply remove the .utf-8 variants after this.
Because it seems to not violating BEP3 directly but is used in practice I guess it's the best to behave like clients do: If the .utf-8 variant exists use it preferred.
Could provide a sample from the wild world out there via Discord.
Best Regards
The text was updated successfully, but these errors were encountered: