-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apostrophe error #411
Comments
Tartube stores two names for every video: a name matching the video's filename, and a 'nickname' taken from the video's metadata. Open the video's properties window. The name is at the top, the nickname is one in the **Listed as" box. The nickname is also the one visible in your screenshots. I checked that video's metadata, and it contains a character which is not supposed to be used as an apostrophe ( ’ ). Check it for yourself here. The nickname is just for aesthetics, it doesn't affect Tartube operations or your filesystem. As far as I can tell, the text is being rendered correctly. You can write to the video's author, if you like, and politely suggest that they learn how to use their keyboard. |
So I think I've figured out the root cause of this issue, and it's another change from yt-dlp's 20211227 release that seems to have broken things. I went to my test db and reverted to that release of yt-dlp, and sure enough everything was fine in the name and description: So I figured I'd work my way back up to the current yt-dlp release and see where things break, but I didn't have to go far as the January release no longer displayed correctly: I went to yt-dlp and searched through all the issues for unicode and found the culprit: Which inspired this change to the code that, you guessed it, was made live in the January release: Since this is the default behavior of yt-dlp moving forward, is it possible to have tartube read in the description and info.json files as unicode to make it where the characters display correctly? Thanks Axcore! |
Spent a couple more hours playing around with this tonight and it looks like the issue is that the json/description is being encoded to UTF-8 twice (first by yt-dlp, then by tartube). (I'm testing on the description file because that was the easiest for me to figure out how to hack a test for) With latest Tartube and 2022-04-08 yt-dlp, this is how the filename and description is displayed: But if I change downloads.py#L4529 to not encode as UTF-8, it displays correctly on my system with an encoding detected as cp1252: Or if I leave downloads.py alone, but change utils.py#L2855 to force the system encoding to UTF-8, it will also display correctly (but it breaks the filenames on Windows that way): [an hour later] Maybe this is the proper fix? Any thoughts on if it will break something else? I'm going to submit a PR with that change in hopes it will work or be a base for a fix :) Thanks Axcore! |
Nice work @ceonelson, I think I would not have solved that without you. |
Win10 / 2.4
If I check a video, the name shows correctly in Tartube:
python3 D:_YT\yt-dlp-20220408 --newline -i --hls-prefer-native --write-description --write-info-json --write-annotations --cookies D:/_ytt/cookies.txt --write-thumbnail --merge-output-format mkv --write-sub --embed-thumbnail --add-metadata --windows-filenames --convert-thumbnails jpg --sub-lang en --output D:/_ytt/Comedy/%(uploader)s - (%(upload_date)s) - %(title)s - %(id)s - [%(format_id)s#%(height)sp].%(ext)s --get-comments --extractor-args youtube:comment_sort=top -f bestvideo[ext=webm][height<=?480][fps<=?30]+bestaudio[ext=webm]/bestvideo[height<=?480][fps<=?30]+bestaudio/best --dump-json --download-archive D:/_ytt/Comedy/ytdl-archive.txt https://www.youtube.com/watch?v=GVtEzGZP-_s
[Comedy] <Simulated download of: 'Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p]'>
But upon downloading it, the apostrophe gets messed up:
python3 D:_YT\yt-dlp-20220408 --newline -i --hls-prefer-native --write-description --write-info-json --write-annotations --cookies D:/_ytt/cookies.txt --write-thumbnail --merge-output-format mkv --write-sub --embed-thumbnail --add-metadata --windows-filenames --convert-thumbnails jpg --sub-lang en --output D:/_ytt/Comedy/%(uploader)s - (%(upload_date)s) - %(title)s - %(id)s - [%(format_id)s#%(height)sp].%(ext)s --get-comments --extractor-args youtube:comment_sort=top -f bestvideo[ext=webm][height<=?480][fps<=?30]+bestaudio[ext=webm]/bestvideo[height<=?480][fps<=?30]+bestaudio/best --download-archive D:/_ytt/Comedy/ytdl-archive.txt https://www.youtube.com/watch?v=GVtEzGZP-_s
[youtube] GVtEzGZP-_s: Downloading webpage
[youtube] GVtEzGZP-_s: Downloading android player API JSON
[youtube] Downloading comment section API JSON
[youtube] Downloading ~140 comments
[youtube] Sorting comments by top comments
[youtube] Downloading comment API JSON page 1 (0/140)
[youtube] Downloading comment API JSON reply thread 1 (6/140)
[youtube] Downloading comment API JSON reply thread 2 (9/140)
[youtube] Downloading comment API JSON reply thread 3 (15/140)
[youtube] Downloading comment API JSON reply thread 4 (23/140)
[youtube] Downloading comment replies API JSON page 1 (33/140)
[youtube] Downloading comment API JSON page 2 (47/140)
[youtube] Downloading comment API JSON reply thread 1 (49/140)
[youtube] Downloading comment replies API JSON page 1 (59/140)
[youtube] Downloading comment API JSON reply thread 2 (79/140)
[youtube] Downloading comment API JSON reply thread 3 (82/140)
[youtube] Downloading comment API JSON page 3 (96/140)
[youtube] Downloading comment API JSON reply thread 1 (104/140)
[youtube] Downloading comment API JSON reply thread 2 (109/140)
[youtube] Downloading comment API JSON reply thread 3 (111/140)
[youtube] Downloading comment API JSON reply thread 4 (113/140)
[youtube] Extracted 114 comments
[info] GVtEzGZP-_s: Downloading 1 format(s): 244+251
[info] Writing video description to: D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].description
[info] Downloading video thumbnail 41 ...
[info] Writing video thumbnail 41 to: D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].webp
[info] Writing video metadata as JSON to: D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].info.json
WARNING: There are no annotations to write.
[ThumbnailsConvertor] Converting thumbnail "D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].webp" to jpg
Deleting original file D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].webp (pass -k to keep)
[download] Destination: D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].f244.webm
[download] 100% of 125.51MiB in 00:27
[download] Destination: D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].f251.webm
[download] 100% of 42.52MiB in 00:07
[Merger] Merging formats into "D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].mkv"
Deleting original file D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].f244.webm (pass -k to keep)
Deleting original file D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].f251.webm (pass -k to keep)
[Metadata] Adding metadata to "D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].mkv"
[EmbedThumbnail] ffmpeg: Adding thumbnail to "D:/_ytt/Comedy/Dinesh D'Souza - (20220421) - UNMASKED Dinesh D’Souza Podcast Ep315 - GVtEzGZP-_s - [244+251#480p].mkv"
Tried both with and without --windows-filenames, same result.
Any ideas? Thanks Axcore!
The text was updated successfully, but these errors were encountered: