You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am attempting to videogrep a video that is English language but brief lines in Spanish occasionally appear. It looks like subtitles that have some non-English characters cause a unicode decode error to be thrown:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 19669: invalid continuation byte
This can easily be fixed by finding and replacing accented characters with non-accented characters in the subtitle track, but maybe this can be done programmatically without altering the original subtitle file? I'm not sure how common it is to find English language subtitles with correct non-English accent markings, etc.
The text was updated successfully, but these errors were encountered:
I tried to deal with this by doing: find . -type f -name '*.vtt' -print -exec iconv -c -f utf-8 -t ascii {} -o {} \;
but it didn't help resolve the issue. Makes me think there is something more here that I may be overlooking.
Also, with 80+ files, the lack of error handling means that I have no idea which file is causing the issue.
I am attempting to videogrep a video that is English language but brief lines in Spanish occasionally appear. It looks like subtitles that have some non-English characters cause a unicode decode error to be thrown:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 19669: invalid continuation byte
This can easily be fixed by finding and replacing accented characters with non-accented characters in the subtitle track, but maybe this can be done programmatically without altering the original subtitle file? I'm not sure how common it is to find English language subtitles with correct non-English accent markings, etc.
The text was updated successfully, but these errors were encountered: