Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode error #12

Closed
limpbrains opened this issue Mar 16, 2012 · 20 comments
Closed

Encode error #12

limpbrains opened this issue Mar 16, 2012 · 20 comments
Assignees

Comments

@limpbrains
Copy link

I can't run srt with this file http://dl.dropbox.com/u/1788271/Bones.S07E01.HDTVRip.srt
It is cp1251
I have the following error:

Traceback (most recent call last):
  File "/usr/local/bin/srt", line 9, in <module>
    load_entry_point('pysrt==0.4.1', 'console_scripts', 'srt')()
  File "/usr/local/lib/python2.7/dist-packages/pysrt/commands.py", line 190, in main
    SubRipShifter().run(sys.argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/pysrt/commands.py", line 118, in run
    self.arguments.action()
  File "/usr/local/lib/python2.7/dist-packages/pysrt/commands.py", line 164, in break_lines
    self.input_file.break_lines(self.arguments.length)
  File "/usr/local/lib/python2.7/dist-packages/pysrt/commands.py", line 177, in input_file
    encoding=encoding, error_handling=SubRipFile.ERROR_LOG)
  File "/usr/local/lib/python2.7/dist-packages/pysrt/srtfile.py", line 131, in open
    new_file.read(source_file, error_handling=error_handling)
  File "/usr/local/lib/python2.7/dist-packages/pysrt/srtfile.py", line 159, in read
    self.extend(self.stream(source_file, error_handling=error_handling))
  File "/usr/lib/python2.7/UserList.py", line 88, in extend
    self.data.extend(other)
  File "/usr/local/lib/python2.7/dist-packages/pysrt/srtfile.py", line 190, in stream
    yield SubRipItem.from_lines(source)
  File "/usr/local/lib/python2.7/dist-packages/pysrt/srtitem.py", line 79, in from_lines
    return cls(index, start, end, body, position)
  File "/usr/local/lib/python2.7/dist-packages/pysrt/srtitem.py", line 21, in __init__
    self.index = int(index)
UnicodeEncodeError: 'decimal' codec can't encode character u'\ufeff' in position 0: invalid decimal Unicode string
@byroot
Copy link
Owner

byroot commented Mar 17, 2012

Strange, I'm able to shift it without encoding error.

srt shift 20 russian.srt

Can you paste the whole command you typed ?

@ghost ghost assigned byroot Mar 17, 2012
@byroot
Copy link
Owner

byroot commented Apr 11, 2012

Well, a month without reply -> I close this issue.

Feel free to reopen it if you still have a problem.

@byroot byroot closed this as completed Apr 11, 2012
@limpbrains
Copy link
Author

Hi, sorry for the long responce

srt shift 40s 33.srt
Traceback (most recent call last):
File "/usr/local/bin/srt", line 9, in
load_entry_point('pysrt==0.4.1', 'console_scripts', 'srt')()
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 192, in main
SubRipShifter().run(sys.argv[1:])
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 118, in run
self.arguments.action()
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 136, in shift
self.input_file.shift(milliseconds=self.arguments.time_offset)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 179, in input_file
encoding=encoding, error_handling=SubRipFile.ERROR_LOG)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtfile.py", line 127, in open
new_file.read(source_file, error_handling=error_handling)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtfile.py", line 155, in read
self.extend(self.stream(source_file, error_handling=error_handling))
File "/usr/lib/python2.7/UserList.py", line 88, in extend
self.data.extend(other)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtfile.py", line 186, in stream
yield SubRipItem.from_lines(source)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtitem.py", line 58, in from_lines
return cls(index, start, end, body, position)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtitem.py", line 21, in init
self.index = int(index)
UnicodeEncodeError: 'decimal' codec can't encode character u'\ufeff' in position 0: invalid decimal Unicode string

python -V
Python 2.7.2+

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.10
Release: 11.10
Codename: oneiric

@byroot
Copy link
Owner

byroot commented Apr 16, 2012

Hum, very strange... so it always happen whatever the subtitle file ?

And how did you installed it ? Beacause /data/share/_films/Game of Thrones_S02E02/src/ is a very strange location...

@limpbrains
Copy link
Author

I've only tried on a few files, all russian, UTF8.
installed from git
pip install -e git+https://github.com/byroot/pysrt.git#egg=pysrt

@byroot byroot reopened this Apr 16, 2012
@byroot
Copy link
Owner

byroot commented Apr 16, 2012

Ok, I still can't reproduce but now I'm almost sure that it's a BOM issue...

I will ask a friend on ubuntu to test that

Did you tried the version released on PyPI ?
pip install --upgrade pysrt

@limpbrains
Copy link
Author

I confirm it is a BOM issue.
I've successfully edited file without BOM created with notepad++
also I've tried the following command
srt -e utf_8_sig ...
but failed with same error

@byroot
Copy link
Owner

byroot commented Apr 17, 2012

Pysrt is supposed to handle BOM correctly...

And the file you gived to me is in cp1252, why did it have an utf-8 BOM ?
Can you send me another file again ?

@Diaoul
Copy link

Diaoul commented Dec 9, 2012

I'm having the same issue
File is here: https://docs.google.com/open?id=0B2q9iBGZdj6qN29uUzBBQXNJM2c

@byroot byroot closed this as completed in f780a06 Dec 14, 2012
@byroot
Copy link
Owner

byroot commented Dec 14, 2012

I finally found the issue, it was because chardet returned "UTF-8" and the encodings module was only aware of "utf-8".

My bad ...

@Diaoul
Copy link

Diaoul commented Jan 13, 2013

Is this fixed in 0.4.4? Because I still have this error

@byroot
Copy link
Owner

byroot commented Jan 13, 2013

I Think so. You still have the issue with this same file and pysrt 0.4.4 ?

@byroot
Copy link
Owner

byroot commented Jan 13, 2013

Oh shit ... confirmed, I'll fix that right now.

@byroot
Copy link
Owner

byroot commented Jan 13, 2013

Oh, I just forgot to release ...

@byroot
Copy link
Owner

byroot commented Jan 13, 2013

0.4.5 released with the fix.

@Diaoul
Copy link

Diaoul commented Jan 13, 2013

Thanks, that was fast :)

@Diaoul
Copy link

Diaoul commented Jan 13, 2013

I'm still having an error 😢
I added a print statement to see what's in lines here and I got this:

[u'\ufeff1\r\n', u'00:00:01,677 --> 00:00:04,145\r\n', u'Alors, sur quel genre de croisi\xe8re\r\n', u'allez-vous embarquer ?\r\n']

@Diaoul
Copy link

Diaoul commented Jan 13, 2013

Of course int(u'\ufeff1\r\n') fails
File can be downloaded on Addic7ed

@Diaoul
Copy link

Diaoul commented Jan 13, 2013

Sample code to reproduce the error:

from charade.universaldetector import UniversalDetector
import codecs
import pysrt

def is_valid_subtitle(path):
    u = UniversalDetector()
    for line in open(path, 'rb'):
        u.feed(line)
    u.close()
    encoding = u.result['encoding']
    source_file = codecs.open(path, 'rU', encoding=encoding, errors='replace')
    try:
        for _ in pysrt.SubRipFile.stream(source_file, error_handling=pysrt.SubRipFile.ERROR_RAISE):
            pass
    except pysrt.Error as e:
        if e.args[0] < 50:  # Error occurs within the 50 first lines
            return False
#    except UnicodeEncodeError:  # Workaround for https://github.com/byroot/pysrt/issues/12
#        pass
    return True

@byroot
Copy link
Owner

byroot commented Jan 13, 2013

Oh ! it make sense now. If you open the file yourself pysrt do not strip the BOM.

Anyway chardet is integrated inside pysrt now.

Try something like:

def is_valid_subtitle(path):
    source_file = pysrt.SubRipFile._open_unicode_file(path)
    try:
        for _ in pysrt.SubRipFile.stream(source_file, error_handling=pysrt.SubRipFile.ERROR_RAISE):
            pass
    except pysrt.Error as e:
        if e.args[0] < 50:  # Error occurs within the 50 first lines
            return False
#    except UnicodeEncodeError:  # Workaround for https://github.com/byroot/pysrt/issues/12
#        pass
    return True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants