rtf reader: nonasci metadata causes UnicodeDecodeError (openoffice rtf files) #3

joka · 2010-08-04T19:59:04Z

I have openoffice rtf files with nonasci metadata (author):

{\info{\author Claudia Jürgens}{\creatim\yr2010\mo7\dy19\hr12\min45}{\author Claudia Jürgens}
{\revtim\yr2010\mo7\dy28\hr13\min27}{\printim\yr0\mo0\dy0\hr0\min0}{\comment    
StarWriter}{\vern3000}}\deftab709

This causes UnicodeDecodeError:

Module pyth.plugins.rtf15.reader, line 93, in read
Module pyth.plugins.rtf15.reader, line 113, in go                                           
Module pyth.plugins.rtf15.reader, line 147, in parse
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 0: ordinal not in range(128)

This patch just catches the error:

*** reader.py  2010-05-04 21:48:14.000000000 +0200 
--- reader.py   2010-08-04 21:47:10.000000000 +0200 
***************       
*** 140,146 ****      
                  control, digits = self.getControl() 
                  self.group.handle(control, digits) 
              else:   
!                 self.group.char(unicode(next)) 


      def getControl(self): 
--- 140,149 ----      
                  control, digits = self.getControl() 
                  self.group.handle(control, digits) 
              else:   
!                 try: 
!                     self.group.char(unicode(next)) 
!                 except UnicodeDecodeError, e: 
!                     self.group.char('?') 


      def getControl(self):

The text was updated successfully, but these errors were encountered:

brendonh · 2010-08-14T15:09:12Z

Hi joka,

As with the \f0 issue, please send me a full RTF file to reproduce this, and I'll see if I can figure out the best fix.

brendonh · 2010-08-18T21:31:46Z

Fixed (in trunk) by decoding the char in the current group using its charset (i.e. the doc default charset for metadata), rather than blindly unicode()ing it.

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rtf reader: nonasci metadata causes UnicodeDecodeError (openoffice rtf files) #3

rtf reader: nonasci metadata causes UnicodeDecodeError (openoffice rtf files) #3

joka commented Aug 4, 2010

brendonh commented Aug 14, 2010

brendonh commented Aug 18, 2010

rtf reader: nonasci metadata causes UnicodeDecodeError (openoffice rtf files) #3

rtf reader: nonasci metadata causes UnicodeDecodeError (openoffice rtf files) #3

Comments

joka commented Aug 4, 2010

brendonh commented Aug 14, 2010

brendonh commented Aug 18, 2010