Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtf reader: nonasci metadata causes UnicodeDecodeError (openoffice rtf files) #3

Closed
joka opened this issue Aug 4, 2010 · 2 comments

Comments

@joka
Copy link

joka commented Aug 4, 2010

I have openoffice rtf files with nonasci metadata (author):

{\info{\author Claudia Jürgens}{\creatim\yr2010\mo7\dy19\hr12\min45}{\author Claudia Jürgens}
{\revtim\yr2010\mo7\dy28\hr13\min27}{\printim\yr0\mo0\dy0\hr0\min0}{\comment    
StarWriter}{\vern3000}}\deftab709   

This causes UnicodeDecodeError:

Module pyth.plugins.rtf15.reader, line 93, in read
Module pyth.plugins.rtf15.reader, line 113, in go                                           
Module pyth.plugins.rtf15.reader, line 147, in parse
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 0: ordinal not in range(128) 

This patch just catches the error:

*** reader.py  2010-05-04 21:48:14.000000000 +0200 
--- reader.py   2010-08-04 21:47:10.000000000 +0200 
***************       
*** 140,146 ****      
                  control, digits = self.getControl() 
                  self.group.handle(control, digits) 
              else:   
!                 self.group.char(unicode(next)) 


      def getControl(self): 
--- 140,149 ----      
                  control, digits = self.getControl() 
                  self.group.handle(control, digits) 
              else:   
!                 try: 
!                     self.group.char(unicode(next)) 
!                 except UnicodeDecodeError, e: 
!                     self.group.char('?') 


      def getControl(self):
@brendonh
Copy link
Owner

Hi joka,

As with the \f0 issue, please send me a full RTF file to reproduce this, and I'll see if I can figure out the best fix.

@brendonh
Copy link
Owner

Fixed (in trunk) by decoding the char in the current group using its charset (i.e. the doc default charset for metadata), rather than blindly unicode()ing it.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants