Skip to content
This repository has been archived by the owner on Oct 16, 2018. It is now read-only.

Handling non UTF-8 files #3

Closed
cgestes opened this issue Jun 25, 2017 · 3 comments
Closed

Handling non UTF-8 files #3

cgestes opened this issue Jun 25, 2017 · 3 comments

Comments

@cgestes
Copy link
Contributor

cgestes commented Jun 25, 2017

calling replacer zsh sss in .config/ctafconf

I got the following backtrace with the last commits:

  File "/home/ctaf/.config/ctafconf/bin/replacer", line 208, in replace_in_file
    in_lines = in_fd.readlines()
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 10: invalid continuation byte

@dmerejkowsky
Copy link
Owner

dmerejkowsky commented Jun 26, 2017

Oh. This is a regression introduced by this commit: aad74e9

My mistake.

The previous implementation used to just skip non-utf-8 encoding files, so I'll restore that.

Better fixes could be:

  • use bytes instead of strings (recent python3 versions allow regexp on bytes) (Requires Get rid of Python2 compat #1 because life is too short)
  • use chardet and preserve the encoding (if input is latin-1, output will be latin-1 too)
  • add a --encoding option ?

@dmerejkowsky dmerejkowsky changed the title unicode crash Handling non-UTF-8 files Jun 26, 2017
@dmerejkowsky dmerejkowsky changed the title Handling non-UTF-8 files Handling non UTF-8 files Jun 26, 2017
@dmerejkowsky
Copy link
Owner

Regression fixed, keeping the issue open for discussion.

dmerejkowsky added a commit that referenced this issue Jun 26, 2017
Fix regression introduced by.

commit aad74e9
Author: Dimitri Merejkowsky <d.merej@gmail.com>
Date:   Sat Jun 24 16:44:58 2017 +0200

    fix too broad exception

We'll just skip non-utf-8 files for now
@cgestes
Copy link
Contributor Author

cgestes commented Jun 26, 2017

Seems good enough as is.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants