Skip to content

Conversation

@pombredanne
Copy link

Signed-off-by: Philippe Ombredanne pombredanne@nexb.com

 * this fixes an issue when no locale is defined on Linux
   and the filesystem encoding is ASCII. By forcing the encoding used
   for filesystem-realted encode/decode in this case we can use
   proper surrogatescape encoding and marshall bytes to unicode and back
   mostly the same way Python 3 does it.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Author

@sschuberth ping, FYI.

@codecov
Copy link

codecov bot commented Sep 19, 2017

Codecov Report

Merging #7 into master will decrease coverage by 69.36%.
The diff coverage is 61.29%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master       #7       +/-   ##
===========================================
- Coverage   95.83%   26.47%   -69.37%     
===========================================
  Files           5        5               
  Lines         168      680      +512     
  Branches       26      115       +89     
===========================================
+ Hits          161      180       +19     
- Misses          4      490      +486     
- Partials        3       10        +7
Impacted Files Coverage Δ
src/backports/os.py 13.58% <28.57%> (-80.01%) ⬇️
tests/test_os.py 75.67% <70.83%> (-10.04%) ⬇️
tests/test_helpers.py 95.83% <0%> (-4.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 848c092...6024baa. Read the comment docs.

@pombredanne
Copy link
Author

@pjdelport not sure why the code coverage is going down... since I added more tests. Probably a quirk in codecov and the fact that test without a locale/ascii fs encoding run in a subprocess.

pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Sep 19, 2017
 * see PiDelport/backports.os#7 for details

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@PiDelport
Copy link
Owner

@pombredanne: Thank you for this update!

I investigated this further, and I think I found the root problem: it actually seems to be a bug (or at least an incompatibility) in how Python 2's ASCII codec calls the surrogateescape error handler.

This bug can be reproduced independently of locale:

On Python 2, surrogateescape encodes surrogates for low bytes to ASCII as expected; however, as soon as it encodes surrogates for high bytes, the ASCII codec seems to disregard the result of surrogateescape, and fail anyway:

$ python2.7 -c "import backports.os; print(repr(u'\udc7f'.encode('ascii', 'surrogateescape')))"
'\x7f'

$ python2.7 -c "import backports.os; print(repr(u'\udc80'.encode('ascii', 'surrogateescape')))"
UnicodeEncodeError: 'ascii' codec can't encode character u'\udc80' in position 0: ordinal not in range(128)

(I added instrumentation to the latter case to verify that surrogateescape is actually being called, and is correctly returning \x80.)

On Python 3, by contrast, the ASCII codec just uses the result of surrogateescape as expected:

$ python3.6 -c "print(repr(u'\udc7f'.encode('ascii', 'surrogateescape')))"
b'\x7f'

$ python3.6 -c "print(repr(u'\udc80'.encode('ascii', 'surrogateescape')))"
b'\x80'

@PiDelport
Copy link
Owner

Given the above, I don't think this PR is the right fix: it avoids the bug, but it does so by re-interpreting a filesystem encoding of ascii as utf-8, which actually changes what gets encodes to and from compared to Python 3 on ascii.

However, I implemented a fix for the bug described above in #8: can you check that out, and see if it works for you?

@pombredanne
Copy link
Author

@pjdelport You are my new unicode star!

@sschuberth
Copy link

Thanks all! So, I believe we're good to close this unmerged as @pombredanne confirmed that #8 works.

@pombredanne
Copy link
Author

Closing as invalid in favor of #8

pombredanne added a commit to aboutcode-org/scancode-toolkit that referenced this pull request Sep 22, 2017
 * see PiDelport/backports.os#7 for details

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants