Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: fix processing of UTF-8 files with BOM #5506

Merged
merged 2 commits into from Jul 29, 2019

Conversation

@SSE4
Copy link
Contributor

commented Jul 17, 2019

Changelog: Bugfix: fix processing of UTF-8 files with BOM
Docs: omit
@PYVERS: Macos@py27, Windows@py36, Linux@py27, py34
@tags: svn, slow
@revisions: 1
closes: #5504

  • Refer to the issue that supports this Pull Request.
  • If the issue has missing info, explain the purpose/use case/pain/need that covers this Pull Request.
  • I've read the Contributing guide.
  • I've followed the PEP8 style guides for Python code.
  • I've opened another PR in the Conan docs repo to the develop branch, documenting this one.

Note: By default this PR will skip the slower tests and will use a limited set of python versions. Check here how to increase the testing level by writing some tags in the current PR body text.

- fix processing of UTF-8 files with BOM
Signed-off-by: SSE4 <tomskside@gmail.com>
@@ -176,6 +176,21 @@ def load(path, binary=False):
""" Loads a file content """
with open(path, 'rb') as handle:
tmp = handle.read()
if not binary:
import codecs
encodings = {codecs.BOM_UTF8: "utf_8_sig",

This comment has been minimized.

Copy link
@memsharded

memsharded Jul 17, 2019

Contributor

Wow, I can't believe this is necessary... So every python application out there that is reading text files should do something like this? I am not sure it makes sense, but most likely I am failing to understand the issue. Can't we just ask that conanfile.txt should have a standard ascii or utf8 encoding? Is this something that will be solved in Python3 and this is only for Python 2? Wdyt @lasote?

This comment has been minimized.

Copy link
@uilianries

uilianries Jul 17, 2019

Member

Is it possible solving this using comments with encoding like # -*- coding: utf-8 -*- ?

This comment has been minimized.

Copy link
@SSE4

SSE4 Jul 17, 2019

Author Contributor

yes, seems so, I have found many such recommendations on stack overflow.
some text editors, mostly on Windows, defaults to save into UTF-8 with BOM, or UTF-16 with BOM, if they contain some non-ASCII byte sequences.

This comment has been minimized.

Copy link
@SSE4

SSE4 Jul 17, 2019

Author Contributor

from wiki:

Microsoft compilers[9] and interpreters, and many pieces of software on Microsoft Windows such as Notepad treat the BOM as a required magic number rather than use heuristics. These tools add a BOM when saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is present or the file contains only ASCII. Google Docs also adds a BOM when converting a document to a plain text file for download.

This comment has been minimized.

Copy link
@lasote

lasote Jul 18, 2019

Contributor

Wow, this is insane. Probably that code should be moved to decode_text function that it is already a bit insane.

This comment has been minimized.

Copy link
@SSE4

SSE4 Jul 18, 2019

Author Contributor

moved to decode_text

This comment has been minimized.

Copy link
@lasote

lasote Jul 19, 2019

Contributor

So, to confirm, please answer the question from @memsharded, Is this something that will be solved in Python3 and this is only for Python 2?

This comment has been minimized.

Copy link
@SSE4

SSE4 Jul 19, 2019

Author Contributor

no, it will not be solved by Python 3, I have exactly the same issue with Python 3 and UTF-8 with BOM files on Windows, which Notepad saves by default for me.

- move BOM handling to the decode_text
Signed-off-by: SSE4 <tomskside@gmail.com>

@SSE4 SSE4 force-pushed the SSE4:fix_bom branch from 6282d00 to 7c2b1b2 Jul 18, 2019

@lasote lasote added this to the 1.18 milestone Jul 19, 2019

@lasote

lasote approved these changes Jul 29, 2019

@lasote lasote merged commit 4e37165 into conan-io:develop Jul 29, 2019

2 checks passed

continuous-integration/jenkins/pr-head This commit looks good
Details
license/cla Contributor License Agreement is signed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.