fix incorrect text codec detection #2585

StoneMoe · 2023-01-20T04:19:30Z

Description

Current implementation of generic.detect_encoding() use a guess list to detect text encoding
which will break all codecs which use similar encoding method to the codec ordered before it
affect WineCommand output log afaik

For example:

codepage 936 use by Wine by default when setting language to Simplified Chinese, which is actually gb2312/gbk in different Windows version
gb2312 use one or two bytes for encoding
utf-16 use two bytes for encoding
they use different codepoint table
utf-16 ordered before gb2312 in guess list
gb2312 encoded bytes will be matched as utf-16

then garbled text happened:

>>> '你好'.encode('gb2312').decode('utf-16')
'\ue3c4쎺'

or

>>> 'こんにちは'.encode('euc-jp').decode('utf-16')
'뎤\uf3a4쮤솤쾤'

This PR replaced the detect_encoding() with chardet library
which is already included in Bottles

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Change to CJK language and install font via Bottles, then reg add log will print correct message that piped from command stdout

github-actions · 2023-01-20T04:20:13Z

Pylint result on modfied files:

------------- Module bottles.backend.utils.generic
bottles/backend/utils/generic.py:66:0: W1405: Quote delimiter " is inconsistent with the rest of the file (inconsistent-quotes)
bottles/backend/utils/generic.py:79:0: W1405: Quote delimiter " is inconsistent with the rest of the file (inconsistent-quotes)
bottles/backend/utils/generic.py:79:0: W1405: Quote delimiter " is inconsistent with the rest of the file (inconsistent-quotes)
bottles/backend/utils/generic.py:25:0: C0414: Import alias does not rename original package (useless-import-alias)
bottles/backend/utils/generic.py:25:0: E0401: Unable to import 'chardet' (import-error)
bottles/backend/utils/generic.py:61:4: W0702: No exception type(s) specified (bare-except)
bottles/backend/utils/generic.py:54:8: C0415: Import outside toplevel (ctypes) (import-outside-toplevel)
bottles/backend/utils/generic.py:28:0: W0612: Unused variable 'validate_url' (unused-variable)
bottles/backend/utils/generic.py:43:0: W0612: Unused variable 'detect_encoding' (unused-variable)
bottles/backend/utils/generic.py:51:0: W0612: Unused variable 'is_glibc_min_available' (unused-variable)
bottles/backend/utils/generic.py:66:0: W0612: Unused variable 'sort_by_version' (unused-variable)
bottles/backend/utils/generic.py:76:0: W0612: Unused variable 'get_mime' (unused-variable)
bottles/backend/utils/generic.py:85:0: W0612: Unused variable 'random_string' (unused-variable)

Your code has been rated at 5.64/10

fix incorrect text codec detection

dce5f96

mirkobrombin approved these changes Jan 20, 2023

View reviewed changes

Kinsteen approved these changes Jan 21, 2023

View reviewed changes

Kinsteen merged commit 6954b8a into bottlesdevs:main Jan 21, 2023

StoneMoe mentioned this pull request Feb 8, 2023

fix inconsistent return type for WineCommand #2658

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix incorrect text codec detection #2585

fix incorrect text codec detection #2585

StoneMoe commented Jan 20, 2023 •

edited

github-actions bot commented Jan 20, 2023

fix incorrect text codec detection #2585

fix incorrect text codec detection #2585

Conversation

StoneMoe commented Jan 20, 2023 • edited

Description

Type of change

How Has This Been Tested?

github-actions bot commented Jan 20, 2023

StoneMoe commented Jan 20, 2023 •

edited