windows: UnicodeDecodeError: 'gbk' codec can't decode #4

BackMountainDevil · 2021-01-15T06:38:23Z

Env

windwos 10 amd64
py 3.9.0
pyflowchart 0.1.0

Err

code

# This is statement is required by the build system to query build info
if __name__ == '__build__':
    raise Exception

# cube.py
# Converted to Python by Jason Petrone 6/00

import sys

try:
    from OpenGL.GLUT import *
    from OpenGL.GL import *
    from OpenGL.GLU import *
except:
    print("ERROR: PyOpenGL not installed properly")


def init():
    glClearColor(0.0, 0.0, 0.0, 0.0)
    glShadeModel(GL_FLAT)


def display():
    glClear(GL_COLOR_BUFFER_BIT)
    glColor3f(1.0, 1.0, 1.0)
    glLoadIdentity()  # clear the matrix
    # viewing transformation
    gluLookAt(0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0)
    glScalef(1.0, 2.0, 1.0)  # modeling transformation
    glutWireCube(1.0)
    glFlush()


def reshape(w, h):
    glViewport(0, 0, w, h)
    glMatrixMode(GL_PROJECTION)
    glLoadIdentity()
    glFrustum(-1.0, 1.0, -1.0, 1.0, 1.5, 20.0)
    glMatrixMode(GL_MODELVIEW)


glutInit(sys.argv)
glutInitDisplayMode(GLUT_SINGLE | GLUT_RGB)
glutInitWindowSize(500, 500)
glutInitWindowPosition(100, 100)
glutCreateWindow('cube')
init()
glutDisplayFunc(display)
glutReshapeFunc(reshape)

glutMainLoop()

Exceptioin

$ python -m pyflowchart cube.py
Traceback (most recent call last):
  File "D:\Program Files\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Program Files\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Program Files\Python\Python39\lib\site-packages\pyflowchart\__main__.py", line 33, in <module>
    main(args.code_file, args.field, args.inner)
  File "D:\Program Files\Python\Python39\lib\site-packages\pyflowchart\__main__.py", line 16, in main
    code = code_file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 355: illegal multibyte sequence

另一个代码证明了不是code问题,也爆出了同样的错误,因此错误是由于该模块本身的问题...但是我目前的知识还不足以维护这个模块...

code2

import math
#  计算磁环的横截面积F  cm^2
def calMagnetRingCrossArea(D, d, h):
    return (0.5 * h * (D + d))

#  计算磁环平均长度l  cm
def calMagnetRingLength(D, d):
    return (0.5 * (D + d) * math.pi)

#  计算磁环绕线电感的电感L  mH
def calInductance(u, N, F, l):
    return 0

#  计算线圈匝数N
def calRingNumber(u, L, F, l):
    return math.sqrt((L * l * math.pow(10,5))/(0.4 * math.pi * u * F))
    
u = 75    
D = 3.3
d = 1.99
h = 1.07
L = 50.0
F = calMagnetRingCrossArea(D,d,h)
l = calMagnetRingLength(D,d)
N = calRingNumber(u,L,F,l)
print(N)

cdfmlr · 2021-01-15T07:03:48Z

首先感谢你的反馈，对给你带来的麻烦表示歉意。

这个问题应该和 f312d64 类似，是中文环境的 Windows 下读取文件时编码的问题。

我这边没有 Windows 的机器，我用 mac 做了一些简单的尝试，但没能复现问题，所以可能需要你进一步提供更多的信息来帮助确定 BUG 所在。@BackMountainDevil 麻烦可以看一下你的 cube.py 文件是什么编码的吗？

具体的方法是：利用 VSCode 等编辑器打开文件，在右下角会有显示。例如下图的文件是 UTF-8 编码的:

BackMountainDevil · 2021-01-15T12:08:47Z

两个文件都是UTF-8编码,通过VS Code/记事本右下角的状状态栏都是UTF-8.
编码修改为gbk再次尝试还是报错

BackMountainDevil · 2021-01-15T12:25:06Z

我把模块里面的main代码进行了修改

code = code_file.read().decode(encoding='utf-8', errors='ignore')
code = code_file.read().decode(encoding='utf-8')

无论修改为哪一个,错误都是那样

code = code_file.read().decode(encoding='utf-8')
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 215: illegal multibyte sequence
code = code_file.read().decode(encoding='utf-8', errors='ignore')
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 215: illegal multibyte sequence

cdfmlr · 2021-01-15T12:32:15Z

哈哈，这个修改思路和我想的一样，但你的实现有点小问题。应该还需要多改一行代码：

if __name__ == '__main__':
    parser.add_argument('code_file',  type=argparse.FileType('r'))

'r' 要改成 'rb'，不然你是在对 str 做 decode。

你试试

BackMountainDevil · 2021-01-15T12:40:30Z

模块中读取的编码是cp936 == gbk

def main(code_file, field, inner):
    print("code_file: ", code_file, "\nfield: ", field, "\ninner: ", inner)
    code = code_file.read().decode(encoding='utf-8', errors='ignore')
    print(code)
    flowchart = Flowchart.from_code(code, field=field, inner=inner)
    print(flowchart.flowchart())

## output
code_file:  <_io.TextIOWrapper name='cube.py' mode='r' encoding='cp936'>
field:
inner:  True

但是无论指定encoding为cp936还是gbk都会报错UnicodeDecodeError: 'gbk' codec can't decode byte
参数设置失效这我还是头一回知道

cube.py编码确实是utf-8

path = "D://Documents//CAU\Lion//repositiries//Python//pyopengl//beginner//cube.py"
with open(path, 'r', encoding='utf') as text:
    words = text.read()
    print(words)

无论是在VS Code里运行还是powershell运行都是正确运行的,但是如果指定为gbk或者不设置encoding就会报错UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 215: illegal multibyte sequence

cdfmlr · 2021-01-15T12:46:34Z

open 的文档：https://docs.python.org/3/library/functions.html#open

里面说不指定 encoding 的话默认用 platform dependent 的编码。而据我了解，中文的 Windows 系统使用 GBK 编码。所以不指定 encoding 的话，就是用 GBK 去解码。所以无法解码 UTF-8 的文件。

cdfmlr · 2021-01-15T12:54:10Z

我在 c7410be 里添加了一些自动检测文本编码的代码。应该可以解决你的问题了。但我仍然没有借到 Windows 电脑做测试😂，你可以用 #4_file_decode_error 这个分支里的代码试试😂

BackMountainDevil · 2021-01-15T12:54:19Z

裂开...格式修改为gbk格式就行了,sorry

cdfmlr · 2021-01-15T12:56:36Z

emmm，但是你不应该把编码改成 GBK 的，Python3 默认用 UTF-8 编码

裂开...格式修改为gbk格式就行了,sorry

试试这个：

我在 c7410be 里添加了一些自动检测文本编码的代码。应该可以解决你的问题了。但我仍然没有借到 Windows 电脑做测试😂，你可以用 #4_file_decode_error 这个分支里的代码试试😂

BackMountainDevil · 2021-01-15T12:59:26Z

https://github.com/cdfmlr/pyflowchart/blob/%234_file_decode_error/pyflowchart/__main__.py
NB,我把你刚才更新后的代码cv之后可以了!!!

this will be released as v0.1.1 Closes #4

resolve #4 open file decode error

cdfmlr · 2021-01-16T07:52:30Z

这个问题在新发布的 v0.1.1 版本中应该解决了🎉，pip 更新即可使用。

$ pip3 install -U pyflowchart

再次感谢 @BackMountainDevil 🙏

P.S. 关于 PyFlowchart 的完整用法，敬请参考 README。

cdfmlr added a commit that referenced this issue Jan 15, 2021

resolve #4 Windows: UnicodeDecodeError: 'gbk' codec can't decode

9226cf6

this will be released as v0.1.1 Closes #4

cdfmlr mentioned this issue Jan 15, 2021

resolve #4 open file decode error #5

Merged

cdfmlr closed this as completed in #5 Jan 16, 2021

cdfmlr added a commit that referenced this issue Jan 16, 2021

Merge pull request #5 from cdfmlr/#4_file_decode_error

dd4bff7

resolve #4 open file decode error

BackMountainDevil mentioned this issue Jan 25, 2021

UnicodeDecodeError exception occurs when .env file contains Non-ASCII characters on Windows theskumar/python-dotenv#300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

windows: UnicodeDecodeError: 'gbk' codec can't decode #4

windows: UnicodeDecodeError: 'gbk' codec can't decode #4

BackMountainDevil commented Jan 15, 2021

cdfmlr commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

cdfmlr commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

cdfmlr commented Jan 15, 2021

cdfmlr commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

cdfmlr commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

cdfmlr commented Jan 16, 2021

windows: UnicodeDecodeError: 'gbk' codec can't decode #4

windows: UnicodeDecodeError: 'gbk' codec can't decode #4

Comments

BackMountainDevil commented Jan 15, 2021

Env

Err

code

Exceptioin

code2

cdfmlr commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

cdfmlr commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

模块中读取的编码是cp936 == gbk

cube.py编码确实是utf-8

cdfmlr commented Jan 15, 2021

cdfmlr commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

cdfmlr commented Jan 15, 2021

BackMountainDevil commented Jan 15, 2021

cdfmlr commented Jan 16, 2021