Many errors are reported when parsing the dbc with Chinese characters and special characters #737

Liluoquan · 2023-11-24T02:03:23Z

When I use canmatrix to load DBC with signals containing Chinese characters and special characters, like:
_matrix = canmatrix.formats.dbc.load(f, dbcImportEncoding=encoding)
errors reported like this:

error with line no: 2004
b' SG_ PSDCU_RR\xe4\xb8\xbb\xe8\xbd\xaf\xe4\xbb\xb6\xe7\x89\x88\xe6\x9c\xac\xe5\x8f\xb7$_W : 63|8@0+(1,0)[0|255] "" Vector__XXX\r\n'

the original line like this:
SG_ 冗余制动降级状态$_W : 23|3@0+(1,0)[0|7] "" Vector__XXX
then I find canmatrix use regex to match each line in the dbc, it uses the following regex when processing lines starting with'SG_':
pattern = r"^SG_ +(\w+) *: *(\d+)\|(\d+)@(\d+)([\+|\-]) *$([0-9.+\-eE]+), *([0-9.+\-eE]+)$ *\[([0-9.+\-eE]+)\|([0-9.+\-eE]+)\] +\"(.*)\" +(.*)"
regex group (\w+) cannot match Chinese characters or special characters in python3.8, so I suggest to change the regex above into:
pattern = r"^SG_ +(\S+) *: *(\d+)\|(\d+)@(\d+)([\+|\-]) *$([0-9.+\-eE]+), *([0-9.+\-eE]+)$ *\[([0-9.+\-eE]+)\|([0-9.+\-eE]+)\] +\"(.*)\" +(.*)"
To adapt to the scenarios mentioned in the issue.
Please reply, it's very important to me!

The text was updated successfully, but these errors were encountered:

ebroecker · 2023-11-27T16:41:36Z

Hi @Liluoquan

you have to specify the encoding "dbcImportEncoding".

maybe something like dbcImportEncoding="utf8"

ebroecker · 2023-12-04T13:59:39Z

Hi @Liluoquan

any success?

Liluoquan · 2023-12-07T01:44:31Z

Hi @ebroecker
sorry, it didn't work when i use utf-8, GB2312 or gbk:
_matrix = canmatrix.formats.dbc.load(f, dbcImportEncoding='utf-8')
The error is as follows:

error with line no: 28
b' SG_ \xca\xfd\xd7\xd6\xd6\xa4\xca\xe9\xb4\xe6\xb4\xa2\xb9\xca\xd5\xcf$_W : 20|1@0+(1,0)[0|1] "" Vector__XXX\r\n'

ebroecker · 2023-12-12T12:25:45Z

Hi @Liluoquan

I did not read your issue completely the fist time - sorry.

You already provided a potential fix. Thanks for it!
I'll add your provided fix soon.

ebroecker self-assigned this Dec 12, 2023

ebroecker added the bug label Dec 12, 2023

ebroecker added the merged label Dec 20, 2023

ebroecker closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many errors are reported when parsing the dbc with Chinese characters and special characters #737

Many errors are reported when parsing the dbc with Chinese characters and special characters #737

Liluoquan commented Nov 24, 2023

ebroecker commented Nov 27, 2023

ebroecker commented Dec 4, 2023

Liluoquan commented Dec 7, 2023

ebroecker commented Dec 12, 2023

Many errors are reported when parsing the dbc with Chinese characters and special characters #737

Many errors are reported when parsing the dbc with Chinese characters and special characters #737

Comments

Liluoquan commented Nov 24, 2023

ebroecker commented Nov 27, 2023

ebroecker commented Dec 4, 2023

Liluoquan commented Dec 7, 2023

ebroecker commented Dec 12, 2023