Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many errors are reported when parsing the dbc with Chinese characters and special characters #737

Closed
Liluoquan opened this issue Nov 24, 2023 · 4 comments
Assignees

Comments

@Liluoquan
Copy link

When I use canmatrix to load DBC with signals containing Chinese characters and special characters, like:
_matrix = canmatrix.formats.dbc.load(f, dbcImportEncoding=encoding)
errors reported like this:

error with line no: 2004
b' SG_ PSDCU_RR\xe4\xb8\xbb\xe8\xbd\xaf\xe4\xbb\xb6\xe7\x89\x88\xe6\x9c\xac\xe5\x8f\xb7$_W : 63|8@0+(1,0)[0|255] "" Vector__XXX\r\n'

the original line like this:
SG_ 冗余制动降级状态$_W : 23|3@0+(1,0)[0|7] "" Vector__XXX
then I find canmatrix use regex to match each line in the dbc, it uses the following regex when processing lines starting with'SG_':
pattern = r"^SG_ +(\w+) *: *(\d+)\|(\d+)@(\d+)([\+|\-]) *\(([0-9.+\-eE]+), *([0-9.+\-eE]+)\) *\[([0-9.+\-eE]+)\|([0-9.+\-eE]+)\] +\"(.*)\" +(.*)"
regex group (\w+) cannot match Chinese characters or special characters in python3.8, so I suggest to change the regex above into:
pattern = r"^SG_ +(\S+) *: *(\d+)\|(\d+)@(\d+)([\+|\-]) *\(([0-9.+\-eE]+), *([0-9.+\-eE]+)\) *\[([0-9.+\-eE]+)\|([0-9.+\-eE]+)\] +\"(.*)\" +(.*)"
To adapt to the scenarios mentioned in the issue.
Please reply, it's very important to me!

@ebroecker
Copy link
Owner

Hi @Liluoquan

you have to specify the encoding "dbcImportEncoding".

maybe something like dbcImportEncoding="utf8"

@ebroecker
Copy link
Owner

Hi @Liluoquan

any success?

@Liluoquan
Copy link
Author

Hi @ebroecker
sorry, it didn't work when i use utf-8, GB2312 or gbk:
_matrix = canmatrix.formats.dbc.load(f, dbcImportEncoding='utf-8')
The error is as follows:

error with line no: 28
b' SG_ \xca\xfd\xd7\xd6\xd6\xa4\xca\xe9\xb4\xe6\xb4\xa2\xb9\xca\xd5\xcf$_W : 20|1@0+(1,0)[0|1] "" Vector__XXX\r\n'

@ebroecker ebroecker self-assigned this Dec 12, 2023
@ebroecker
Copy link
Owner

Hi @Liluoquan

I did not read your issue completely the fist time - sorry.

You already provided a potential fix. Thanks for it!
I'll add your provided fix soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants