Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database as_dbc_string() TypeError using bitstruct.c #271

Closed
lhh31 opened this issue Feb 5, 2021 · 18 comments
Closed

database as_dbc_string() TypeError using bitstruct.c #271

lhh31 opened this issue Feb 5, 2021 · 18 comments

Comments

@lhh31
Copy link

lhh31 commented Feb 5, 2021

Python Version: 3.8.5
Cantools Version 34.2.0+
OS: Windows 10 (1909)

This works fine in cantools version 34.1.0. From the commits it appears there was a change from using bistruct to using bitstruct.c. Reverting this change in 34.2.0 does fix the issue.

Further analysis shows bitstruct was at version 8.10.0, updating to 8.11.1 can also resolve the issue.

I am no expert but I think this means that the setup.py needs to have the required bitstruct version upped?

Test code

import cantools
dbc = cantools.database.load_file("vehicle.dbc",strict=False)
cantools.database.dump_file(dbc,"vehicle2.dbc")

Error Messages

  File "c:\anaconda3\envs\py38-logger-tool\lib\site-packages\cantools\database\__init__.py", line 216, in dump_file
    output = database.as_dbc_string()
  File "c:\anaconda3\envs\py38-logger-tool\lib\site-packages\cantools\database\can\database.py", line 279, in as_dbc_string
    return dbc.dump_string(InternalDatabase(self._messages,
  File "c:\anaconda3\envs\py38-logger-tool\lib\site-packages\cantools\database\can\formats\dbc.py", line 1524, in dump_string
    database = deepcopy(database)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 270, in _reconstruct
    state = deepcopy(state, memo)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 205, in _deepcopy_list
    append(deepcopy(a, memo))
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 270, in _reconstruct
    state = deepcopy(state, memo)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 264, in _reconstruct
    y = func(*args)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 263, in <genexpr>
    args = (deepcopy(arg, memo) for arg in args)
  File "c:\anaconda3\envs\py38-logger-tool\lib\copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle 'bitstruct.c.CompiledFormatDict' object
@JulianWgs
Copy link
Contributor

Hello,

I'm having the same issue as I want to use the decode function with multiprocessing for my project. The pickling does not work.

It's this commit which introduces the error, especially these lines.

For a quick fix I would propose to make it configurable which version is imported (bitstruct vs bitstruct.c). In the current code if it is possible bitstruct.c is always imported.

A long term solution would be to make bitstruct.c pickleable, but I dont know how. Is this possible?

@eerimoq Do you need help implementing any of the changes?

Best regards
Julian

@eerimoq
Copy link
Collaborator

eerimoq commented Aug 17, 2021

@andlaus is the new maintainer, and I'm sure he appreciates all help he can get.

@andlaus
Copy link
Member

andlaus commented Aug 17, 2021

yes, help is always appreciated :)

I guess that in this concrete case, I might need to come back to you if changes to the bistruct or bitstruct.c turn out to be the best solution. My current naive view is that bitstruct.c should be replaced by bitstruct if the performance impact is negligible, or if it is not, all bitstruct.c objects should become pickleable. Let's see, it should not be too hard. (Famous last words.)

@JulianWgs
Copy link
Contributor

JulianWgs commented Aug 18, 2021

Pickling support

It is possible to make c extensions pickleable. See more information here. The link is wrong in the post, but the edit is too short to fix it. Here is the correct link to PyMethodDef. Unfortunately I still dont quite understand the way forward with this as I never have written a c extension in Python. May be this code is relevant?

Benchmarking

I've just benchmarked the first test case of bitstruct and the c implementation is much faster:

%timeit packed = pack('u1u1s6u7u9', 0, 0, -2, 65, 22)

# bitstruct.c
# 239 ns ± 1.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

# bitstruct
# 18 µs ± 336 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Also looking at test, but probably more suited for another issue: cantools and bitstruct would be greatly suitable for hypothesis.

This is a very low level test, here is a more practical benchmark.

import cantools

db = cantools.database.Database()
db.add_dbc_string("""
BO_ 257 TestMessage: 6 Vector__XXX
 SG_ TestSignal1 : 16|16@1- (1,0) [0|0] "" Vector__XXX
 SG_ TestSignal2 : 32|16@1- (1,0) [0|0] "" Vector__XXX
 SG_ TestSignal3 : 0|16@1- (1,0) [0|0] "" Vector__XXX
""")

%timeit db.decode_message(0x101, b"FFFFFF")

# bitstruct.c
# 7.04 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# bitstruct
# 11.2 µs ± 185 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For this very specific case bitstruct.c is not much faster. Of course this does not give the full picture for every decoding operation. May be it is just a cherry picked example.

I did a little more digging and used line_profiler to see where time is spent:

bitstruct

Timer unit: 1e-06 s

Total time: 0.000253 s
Function: decode_data at line 102

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   102                                           def decode_data(data, fields, formats, decode_choices, scaling):
   103         1         58.0     58.0     22.9      unpacked = formats.big_endian.unpack(bytes(data))
   104         1         82.0     82.0     32.4      unpacked.update(formats.little_endian.unpack(bytes(data[::-1])))
   105                                           
   106         2        111.0     55.5     43.9      return {
   107                                                   field.name: _decode_field(field,
   108                                                                             unpacked[field.name],
   109                                                                             decode_choices,
   110                                                                             scaling)
   111         1          2.0      2.0      0.8          for field in fields
   112                                               }

bitstruct.c

Timer unit: 1e-06 s

Total time: 7.6e-05 s
Function: decode_data at line 102

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   102                                           def decode_data(data, fields, formats, decode_choices, scaling):
   103         1          7.0      7.0      9.2      unpacked = formats.big_endian.unpack(bytes(data))
   104         1          6.0      6.0      7.9      unpacked.update(formats.little_endian.unpack(bytes(data[::-1])))
   105                                           
   106         2         62.0     31.0     81.6      return {
   107                                                   field.name: _decode_field(field,
   108                                                                             unpacked[field.name],
   109                                                                             decode_choices,
   110                                                                             scaling)
   111         1          1.0      1.0      1.3          for field in fields
   112                                               }

The part where bitstruct is used is only on line 103 and 104. The dict comprehension is still as slow and _decode_field is a pure Python function.

I hope this helps. All points are open for discussion and feedback is welcome.

Best regards
Julian

@andlaus
Copy link
Member

andlaus commented Aug 25, 2021

hm, this is not exactly what I meant: I am aware that, on its own, bitstruct.c is much faster than bitstruct, but I was wondering how much impact this has on a holistic level. For this purpose, I've modified the decode subparser to measure the relevant timings plus disabling the terminal output and then set it to work on a trace containing ~85000 frames using an ARXML database. Here are the performance results:

# bitstruct.c
Setup time: 8.41
Decode time: 2.01

# bitstruct
Setup time: 8.53
Decode time: 3.07

The "Setup time" is the time required to parse the database, while "Decode time" is the time required to decode the raw CAN frames via database.decode_message() (as you can see from the timing, the database is quite large). From the results above, I conclude that bitstruct.c indeed has a significant impact on decoding performance, but also that the regular bitstruct module seems to be "fast enough" in most cases, in particular when considering scenarios that involve a "live" bus instead of decoding a trace as fast as possible (the trace used here captures about two minutes of an almost fully loaded 500kbps bus).

I guess that I will give making bitstruct.c codecs pickleable a shot in the future anyway, but I will not press very hard on this front...

@leunark
Copy link

leunark commented Nov 16, 2021

I'm having the same problem getting the "can't pickle bitstruct.c.CompiledFormatDict objects" exception when trying to use the cache for faster loading. Also I have huge blfs with a lot of messages that need to be decoded. By utilising multiprocessing the dbc object has to be passed as a parameter but since it is not fully pickable with the bitstruct.c integration, I can't utilise the benefits of multiprocessing. What I've done so far is checking out the repo, and enforcing the use of bitstruct instead of bitstruct.c by reversing this commit.

I'm really relying on the dbc object being picklable because although the bistruct.c is faster, with bitstruct the blf files I'm using can be decoded 50% faster. Can we just make the use of bitstruct.c or bitstruct optional?

@andlaus
Copy link
Member

andlaus commented Nov 17, 2021

Can we just make the use of bitstruct.c or bitstruct optional?

bitstruct is pretty essential w.r.t. encoding and decoding messages, so making it optional is (probably) not in the cards. I agree pickleability is more important than performance, though, i.e. the default should be switched to bitstruct. Mind opening a PR to that end?

@leunark
Copy link

leunark commented Nov 17, 2021

Mind opening a PR to that end?

Yes, I can do that tomorrow.

@eerimoq
Copy link
Collaborator

eerimoq commented Nov 18, 2021

If there is a problem with bitstruct, please open an issue on that project.

https://github.com/eerimoq/bitstruct

@andlaus
Copy link
Member

andlaus commented Nov 18, 2021

done: eerimoq/bitstruct#24

@leunark
Copy link

leunark commented Nov 18, 2021

Is it then still needed to open a PR to switch the default to bitstruct? If the pickleable issue for bitstruct.c gets fixed soon, this will just lead to switching the default back to bitstruct.c after updating it.
Also, maybe it is smart to integrate a test to check if can.Database remains pickleable?

@andlaus
Copy link
Member

andlaus commented Nov 18, 2021

If the pickleable issue for bitstruct.c gets fixed soon, this will just lead to switching the default back to bitstruct.c after updating it.

yes. This hopefully will not cause us any headaches, though...

Also, maybe it is smart to integrate a test to check if can.Database remains pickleable?

I agree. But for this, it first needs to become pickleable ;)

@eerimoq: Do you intend to fix this issue or shall I give it a go? (the latter will probably take substantially longer.)

@eerimoq
Copy link
Collaborator

eerimoq commented Nov 19, 2021

@andlaus I'll give it a try during the weekend, so no need for you to work on it right now.

@andlaus
Copy link
Member

andlaus commented Nov 22, 2021

since eerimoq/bitstruct#24, should have addressed that issue, I'm closing it. feel free to re-open if the problem still persists...

@andlaus andlaus closed this as completed Nov 22, 2021
@juleq
Copy link
Collaborator

juleq commented Nov 22, 2021

@andlaus Is the revision of bitstruct pinned or will it update on the next cantools release?

@andlaus
Copy link
Member

andlaus commented Nov 22, 2021

requirements.txt says it is unpinned. could you tag a new cantools release? this issue seemed to have affected more people than usual...

@juleq
Copy link
Collaborator

juleq commented Nov 22, 2021

Have released. Has already propagated through for my by now.

@juleq
Copy link
Collaborator

juleq commented Nov 22, 2021

bitstruct did not update when doing pip cantools uninstall/install. Will make another release to add require bitstruct >= 8.12.1. Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants