Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues when writing lots of data #79

Closed
kbeckmann opened this issue Nov 24, 2018 · 2 comments
Closed

Performance issues when writing lots of data #79

kbeckmann opened this issue Nov 24, 2018 · 2 comments
Labels
software Component: software

Comments

@kbeckmann
Copy link
Contributor

kbeckmann commented Nov 24, 2018

$ python -m glasgow.cli run benchmark sink -c 1000000
I: glasgow.applet.benchmark: running benchmark mode sink for 1.907 MiB
I: glasgow.applet.benchmark: mode sink: 1.182 MiB/s
$ python -m glasgow.cli run benchmark sink -c 5000000 
I: glasgow.applet.benchmark: running benchmark mode sink for 9.537 MiB
I: glasgow.applet.benchmark: mode sink: 0.516 MiB/s
$ python -m glasgow.cli run benchmark sink -c 10000000
I: glasgow.applet.benchmark: running benchmark mode sink for 19.073 MiB
I: glasgow.applet.benchmark: mode sink: 0.302 MiB/s

I fixed this by using arrays of bytearrays in the class DirectDemultiplexerInterface. However when I do this I get some weird errors from libusb if I send lots of data. Maybe I just found a new bug instead, not sure!

With my patch:

$ python -m glasgow.cli run benchmark sink -c 1000000 
I: glasgow.applet.benchmark: running benchmark mode sink for 1.907 MiB
I: glasgow.applet.benchmark: mode sink: 18.284 MiB/s
$ python -m glasgow.cli run benchmark sink -c 5000000
I: glasgow.applet.benchmark: running benchmark mode sink for 9.537 MiB
I: glasgow.applet.benchmark: mode sink: 18.310 MiB/s
$ python -m glasgow.cli run benchmark sink -c 10000000
I: glasgow.applet.benchmark: running benchmark mode sink for 19.073 MiB
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/konrad/dev/Glasgow/software/glasgow/cli.py", line 694, in <module>
    main()
  File "/home/konrad/dev/Glasgow/software/glasgow/cli.py", line 690, in main
    exit(loop.run_until_complete(_main()))
  File "/usr/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
    return future.result()
  File "/home/konrad/dev/Glasgow/software/glasgow/cli.py", line 485, in _main
    await task
  File "/home/konrad/dev/Glasgow/software/glasgow/cli.py", line 474, in run_applet
    iface = await applet.run(device, args)
  File "/home/konrad/dev/Glasgow/software/glasgow/applet/benchmark/__init__.py", line 141, in run
    await iface.write(golden)
  File "/home/konrad/dev/Glasgow/software/glasgow/access/direct/demultiplexer.py", line 114, in write
    await self._write_packet()
  File "/home/konrad/dev/Glasgow/software/glasgow/access/direct/demultiplexer.py", line 106, in _write_packet
    await self.device.bulk_write(self._endpoint_out, packet)
  File "/home/konrad/dev/Glasgow/software/glasgow/device/hardware.py", line 182, in bulk_write
    await self._do_transfer(is_read=False, setup=lambda transfer:
  File "/home/konrad/dev/Glasgow/software/glasgow/device/hardware.py", line 148, in _do_transfer
    transfer.submit()
  File "/usr/lib/python3.7/site-packages/usb1/__init__.py", line 821, in submit
    raiseUSBError(result)
  File "/usr/lib/python3.7/site-packages/usb1/__init__.py", line 125, in raiseUSBError
    raise __STATUS_TO_EXCEPTION_DICT.get(value, __USBError)(value)
usb1.USBErrorIO: LIBUSB_ERROR_IO [-1]

The patch in question:

diff --git a/software/glasgow/access/direct/demultiplexer.py b/software/glasgow/access/direct/demultiplexer.py
index e9a6cdc..bd949ed 100644
--- a/software/glasgow/access/direct/demultiplexer.py
+++ b/software/glasgow/access/direct/demultiplexer.py
@@ -63,7 +63,7 @@ class DirectDemultiplexerInterface(AccessDemultiplexerInterface):
 
         self._interface  = self.device.usb.claimInterface(self._pipe_num)
         self._buffer_in  = bytearray()
-        self._buffer_out = bytearray()
+        self._buffer_out = []
 
     async def reset(self):
         self.logger.trace("asserting reset")
@@ -102,19 +102,16 @@ class DirectDemultiplexerInterface(AccessDemultiplexerInterface):
         return result
 
     async def _write_packet(self):
-        packet = self._buffer_out[:self._out_packet_size]
-        self._buffer_out = self._buffer_out[self._out_packet_size:]
+        packet = self._buffer_out.pop(0)
         await self.device.bulk_write(self._endpoint_out, packet)
 
     async def write(self, data):
         if not isinstance(data, (bytes, bytearray)):
             data = bytes(data)
 
-        self.logger.trace("FIFO: write <%s>", data.hex())
-        self._buffer_out += data
-
-        if len(self._buffer_out) > self._out_packet_size:
-            await self._write_packet()
+        #self.logger.trace("FIFO: write <%s>", data.hex())
+        self._buffer_out.append(data)
+        await self._write_packet()
 
     async def flush(self):
         self.logger.trace("FIFO: flush")

It also helped to comment out the trace log even if I don't run with -v. Compile-time logs would probably fix this but I'm not a python-ninja so have no clue how to do that properly. (Goes from 18.x to 17.x MB/s when that line is commented out, so it's not a huge problem but still noticeable.)

@whitequark whitequark added the software Component: software label Nov 24, 2018
@whitequark
Copy link
Member

I fixed this by using arrays of bytearrays in the class DirectDemultiplexerInterface.

Nice! I was aware that there is a bottleneck somewhere but deliberately did not pursue it so far. Thanks for tracking it down!

Compile-time logs would probably fix this but I'm not a python-ninja so have no clue how to do that properly.

This has been on my plans for a while. I'm not yet sure what's the best way to do it, I think some decorator and bytecode hacking could help, but it's somewhat of an open problem.

However when I do this I get some weird errors from libusb if I send lots of data.

Yeah. So, on Linux, you cannot send a chunk more than a few MB in size (I think I got about 16 MiB in my tests) with libusb. I think it's a kernel limitation, or possibly HCI driver. I'm not sure if it's documented anywhere, I certainly haven't found one.

This is why the Python code chunks the packets right now. (Should've probably added a comment there.)

Think you could investigate alternative chunking strategies? If I was doing it, I would have added a new class, perhaps ChunkedFIFO, which would do the following:

  • Store bytes or bytearray objects provided via .push() in a FIFO structure;
  • Each time .pop() is called, locate the object at the head;
  • If it is under some "safe limit" (1 MiB?), return it directly;
  • Otherwise, return Python's "memory view" (zero-copy view into another buffer) pointing into the chunk, and track current offset.

But you're free to go with another strategy if I'm missing something.

whitequark added a commit that referenced this issue Nov 29, 2018
@kbeckmann
Copy link
Contributor Author

Sorry I had quite a busy week behind me, thanks for fixing it! It seems to work really well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
software Component: software
Development

No branches or pull requests

2 participants