-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keybow 2040: Speed up RGB. #982
Conversation
c1b5c1b
to
4026aa6
Compare
Note: All my claimed "FPS" measurements don't include the processing time for the RGB effect, but the transaction time gains are significant enough for this not to matter. Before this change I counted roughly 6 I2C transactions per pixel, for a total of 96 at 100KHz, giving a 60-80ms transaction time, for an absolute best-case 16FPS. After this change, it's three I2C transactions - one to set the bank to the current frame, one to transmit the data and a final one to set the new frame (we're flipping back and forth between frame 0 and frame 1) to be visible. This takes a total of ~6ms- or a 10x speedup. I can drive my Keybow 2040 up to ~650KHz for ~200 updates/second, though I'm not going to push a change to do that based on a sample size of 1 😆 I've split this into multiple commits to hopefully - somewhat - document my reasoning. I have opted not to delete |
ff26328
to
41bc224
Compare
Replace slow RGB wrapper for is31fl3731 driver with a custom implementation which double-buffers with two frames and writes pixels in batches (or all 144 in bulk). Speeds up default RGB breath effect from ~12FPS to ~60FPS. Fixes KMKfw#859. Signed-off-by: Phil Howard <github@gadgetoid.com>
41bc224
to
9b94ff3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good in general
Did you also try the is31fl3731 driver that is built into CP?
@staticmethod | ||
def pixel_addr(x): | ||
return [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to make this a class attribute, like so:
class Keybow2040Leds(PixelBuf):
pixel_addr = (
(120, 88, 104), ...
)
# ... in _transmit
r, g, b = Keybow2040Leds.pixel_addr[x]
Semantics aside, I'd be interested if there's a noticable performance difference. I'm not sure if your version rebuilds the array for every call and thus requires dynamic memory allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good shout- this was one of the few bits I didn’t rip out of the original library but there should be some minor gains to grab here too.
I don’t actually know if it rebuilds either- I would have expected the initial Python-> bytecode step to somehow magically make this efficient, but I’ve probably been around C compilers too much. ( and I’ll be darned if I understand those either )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay back of the napkin maths suggests this is a 30% performance improvement on top of my best case performance, really nice catch!
This really gets us to a 166Hz update rate which I mistakenly reported above before realising I was running my I2C bus at 650KHz.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a pixel_addr = self._pixel_addr
before the hot loop to bring the list into the local scope offers a very tiny performance improvement, too.
Interesting, I did not know about this! Is it possible to subclass so I can handle all the weird pixel ordering? 🤔 though I guess a wrapper won’t hurt. I’d expect this to be a little faster so it’s definitely worth a try! |
No need to subclass:
|
Haha I had not read up, since I was on my phone. Looking into it now! Thank you. Edit: Ah it's only available in the "Adafruit LED Glasses Driver nRF52840" build, though that doesn't preclude it from being added to Keybow 2040 for this very purpose. I'll look into building a firmware and see if it's worth the effort- if it is I'll catch Edit 2: Okay I have had absolutely no luck getting it up and running. Managed to built it into a custom CircuitPython build, but looking over the code it doesn't seem to enable the LEDs or quite do what I'm after. A shame! |
97b7c4c
to
b254bc8
Compare
Speed up Keybow 2040 I2C further to ~5ms (200FPS) by increasing from the default 100KHz to 400KHz I2C. Since Keybow 2040 does not include a Qw'St connector, we need not worry about other I2C devices. Signed-off-by: Phil Howard <github@gadgetoid.com>
Write all 144 LED values (only 48 are used) in one transaction. This is faster with 400KHz than trying to be smart. Signed-off-by: Phil Howard <github@gadgetoid.com>
Speed up RGB a little more by skipping the first 17 and last few unused elements. Signed-off-by: Phil Howard <github@gadgetoid.com>
Create the address lookup only once for a further 30% performance boost. Co-authored-by: xs5871 <xs5871@qn-4.net> Signed-off-by: Phil Howard <github@gadgetoid.com>
Bring references to the output buffer and address lookup into the hot path, gives around a .7ms speedup by using optimised (local scope first) qstr lookups. Signed-off-by: Phil Howard <github@gadgetoid.com>
bc5f43e
to
306d123
Compare
Remove hot path conversion to bytes and drop double-buffering which had no visible impact. Reduce the size of the output buffer to remove leading 17 and trailing 4 bytes, and prime the buffer with the first LED address. Drop offset multiply and cache LED offsets in address lookup. Iterate through address lookup and unpack offset, r, g, b. Update time is now ~4ms and much more stable. Signed-off-by: Phil Howard <github@gadgetoid.com>
306d123
to
dcaaae0
Compare
Okay, despite failing to get anywhere with the C driver I've dramatically simplified the hot path and removed the double-buffering which should be doing effectively nothing (if the i2c transfer takes <4ms we're not going to see any tearing anyway). This gets us to a very stable < 4ms conversion and transfer time for a theoretical (static lighting) 250FPS. According to my very technical "using a stopwatch on my phone" tests we're now getting roughly 60FPS in the RGB fade effect. Changing from I also refactored out the multiply and then rewrote the result for clarity and brevity by iterating the address lookup and unpacking it to I think I need to stop optimising this now... 😆 I've practically optimised out all the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let's go with that.
Replace slow RGB wrapper for is31fl3731 driver with a custom implementation which double-buffers with two frames and writes pixels in batches (or all 144 in bulk).
Speeds up default RGB breath effect from ~12FPS to ~60FPS.
Fixes #859.