Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keybow 2040: Speed up RGB. #982

Merged
merged 7 commits into from
Jun 12, 2024
Merged

Conversation

Gadgetoid
Copy link
Contributor

Replace slow RGB wrapper for is31fl3731 driver with a custom implementation which double-buffers with two frames and writes pixels in batches (or all 144 in bulk).

Speeds up default RGB breath effect from ~12FPS to ~60FPS.

Fixes #859.

@Gadgetoid Gadgetoid force-pushed the keybow2040/fast-leds branch 2 times, most recently from c1b5c1b to 4026aa6 Compare June 9, 2024 22:32
@Gadgetoid
Copy link
Contributor Author

Note: All my claimed "FPS" measurements don't include the processing time for the RGB effect, but the transaction time gains are significant enough for this not to matter.

Before this change I counted roughly 6 I2C transactions per pixel, for a total of 96 at 100KHz, giving a 60-80ms transaction time, for an absolute best-case 16FPS.

After this change, it's three I2C transactions - one to set the bank to the current frame, one to transmit the data and a final one to set the new frame (we're flipping back and forth between frame 0 and frame 1) to be visible. This takes a total of ~6ms- or a 10x speedup.

I can drive my Keybow 2040 up to ~650KHz for ~200 updates/second, though I'm not going to push a change to do that based on a sample size of 1 😆

I've split this into multiple commits to hopefully - somewhat - document my reasoning.

I have opted not to delete is31fl3731_pixelbuf.py since it still works, but it's effectively redundant. Let me know if I should do that.

Replace slow RGB wrapper for is31fl3731 driver with a custom implementation
which double-buffers with two frames and writes pixels in batches (or all 144 in bulk).

Speeds up default RGB breath effect from ~12FPS to ~60FPS.

Fixes KMKfw#859.

Signed-off-by: Phil Howard <github@gadgetoid.com>
Copy link
Collaborator

@xs5871 xs5871 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good in general

Did you also try the is31fl3731 driver that is built into CP?

Comment on lines 65 to 67
@staticmethod
def pixel_addr(x):
return [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to make this a class attribute, like so:

class Keybow2040Leds(PixelBuf):
    pixel_addr = (
        (120, 88, 104), ...
    )
    # ... in _transmit
    r, g, b = Keybow2040Leds.pixel_addr[x]

Semantics aside, I'd be interested if there's a noticable performance difference. I'm not sure if your version rebuilds the array for every call and thus requires dynamic memory allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good shout- this was one of the few bits I didn’t rip out of the original library but there should be some minor gains to grab here too.

I don’t actually know if it rebuilds either- I would have expected the initial Python-> bytecode step to somehow magically make this efficient, but I’ve probably been around C compilers too much. ( and I’ll be darned if I understand those either )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay back of the napkin maths suggests this is a 30% performance improvement on top of my best case performance, really nice catch!

This really gets us to a 166Hz update rate which I mistakenly reported above before realising I was running my I2C bus at 650KHz.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a pixel_addr = self._pixel_addr before the hot loop to bring the list into the local scope offers a very tiny performance improvement, too.

@Gadgetoid
Copy link
Contributor Author

Did you also try the is31fl3731 driver that is built into CP?

Interesting, I did not know about this! Is it possible to subclass so I can handle all the weird pixel ordering? 🤔 though I guess a wrapper won’t hurt. I’d expect this to be a little faster so it’s definitely worth a try!

@xs5871
Copy link
Collaborator

xs5871 commented Jun 11, 2024

No need to subclass:

Parameters: [...] mapping (Tuple[int, ...]) – mapping of matrix locations to LEDs

@Gadgetoid
Copy link
Contributor Author

Gadgetoid commented Jun 11, 2024

Haha I had not read up, since I was on my phone. Looking into it now! Thank you.

Edit: Ah it's only available in the "Adafruit LED Glasses Driver nRF52840" build, though that doesn't preclude it from being added to Keybow 2040 for this very purpose. I'll look into building a firmware and see if it's worth the effort- if it is I'll catch ImportError and fall back to my Python version.

Edit 2: Okay I have had absolutely no luck getting it up and running. Managed to built it into a custom CircuitPython build, but looking over the code it doesn't seem to enable the LEDs or quite do what I'm after. A shame!

@Gadgetoid Gadgetoid force-pushed the keybow2040/fast-leds branch 2 times, most recently from 97b7c4c to b254bc8 Compare June 11, 2024 22:57
Gadgetoid and others added 5 commits June 11, 2024 23:59
Speed up Keybow 2040 I2C further to ~5ms (200FPS) by increasing from the default 100KHz to 400KHz I2C.

Since Keybow 2040 does not include a Qw'St connector, we need not worry about other I2C devices.

Signed-off-by: Phil Howard <github@gadgetoid.com>
Write all 144 LED values (only 48 are used) in one transaction.

This is faster with 400KHz than trying to be smart.

Signed-off-by: Phil Howard <github@gadgetoid.com>
Speed up RGB a little more by skipping the first 17 and last few unused elements.

Signed-off-by: Phil Howard <github@gadgetoid.com>
Create the address lookup only once for a further 30% performance boost.

Co-authored-by: xs5871 <xs5871@qn-4.net>
Signed-off-by: Phil Howard <github@gadgetoid.com>
Bring references to the output buffer and address lookup into the hot path,
gives around a .7ms speedup by using optimised (local scope first) qstr lookups.

Signed-off-by: Phil Howard <github@gadgetoid.com>
@Gadgetoid Gadgetoid force-pushed the keybow2040/fast-leds branch 3 times, most recently from bc5f43e to 306d123 Compare June 11, 2024 23:56
Remove hot path conversion to bytes and drop double-buffering which had no visible impact.

Reduce the size of the output buffer to remove leading 17 and trailing 4 bytes, and prime
the buffer with the first LED address.

Drop offset multiply and cache LED offsets in address lookup.

Iterate through address lookup and unpack offset, r, g, b.

Update time is now ~4ms and much more stable.

Signed-off-by: Phil Howard <github@gadgetoid.com>
@Gadgetoid
Copy link
Contributor Author

Okay, despite failing to get anywhere with the C driver I've dramatically simplified the hot path and removed the double-buffering which should be doing effectively nothing (if the i2c transfer takes <4ms we're not going to see any tearing anyway).

This gets us to a very stable < 4ms conversion and transfer time for a theoretical (static lighting) 250FPS. According to my very technical "using a stopwatch on my phone" tests we're now getting roughly 60FPS in the RGB fade effect.

Changing from i2c.write(bytes([_COLOR_OFFSET + 17]) + self.out_buffer[17:140]) to i2c.write(out) using some classic tricks (priming the buffer with the correct address, and sizing it correctly) also dramatically reduces the variation I was seeing in the update times bringing it to ~.1ms deviation where previously I was seeing up to 1ms.

I also refactored out the multiply and then rewrote the result for clarity and brevity by iterating the address lookup and unpacking it to o, r, g, b.

I think I need to stop optimising this now... 😆 I've practically optimised out all the code.

Copy link
Collaborator

@xs5871 xs5871 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's go with that.

@xs5871 xs5871 merged commit d223935 into KMKfw:main Jun 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] keybow2040: RGB causing infinite loop
2 participants