-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some minor performance improvements #35
Conversation
Get rid of FBInkColor, using something that matches the target pixel format directly instead, so we don't have to spend extra time repacking pixels. Mainly aimed at the non-image code, where we only ever use two different colors, so repacking those every pixel felt very stupid. Obviously has no impact on 8/4bpp, where this is not a concern. WIP, currently MINIMAL builds, and the results are... underwhelming :D. (It's veeeeeeery slightly faster, but then again, a side-effect of a MINIMAL build is that they're by default a tiiiiny bit faster, too. So, compared to a MINIMAL build from master, the lead is even harder to see).
Because I was wondering why a quick'n dirty test was twice as fast before realizing I'd hard-coded a put_pixel_*() shortcut instead of using put_pixel() ;).
Do it once at init time. That only leaves inversion to handle at runtime (which involves repacking @ 16bpp, because rgb565 is terrible).
plotting a rectangle pixel-per-pixel to handle scaling. Also fix a corner-case issue with fill_rect being fed OOB x/y values that this unearthed ;)
Unlike a switch, this appears to behave pretty well...
Doesn't change much in practice, but, hey.
Any decent compiler should have already been doing that, so this doesn't change much.
This is subtly broken @ 16bpp, because we're not packing to rgb565 when the color isn't one of our two pen colors...
Did I mention how terrible RGB565 is?
Slightly clunkier than before...
Pass the bg/fg pixel pointers directly instead of assigning them to pixel first.
As early as an A8, branching appears to be ever so slightly better.
familiar eips on legacy 600x800 devices...
Because it turned out to be extremely verbose since the fixed cell rendering switched to using fill_rect (a lot) :D.
As expected, this has almost no impact on 8bpp, because you can't beat that for simplicity ;). The main aim was 32bpp, but it turns out it noticeably helped 4bpp fbs, for some mysterious reason. |
Also uncovered some more 4bpp corner-cases that I won't bother to fix, because dealing with nibbles gives me headaches -_-". (In short: using an odd scaling factor in the fixed-cell rendering codepath will lead to artefacting on the edges, especially in overlay mode). (There are workarounds in place to deal with the same kind of issues in the OT codepath, as well as in dump/restore. AFAICT, this would be trickier to do here.). |
Forgot to rotate the coords -_-". Note that this makes it noticeably slower than before, because it's no longer using memset. On the upside, the color is now accurate ;).
where it's unused. (Essentially, flips it to false in a few 4bpp codepaths where I previously set it to true). No effective change, as this is meaningless @ 4bpp ;).
This initially started as an experiment in replacing the
FBInkColor
struct (which was basically just a triplet of uint8_t for R, G, B) with a magical union (FBInkPixel
) that can hold a pixel representation, properly packed for our target bitdepths (i.e., g8, rgb565, bgra). My train of thought being that, on non-trivial bitdepths (i.e., >= 16bpp), we were storing/using pen-colors as a single g8 value, and constantly repacking it once per pixel.It turned out that the overhead of doing that wasn't as expensive as I would have thought, so, that bit of the PR actually has a minimal impact on performance ;p.
It does make most things slightly more sensible to read (except maybe button_scan), so, I'll take that ;p.
What did noticeably affect performance is switching the scaling in the fixed-cell rendering from a naive rectangle (well, square) per-pixel plotting to using fill_rect instead, because, duh. And since fill_rect is (usually) using memset, zoom!
This uncovered a sneaky issue with fill_rect @ 16bpp, because nothing except black & white actually packs into two identical bytes @ RGB565. So, fixed that by switching to a naive pixel plotting routine if needed.
What also very, very, very moderately affected performance is switching from the Get/Put Pixel function pointer to an if ladder in get/put_pixel. Probably because of the CPU's branch predictor.
That's not the first time I did this experiment, except I usually tried it with a switch: turns out that was noticeably worse, here.