Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some minor performance improvements #35

Merged
merged 33 commits into from
Jul 6, 2019
Merged

Some minor performance improvements #35

merged 33 commits into from
Jul 6, 2019

Conversation

NiLuJe
Copy link
Owner

@NiLuJe NiLuJe commented Jul 5, 2019

This initially started as an experiment in replacing the FBInkColor struct (which was basically just a triplet of uint8_t for R, G, B) with a magical union (FBInkPixel) that can hold a pixel representation, properly packed for our target bitdepths (i.e., g8, rgb565, bgra). My train of thought being that, on non-trivial bitdepths (i.e., >= 16bpp), we were storing/using pen-colors as a single g8 value, and constantly repacking it once per pixel.

It turned out that the overhead of doing that wasn't as expensive as I would have thought, so, that bit of the PR actually has a minimal impact on performance ;p.
It does make most things slightly more sensible to read (except maybe button_scan), so, I'll take that ;p.

What did noticeably affect performance is switching the scaling in the fixed-cell rendering from a naive rectangle (well, square) per-pixel plotting to using fill_rect instead, because, duh. And since fill_rect is (usually) using memset, zoom!
This uncovered a sneaky issue with fill_rect @ 16bpp, because nothing except black & white actually packs into two identical bytes @ RGB565. So, fixed that by switching to a naive pixel plotting routine if needed.

What also very, very, very moderately affected performance is switching from the Get/Put Pixel function pointer to an if ladder in get/put_pixel. Probably because of the CPU's branch predictor.
That's not the first time I did this experiment, except I usually tried it with a switch: turns out that was noticeably worse, here.

NiLuJe added 30 commits July 1, 2019 20:16
Get rid of FBInkColor, using something that matches the target pixel
format directly instead, so we don't have to spend extra time repacking
pixels.

Mainly aimed at the non-image code, where we only ever use two different
colors, so repacking those every pixel felt very stupid.

Obviously has no impact on 8/4bpp, where this is not a concern.

WIP, currently MINIMAL builds, and the results are... underwhelming :D.
(It's veeeeeeery slightly faster, but then again, a side-effect of a
MINIMAL build is that they're by default a tiiiiny bit faster, too.
So, compared to a MINIMAL build from master, the lead is even harder to
see).
Because I was wondering why a quick'n dirty test was twice as fast
before realizing I'd hard-coded a put_pixel_*() shortcut instead of
using put_pixel() ;).
Do it once at init time.

That only leaves inversion to handle at runtime (which involves
repacking @ 16bpp, because rgb565 is terrible).
plotting a rectangle pixel-per-pixel to handle scaling.

Also fix a corner-case issue with fill_rect being fed OOB x/y values
that this unearthed ;)
Unlike a switch, this appears to behave pretty well...
Doesn't change much in practice, but, hey.
Any decent compiler should have already been doing that, so this doesn't
change much.
This is subtly broken @ 16bpp, because we're not packing to rgb565 when
the color isn't one of our two pen colors...
Slightly clunkier than before...
Pass the bg/fg pixel pointers directly instead of assigning them to
pixel first.
As early as an A8, branching appears to be ever so slightly better.
Because it turned out to be extremely verbose since the fixed cell
rendering switched to using fill_rect (a lot) :D.
@NiLuJe
Copy link
Owner Author

NiLuJe commented Jul 5, 2019

As expected, this has almost no impact on 8bpp, because you can't beat that for simplicity ;).

The main aim was 32bpp, but it turns out it noticeably helped 4bpp fbs, for some mysterious reason.
16bpp remains terrible, because RGB565, but it does help a bit there, too.

@NiLuJe
Copy link
Owner Author

NiLuJe commented Jul 5, 2019

Also uncovered some more 4bpp corner-cases that I won't bother to fix, because dealing with nibbles gives me headaches -_-".

(In short: using an odd scaling factor in the fixed-cell rendering codepath will lead to artefacting on the edges, especially in overlay mode).

(There are workarounds in place to deal with the same kind of issues in the OT codepath, as well as in dump/restore. AFAICT, this would be trickier to do here.).

Forgot to rotate the coords -_-".

Note that this makes it noticeably slower than before, because it's no
longer using memset.
On the upside, the color is now accurate ;).
where it's unused.

(Essentially, flips it to false in a few 4bpp codepaths where I
previously set it to true).

No effective change, as this is meaningless @ 4bpp ;).
@NiLuJe NiLuJe merged commit dffe5b7 into master Jul 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant