-
-
Notifications
You must be signed in to change notification settings - Fork 7
Optimization for ZX to LCD pixel rendering #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the ZX emulator's pixel rendering pipeline by implementing more efficient memory operations and reducing computational overhead when converting the ZX framebuffer to RGB565 LCD format.
Key changes:
- Replaces individual pixel writes with coalesced 32-bit writes (2x uint16_t pixels per write)
- Eliminates the inner X loop by processing pixels in pairs using bit manipulation
- Adds optimized paths for solid color rows and a 4-entry lookup table for mixed content
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
for (int y = 0; y < 8; y++) | ||
{ | ||
// read the value of the pixels | ||
int screenY = attrY * 8 + y; | ||
int col = ((attrX * 8) & B11111000) >> 3; | ||
int scan = (screenY & B11000000) + ((screenY & B111) << 3) + ((screenY & B111000) >> 3); | ||
uint8_t row = *(pixelBase + 32 * scan + col); | ||
uint8_t rowCopy = *(pixelBaseCopy + 32 * scan + col); | ||
int screenY = attrY * 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation of screenY
inside the loop is inefficient since it's constant for all iterations. Consider moving this calculation outside the loop as it was in the original code (screenY = attrY * 8 + y
).
Copilot uses AI. Check for mistakes.
*d32++ = u32Clr; | ||
*d32++ = u32Clr; | ||
*d32++ = u32Clr; | ||
pixelAddress += 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pixelAddress
increment is unnecessary and potentially incorrect. The pointer arithmetic already advances d32
by 4 positions (16 bytes = 8 uint16_t elements), so incrementing pixelAddress
by 8 additional elements may cause buffer overrun or incorrect positioning for subsequent operations.
Copilot uses AI. Check for mistakes.
*d32++ = u32Clr; | ||
*d32++ = u32Clr; | ||
*d32++ = u32Clr; | ||
pixelAddress += 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pixelAddress
increment is unnecessary and potentially incorrect. The pointer arithmetic already advances d32
by 4 positions (16 bytes = 8 uint16_t elements), so incrementing pixelAddress
by 8 additional elements may cause buffer overrun or incorrect positioning for subsequent operations.
Copilot uses AI. Check for mistakes.
I'll do some local testing and get this approved. There's a couple of things flagged up by Copliot - but it might not understand the code properly. |
The things it flagged are not important. Copilot is not useful for this type of change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally - it's a measurable improvement! Great work - thankyou!
This is a great improvement. |
This change improves the "pixel pipeline" speed of the ZX emulator by reducing the number of instructions needed to convert the ZX framebuffer into RGB565 LCD format. The changes include coalesced writes (2 x uint16_t into 1 uint32_t), removal of the X loop, removal of conditional tests and branches and the addition of a 4-entry lookup table.