Skip to content

Commit

Permalink
Also fix RBGA4 (untested)
Browse files Browse the repository at this point in the history
  • Loading branch information
AuroraWright committed Sep 23, 2017
1 parent a39adc8 commit f27cdb4
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions sysmodules/rosalina/source/draw.c
Original file line number Diff line number Diff line change
Expand Up @@ -246,9 +246,9 @@ static inline void Draw_ConvertPixelToBGR8(u8 *dst, const u8 *src, GSPGPU_Frameb
case GSP_RGBA4_OES:
{
u16 px = *(u32 *)src;
blue = px & 0xF;
green = (px >> 4) & 0xF;
red = (px >> 8) & 0xF;
blue = (px >> 4) & 0xF;
green = (px >> 8) & 0xF;
red = (px >> 12) & 0xF;

dst[0] = (blue << 4) | (blue >> 4);
dst[1] = (green << 4) | (green >> 4);
Expand Down

5 comments on commit f27cdb4

@Metzelmaennchen
Copy link

@Metzelmaennchen Metzelmaennchen commented on f27cdb4 Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear Aurora,
What's this ;)
blue = (px >> 4) & 0xF;
dst[0] = (blue << 4) | (blue >> 4);
The cast to 0xF keeps only the lower 4 bits. So the second term, shifting right by four, leaves always zero.
As a result, the darker colors will never show up. Okay color complete black does.
It's the same as dst[0] = (blue << 4);
I currently don't know for what dst is used on return of function but I assume that this is a conversion from 4b to 8b. In my opinion the colors should be scaled. Devide max 8b with max 4b. We do 255/15=17:
dst[x] = (color * 17);

@9ary
Copy link
Contributor

@9ary 9ary commented on f27cdb4 Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've researched this when I fixed the RGB5_A1 format, it turns out that if you scale the colors linearly, the lower bits are the same as the upper bits, more or less 1. This is accurate enough, it's faster than multiplication*, and it does not pointlessly waste space like a lookup table would.
You are right though, the bits should not be shifted to the right by 4, but by 0. That is definitely a mistake. I'll make a PR next time I compile Luma on my side.

* Multiplying by a fixed step is wrong, you need to do a float division, then multiplication, and then go back to integer. Or you can use wider integers as intermediates, multiply first and divide second, but ARM11 doesn't even have a divide instruction, so that path is also slow.

@Metzelmaennchen
Copy link

@Metzelmaennchen Metzelmaennchen commented on f27cdb4 Oct 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me, I would like to know whats wrong with my thoughts. First float division, then multiplication:
255.0f
------- * (float)color = outColor;
15.0f

is just the same as doing
17 * color = outColor;

Btw. is something like NEON supported on the 3DS ?

@profi200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No NEON. Just vfpv2 (hardware floating point). Using float would not make a difference in that code. It is already accounting for precision loss.

@9ary
Copy link
Contributor

@9ary 9ary commented on f27cdb4 Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is just the same as doing
17 * color = outColor;

This happens to be true for 4 bits, but not for 5 or 3, where the multiplier is not an integer. In all cases, using the upper bits as the lower ones happens to be a very good approximation (off by one at most like I said), and with 4 bits, it's always exact.

Please sign in to comment.