Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename PixelFormat enum members BGR555 and BGR565 to RGB555 and RGB565 #2823

Merged
merged 2 commits into from Sep 7, 2023

Conversation

weirddan455
Copy link
Collaborator

These were mis-named. Red is stored in the high bits and blue is stored in the low bits.

This is just a re-name. Nothing functional changes in this PR.

I found this because colors were wrong in the demo scene stuff @johnnovak sent me. I was mapping FFmpeg's BGR enums when really they're stored as RGB (and a simple swap to the right FFmpeg fixed it).

Also, if you look at where these are decoded for image capturing, they get mapped to RGB helpers (not BGR):

case PixelFormat::BGR555: {
const auto p = host_to_le(
*reinterpret_cast<const uint16_t*>(pos));
pixel = Rgb555(p).ToRgb888();
} break;
case PixelFormat::BGR565: {
const auto p = host_to_le(
*reinterpret_cast<const uint16_t*>(pos));
pixel = Rgb565(p).ToRgb888();
} break;

@weirddan455 weirddan455 added the plumbing Issues related to low-level support functions and classes label Sep 3, 2023
@johnnovak
Copy link
Member

johnnovak commented Sep 3, 2023

Yeah, so you've just described BGR555 and BGR565 in your comments 😄 BGR in the PixelFormat enums refer to the memory layout, not the register layout!

See here:

image

(Reference: https://learn.microsoft.com/en-us/archive/msdn-magazine/2008/june/foundations-bitmaps-and-pixel-bits)


Looking at rgb555.h and rgb565.h, the memory layout of the data is BGR555 and BGR565, respectively. But it gets murky what people mean by BGR vs RGB in APIs, whether it's register or memory layout, I'll give you that.

I named these PixelFormat enums from the perspective of how the bitmap data is layed out in memory, maybe we should make that distinction clear. So I have a bitmap image as a memory blob, and it's tagged with a PixelFormat enum, all I care about is how the data is laid out in memory! Register layout is irrelevant there (but you're correct, knowing both is important for the code to work correctly, absolutely.) The memory layout is important to know, because this info is needed so you can decode the blob of data into pixels, in whatever register layout you want or need.

Yeah, and Rgb555, Rgb666 and Rgb888 are named fine as they are. They represent "RGB values", and that's it. Storage in memory or on disk is different. I think you're conflating/confusing these two concepts. I.e., Rgb555 could have support to read an RGB555 values from memory either assuming RGB555 or BGR555 memory layouts (or little and big-endian, but fortunately DOSBox only uses little-endian memory layout).

Happy to be proven wrong, but memory dumps please to prove it 😄

@johnnovak
Copy link
Member

johnnovak commented Sep 3, 2023

Ok, this Intel doco is a bit more "definitive" than the Microsoft one I linked above 😄

https://www.intel.com/content/www/us/en/docs/ipp/developer-reference/2021-7/rgb-image-formats.html

Note the term "memory layout"! The OpenGL pixel formats also always refer to memory layout. IMO, that's the sane way... SDL seems to use register-layout for its pixel formats, which I quite dislike.

image

This seems pretty sane to me, this is ultimately what I want to represent in PixelFormat, memory layouts. So I think I got it right, but please provide memory dumps of 15/16/24/32-bit hi/true-colour bitmap data, then we can make sure it 100% aligns with the above table.

Also, to be clear, my understanding is that DOSBox always stores the pixel data as little-endian, so the low-order byte is at the lower memory location. So:

byte 0: 1st BGR565 pixel, low-order byte
byte 1: 1st BGR565 pixel, high-order byte
byte 2: 2nd BGR565 pixel, low-order byte
byte 3: 2bd BGR565 pixel, high-order byte
...

Maybe @GranMinigun can tell me me I got everything completely backwards, but I doubt it 😄

@weirddan455
Copy link
Collaborator Author

weirddan455 commented Sep 3, 2023

OK I haven't fully read everything you wrote out there (I'm about to) but I'm just going to put this here. The comment I grabbed from FFmpeg header and this is how they describe the formats:

    AV_PIX_FMT_RGB565BE,  ///< packed RGB 5:6:5, 16bpp, (msb)   5R 6G 5B(lsb), big-endian
    AV_PIX_FMT_RGB565LE,  ///< packed RGB 5:6:5, 16bpp, (msb)   5R 6G 5B(lsb), little-endian
    AV_PIX_FMT_RGB555BE,  ///< packed RGB 5:5:5, 16bpp, (msb)1X 5R 5G 5B(lsb), big-endian   , X=unused/undefined
    AV_PIX_FMT_RGB555LE,  ///< packed RGB 5:5:5, 16bpp, (msb)1X 5R 5G 5B(lsb), little-endian, X=unused/undefined

    AV_PIX_FMT_BGR565BE,  ///< packed BGR 5:6:5, 16bpp, (msb)   5B 6G 5R(lsb), big-endian
    AV_PIX_FMT_BGR565LE,  ///< packed BGR 5:6:5, 16bpp, (msb)   5B 6G 5R(lsb), little-endian
    AV_PIX_FMT_BGR555BE,  ///< packed BGR 5:5:5, 16bpp, (msb)1X 5B 5G 5R(lsb), big-endian   , X=unused/undefined
    AV_PIX_FMT_BGR555LE,  ///< packed BGR 5:5:5, 16bpp, (msb)1X 5B 5G 5R(lsb), little-endian, X=unused/undefined

As you can see they are describing register layout plus endianess (there's also a macro not shown here that maps to native endiannes).

I've seen formats like RGBA be described as both (something like RGBA32 to describe a 32 bit register and RGBA8888 to describe 4 bytes layed out in memory).

However, since these formats are not 8 bit aligned on colors, it doesn't really make sense to describe it that way. You're always loading in a 16 bit word.

But yeah, let me finish reading the sources you linked and I'll get a memory dump.

@johnnovak
Copy link
Member

However, since these formats are not 8 bit aligned on colors, it doesn't really make sense to describe it that way. You're always loading in a 16 bit word.

I'm open to use a different PixelFormat interpretation. But to me the most useful thing to do is this: I have an array of bytes; what does it represent? I want PixelFormat to describe that with zero ambiguity, so I know how to decode the byte array. So that's why I'm leaning towards to do what the Intel doco says (and I think I've done that already, that's how it is now).

@johnnovak
Copy link
Member

johnnovak commented Sep 3, 2023

Ok, so this is my preference:

  • PixelFormat describes memory layout. If I screwed it up, it needs to be fixed so we use the same definitions as the Intel doco.
  • Rgb555, Rgb666, and Rgb888 return values in the desired register layout. So by default if you get the value as a uint16_t1 from Rgb555, it will give you RGB555 register layout.
  • This is actually how I think it all works now. But even if we don't need to change anything code wise, these details must be documented in a bit more detail.
  • Ideally, Rgb555 et all should have something called Rgb555.DecodeFrom(...) which takes a PixelFormat and a memory pointer to a byte array or something. So this would involve moving stuff from my image_decode.h into the Rgb* classes which is arguably the more correct way, as pixel format decoding from memory (raw byte arrays) should be reusable.

I think this would be the most correct way, without requiring the reader of the code to "just know" things or guess them. I just want to make all this explicit; you should be able to understand it in 2 minutes by reading the comments and the code.

@weirddan455
Copy link
Collaborator Author

OK, I have read everything. No memory dump yet. I'm actually not sure how to do that. I would ideally want a structured image (like a pure blue image followed by a pure red image or something). I could maybe write a DOS executable to do that but I'm not doing that tonight 😆

So here's why I think you're wrong about memory order for this particular format.

Pretend you have a 16 bit register here:

bits:
high byte            low byte
0 1 2 3 4 5 6 7      8 9 10 11 12 13 14 15
R R R R R G G G      G G G  B  B  B  B  B

Now when you go to write that register out to memory on a little endian machine you get this:

bits:
low byte                high byte
8 9 10 11 12 13 14 15   0 1 2 3 4 5 6 7
G G G  B  B  B  B  B    R R R R R G G G

So the layout in memory is GBRG? It doesn't make sense to describe it in this way. You also can't treat these formats as an "array of bytes" (well technically you could but it would be harder than it needs to be since the green channel is split between two bytes). You have to treat them as an array of uint16_t.

The Intel and Microsoft sources you linked are kind of vague and it's only making me more confused by reading them.

However, we're definitely treating these as uint16_t in these helper functions.

static constexpr uint16_t r5_mask = 0b1111'1000'0000'0000;
static constexpr uint16_t g6_mask = 0b0000'0111'1110'0000;
static constexpr uint16_t b5_mask = 0b0000'0000'0001'1111;
static constexpr uint8_t r5_offset = 11;
static constexpr uint8_t g6_offset = 5;
static constexpr uint8_t b5_offset = 0;

@johnnovak
Copy link
Member

The Intel and Microsoft sources you linked are kind of vague and it's only making me more confused by reading them.

Um, they're super exact 😄

Now when you go to write that register out to memory on a little endian machine you get this:

bits:
low byte                high byte
8 9 10 11 12 13 14 15   0 1 2 3 4 5 6 7
G G G  B  B  B  B  B    R R R R R G G G
image

I don't see where the confusion lies 😄

@weirddan455
Copy link
Collaborator Author

weirddan455 commented Sep 3, 2023

It’s vague in that it doesn’t say if that’s memory layout or register layout. I’m assuming register layout which means memory layout is actually GBRG when you put the low byte first.

I’m going to have to find or write a DOS program to output some simple colors in these formats to prove it to you…

@johnnovak
Copy link
Member

It’s vague in that it doesn’t say if that’s memory layout or register layout. I’m assuming register layout which means memory layout is actually GBRG when you put the low byte first.

image

😄

Maybe get some sleep and go through the Intel doco again with fresh eyes 😄

I’m going to have to find or write a DOS program to output some simple colors in these formats to prove it to you…

I think that's for the best anyway. But you could just use some image viewer like QPV to display some test image in various hi/true-colour modes, then just dump the contents of the image data that gets passed down to the image capturer, for example. I'll prepare a pack with QPV for you.

But don't get me wrong, I'm not overly hung up on this and I don't wanna treat that Intel doco as some holy scripture that we must follow 😄 The way I've described things makes sense to me, but ultimately, it's gonna be you and me who are going to work with various pixel formats in the foreseeable future. So I'm happy to simplify this and change the descriptions to something that makes sense for both of us.

I think your idea wasn't bad to simplify it, e.g. just say that RGB555 and RGB565 are stored as little-endian uint16s, and when you read them as such, you'll end up with this register bit-layout. That's actually simpler than what I proposed a few posts above. But then let's document that as explicitly as we can (I could've documented my assumptions a bit better, admittedly), also make it clear in the Rgb* classes what they expect & return. Then we're done with it for good 😎

I just want comments that describe this all, so if I touch this code again in 6 months, I'll know how things are in 2 minutes.

@johnnovak
Copy link
Member

johnnovak commented Sep 3, 2023

@weirddan455 Image viewer pack. Read dosbox.conf for instructions.

Quick guide:

  • Turn off auto-resolution with #, then you can select the screen mode with the + and - keys
  • One quirk of the program is that once you enter a 32k (15-bit) mode, you can only switch between other 32k modes with + and - while the image is being displayed. Same deal with 16-bit, 24-bit, 16/256-colour modes—you can only switch between modes of the same colour depth once the image is displayed.
  • There is no way to force displaying paletted images as high or true colour! So you need a high or true colour image; there is one example in the PICTURES\16M dir.
  • Alt-X quits the program.

QPV.zip

EDIT: Actually, here's a much simpler idea:

  1. Modify the image capturer that it also writes the raw image byte array to disk.
  2. Run Zilog that can use 15/16/24/32bpp modes.
  3. Create a few screenshots withdefault_image_capture_formats set to raw. So we'll have the standard raw capture in a PNG, plus our byte array written to disk. If the first 1 or 2 pixels are not black in the raw capture, that's all we need—we note down the hex RGB codes of those 1-2 pixels.
  4. Open the dumped raw byte array in a hex editor, check the first few bytes, and compare them against our colours.
  5. Profit 😎

@weirddan455
Copy link
Collaborator Author

Thanks for the QPV pack. I created 3 .png images. One solid red, one green, one blue. I then opened them in QPV. It displayed it in "BGR565" pixel format as named by DOSBox. I then zoomed in on the image and took a screenshot with your raw image captured (modified as you said to just dump out the raw bytes to disk).

Here they are in a hex editor. Note the "Binary" box at the bottom right. That's the easiest to visualize IMO:

Green: Occupying the first and last bits of the 16 bit word in memory

green-hex

Blue: Layed out in memory right after the first half of green

blue-hex

Red: Layed out in memory right after blue (followed by the last half of green)

red-hex

@weirddan455
Copy link
Collaborator Author

This is all I did to make the raw dump. Inside your SaveRawImage function:

std::string filename = "/home/daniel/.config/dosbox/capture/bytes" + std::to_string(++byte_dump_index);
FILE *byte_dump = fopen(filename.c_str(), "wb");
if (byte_dump) {
	fwrite(image.image_data, image.pitch, raw_image_height, byte_dump);
	fclose(byte_dump);
}

And here's the actual files if you would like to examine them yourself. Raw bytes plus the .png generated by your raw image capturer.

dump.zip

@weirddan455
Copy link
Collaborator Author

So in conclusion, it's layed out in memory exactly as I thought. It's G(3 bits) + B(5 bits) + R(5 bits) + G(3 bits). A weird ass memory order that doesn't make any sense as an "array of bytes" but does make sense if you're on a little endian machine reading and writing 16 bit words. It simply does not make sense to describe this format in memory order.

When you load that into a 16 bit register it is R(5 bits) + G(6 bits) + B(5 bits). RGB, not BGR.

@johnnovak
Copy link
Member

Thanks for investigating this @weirddan455 . Now there zero uncertainty about it 😄

So in conclusion, it's layed out in memory exactly as I thought. It's G(3 bits) + B(5 bits) + R(5 bits) + G(3 bits). A weird ass memory order that doesn't make any sense as an "array of bytes" but does make sense if you're on a little endian machine reading and writing 16 bit words. It simply does not make sense to describe this format in memory order.

I think it still does to retain consistency with the 24-bit and 32-bit pixel formats, which are byte based and they you must describe memory order. And that is exactly what other sources like the Intel docs do. Otherwise with your proposal we'd have register layout assuming LE for 16-bit pixel format descriptors, and memory layout for the 24 and 32-bit descriptors. I really don't want that.

Check the Microsoft image again. The first image exactly describes this memory layout we're dealing with here, and it's BGR555 because

  • the first byte in memory layout contains all the blue bits, and half the green bits, so the first byte is B(G)
  • the second byte in memory layout contains half the green bits and all the red bits, so the second bye is (B)R

So you write them next to each other in memory layout: B(G) (G)R -- BGR! There you go! 😎

This might seem roundabout, but then it made sense to me when I looked into it back then, just yesterday you confused me 😄

The Intel doco describes the same thing, of course. I think what confused you was that the Intel doco puts the low-order byte in the second column. So you have to flip the two columns and that gives you the memory layout as you'd see in a hex editor.

So if we do it like these other standards, the 16-bit and 24/32-bit pixel format descriptions are kept consistent and always describe memory layout. People familiar with these standards will understand what they're dealing with. But we should also put the exact memory layout bit patterns there in the comments; that was my omission and that would've saved this whole exercise 😄

Similarly, we need to document the Rgb* classes better and like I said, decoding memory layout byte streams into their internal representations should be moved into those classes. Then it should be documented the register layout used by those classes.

Then everybody would understand everything 😄 Even if someone like you is initially confused about BGR555/565, the bit layout description of the PixelFormat enums will make things crystal clear. Same for the Rgb* classes.

If you're worried about performance, don't 😄 With proper use of inline keywords and host_to_le, the "no conversion" path from memory to ffmpeg should become a no-op after compiler optimisations.

If any of this is unclear, that's fine, but the I'll knock this out in an hour and raise a PR for it myself then -- more efficient that way 😄

@weirddan455
Copy link
Collaborator Author

I think it still does to retain consistency with the 24-bit and 32-bit pixel formats, which are byte based and they you must describe memory order.

You can't really because they are read in different ways. Take BGR888 for example. That format must be read 1 byte at a time because the entire pixel is too large for a 16 bit word and too small for a 32 bit word. So we must specify this in the memory order.

RGB565 must be read in a 16 bit word at a time because the individual colors are not byte aligned. Now, ideally you would call this format RGB16 or something to signify this is layed out (msb)R->G->B(lsb) in a 16 bit word. The only reason we can't do that is because we have to differentiate between RGB565 and RGB555

On the other hand BGRX8888 can be read in either way and I've seen APIs support both. You can have, say BGRX32LE (read in a 32 bit word in little endian format) or BGRX8888 (read in 4 individual bytes in this memory order). Either one is fine because it's both byte aligned on the channels and fits into a single 32 bit word.

Otherwise with your proposal we'd have register layout assuming LE for 16-bit pixel format descriptors, and memory layout for the 24 and 32-bit descriptors. I really don't want that.

We already have that! The 16 bit formats must assume little endian (or native endian, I don't know how these get layed out in memory on big endian). You're trying to assume 16 bit formats follow memory order but they don't. And they never have. And they never will.

So you write them next to each other in memory layout: B(G) (G)R -- BGR! There you go! 😎

Sorry, but that's some backwards ass logic IMO (literally, it's reading the bits backwards). I re-read the Intel and Microsoft docs and, yes, they seem to be describing our formats as BGR. I think that is wrong and mis-leading. BGR describes neither the layout in memory nor in register.

In memory, it is (G)B R(G). The green bits are not contiguous. In register, it is RGB. The FFmpeg enums call our format RGB and I think that is much more clear because it actually describes the register layout correctly.

If you're worried about performance, don't 😄 With proper use of inline keywords and host_to_le, the "no conversion" path from memory to ffmpeg should become a no-op after compiler optimisations.

I'm not talking about performance. This is just about clarity. Plus, I'm not even using these helper functions for FFmpeg. FFmpeg has its own conversion routine that converts from any arbitrary format I give it into YuV color space. I don't need to do any intermediate conversion to RGB888.

@johnnovak
Copy link
Member

Sorry, but that's some backwards ass logic IMO (literally, it's reading the bits backwards). I re-read the Intel and Microsoft docs and, yes, they seem to be describing our formats as BGR. I think that is wrong and mis-leading. BGR describes neither the layout in memory nor in register.

In memory, it is (G)B R(G). The green bits are not contiguous. In register, it is RGB. The FFmpeg enums call our format RGB and I think that is much more clear because it actually describes the register layout correctly.

I get that, and it's a bit backwards. I wanted consistency across pixel formatss o all use memory layout, that's all.

Your proposal of treating RGB555/565 makes sense; my only problem was/is the lack of consistency because then 24/32-bit describes memory layout, and 15/16-bit pixel layout.

I just though this BGR555/565 thing is a standard, given I found the Intel and Microsoft doco on it, and from memory, OpenGL seems to follow the same "ass backwards" logic 😅 I have a tendency to follow established standards 🤷🏻 But then, I'm no expert on OpenGL stuff, neither on video encoding...

So if you feel strongly about it, we can do it like you proposed because arguably it's simpler (barring the slight inconsistency about the interpretation of PixelFormat enums, but comments will clarify that).

Just please add more comments then to the PixelFormats to make it 100% unambiguous everywhere whether it's memory layout, or register layout, or 16-bit int read as LE, or whatever... The Rgb* classes also need a lot more comments because now you can only figure out by reading the code in detail.

So let's do what you proposed, just add a lot more comments, that's the TL;DR. How does that sound? 😄

@weirddan455
Copy link
Collaborator Author

I just though this BGR555/565 thing is a standard, given I found the Intel and Microsoft doco on it, and from memory, OpenGL seems to follow the same "ass backwards" logic

"Standard" is kind of a loose term for this type of thing. I searched for OpenGL and only found reference to RGB565, no BGR565 or RGB/BGR555. Also, here's a Microsoft document describing our format as RGB so they kind of contradict themselves sometimes:

https://learn.microsoft.com/en-us/windows/win32/directshow/working-with-16-bit-rgb

So let's do what you proposed, just add a lot more comments, that's the TL;DR. How does that sound? 😄

Yeah, more documentation is good. These pixel formats always confuse me the first time around.

@johnnovak
Copy link
Member

"Standard" is kind of a loose term for this type of thing. I searched for OpenGL and only found reference to RGB565, no BGR565 or RGB/BGR555. Also, here's a Microsoft document describing our format as RGB so they kind of contradict themselves sometimes:

https://learn.microsoft.com/en-us/windows/win32/directshow/working-with-16-bit-rgb

I think that's fine because they assume you read the value as a LE 16-bit value, then the register layout is the same that we have. But yeah... confusing topic! Don't get me started about column-major and row-major memory ordering of matrix data between OpenGL... and whatever else, I can't remember this stuff for the life of me! 😅

@weirddan455
Copy link
Collaborator Author

weirddan455 commented Sep 4, 2023

I'm trying to find where in the code these buffers get written to so I can better document. For example, are we explicitly converting everything to little endian or are we just using the machine's native endianess? Do we even have any CI machines or tests running on big endian to confirm this?

I found this block of code that suggests we're using native endianess and BGRX8888 is actually more like XRGB32 (writing out a 32-bit word and if you look at memory on a big endian machine you would see XRGB memory order).

if (vga.draw.render.pixel_format == PixelFormat::Indexed8) {
std::fill(templine_buffer.begin(),
templine_buffer.end(),
bg_color_index);
} else if (vga.draw.render.pixel_format == PixelFormat::BGR565) {
const auto background_color = from_rgb_888_to_565(
vga.dac.palette_map[bg_color_index]);
const auto line_length = templine_buffer.size() / sizeof(uint16_t);
size_t i = 0;
while (i < line_length) {
write_unaligned_uint16_at(TempLine, i++, background_color);
}
} else if (vga.draw.render.pixel_format == PixelFormat::BGRX8888) {
const auto background_color = vga.dac.palette_map[bg_color_index];
const auto line_length = templine_buffer.size() / sizeof(uint32_t);
size_t i = 0;
while (i < line_length) {
write_unaligned_uint32_at(TempLine, i++, background_color);
}
}

I'm not 100% sure that I'm looking in the right place though. @kcgen Do you have experience with this part of the code? And do you know what the memory order of these types are on big endian machines?

@kcgen
Copy link
Member

kcgen commented Sep 4, 2023

I'm trying to find where in the code these buffers get written to so I can better document. For example, are we explicitly converting everything to little endian or are we just using the machine's native endianess?

Yes, all multi-byte memory devices (ie: video-card memory IO, IDE-based disk IO, and so on) are written to by DOS in little-endian (we emulate the little endian x86 CPU), into the device's memory-handler, which is is a class with byte, word, and dword read and write interfaces.

These memory handler takes care of converting these "multi-byte little-endian DOS values" into host-order values so downstream emulation code can operate natively on the data.

So for example, in vga_memory.cpp, you can see all the video-memory page handlers that do these host_read and host_write multi-byte conversions for each type of video card:

(here's just the class names)

class VGA_UnchainedRead_Handler : public PageHandler {
class VGA_ChainedEGA_Handler final : public PageHandler {
class VGA_UnchainedEGA_Handler : public VGA_UnchainedRead_Handler {
class VGA_ChainedVGA_Handler final : public PageHandler {
class VGA_UnchainedVGA_Handler final : public VGA_UnchainedRead_Handler {
class VGA_TEXT_PageHandler final : public PageHandler {
class VGA_Map_Handler final : public PageHandler {
class VGA_Changes_Handler final : public PageHandler {
class VGA_LIN4_Handler final : public VGA_UnchainedEGA_Handler {
class VGA_LFBChanges_Handler final : public PageHandler {
class VGA_LFB_Handler final : public PageHandler {
class VGA_MMIO_Handler final : public PageHandler {
class VGA_TANDY_PageHandler final : public PageHandler {
class VGA_PCJR_Handler final : public PageHandler {
class VGA_HERC_Handler final : public PageHandler {
class VGA_Empty_Handler final : public PageHandler {

Inside each class, you'll see the host_write calls: host_writeb(), host_writew(...), and host_writed(..). which do the byte-swapping (if needed) on big-endian machines.

The same happens when a DOS program wants to read memory. It goes through the device's memory handler's read-calls, which ensure multi-byte reads are flipped back to little-endian.

The page handler also abstracts the complexities of how DOS memory pages were laid out, wrapped, and so on. So for example, the VGA linear frame buffer handler has the frame buffer details in it.

There are some rare exceptions where the DOS side writes a blob of "bytes", but the emulated device is asked to interpret it differently based on register bits. An example is the GUS, that just has a flat 1 MB byte memory space, and the DOS program can flip from doing 8-bit to 16-bit sample IO at a very fine-grained level (literally per-sample!). So in there, we don't have a GUS PageHandler, and instead do the host_read (from DOS little-endian to host/native type) in the emulation code itself.

// Read a 16-bit sample returned as a float
float Voice::Read16BitSample(const ram_array_t &ram, const int32_t addr) const noexcept
{
        const auto upper = addr & 0b1100'0000'0000'0000'0000;
        const auto lower = addr & 0b0001'1111'1111'1111'1111;
        const auto i = static_cast<uint32_t>(upper | (lower << 1));
        return static_cast<int16_t>(host_readw(&ram.at(i)));
}

Do we even have any CI machines or tests running on big endian to confirm this?

Yes; we've got the IBM System/390 QEMU container:

2023-09-04_15-22

I've got a very slow PPC 32-bit laptop with BSD on it, and have confirmed that all the multi-byte VESA (banked, LFB, 15-bit, 16-bit, 24-bit, and 32bit) video modes work, and have fixed some CDDA (codec-related) and ZMBV endian issues that it helped reveal in the past.

2023-03-13-140628_1280x854_scrot

(you can see all the tests I ran on it last time I had it fired up: #2338 (comment))

Getting reasonable big-endian hardware is quite a pain though! Would loved to find a big-endian SBC, like the Pi, that's supported on modern Linux.

@weirddan455
Copy link
Collaborator Author

Great @kcgen thanks for the info! So that means BGRX8888 is a fine description. Loading in 1 byte at a time in that order should work regardless of endianess.

It's just these 16 byte formats that would need to be byte swapped on big endian machines so I'll be sure to document that these are stored in memory as little endian uint16_t's. The image capturer is already handling this correctly:

case PixelFormat::BGR555: {
const auto p = host_to_le(
*reinterpret_cast<const uint16_t*>(pos));
pixel = Rgb555(p).ToRgb888();
} break;
case PixelFormat::BGR565: {
const auto p = host_to_le(
*reinterpret_cast<const uint16_t*>(pos));
pixel = Rgb565(p).ToRgb888();
} break;

Getting reasonable big-endian hardware is quite a pain though! Would loved to find a big-endian SBC, like the Pi, that's supported on modern Linux.

I have a Raspberry Pi 3. Are you saying it can work in big endian? From what I've read, these ARM CPUs have the ability to work in either endian mode but usually default to little endian.

@kcgen
Copy link
Member

kcgen commented Sep 4, 2023

@weirddan455, I should have also mentioned that the DOS memory space itself is generally left in 'DOS little-endian' space; which is working memory for the DOS program itself.

It's only at the DOS or hardware API's points, and specifically the multi-byte value points, where we need to convert to host-endian if we want to operate on the values in some way, especially when those values touch the host-itself (16-bit DOS VGA memory memory going to host SDL output, 16-bit DOS DMA'd audio samples going to host SDL audio, 16 and 32-bit IDE disk reads and write going to host file IO).

I think there's even some multi-byte handling at the keyboard IO level (atleast from grepping around) :-)

I'm just a messenger though - this excellent endian aware groundwork was laid down by the original team. Very good stuff.

@kcgen
Copy link
Member

kcgen commented Sep 4, 2023

I have a Raspberry Pi 3. Are you saying it can work in big endian? From what I've read, these ARM CPUs have the ability to work in either endian mode but usually default to little endian.

Back when I looked into it, I couldn't find any Linux distros that supported big-endian ARM (so that's why I bought an old PowerPC-based Apple laptop, because it's 100% big-endian and still has some shreds of Linux support for it 😆 )

Even though ARM can be flipped to big-endian, it was a lost-cause on Linux. But it looks like NetBSD /does have/ some images that can do it: https://mail-index.netbsd.org/port-arm/2020/12/03/msg007117.html ; this is news to me!

@kcgen
Copy link
Member

kcgen commented Sep 4, 2023

@weirddan455 , @johnnovak:

Oh - just noticed this in the code-snippet above.

const auto p = host_to_le(*reinterpret_cast<const uint16_t*>(pos)); 

Whenever you see a uint8_t-pointer cast up to a bigger type and the value read from it, this code is dangerous because it assumes the memory is aligned to the size of this bigger type (and will generate ASAN errors on some systems).

It's a sure sign we should replace that code with:

uint16_t read_unaligned_uint16(const uint8_t* arr)

The read_unaligned_* (and write) functions replace code like:

auto value = *(uint16_t*)(pointer_to_uint8);

Or if we need it conditionally byte-swapped (from little-endian space to host, or from host-to-little-endian space), like what host_to_le does:

uint16_t host_readw(const uint8_t* arr)

A single host_read* call replaces:

#ifdef WORDS_BIGENDIAN
auto x = byteswap(*(uint16_t*)(arr));
#else
auto x = *(uint16_t*)(arr);
#endif

Check out mem_host.h and mem_unaligned.h. There's also helper functions to do type-sized array-offset reads and writes, and a couple others.

@johnnovak
Copy link
Member

I should have also mentioned that the DOS memory space itself is generally left in 'DOS little-endian' space; which is working memory for the DOS program itself.

Yes, and that is crucial, and it's the only way I'd say. If I take a memory dump of the emulated memory and compare it byte by byte to the dump of the same program running on real x86 hardware, the dumps should be identical. Regardless of whether I'm on a big or little-endian host.

Ultimately, big-endian host support is becoming increasingly theoretical because it's a LE world now. BE hardware is just a historical curiosity in 2023. But ok, we can carry BE support forward because it's not too hard, but then if you don't even have hardware to test BE builds, it becomes rather pointless (and might be completely broken without testing for all we know).

@shermp
Copy link
Collaborator

shermp commented Sep 5, 2023

It might be too slow (I haven't tested it...), but cross-compiling to a BE architecture (PPC?) in debian, and then running the binaries using qemu-user-static might work for some basic testing.

@johnnovak
Copy link
Member

It might be too slow (I haven't tested it...), but cross-compiling to a BE architecture (PPC?) in debian, and then running the binaries using qemu-user-static might work for some basic testing.

Maybe, but outside of old Sparc, PPC and IBM z machines and some home routers, BE is dead. Not too many Staging users on those platforms 😅

On one hand, making code endianness aware is a good idea, but in practical reality it already doesn't matter much and if LE is the new world standard, everyone might just as well assume LE. Like we assume a byte is 8 bits; there were other esoteric architectures with different word sizes that belong to the museum.

@weirddan455
Copy link
Collaborator Author

@kcgen I looked at this more and I still strongly suspect that the RenderedImage data that gets sent to the image and video capturers is in native endian rather than little endian.

I get what you're saying about DOS memory being explicitly little endian and that makes sense. I just think this is on the "native side". I looked at the ZMBV code and its format enum is just bits per pixel and then it does a reinterpret_cast to either uint16_t or uint32_t with no endianess checks or byte swapping. So if that ever worked on big endian, that should confirm it.

It might be too slow (I haven't tested it...), but cross-compiling to a BE architecture (PPC?) in debian, and then running the binaries using qemu-user-static might work for some basic testing.

I didn't get to the user static thing (that requires installing Debian, setting up a full cross platform environment, and making a fully statically linked executable) but I did spend entirely too long trying to get qemu full system emulation working. I got NetBSD to boot with emulated SPARC but couldn't get a GUI up. PowerPC failed to boot after install.

ScummVM has this wiki page: https://wiki.scummvm.org/index.php/HOWTO-Debug-Endian-Issues

They recommend doing Qemu PowerPC on Debian 8. But the compiler on that is ancient and we're using C++ 17 so I didn't bother trying that. It's almost certain to fail.

Maybe, but outside of old Sparc, PPC and IBM z machines and some home routers, BE is dead. Not too many Staging users on those platforms 😅

And yeah, that's the reality. The new PowerPCs are running little endian. SPARC and IBM z are for mainframes and servers.

It's pretty much just old PowerPC Macs. Linux support for those have dropped off. Looks like Gentoo is one of the last hold-outs so it's either that or BSD to get a compiler and libraries new enough to run Staging.

@kcgen
Copy link
Member

kcgen commented Sep 6, 2023

@kcgen I looked at this more and I still strongly suspect that the RenderedImage data that gets sent to the image and video capturers is in native endian rather than little endian.

That's right. If the DOS side is making multi-byte value writes (say, uint32's and uint16's) into the VGA memory space, those are going to pass through the VGA devices memory-handler class. Those handler will convert the values to native endian.

(and make sense: writes to the video card are a one-way ticket outbound to the host, in our case, capture or to SDL/OpenGL)

@kcgen
Copy link
Member

kcgen commented Sep 6, 2023

Here's the VGA linear frame buffer memory handler (just for ref), those host_read/writes are doing byte-swapping on BE.

class VGA_LFBChanges_Handler final : public PageHandler {
public:
	VGA_LFBChanges_Handler() {
		flags=PFLAG_NOCODE;
	}

	uint8_t readb(PhysPt addr) override
	{
		addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
		addr = CHECKED(addr);
		return host_readb(&vga.mem.linear[addr]);
	}

	uint16_t readw(PhysPt addr) override
	{
		addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
		addr = CHECKED(addr);
		return host_readw_at(vga.mem.linear, addr);
	}

	uint32_t readd(PhysPt addr) override
	{
		addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
		addr = CHECKED(addr);
		return host_readd_at(vga.mem.linear, addr);
	}

	void writeb(PhysPt addr, uint8_t val) override
	{
		addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
		addr = CHECKED(addr);
		host_writeb(&vga.mem.linear[addr], val);
		MEM_CHANGED( addr );
	}

	void writew(PhysPt addr, uint16_t val) override
	{
		addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
		addr = CHECKED(addr);
		host_writew_at(vga.mem.linear, addr, val);
		MEM_CHANGED( addr );
	}

	void writed(PhysPt addr, uint32_t val) override
	{
		addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
		addr = CHECKED(addr);
		host_writed_at(vga.mem.linear, addr, val);
		MEM_CHANGED( addr );
	}
};

@weirddan455
Copy link
Collaborator Author

OK, I must have misread your post the other day then. I thought you were saying it was in little endian. I looked at those page handlers but I couldn't find the usage code. This block I commented earlier is just doing write_unaligned_uint32_at which just does a memcpy

if (vga.draw.render.pixel_format == PixelFormat::Indexed8) {
std::fill(templine_buffer.begin(),
templine_buffer.end(),
bg_color_index);
} else if (vga.draw.render.pixel_format == PixelFormat::BGR565) {
const auto background_color = from_rgb_888_to_565(
vga.dac.palette_map[bg_color_index]);
const auto line_length = templine_buffer.size() / sizeof(uint16_t);
size_t i = 0;
while (i < line_length) {
write_unaligned_uint16_at(TempLine, i++, background_color);
}
} else if (vga.draw.render.pixel_format == PixelFormat::BGRX8888) {
const auto background_color = vga.dac.palette_map[bg_color_index];
const auto line_length = templine_buffer.size() / sizeof(uint32_t);
size_t i = 0;
while (i < line_length) {
write_unaligned_uint32_at(TempLine, i++, background_color);
}
}

But regardless, I guess we're in agreement now. I'll get some documentation done tomorrow. I'll include the SDL and FFmpeg enum types that our types map to in case anyone has to work with those in the future.

@johnnovak
Copy link
Member

But regardless, I guess we're in agreement now. I'll get some documentation done tomorrow. I'll include the SDL and FFmpeg enum types that our types map to in case anyone has to work with those in the future.

That sounds good @weirddan455. Thanks for getting to the bottom of this for once and all for everyone's benefit 🎉

@kcgen
Copy link
Member

kcgen commented Sep 6, 2023

Oh sorry for that confusion.

The flow is from DOS into the video card's memory handler (at that point, flipped to native), combined with IO port writes into the various video card registers to tell it where the line is and any other properties.

VGA draw comes in after both those have taken place. It's templine pointer is ultimately from some chunk written by the video card's memory handler.

@weirddan455
Copy link
Collaborator Author

@kcgen @johnnovak This is ready for review now.

I renamed all the enum names to line up more closely with SDL's naming schemes except I explicitly added whether it is a packed format or an array of bytes.

In the 2nd commit, I fixed a couple of endianess bugs in the image decoder and replaced the reinterpret_cast with read_unaligned_uint16 as @kcgen suggested. This commit is also a good usage example of how the Packed formats should be used. I tested that there was no regressions on little endian. I believe this fixes a bug on big endian but sadly could never get my Qemu test setup working to confirm.

@johnnovak
Copy link
Member

I believe this fixes a bug on big endian but sadly could never get my Qemu test setup working to confirm.

No great loss 😄

Will review in detail today @weirddan455 .

include/video.h Outdated Show resolved Hide resolved
Clarify how the types are stored (u8, u16, u32)
Clarify endianess concerns in comments
Specify SDL and FFmpeg equvilents in comments

BGR555 -> RGB555_Packed16
BGR565 -> RGB565_Packed16
BGR888 -> BGR24_ByteArray
BGRX8888 -> XRGB8888_Packed32
Copy link
Member

@johnnovak johnnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @weirddan455, merge away 🎉

@johnnovak
Copy link
Member

Sent you an invite to become a maintainer in the project @weirddan455 . You should be able to merge this yourself once you've accepted the invite.

@weirddan455 weirddan455 merged commit 97755b2 into main Sep 7, 2023
50 checks passed
@weirddan455
Copy link
Collaborator Author

@johnnovak I don't see any invite but it let me merge now anyway.

@weirddan455 weirddan455 deleted the wd/rgb branch September 7, 2023 05:00
@johnnovak johnnovak added the video Graphics and video related issues label Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plumbing Issues related to low-level support functions and classes video Graphics and video related issues
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants