New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename PixelFormat enum members BGR555 and BGR565 to RGB555 and RGB565 #2823
Conversation
Yeah, so you've just described BGR555 and BGR565 in your comments 😄 BGR in the PixelFormat enums refer to the memory layout, not the register layout! See here: (Reference: https://learn.microsoft.com/en-us/archive/msdn-magazine/2008/june/foundations-bitmaps-and-pixel-bits) Looking at I named these PixelFormat enums from the perspective of how the bitmap data is layed out in memory, maybe we should make that distinction clear. So I have a bitmap image as a memory blob, and it's tagged with a PixelFormat enum, all I care about is how the data is laid out in memory! Register layout is irrelevant there (but you're correct, knowing both is important for the code to work correctly, absolutely.) The memory layout is important to know, because this info is needed so you can decode the blob of data into pixels, in whatever register layout you want or need. Yeah, and Happy to be proven wrong, but memory dumps please to prove it 😄 |
Ok, this Intel doco is a bit more "definitive" than the Microsoft one I linked above 😄 https://www.intel.com/content/www/us/en/docs/ipp/developer-reference/2021-7/rgb-image-formats.html Note the term "memory layout"! The OpenGL pixel formats also always refer to memory layout. IMO, that's the sane way... SDL seems to use register-layout for its pixel formats, which I quite dislike. This seems pretty sane to me, this is ultimately what I want to represent in PixelFormat, memory layouts. So I think I got it right, but please provide memory dumps of 15/16/24/32-bit hi/true-colour bitmap data, then we can make sure it 100% aligns with the above table. Also, to be clear, my understanding is that DOSBox always stores the pixel data as little-endian, so the low-order byte is at the lower memory location. So:
Maybe @GranMinigun can tell me me I got everything completely backwards, but I doubt it 😄 |
OK I haven't fully read everything you wrote out there (I'm about to) but I'm just going to put this here. The comment I grabbed from FFmpeg header and this is how they describe the formats:
As you can see they are describing register layout plus endianess (there's also a macro not shown here that maps to native endiannes). I've seen formats like RGBA be described as both (something like RGBA32 to describe a 32 bit register and RGBA8888 to describe 4 bytes layed out in memory). However, since these formats are not 8 bit aligned on colors, it doesn't really make sense to describe it that way. You're always loading in a 16 bit word. But yeah, let me finish reading the sources you linked and I'll get a memory dump. |
I'm open to use a different PixelFormat interpretation. But to me the most useful thing to do is this: I have an array of bytes; what does it represent? I want PixelFormat to describe that with zero ambiguity, so I know how to decode the byte array. So that's why I'm leaning towards to do what the Intel doco says (and I think I've done that already, that's how it is now). |
Ok, so this is my preference:
I think this would be the most correct way, without requiring the reader of the code to "just know" things or guess them. I just want to make all this explicit; you should be able to understand it in 2 minutes by reading the comments and the code. |
OK, I have read everything. No memory dump yet. I'm actually not sure how to do that. I would ideally want a structured image (like a pure blue image followed by a pure red image or something). I could maybe write a DOS executable to do that but I'm not doing that tonight 😆 So here's why I think you're wrong about memory order for this particular format. Pretend you have a 16 bit register here:
Now when you go to write that register out to memory on a little endian machine you get this:
So the layout in memory is GBRG? It doesn't make sense to describe it in this way. You also can't treat these formats as an "array of bytes" (well technically you could but it would be harder than it needs to be since the green channel is split between two bytes). You have to treat them as an array of The Intel and Microsoft sources you linked are kind of vague and it's only making me more confused by reading them. However, we're definitely treating these as dosbox-staging/include/rgb565.h Lines 104 to 110 in 202cef4
|
It’s vague in that it doesn’t say if that’s memory layout or register layout. I’m assuming register layout which means memory layout is actually GBRG when you put the low byte first. I’m going to have to find or write a DOS program to output some simple colors in these formats to prove it to you… |
😄 Maybe get some sleep and go through the Intel doco again with fresh eyes 😄
I think that's for the best anyway. But you could just use some image viewer like QPV to display some test image in various hi/true-colour modes, then just dump the contents of the image data that gets passed down to the image capturer, for example. I'll prepare a pack with QPV for you. But don't get me wrong, I'm not overly hung up on this and I don't wanna treat that Intel doco as some holy scripture that we must follow 😄 The way I've described things makes sense to me, but ultimately, it's gonna be you and me who are going to work with various pixel formats in the foreseeable future. So I'm happy to simplify this and change the descriptions to something that makes sense for both of us. I think your idea wasn't bad to simplify it, e.g. just say that RGB555 and RGB565 are stored as little-endian I just want comments that describe this all, so if I touch this code again in 6 months, I'll know how things are in 2 minutes. |
@weirddan455 Image viewer pack. Read Quick guide:
EDIT: Actually, here's a much simpler idea:
|
Thanks for the QPV pack. I created 3 .png images. One solid red, one green, one blue. I then opened them in QPV. It displayed it in "BGR565" pixel format as named by DOSBox. I then zoomed in on the image and took a screenshot with your raw image captured (modified as you said to just dump out the raw bytes to disk). Here they are in a hex editor. Note the "Binary" box at the bottom right. That's the easiest to visualize IMO: Green: Occupying the first and last bits of the 16 bit word in memoryBlue: Layed out in memory right after the first half of greenRed: Layed out in memory right after blue (followed by the last half of green) |
This is all I did to make the raw dump. Inside your std::string filename = "/home/daniel/.config/dosbox/capture/bytes" + std::to_string(++byte_dump_index);
FILE *byte_dump = fopen(filename.c_str(), "wb");
if (byte_dump) {
fwrite(image.image_data, image.pitch, raw_image_height, byte_dump);
fclose(byte_dump);
} And here's the actual files if you would like to examine them yourself. Raw bytes plus the .png generated by your raw image capturer. |
So in conclusion, it's layed out in memory exactly as I thought. It's G(3 bits) + B(5 bits) + R(5 bits) + G(3 bits). A weird ass memory order that doesn't make any sense as an "array of bytes" but does make sense if you're on a little endian machine reading and writing 16 bit words. It simply does not make sense to describe this format in memory order. When you load that into a 16 bit register it is R(5 bits) + G(6 bits) + B(5 bits). RGB, not BGR. |
Thanks for investigating this @weirddan455 . Now there zero uncertainty about it 😄
I think it still does to retain consistency with the 24-bit and 32-bit pixel formats, which are byte based and they you must describe memory order. And that is exactly what other sources like the Intel docs do. Otherwise with your proposal we'd have register layout assuming LE for 16-bit pixel format descriptors, and memory layout for the 24 and 32-bit descriptors. I really don't want that. Check the Microsoft image again. The first image exactly describes this memory layout we're dealing with here, and it's BGR555 because
So you write them next to each other in memory layout: This might seem roundabout, but then it made sense to me when I looked into it back then, just yesterday you confused me 😄 The Intel doco describes the same thing, of course. I think what confused you was that the Intel doco puts the low-order byte in the second column. So you have to flip the two columns and that gives you the memory layout as you'd see in a hex editor. So if we do it like these other standards, the 16-bit and 24/32-bit pixel format descriptions are kept consistent and always describe memory layout. People familiar with these standards will understand what they're dealing with. But we should also put the exact memory layout bit patterns there in the comments; that was my omission and that would've saved this whole exercise 😄 Similarly, we need to document the Then everybody would understand everything 😄 Even if someone like you is initially confused about BGR555/565, the bit layout description of the PixelFormat enums will make things crystal clear. Same for the If you're worried about performance, don't 😄 With proper use of If any of this is unclear, that's fine, but the I'll knock this out in an hour and raise a PR for it myself then -- more efficient that way 😄 |
You can't really because they are read in different ways. Take
On the other hand
We already have that! The 16 bit formats must assume little endian (or native endian, I don't know how these get layed out in memory on big endian). You're trying to assume 16 bit formats follow memory order but they don't. And they never have. And they never will.
Sorry, but that's some backwards ass logic IMO (literally, it's reading the bits backwards). I re-read the Intel and Microsoft docs and, yes, they seem to be describing our formats as BGR. I think that is wrong and mis-leading. BGR describes neither the layout in memory nor in register. In memory, it is (G)B R(G). The green bits are not contiguous. In register, it is RGB. The FFmpeg enums call our format RGB and I think that is much more clear because it actually describes the register layout correctly.
I'm not talking about performance. This is just about clarity. Plus, I'm not even using these helper functions for FFmpeg. FFmpeg has its own conversion routine that converts from any arbitrary format I give it into YuV color space. I don't need to do any intermediate conversion to RGB888. |
I get that, and it's a bit backwards. I wanted consistency across pixel formatss o all use memory layout, that's all. Your proposal of treating RGB555/565 makes sense; my only problem was/is the lack of consistency because then 24/32-bit describes memory layout, and 15/16-bit pixel layout. I just though this BGR555/565 thing is a standard, given I found the Intel and Microsoft doco on it, and from memory, OpenGL seems to follow the same "ass backwards" logic 😅 I have a tendency to follow established standards 🤷🏻 But then, I'm no expert on OpenGL stuff, neither on video encoding... So if you feel strongly about it, we can do it like you proposed because arguably it's simpler (barring the slight inconsistency about the interpretation of PixelFormat enums, but comments will clarify that). Just please add more comments then to the PixelFormats to make it 100% unambiguous everywhere whether it's memory layout, or register layout, or 16-bit int read as LE, or whatever... The So let's do what you proposed, just add a lot more comments, that's the TL;DR. How does that sound? 😄 |
"Standard" is kind of a loose term for this type of thing. I searched for OpenGL and only found reference to https://learn.microsoft.com/en-us/windows/win32/directshow/working-with-16-bit-rgb
Yeah, more documentation is good. These pixel formats always confuse me the first time around. |
I think that's fine because they assume you read the value as a LE 16-bit value, then the register layout is the same that we have. But yeah... confusing topic! Don't get me started about column-major and row-major memory ordering of matrix data between OpenGL... and whatever else, I can't remember this stuff for the life of me! 😅 |
I'm trying to find where in the code these buffers get written to so I can better document. For example, are we explicitly converting everything to little endian or are we just using the machine's native endianess? Do we even have any CI machines or tests running on big endian to confirm this? I found this block of code that suggests we're using native endianess and dosbox-staging/src/hardware/vga_draw.cpp Lines 965 to 984 in 765bcc2
I'm not 100% sure that I'm looking in the right place though. @kcgen Do you have experience with this part of the code? And do you know what the memory order of these types are on big endian machines? |
Yes, all multi-byte memory devices (ie: video-card memory IO, IDE-based disk IO, and so on) are written to by DOS in little-endian (we emulate the little endian x86 CPU), into the device's memory-handler, which is is a class with byte, word, and dword read and write interfaces. These memory handler takes care of converting these "multi-byte little-endian DOS values" into host-order values so downstream emulation code can operate natively on the data. So for example, in (here's just the class names)
Inside each class, you'll see the host_write calls: The same happens when a DOS program wants to read memory. It goes through the device's memory handler's read-calls, which ensure multi-byte reads are flipped back to little-endian. The page handler also abstracts the complexities of how DOS memory pages were laid out, wrapped, and so on. So for example, the VGA linear frame buffer handler has the frame buffer details in it. There are some rare exceptions where the DOS side writes a blob of "bytes", but the emulated device is asked to interpret it differently based on register bits. An example is the GUS, that just has a flat 1 MB byte memory space, and the DOS program can flip from doing 8-bit to 16-bit sample IO at a very fine-grained level (literally per-sample!). So in there, we don't have a GUS PageHandler, and instead do the host_read (from DOS little-endian to host/native type) in the emulation code itself. // Read a 16-bit sample returned as a float
float Voice::Read16BitSample(const ram_array_t &ram, const int32_t addr) const noexcept
{
const auto upper = addr & 0b1100'0000'0000'0000'0000;
const auto lower = addr & 0b0001'1111'1111'1111'1111;
const auto i = static_cast<uint32_t>(upper | (lower << 1));
return static_cast<int16_t>(host_readw(&ram.at(i)));
}
Yes; we've got the IBM System/390 QEMU container: I've got a very slow PPC 32-bit laptop with BSD on it, and have confirmed that all the multi-byte VESA (banked, LFB, 15-bit, 16-bit, 24-bit, and 32bit) video modes work, and have fixed some CDDA (codec-related) and ZMBV endian issues that it helped reveal in the past. (you can see all the tests I ran on it last time I had it fired up: #2338 (comment)) Getting reasonable big-endian hardware is quite a pain though! Would loved to find a big-endian SBC, like the Pi, that's supported on modern Linux. |
Great @kcgen thanks for the info! So that means It's just these 16 byte formats that would need to be byte swapped on big endian machines so I'll be sure to document that these are stored in memory as little endian dosbox-staging/src/capture/image/image_decoder.h Lines 108 to 118 in 6b7f364
I have a Raspberry Pi 3. Are you saying it can work in big endian? From what I've read, these ARM CPUs have the ability to work in either endian mode but usually default to little endian. |
@weirddan455, I should have also mentioned that the DOS memory space itself is generally left in 'DOS little-endian' space; which is working memory for the DOS program itself. It's only at the DOS or hardware API's points, and specifically the multi-byte value points, where we need to convert to host-endian if we want to operate on the values in some way, especially when those values touch the host-itself (16-bit DOS VGA memory memory going to host SDL output, 16-bit DOS DMA'd audio samples going to host SDL audio, 16 and 32-bit IDE disk reads and write going to host file IO). I think there's even some multi-byte handling at the keyboard IO level (atleast from grepping around) :-) I'm just a messenger though - this excellent endian aware groundwork was laid down by the original team. Very good stuff. |
Back when I looked into it, I couldn't find any Linux distros that supported big-endian ARM (so that's why I bought an old PowerPC-based Apple laptop, because it's 100% big-endian and still has some shreds of Linux support for it 😆 ) Even though ARM can be flipped to big-endian, it was a lost-cause on Linux. But it looks like NetBSD /does have/ some images that can do it: https://mail-index.netbsd.org/port-arm/2020/12/03/msg007117.html ; this is news to me! |
Oh - just noticed this in the code-snippet above. const auto p = host_to_le(*reinterpret_cast<const uint16_t*>(pos)); Whenever you see a uint8_t-pointer cast up to a bigger type and the value read from it, this code is dangerous because it assumes the memory is aligned to the size of this bigger type (and will generate ASAN errors on some systems). It's a sure sign we should replace that code with: uint16_t read_unaligned_uint16(const uint8_t* arr) The auto value = *(uint16_t*)(pointer_to_uint8); Or if we need it conditionally byte-swapped (from little-endian space to host, or from host-to-little-endian space), like what uint16_t host_readw(const uint8_t* arr) A single #ifdef WORDS_BIGENDIAN
auto x = byteswap(*(uint16_t*)(arr));
#else
auto x = *(uint16_t*)(arr);
#endif Check out |
Yes, and that is crucial, and it's the only way I'd say. If I take a memory dump of the emulated memory and compare it byte by byte to the dump of the same program running on real x86 hardware, the dumps should be identical. Regardless of whether I'm on a big or little-endian host. Ultimately, big-endian host support is becoming increasingly theoretical because it's a LE world now. BE hardware is just a historical curiosity in 2023. But ok, we can carry BE support forward because it's not too hard, but then if you don't even have hardware to test BE builds, it becomes rather pointless (and might be completely broken without testing for all we know). |
It might be too slow (I haven't tested it...), but cross-compiling to a BE architecture (PPC?) in debian, and then running the binaries using |
Maybe, but outside of old Sparc, PPC and IBM z machines and some home routers, BE is dead. Not too many Staging users on those platforms 😅 On one hand, making code endianness aware is a good idea, but in practical reality it already doesn't matter much and if LE is the new world standard, everyone might just as well assume LE. Like we assume a byte is 8 bits; there were other esoteric architectures with different word sizes that belong to the museum. |
@kcgen I looked at this more and I still strongly suspect that the I get what you're saying about DOS memory being explicitly little endian and that makes sense. I just think this is on the "native side". I looked at the ZMBV code and its format enum is just bits per pixel and then it does a
I didn't get to the user static thing (that requires installing Debian, setting up a full cross platform environment, and making a fully statically linked executable) but I did spend entirely too long trying to get qemu full system emulation working. I got NetBSD to boot with emulated SPARC but couldn't get a GUI up. PowerPC failed to boot after install. ScummVM has this wiki page: https://wiki.scummvm.org/index.php/HOWTO-Debug-Endian-Issues They recommend doing Qemu PowerPC on Debian 8. But the compiler on that is ancient and we're using C++ 17 so I didn't bother trying that. It's almost certain to fail.
And yeah, that's the reality. The new PowerPCs are running little endian. SPARC and IBM z are for mainframes and servers. It's pretty much just old PowerPC Macs. Linux support for those have dropped off. Looks like Gentoo is one of the last hold-outs so it's either that or BSD to get a compiler and libraries new enough to run Staging. |
That's right. If the DOS side is making multi-byte value writes (say, uint32's and uint16's) into the VGA memory space, those are going to pass through the VGA devices memory-handler class. Those handler will convert the values to native endian. (and make sense: writes to the video card are a one-way ticket outbound to the host, in our case, capture or to SDL/OpenGL) |
Here's the VGA linear frame buffer memory handler (just for ref), those class VGA_LFBChanges_Handler final : public PageHandler {
public:
VGA_LFBChanges_Handler() {
flags=PFLAG_NOCODE;
}
uint8_t readb(PhysPt addr) override
{
addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
addr = CHECKED(addr);
return host_readb(&vga.mem.linear[addr]);
}
uint16_t readw(PhysPt addr) override
{
addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
addr = CHECKED(addr);
return host_readw_at(vga.mem.linear, addr);
}
uint32_t readd(PhysPt addr) override
{
addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
addr = CHECKED(addr);
return host_readd_at(vga.mem.linear, addr);
}
void writeb(PhysPt addr, uint8_t val) override
{
addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
addr = CHECKED(addr);
host_writeb(&vga.mem.linear[addr], val);
MEM_CHANGED( addr );
}
void writew(PhysPt addr, uint16_t val) override
{
addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
addr = CHECKED(addr);
host_writew_at(vga.mem.linear, addr, val);
MEM_CHANGED( addr );
}
void writed(PhysPt addr, uint32_t val) override
{
addr = PAGING_GetPhysicalAddress(addr) - vga.lfb.addr;
addr = CHECKED(addr);
host_writed_at(vga.mem.linear, addr, val);
MEM_CHANGED( addr );
}
}; |
OK, I must have misread your post the other day then. I thought you were saying it was in little endian. I looked at those page handlers but I couldn't find the usage code. This block I commented earlier is just doing dosbox-staging/src/hardware/vga_draw.cpp Lines 965 to 984 in 765bcc2
But regardless, I guess we're in agreement now. I'll get some documentation done tomorrow. I'll include the SDL and FFmpeg enum types that our types map to in case anyone has to work with those in the future. |
That sounds good @weirddan455. Thanks for getting to the bottom of this for once and all for everyone's benefit 🎉 |
Oh sorry for that confusion. The flow is from DOS into the video card's memory handler (at that point, flipped to native), combined with IO port writes into the various video card registers to tell it where the line is and any other properties. VGA draw comes in after both those have taken place. It's templine pointer is ultimately from some chunk written by the video card's memory handler. |
@kcgen @johnnovak This is ready for review now. I renamed all the enum names to line up more closely with SDL's naming schemes except I explicitly added whether it is a packed format or an array of bytes. In the 2nd commit, I fixed a couple of endianess bugs in the image decoder and replaced the |
No great loss 😄 Will review in detail today @weirddan455 . |
Clarify how the types are stored (u8, u16, u32) Clarify endianess concerns in comments Specify SDL and FFmpeg equvilents in comments BGR555 -> RGB555_Packed16 BGR565 -> RGB565_Packed16 BGR888 -> BGR24_ByteArray BGRX8888 -> XRGB8888_Packed32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @weirddan455, merge away 🎉
Sent you an invite to become a maintainer in the project @weirddan455 . You should be able to merge this yourself once you've accepted the invite. |
@johnnovak I don't see any invite but it let me merge now anyway. |
These were mis-named. Red is stored in the high bits and blue is stored in the low bits.
This is just a re-name. Nothing functional changes in this PR.
I found this because colors were wrong in the demo scene stuff @johnnovak sent me. I was mapping FFmpeg's BGR enums when really they're stored as RGB (and a simple swap to the right FFmpeg fixed it).
Also, if you look at where these are decoded for image capturing, they get mapped to RGB helpers (not BGR):
dosbox-staging/src/capture/image/image_decoder.h
Lines 108 to 118 in 6b7f364