Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test every non tested opcode #5

Open
barotto opened this issue May 23, 2018 · 110 comments
Open

Test every non tested opcode #5

barotto opened this issue May 23, 2018 · 110 comments

Comments

@barotto
Copy link
Owner

barotto commented May 23, 2018

A comment from VOGONS thread https://www.vogons.org/viewtopic.php?f=9&t=60095:

XLAT instructions (both 16-bit and 32-bit versions) were faulty: it was checking memory against a write(for the memory operand) instead of a proper read.

@barotto barotto changed the title Test XLAT opcode Test every non tested opcode May 31, 2018
@superfury
Copy link

superfury commented Jun 2, 2018

Currently left(ignoring those still pending in the other issue(pushf(d)/popf(d)), as well as protected-mode instructions and related instruction for protected mode-specific functionality(0F beginning range instructions)):

  • Logical operations in the 00-3F range.
  • Logical operations in the 80-83 range.
  • INC reg in the 41-47 range.
  • DEC reg in the 49-4F range.
  • String input/output from port in the 6C-6F range, probably testable using e.g. the temporary DMA registers(unused registers which aren't used on the DMA chipset of the DMA Page Registers).
  • TEST instructions(opcode 84-85).
  • XCHG instructions(opcode 86-87).
  • MOV instructions(opcode 88-8B).
  • XCHG instructions(opcode 90-97).
  • SAHF/LAHF instructions (opcode 9E-9F).
  • MOV with immediate offset(opcodes A0-A3).
  • 32-bit address-size string instructions(B/W/D).
  • TEST AL/AX/EAX,imm(opcodes A8&A9).
  • Different variants of SHLD/SHRD with operand-sizes.
  • MOV reg,imm (opcodes B0-BF).
  • RET near (opcode C2).
  • MOV r/m,imm(opcodes C6-C7).
  • ENTER/LEAVE(opcodes C6&C7).
  • XLAT (opcode D7).
  • LOOP/J(E)CXZ instructions 32-bit operand size(opcodes E0-E3).
  • IN/OUT instructions(opcodes E4-E7, EC-EF), see string i/o instructions.
  • JMP instructions(opcodes E9-EB).
  • REP/REPE/REPNE (opcodes F0-F3).
  • Opcode F6/F7 TEST r/m,imm
  • Basic flag operations (opcodes F8-FD).
  • INC r/m8/16/32 instructions(opcodes FE-FF).
  • 32-bit relative jumps(0F80-0F8F instruction range).

Side note: opcode 82h is missing from the intel-opcodes list, which is an alias for opcode 80h( http://ref.x86asm.net/coder32.html#x82 ).

@superfury
Copy link

superfury commented Jun 4, 2018

Just created a little 'testsuite' that executes many of the basic instructions mentioned in my previous post(although it requires to be logged and verified manually by looking for register/memory reads/writes/changes):
https://bitbucket.org/superfury/unipcemu/src/9c54c037466f0e079e76a35987584b0687d42af8/UniPCemu/assembly/?at=master

Although I've coded opcodes 00-3F by using raw binary statements to make sure the nasm assembler doesn't actually create opcodes outside of that range somehow(2-byte versions of them).

The only test that actually verifies itself using assembly instructions is the ret imm testing, much in the way your stack tests run.

Edit: https://bitbucket.org/superfury/unipcemu/src/6c46742934fc8ffba70de4e139671244164270e9/UniPCemu/assembly/?at=master

Fixed some bugs in the basic *D opcodes for the 00-3F range being the wrong opcode byte.

@superfury
Copy link

superfury commented Jun 4, 2018

So, combining both testsuite results, that only leaves possible errors in:

  • Logical operations in the 80-83 range**.
  • String input/output from port in the 6C-6F range, probably testable using e.g. the temporary DMA registers(unused registers which aren't used on the DMA chipset of the DMA Page Registers).
  • TEST instructions(opcode 84-85).
  • MOV instructions(opcode 88-8B).
  • SAHF/LAHF instructions (opcode 9E-9F).
  • 32-bit address-size string instructions(B/W/D).
  • Different variants of SHLD/SHRD with operand-sizes.
  • MOV r/m,imm(opcodes C6-C7).
  • ENTER/LEAVE(opcodes C8&C9).
  • XLAT (opcode D7).
  • LOOP/J(E)CXZ instructions 32-bit operand size(opcodes E0-E3)**.
  • IN/OUT instructions(opcodes E4-E7, EC-EF), see string i/o instructions.
  • JMP instructions(opcodes E9-EB).
  • REP/REPE/REPNE (opcodes F0-F3).
  • Basic flag operations (opcodes F8-FD).
  • INC r/m8/16/32 instructions(opcodes FE-FF).
  • 32-bit relative jumps(0F80-0F8F instruction range).

** partially tested and verified in both testsuites

@superfury
Copy link

Just found a little 'bug' in my emulator(which seems to be only half documented in the 80386 programmer's reference manual's far return instruction description). (E)SP is increased before popping SS:(E)SP during a stack switch to an outer privilege level(resulting CS.RPL>CPL), which is documented, but also after popping the SS:(E)SP(thus, increasing(popping) the caller's stack variables. The latter isn't documented within the 80386 programmer's reference manual as far as I can see.

@superfury
Copy link

superfury commented Jun 5, 2018

Having fixed the far return using the immediate on both stack locations(both on source and destination stacks), the Extended Memory Tester v3.0 now properly detects the video card and extended memory, continuing to test memory:D

This app now runs properly: https://archive.org/details/msdos_TESTEXT3_shareware

@superfury
Copy link

superfury commented Jul 7, 2018

Hmmmm..... One basic device that can be used to test the ISNS/OUTS instructions is the IDE/ATA harddisk's buffer(Command E4 to read the buffer, Command E8 to write the buffer). First fill it with a pattern manually(512 bytes), then read it back and verify the pattern. That should be able to verify all in(s)/out(s) instructions.

Edit: Although I know some of them work properly(8-bit plain, 8-bit string and 16-bit string) work already, since they're used during POST(8-bit plain), during disk reads(8-bit or 16-bit reads), disk writes(8-bit or 16-bit writes) and CD-ROM access(16-bit reads and writes). The only one of those still untested are string and normal 32-bit variants and normal 16/32-bit single input/output.

@superfury
Copy link

superfury commented Aug 16, 2018

I see you're busy on some more (protected mode) tests. Great!

That now just leaves:

  • String input/output from port in the 6C-6F range, probably testable using e.g. the temporary DMA registers(unused registers which aren't used on the DMA chipset of the DMA Page Registers) and hard disk buffer(using the read/write buffer commands).
  • MOV instructions(opcode 88-8B).
  • SAHF/LAHF instructions (opcode 9E-9F).
  • MOV r/m,imm(opcodes C6-C7).
  • ENTER/LEAVE(opcodes C8&C9).
  • XLAT (opcode D7).
  • LOOP/J(E)CXZ instructions 32-bit operand size(opcodes E0-E3)**.
  • IN/OUT instructions(opcodes E4-E7, EC-EF), see string i/o instructions.
  • JMP instructions(opcodes E9-EB).
  • Basic flag operations (opcodes F8-FD).
  • 32-bit relative jumps(0F80-0F8F instruction range).

So mostly various kinds of move instructions and miscellaneous instructions.

And of course all remaining protected-mode functionality itself(task switching etc.) is still left to test.

I'm currently wondering if protected-mode functionality of UniPCemu still has bugs(concerning call gates and interrupts etc. pushing data on the stack with(out) stack switch) in various cases.

@barotto
Copy link
Owner Author

barotto commented Aug 17, 2018

I've found a pretty big bug on my emulator comparing its memory dump with Bochs' after the execution of the test suite: the Dirty bit of a PTE is not properly set on a write when the same page has been previously accessed by a read.
So there are a lot of behavioural tests to be implemented beside the various opcodes...

Repository owner deleted a comment from superfury Aug 23, 2018
@superfury
Copy link

superfury commented Aug 25, 2018

I've just ran Bochs 2.6.9 and dumped the 640K memory. Then I did the same with UniPCemu's MMU functionality(running the debugger, press and hold circle and then tap square) when reaching the HLT at the end of the POST 0xFF(when reaching the HLT status).

It reveals there's quite a lot wrong, apparently:

00000401 C2 00
00000402 10 00
00000405 8E 00
000020B8 67 07
000023E0 27 07
000023E4 27 07
000023E8 27 07
0000FFFC F5 00
0000FFFD FF 00
0000FFFF F0 00
0001FFB7 05 46
0001FFBB 05 32
0001FFBF 56 00
0001FFC2 00 80
0001FFC3 D3 00
0001FFC4 72 40
0001FFC7 58 DB
0001FFC8 73 FF
0001FFCB 38 00
0001FFCC AD 00
0001FFCF 0F 14
0001FFD0 AB 00
0001FFD1 00 FF
0001FFD2 00 FF
0001FFD3 01 00
0001FFD4 00 C0
0001FFD7 EB 05
0001FFD8 FF 00
0001FFDB 04 0D
0001FFDC AD 6F
0001FFDF 20 32
0001FFE2 80 00
0001FFE3 07 00
0001FFE4 08 00
0001FFE6 00 80
0001FFE7 94 00
0001FFE8 72 40
0001FFED FF 00
0001FFEE FF 00
0001FFEF 07 00
0001FFF0 08 01
0001FFF1 00 01
0001FFF3 08 84
0001FFF4 AD 00
0001FFF5 00 FF
0001FFF6 00 FF
0001FFF7 0D FF
0001FFF8 00 FF
0001FFF9 00 FF
0001FFFA 00 FF
0001FFFB 13 6E
0001FFFC 00 75
0001FFFD 00 57
0002E000 01 00
0002E001 FF 00
0002E002 FF 00
0002E003 FF 00
0009F003 00 50

So that means:

  • First IDT entry has an issue.
  • Page table entry issues? Seems to be the Accessed bit not being updated correctly?
  • Stack overflow? (FFFC-FFFF)
  • More stack issues? (1FFB7-1FFFD)
  • Normal test issues? (2E000-2E003 and 9F003)

@barotto
Copy link
Owner Author

barotto commented Aug 25, 2018

First IDT entry has an issue.

That's the IDT area, but it's not necessarily a problem with the IDT management, could be something that writes garbage at the wrong address.

Page table entry issues?

Possible.

Stack overflow? (FFFC-FFFF)

Maybe, or again you could be writing garbage to the wrong addresses (see below).

More stack issues? (1FFB7-1FFFD)

Maybe.

Normal test issues? (2E000-2E003 and 9F003)

Can't tell right now what the segment at 0x2e000 is used for, but dword at 0x9F000 should be 0x50465046, for some reason you're missing the last byte 0x50 at 0x9F003, or you have overwritten it with 00.

It seems like you have problems with the MMU.

@superfury
Copy link

superfury commented Aug 25, 2018

When debugging the direct memory writes by the Paging unit, I see:

  • Write to address 0x000023e0 value 0x000f8027.
  • Write to address 0x000023e4 value 0x000f9027.
  • Write to address 0x000023e8 value 0x000fa027.
  • Write to address 0x000020b8 value 0x0002e067 (at step EE).

So the Paging unit requests the memory mapping to actually write the correct values there at some point. Maybe something else is somehow overwriting it?
Edit: Or maybe there's an issue in the writeback to memory after said breakpoint... Hmmmm...
Edit: Already confirmed that the 0x23E0 memory location receives and stores value 0x27. So it's another bug that's somehow overwriting this value?
Edit: Just added those addresses hardcoded to the MMU unit. Let's see if they're overwritten with different values normally(thus not an general emulator bug).
Edit: Looking directly in the capture of the physical RAM(with memory gaps having been removed when dumping the RAM), the values written are actually in RAM at their supposed locations?

Edit: Looking again at my own created logs(made using MinGW-w64 using the following script), from UniPCemu's ROM directory(which contains the Bochs dump as well as the ROMs):

cp "..\captures\memory.dat" "fullmemory.dat"
dd if=/dev/null of=fullmemory.dat bs=1 count=1 seek=655360
cmp -l fullmemory.dat bochs-640k-memdump.bin | gawk '{printf "%08X %02X %02X\n", ($1-1), strtonum(0$2), strtonum(0$3)}'>memory.cmp.txt

It copies the memory.dat to the ROM directory for processing, then edits it to become 640K large, finally executing the difference dump as per your documentation.

I thought that the left was the Bochs dump, while the right value was my own. Looking at it again(as well as reading your documentation on the results of the gawk command again), it's reversed.

So those three pages aren't supposed to be accessed(since bits 5(PTE&PDE)&6(PDE) are supposed to be cleared instead of set in the end of logging). But they're accessed anyways, which isn't supposed to happen? So those paged accesses on those locations are instructions addressing a wrong point in memory, or incorrect mapping somehow?

@barotto
Copy link
Owner Author

barotto commented Aug 25, 2018

PTE at 0x23e0 is for the page at 0xF8000. Its Accessed bit should be 0, not 1 (byte at 0x23e0 should be 0x07 not 0x27). It seems like you're fetching the NOPs that are present in that area of the test code. Or maybe your MMU is updating the wrong PTE.

I also think the dword at 0x9F000 is interesting. See what is wrinting 00 at 0x9F003, overwriting the 0x50.

@superfury
Copy link

superfury commented Aug 25, 2018

About 9F003, I see 0x50 being written at 10:0000C455.
Then 23E0 written 0x27 at instruction 0010:7FFB.
Then 23E4 written 0x27 at instruction 0010:8FFE.
Then 23E8 written 0x27 at instruction 0010:98A8.
Then 20B8 written 0x67 at instruction 0010:99F7.

The current testsuite lst used:
test386.zip

@superfury
Copy link

superfury commented Aug 25, 2018

Just found a bug that caused the highest written byte to memory to remain unlogged(it was storing size-1 in the memory usage variable, instead of size). It still needed to add 1 to the current address for detection of memory usage, which it didn't(used for logging purposes only).
Since address 9F003 is the last byte in the memory capture, that one was missing and thus filled with zeroes by the dd-command.
Edit: It's gone now:

00000401 C2 00
00000402 10 00
00000405 8E 00
000020B8 67 07
000023E0 27 07
000023E4 27 07
000023E8 27 07
0000FFFC F5 00
0000FFFD FF 00
0000FFFF F0 00
0001FFB7 05 46
0001FFBB 05 32
0001FFBF 56 00
0001FFC2 00 80
0001FFC3 D3 00
0001FFC4 72 40
0001FFC7 58 DB
0001FFC8 73 FF
0001FFCB 38 00
0001FFCC AD 00
0001FFCF 0F 14
0001FFD0 AB 00
0001FFD1 00 FF
0001FFD2 00 FF
0001FFD3 01 00
0001FFD4 00 C0
0001FFD7 EB 05
0001FFD8 FF 00
0001FFDB 04 0D
0001FFDC AD 6F
0001FFDF 20 32
0001FFE2 80 00
0001FFE3 07 00
0001FFE4 08 00
0001FFE6 00 80
0001FFE7 94 00
0001FFE8 72 40
0001FFED FF 00
0001FFEE FF 00
0001FFEF 07 00
0001FFF0 08 01
0001FFF1 00 01
0001FFF3 08 84
0001FFF4 AD 00
0001FFF5 00 FF
0001FFF6 00 FF
0001FFF7 0D FF
0001FFF8 00 FF
0001FFF9 00 FF
0001FFFA 00 FF
0001FFFB 13 6E
0001FFFC 00 75
0001FFFD 00 57
0002E000 01 00
0002E001 FF 00
0002E002 FF 00
0002E003 FF 00

@superfury
Copy link

The first two are instruction fetches triggering the accessed being set during writeback of the Page tables.
0x23E8 is the memory operand of the MOV at said address, offset AD7C.

@superfury
Copy link

superfury commented Aug 25, 2018

It's checking an instruction fetch operand there at 10:7FFB. It's checking a DWORD at that location. A part of said operand is at F8000. It's the byte after 0F85. The PDE is 000023E0. The PTE to write back is 0x000f8027.
Edit: The test386.lst confirms the 0x8000 byte is the final byte of the instruction(the jne error instruction, which is a 0F85 imm32 instruction).

It's the final byte of line 15393 in my test386.lst file.

@barotto
Copy link
Owner Author

barotto commented Aug 25, 2018

About 9F003, I see 0x50 being written at 10:0000C455.

That's ok, the problem is that your dump at POST FFh shows a 0x00. resolved

Then 23E0 written 0x27 at instruction 0010:7FFB.

Pages 0xF8000-0xFA000 shouldn't be accessed. they should

Then 20B8 written 0x67 at instruction 0010:99F7.

This seems correct? IT IS

@superfury
Copy link

superfury commented Aug 25, 2018

You say that Paged memory at 0xF8000-FA000 shouldn't be accessed, but there's executable code there that's executing in my case(as well as being present in the test386.lst).

The test386.lst says this about said instruction:

  1055 00007FFB 0F85FF3F0000        <2>  jne error

It seems to be one of the arithmetic tests that's executed.

The entire block of nasm code:

  1055                              <1>  testBittestWFlags btr, %1, %2, %3, %4
  1055                              <2> 
  1055 00007FBC 66B80100            <2>  mov ax, %4
  1055 00007FC0 6650                <2>  push ax
  1055 00007FC2 9D                  <2>  popf
  1055 00007FC3 66B80100            <2>  mov ax, %2
  1055 00007FC7 660FBAF001          <2>  o16 %1 ax, %3
  1055 00007FCC 9C                  <2>  pushf
  1055 00007FCD 6658                <2>  pop ax
  1055 00007FCF 6625D508            <2>  and ax, PS_ARITH
  1055 00007FD3 663D0008            <2>  cmp ax, %5
  1055 00007FD7 0F8523400000        <2>  jne error
  1055                              <2> 
  1055                              <2> 
  1055 00007FDD 66B80100            <2>  mov ax, %4
  1055 00007FE1 6650                <2>  push ax
  1055 00007FE3 9D                  <2>  popf
  1055 00007FE4 66B80100            <2>  mov ax, %2
  1055 00007FE8 66B90100            <2>  mov cx, %3
  1055 00007FEC 660FB3C8            <2>  o16 %1 ax, cx
  1055 00007FF0 9C                  <2>  pushf
  1055 00007FF1 6658                <2>  pop ax
  1055 00007FF3 6625D508            <2>  and ax, PS_ARITH
  1055 00007FF7 663D0008            <2>  cmp ax, %5
  1055 00007FFB 0F85FF3F0000        <2>  jne error

That's bit_m.asm, row 85 being assembled.
Edit: Looking further up the inclusion list, it's test386.asm row 1052.
In test386.asm, I see it's part of the E0 undefined instruction tests.

@barotto
Copy link
Owner Author

barotto commented Aug 25, 2018

I'm getting confused by all the edits.
PLEASE verify that 2nd column are the values of your emu, 3rd column are Bochs'
Because I'm starting to think your post shows the opposite...

@superfury
Copy link

superfury commented Aug 25, 2018

I've recompiled the test386.asm source code(it uses the BOCHS version and 386-specific tests(POST E0) enabled).

As I said in my last post, the strange page (which has a PTE/PDE located at paged address F8000 and onwards) is part of the E0 tests that are executing and are present in the test386.lst generated by the nasm compiler.

This is the result of UniPCemu's memory dump(which is a direct dump of physical RAM, which is zero-padded to 640K) compared against the Bochs memory dump, as in your instructions at the test386.asm main code page(Readme).

00000401 C2 00
00000402 10 00
00000405 8E 00
000020B8 67 07
000023E0 27 07
000023E4 27 07
000023E8 27 07
0000FFFC F5 00
0000FFFD FF 00
0000FFFF F0 00
0001FFB7 05 46
0001FFBB 05 32
0001FFBF 56 00
0001FFC2 00 80
0001FFC3 D3 00
0001FFC4 72 40
0001FFC7 58 DB
0001FFC8 73 FF
0001FFCB 38 00
0001FFCC AD 00
0001FFCF 0F 14
0001FFD0 AB 00
0001FFD1 00 FF
0001FFD2 00 FF
0001FFD3 01 00
0001FFD4 00 C0
0001FFD7 EB 05
0001FFD8 FF 00
0001FFDB 04 0D
0001FFDC AD 6F
0001FFDF 20 32
0001FFE2 80 00
0001FFE3 07 00
0001FFE4 08 00
0001FFE6 00 80
0001FFE7 94 00
0001FFE8 72 40
0001FFED FF 00
0001FFEE FF 00
0001FFEF 07 00
0001FFF0 08 01
0001FFF1 00 01
0001FFF3 08 84
0001FFF4 AD 00
0001FFF5 00 FF
0001FFF6 00 FF
0001FFF7 0D FF
0001FFF8 00 FF
0001FFF9 00 FF
0001FFFA 00 FF
0001FFFB 13 6E
0001FFFC 00 75
0001FFFD 00 57
0002E000 01 00
0002E001 FF 00
0002E002 FF 00
0002E003 FF 00

The second column IS confirmed to be UniPCemu's data in memory. The third column is Bochs' dump.

test386.lst.zip

@barotto
Copy link
Owner Author

barotto commented Aug 25, 2018

BTW dword at 0x2E000 is currently used as scratch memory for arith-logic tests 0xEE.
Memory is accessed as DS:[0], where DS=0x14, with base=0x2E000

@superfury
Copy link

superfury commented Aug 25, 2018

Just ran the testsuite on Bochs again. Ran the continue command to let it run until the permanent HLT. Then used Ctrl-C to break into the debugger. It's not a problem with UniPCemu there! Bochs errors out at the processor-specific arithmetic logic tests! It's at 0010:0000C002(the error jumped location) when it's processing those. So Bochs is having a problem with those tests(the E0 tests)!

00001382369e[CPU0  ] write_virtual_checks(): no write access to seg
00001382421e[CPU0  ] read_virtual_checks(): read beyond limit
00001382471e[CPU0  ] write_virtual_checks(): write beyond limit, r/w
00001382521e[CPU0  ] read_virtual_checks(): read beyond limit
00001382571e[CPU0  ] write_virtual_checks(): write beyond limit, r/w
00001385741i[CPU0  ] WARNING: HLT instruction with IF=0!
64799855992i[      ] Ctrl-C detected in signal handler.
Next at t=64813965152
(0) [0x0000000fc002] 0010:000000000000c002 (unk. ctxt): jmp .-4 (0x000fc000)      ; ebfc
<bochs:2>

Edit: Having disabled those tests using the flag, I now see Bochs dumping all EE test results into it's debugger window.

@barotto
Copy link
Owner Author

barotto commented Aug 25, 2018

The second column IS confirmed to be UniPCemu's data in memory. The third column is Bochs' dump.

Are you absolutely 100% sure? Because if I compare your post with a Bochs dump I've done, the 2nd middle column seems exactly like Bochs' data. For example the correct value for dword 2E000 is 0xFFFFFF01.

@superfury
Copy link

Well, I know for sure that the dumps of UniPCemu's Paging table locations contain those values in the dump and memory variable (0x27 and 0x67). So the second column is the UniPCemu memory and the third column is the Bochs memory.
The first column is the address, the second column is UniPCemu, the third column is the crashing Bochs output.

@barotto
Copy link
Owner Author

barotto commented Aug 25, 2018

So Bochs is having a problem with those tests(the E0 tests)!

Yes, 0xE0 are the undefined behaviours tests. Bochs is not a faithful replica of the i80386.
Disable TEST_UNDEF and rerun the tests.
Thanks for pointing out that test E0h should be disabled before comparing memory dumps. I'll add a note in the readme.

@superfury
Copy link

This is my new dump of the comparison of UniPCemu and Bochs after having disabled said test:

0000FFFC F5 00
0000FFFD FF 00
0000FFFF F0 00
0001FFBF 56 46
0001FFE4 08 00
0001FFF0 08 00

Only a few bugs left! :D

@barotto
Copy link
Owner Author

barotto commented Aug 25, 2018

:D
Now seems the stack has still some issues....
Keep me posted.

@barotto
Copy link
Owner Author

barotto commented Mar 2, 2019

Interestingly, your emulator, IBMulator, seems to do that behaviour as well, according to your source code:
https://github.com/barotto/IBMulator/blob/master/src/hardware/cpu/executor/opcodes.cpp
Lines 4707(sldt), 4992(str), 4863(smsw).

At least for SMSW my emu differ from Bochs as my emu stores 0s in the upper 16 bits, whereas Bochs stores the whole cr0 (32bit).
I honestly can't remember where I got that information, but I intend to verify the correct behaviour on real hw (386 and 486).

I can confirm smsw eax puts 32bit CR0 in eax on 386sx and 486dx.

@superfury
Copy link

OK. So that's correct behaviour. What about 32-bit SLDT/STR?

@barotto
Copy link
Owner Author

barotto commented Mar 3, 2019

OK. So that's correct behaviour. What about 32-bit SLDT/STR?

Upper 16 bits are set to 0.

@superfury
Copy link

superfury commented Mar 3, 2019

So that's the same behaviour as UniPCemu has already implemented.

The only thing that keeps me wondering is what the remainder of the bug in my instruction emulation and related emulation itself are. Still can't get NT 3.1 to boot from the hard disk(did manage to fix Ctrl+Alt+Del just now, which causes the error code of the BSOD within brackets (0x0000007B (< thisone >, 0x00000000, 0x00000000, 0x00000000)) to change, but it still fails to correctly boot the hard disk due to some weird driver problems(although all HDD timings should be OK according to PCem(based on my timing of ATAPI PACKET command to be 20us)) weirdly crashing the disk driver(eventually ntoskrnl crashing on itself after the write(and readback) of the ATAPI sector count register(actually ATAPI Interrupt Identification register, which is R/O). The primary drive(IDE HDD) seems to be detected just fine, but immediately after it tries to detect the ATAPI secondary drives(secondary master&slave), which it somehow doesn't like, causing the kernel to panic somehow(perhaps due to a CPU bug)?

@superfury
Copy link

superfury commented Mar 5, 2019

Just been reading http://www.rcollins.org/Productivity/DescriptorCache.html . It makes sense, but the tables at the bottom seems to have the descriptor S-bit(System) inversed?

Also, it seems that the Executable bit isn't required for execution permissions on the code segment descriptor cache(code fetches using said descriptor)?

@superfury
Copy link

superfury commented Mar 6, 2019

Any idea if something happens besides the CPU reset line being raised and lowered on a triple fault? So what happens if IDTR.limit<3, an interrupt occurs(leading to a triple fault) and the A20 gate is disabled(forcing A20 to 0)? Will the CPU fetch from FFFFFFF0(A20 gate on) or FFEFFFF0(A20 gate off, probably leading to an #UD fault that might be used by software for CPU Identification(using EDX etc.)) after reset toggles back to 0(due to bus logic)?

@barotto
Copy link
Owner Author

barotto commented Mar 7, 2019

Just been reading http://www.rcollins.org/Productivity/DescriptorCache.html . It makes sense, but the tables at the bottom seems to have the descriptor S-bit(System) inversed?

Yep, it's inverted. S=1 is code/data, S=0 is system

Also, it seems that the Executable bit isn't required for execution permissions on the code segment descriptor cache(code fetches using said descriptor)?

Code segment bit is checked in protected mode when CS is loaded.
I'm not sure what happens if you load CS with LOADALL with a non-executable (data) descriptor cache. #GP? nothing? LOADALL is undocumented and it's used on only on 286s.

@barotto
Copy link
Owner Author

barotto commented Mar 7, 2019

Any idea if something happens besides the CPU reset line being raised and lowered on a triple fault? So what happens if IDTR.limit<3, an interrupt occurs(leading to a triple fault) and the A20 gate is disabled(forcing A20 to 0)? Will the CPU fetch from FFFFFFF0(A20 gate on) or FFEFFFF0(A20 gate off, probably leading to an #UD fault that might be used by software for CPU Identification(using EDX etc.)) after reset toggles back to 0(due to bus logic)?

I've always thought A20 is enabled when the CPU is reset, because that's what every emu does. I've never questioned that behaviour. But this is telling another story:
http://www.rcollins.org/Productivity/A20Reset.html
Do you know of any specific program that exploits the A20 line with a CPU reset to do CPU identification?

@superfury
Copy link

superfury commented Mar 7, 2019

Yes, found that article as well.

So that means that A20 still keeps it's state when resetting externally(using the 8042 or system port A).

But what's explained nowhere is what happens when the CPU sends a shutdown signal on the bug(due to triple fault). Various documentation says the reset line is triggered, which makes sense. But what's happening to the A20 gate in that case isn't explained ANYWHERE.

One simple way to verify it is as said earlier, LIDT in real mode with a limit of 0, then throw an interrupt. Maybe hook Interrupt 06h and make it set the top-left of the display character(direct VGA A0000&A0001 writes) then CLI HLT? If that triggers instead of the BIOS rebooting, you'd know that(at least for that motherboard it's ran on) A20 isn't affected by the shutdown?

Such a program would be trivial with nasm:

bits 16
org 100h
mov ax,2
int 10h ;Set text mode
mov ax,0
mov ds,ax
mov ax,cs
mov word ptr [24],offset handler
mov word ptr [26],ax ;Install #UD handler
lidt [cs:offset idtr] ;Make sure INT triple faults
int 0 ;Any int # to triple fault
idtr: dw 0 ; Must be 0
dd 0 ;Unused
handler:
mov ax,0xb800 ;Setup MDA text mode segment
mov ds,ax
mov byte ptr [0],65
mov byte ptr [1],4 ; Put red A in top-left position
cli
hlt ; Stop the CPU!

What happens when you run said program on your 386/486 machine? Does it print A or reboot?

@superfury
Copy link

superfury commented Mar 15, 2019

Any plans on testing the remainder of protected-mode instructions and functionality(far jmp/call in protected mode(call gates and normal segments), interrupts(both INT and IRET), far return)?

I also see some untested control transfers(opcodes E9 and EB(near jumps), C2(near return))?

@barotto
Copy link
Owner Author

barotto commented Mar 17, 2019

What happens when you run said program on your 386/486 machine? Does it print A or reboot?

I've tested your program on all my retro gear and the results are interesting:
PS/1 2011 (286) = red A
PS/1 2121 (386sx) = red A
PS/1 2133 (486dx) = reboot
generic 486dx-50 = hangs with a blank screen and blinking cursor on the top left but no red A

So it happears to depend on the specific system.

Any plans on testing the remainder of protected-mode instructions and functionality

As the title of this issue implies, eventually yes, but real life issues take the precedence.

@superfury
Copy link

superfury commented Mar 29, 2019

I've just gave Windows 3.0,3.1 and WFW3.11 another go for real and protected(WFW in it's only 386 mode) modes. Both 3.0 run fine now(when in Windows apps).

The only issue left there is an unresponsive MS-DOS program when ran from Windows(real(3.0) and protected(3.0&3.1) modes). In 3.0, typing "dir" a few times, after about 3-4 times "r" is printed at the MS-DOS 6.22 prompt. In 3.1 it's fully unresponsive.

COMMAND.COM reaches the command line and input in both cases.

386 enhanced mode is broken for all 3.x distributions(thus WFW3.11 doesn't boot at all) and all result in a text-mode black screen(80x25) with blinking cursor at character at 0,0.

Any ideas on possible causes? If even 3.0 in real mode goes wrong, does that mean a CPU bug or perhaps a 8042/Keyboard issue?
Edit: Just was debugging the PS/2 keyboard. Then I noticed an undefined case for the generic 0xf3(set typematic rate/delay) handling, called when starting Windows. Whoops. It's fully emulated, but bugged since accurate timing was implemented in the PS/2 keyboard. :S

Edit: Just improved the 8042 a bit to not change the buffer while translating the scancode set(only 1c and 9c in the buffer instead of 1c(status bit 0= 1), f0(status bit 0= 0), 9c(status bit 0= 1). Instead the f0 step doesn't happen so just 1c and 9c(both status 0=1. status 0=0 and value=1c in between when read(not changing to f0 incorrectly)).

@superfury
Copy link

Just fixed a 'bug' in the 8042 that was clearing empty 8042 output buffers to become zeroed when checking for new input to fill the buffer with. But somehow, MS-DOS and Windows 3.x seem to require said value to still be there to run properly?
After fixing said clearing(and leaving whatever part of a scancode in the buffer for software to read until something is actually received), MS-DOS from Windows 3.0(tested in real mode so far) seems to properly receive input now! :D

Also, ever noticed that you can actually nest windows 3.0 real mode sessions within each other this way? Run windows (on a 80(1)86 or with the /r parameter), then from within Windows, run the MS-DOS prompt shortcut, after which you can once again type "win /r" to run a second Windows 3.0a session.

This is proven to actually be nested, because when I looked at Windows' memory stats(using the Help->About option), less free memory was seen left in the nested Windows session compared to the one that was already loaded before executing the MS-DOS prompt shortcut.
Also, returning from the nested Windows 3.0 session and exit back to Windows uses different paths(D:\WINDOWS becoming D:(where I originally ran "win /r" from) instead). Also, the selected MS-DOS prompt option from the first Windows session was still selected when returning using the MS-DOS exit command! :D

@superfury
Copy link

superfury commented Apr 5, 2019

Just looking through the opcodes again(the normal opcodes seem to be fine. The 0F opcodes I'm still reviewing). Strangely enough, http://ref.x86asm.net/coder32.html#x0FB7 and http://ref.x86asm.net/coder32.html#x0FBF seems to imply that a 16-bit version of said opcodes does exist, as it mentions r16/32 in the op1 column?

Edit: Just found and fixed a bug in the PUSH/POP SegReg instructions that caused it to increase/decrease virtual ESP(for protection checks) by 2 instead of 4 while verifying against 16-bit memory accesses. As well as fixing 16-bit SIDT/SGDT to properly fill the final byte with 0x00(386+) or 0xFF(286+).
Edit: The 80386 also seems to have had errors in loading the IDTR/GDTR with LIDT/LGDT when the operand size was 16-bit, causing it to load the highest byte(92h in the MSB of the IDTR/GDTR) incorrectly.

@barotto
Copy link
Owner Author

barotto commented Apr 5, 2019

Strangely enough, http://ref.x86asm.net/coder32.html#x0FB7 and http://ref.x86asm.net/coder32.html#x0FBF seems to imply that a 16-bit version of said opcodes does exist, as it mentions r16/32 in the op1 column?

16-bit 0FB7 and 0FBF are both MOV Gw,Ew.

@superfury
Copy link

superfury commented Apr 5, 2019

So, 0FBF is MOVSX Gv,Ew and 0FB7 is MOVZX Gv,Ew. Is that correct?

The Gv being either a 32-bit register or 16-bit register(depending on the operand size) and Ew always being a 16-bit memory location or register?

@superfury
Copy link

superfury commented Apr 5, 2019

Just ran the Bochs comparision of memory again. Now I end up with the following in the compare file:
00000525 9F 9E
0000083D F1 F0
0000084D B3 B2

Now the question, what is it?
Edit: So byte 5 of selector 20h in the GDT is supposed to not set bit 0. That's CC_SEG_PROT32 ?
Byte 5 of selector 38 of the LDT has the same problem(setting bit 0). That's ROU_SEG_PROT ?
Byte 5 of selector 48 of the LDT is supposed to have the same issue. That's DPL1_SEG_PROT ?

Hmmmm... They're all accessed(A-bit) bits of said descriptors? Aren't those set when said code/data descriptors are loaded into the processor(before parsing faults for their contents)?
Edit: After improving touching of segments to only occur once a segment is fully loaded into the descriptor cache, the 0000084D error disappears. But the other two entries(CC_SEG_PROT32 and ROU_SEG_PROT) remain incorrect?

Is it the case that VERR and VERW don't 'touch' the descriptor, setting the Accessed-bit(bit0) of the selector to 1)?

Edit: Changing the VERR and VERW instructions to not touch the descriptor(just load it, nothing more), those differences compared to Bochs disappear?

Interestingly enough, Bochs instructions of LAR, LSL, VERR and VERW all have this behaviour(as far as I can see in the source code file). They don't load the access rights(set it's accessed bit)?

@barotto
Copy link
Owner Author

barotto commented Apr 6, 2019

So, 0FBF is MOVSX Gv,Ew and 0FB7 is MOVZX Gv,Ew. Is that correct?

The Gv being either a 32-bit register or 16-bit register(depending on the operand size) and Ew always being a 16-bit memory location or register?

Yes.
32-bit 0FBF = MOVSX Gv,Ew
16-bit 0FBF = MOV Gw,Ew
32-bit 0FB7 = MOVZX Gv,Ew
16-bit 0FB7 = MOV Gw,Ew
Gv=r32, Gw=r16, Ew=r/m16 (see http://ref.x86asm.net/index.html#Instruction-Operand-Codes)

@barotto
Copy link
Owner Author

barotto commented Apr 6, 2019

Interestingly enough, Bochs instructions of LAR, LSL, VERR and VERW all have this behaviour(as far as I can see in the source code file). They don't load the access rights(set it's accessed bit)?

The accessed bit is set when the descriptor is loaded into a segment register. LAR, LSL, VERR and VERW load the descriptor data from memory without altering any seg reg and the Intel docs don't mention any a-bit alteration for those instructions. So I think the Bochs behaviour is correct.

@superfury
Copy link

superfury commented Apr 6, 2019

OK. So my current implementation of MOVSX and MOVZX is now correct again.
As for the accessed bit, I'd assume it's set to 1 after the descriptor cache has been loaded with said value(or reached a point where that can't be undone anymore)?

Also, Ev=r/m16/32 and Gv(which is used for the register part of opcode 0FBF/0FB7)=r16/32. The 'v' part of that means that it's a 16-bit register or memory or 32-bit register or memory, depending on the operand size(e.g. Ev means AX in 16-bit operand size, EAX in 32-bit operand size with R/M being 0 and MOD being 3, the same applies to Gv being EAX or AX when reg=0, depending on the operand size).

So the 'v' suffix is a 32-bit memory location(r/m only) or register(either r/m or reg) or 16-bit one, depending on the operand size.
A 'b' suffix means always a byte register or memory location. A 'w' suffix means the same(but word instead), see also the MOV SegReg's official documentation(which we already know is wrong for moving to memory and register(with respect to the upper bits, since quite a while ago(hence the tests in test386.asm checking for those in the segmentation section for said opcodes)). A 'd' means doubleword(used for e.g. CR0-CR7 and related opcodes).

You seem to think the 'v' is always 32-bit register or memory. But all documentation I can find implies it's actually a word or doubleword location in memory or register(which is determined by the operand size instead of being hardcoded for the instruction(as is the case with 'b', 'w' and 'd')).

So, my current implementation according to the above logic is:
32-bit 0FBF = MOVSX Gv,Ew (same as MOVSX Gd,Ew)
16-bit 0FBF = MOVSX Gv,Ew (same as MOVSX Gw,Ew)
32-bit 0FB7 = MOVZX Gv,Ew (same as MOVZX Gd,Ew)
16-bit 0FB7 = MOVZX Gv,Ew (same as MOVZX Gw,Ew)
And that's exactly how UniPCemu now interprets those instructions.

The original crash to a black screen with a message on Windows 95 not being able to boot and please reinstall is now gone. Instead, the Windows 95 hangs the graphics animation that's booting(when not booted in Safe mode) instead. I see it infinitely #UD faulting on a ARPL instruction now?

Edit: The Windows 95 bootlog.txt became very small now, only a few rows:

[000D00EF] LoadSuccess    = C:\WINDOWS\HIMEM.SYS
[000D00EF] Loading Device = C:\WINDOWS\IFSHLP.SYS
[000D00F0] LoadSuccess    = C:\WINDOWS\IFSHLP.SYS
[000D00F0] Loading Device = C:\WINDOWS\SETVER.EXE
[000D00F0] LoadSuccess    = C:\WINDOWS\SETVER.EXE
[000D0280] Loading Vxd = VMM
[000D0292] LoadSuccess = VMM
[000D0292] Loading Vxd = C:\WINDOWS\SMARTDRV.EXE
[000D0292] LoadSuccess = C:\WINDOWS\SMARTDRV.EXE
[000D0294] Loading Vxd = CONFIGMG
[000D029C] LoadSuccess = CONFIGMG
[000D029C] Loading Vxd = VSHARE
[000D029E] LoadSuccess = VSHARE
[000D029E] Loading Vxd = VWIN32
[000D02A2] LoadSuccess = VWIN32
[000D02A2] Loading Vxd = VFBACKUP
[000D02A3] LoadSuccess = VFBACKUP
[000D02A3] Loading Vxd = VCOMM
[000D02A4] LoadSuccess = VCOMM
[000D02A4] Loading Vxd = COMBUFF
[000D02A5] LoadSuccess = COMBUFF
[000D02A5] Loading Vxd = C:\WINDOWS\system\VMM32\IFSMGR.VXD
[000D02A9] LoadSuccess = C:\WINDOWS\system\VMM32\IFSMGR.VXD
[000D02A9] Loading Vxd = C:\WINDOWS\system\VMM32\IOS.VXD

So IOS.VXD is now inexplicably crashing?
Edit: So since loading the IOS.VXD crashes(and everything earlier loads fine), the problem is in IFSMGR.VXD in this case(which is likely, since it's a filesystem and block device driver, as well as everything before it loads fine)?
Edit: Undoing the new behaviour on the setting of the Accessed bit still makes it triple fault on a stack access(and reboot) during the same code block. So the problem isn't anywhere inside said code, but somewhere else... Hmmm...

Edit: Reinstalling Windows 95 seems to fix that(see vogons thread https://www.vogons.org/viewtopic.php?f=9&t=65223&p=747883#p747883 ).

In safe mode it still gets a BSOD on the "Initializing KERNEL" being the last thing that's logged?

@superfury
Copy link

I notice something, looking through the arithmetic(EE) tests: you're not testing all memory-related cases of the tested opcodes.
Opcodes 00-3F only seems to test for memory as the destination(being read and written to), not the source(only being read from, destination being the register of the modr/m(opcodes ending in the 0 and 1 nibbles)).
Opcodes 80-83 don't test the memory cases at all.
The same for opcodes F6 and F7, as well as FE&FF INC/DEC(but for register operands).
0FAF isn't tested for memory operands.
And F6/F7 (I)DIV don't verify reads from memory operands.

Also C* and D* shift/rotate instructions aren't tested for memory operands.

@superfury
Copy link

Just tried the OS/2 warp 3 installer again. Now I noticed that it was throwing a #GP(0) fault at a MOV CR2,ECX instruction. Then, digging deeper, I noticed that it was faulting because it was checking the invalid condition of CR0 with PG set and PE cleared at the loading of ANY CR-register! Whoops. Then, fixing that, it continues on to the DISK1's boot screen(which finally appears now!) and seems to hang(the floppy disk indicator keeps being on, thus the drive motor isn't being stopped at anytime).

@superfury
Copy link

superfury commented Apr 20, 2019

I'm now looking into the Expand-Down segments. Some sources state that the base is actually 64KB or 4GB(thus no additional effect compared to Expand-Up segments, except the limit algorithmic) below the base address? So does that mean that if the Base address in the descriptor is set to e.g. 0x80000000 and a offset of FFFC(on a 64KB data descriptor with the Big bit set to 0), said dereference actually results in a linear address of 0x7FFFFFFC?
Edit: This probably isn't the case, seeing as implementing that crashes OS/2 during boot even earlier than it already does(it's currently still crashing on an address(what it's reporting) that doesn't fault according to my fault raising breakpoints?).

@superfury
Copy link

superfury commented May 3, 2019

One good thing to report now: Windows 3.11 is now properly booting in 386 enhanced mode! :D

It just seems that one 3.x 386-enhanced mode application(using win32s) seems to crash within the kernel due to the kernel double faulting and triple faulting because the stack overflows(due to nested page faults) on address 0x80000FFC on what seems to be inside the page fault handler, faulting on a ADD instruction(either 8-bit or 32-bit) addressing some user-mode memory(just below 80000000), which fails due to said page not being in memory?

Edit: Still can't get Linux to boot, since it tries to return using a near return to a high user-mode address(in the 7XXXXXXX range) while it's having a CS base of c0000000 and a limit of 3fffffff(thus mapping lower memory to high memory for easy transition to 32-bit split user and kernel modes transparently). Since said address breaks the CS limit, it throws a #GP(0) fault, which Linux obviously doesn't like. I do see that said return address on the stack is never written to, except by the boot extraction program(it's final instruction being a block move writing to said address(probably the extracted kernel image)) just before jumping to the kernel entry point. It's still in it's intial kernel software, as CR3 is stilll 101000(4KB(it's PDE) past the 1MB barrier). So the kernel itself that's used during the setup is crashing in that case(due to some strange unmodified memory location being RETN'ed to.
I also see values of 2 through 30h(not all of them, just some of them) being loaded into EBP, which is strange, seeing as a stack frame is thus starting within the zero-page, which is invalid in both Linux and Windows?
I do see that the values when said writing the register have it's value OR'ed with 0x200000, so it looks like it's clearing the top 16 bits of the register somehow? That looks like some 32-bit MOVZX or MOVSX behaviour?

@superfury
Copy link

superfury commented May 15, 2019

Hmmmm... Running JEMM386 in MS-DOS seems to run fine in all settings(386, Pentium with and without VME). But when I try to run it in FreeDOS, it faults(exception 03h, 08h and 09h, as it dumps) on the Pentium and even triple faults on the 80386!

Edit: The 80386 even triple faults on JEMM386 when ran in FreeDOS. Runs fine in MS-DOS 6.22, though. So perhaps a FreeDOS problem with my x86 emulation?

@superfury
Copy link

Managed to fix some more bugs, including the PIC triggering incorrect INT 0 problems(providing interrupts when there were none). :S

Now, with some more fixes, Windows 95 can (sometimes) boot in Normal mode until the Initializing the kernel(and at all other times a Windows Protection Error on the second ESDI_506.pdr(hard disk driver, probably secondary ATA controller?).

Trying to boot the Basic Linux 3.5 from a floppy crashes with an IRET(D) trying to execute a task return to task 0x0001(descriptor type 02h), due to the task register still being NULL(0x0000, type=0x82)? That's odd, as TR is never loaded by LTR? No LTR is observed during boot and FLAGS' high 4 bits are 0x7?

@superfury
Copy link

Just added IN/OUT instruction tests(which are actually fully automated checks, not requiring manual looking at it's results which use OUT and IN instructions to port 0 and 2(the DMA address ports) combined with the resetting of the flopflop(writing port 0xC) to the remainder testsuite.
It simply writes some test patterns(0x55 for direct ports, 0x22 for dx ports), resets the AL/AX register to 0xAA or 0xAAAA(or 0x44 and 0x4444 for DX ports) before each read. They also check out properly. So the I/O ports have no issues with them at all.

Still trying to get MS-DOS 6.22 setup disk 1 to boot. It keeps hanging when handling some interrupt or something like that, while detecting the system configuration(It literally says in it's cyan screen: "Please wait. Setup is checking your sytem configuration."). Somehow, the CPU has some bug causing it to not work properly there?

@superfury
Copy link

superfury commented Jan 24, 2021

Interesting! With the new RCR checks fixed, I get errors on UniPCemu's 32-bit RCR instruction, according to the reference!

test386-EE-reference.zip
Edit: The cause was a modulo 33 on the 32-bit RCR's amount of bits shifted, instead of a proper masking to 5 bits. The issue disappeared once I've fixed that. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants