Emulator Crashes #116

Vutshi · 2024-05-18T12:53:26Z

Describe the bug
I run a program in ELKS and martypc crashes with the following output:

Loaded keyboard mapping file: ./configs/keyboard_layouts/keyboard_US.toml
thread 'main' panicked at core/src/cpu_808x/cycle.rs:632:26:
index out of bounds: the len is 2 but the index is 2
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at /Users/x2241064/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.29.15/src/platform_impl/macos/app_state.rs:387:33:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }

To Reproduce
Steps to reproduce the behavior:

Boot the attached ELKS image
login: root
Execute mand_el4
Following the rendering of the Mandelbrot set, martypc will crash

Expected behavior
Do not crash.

Environment (please complete the following information):

macOS 13.6.6 (22G630)
2,3 GHz Quad-Core Intel Core i7
Intel Iris Plus Graphics
Rust 1.78.0

Build info

Built from source fdf1231

Disk image
fd360-fat-ELKS.img.zip

The text was updated successfully, but these errors were encountered:

dbalsom · 2024-05-18T15:41:31Z

thanks. does it crash with the 0.2.0 release build?

Vutshi · 2024-05-18T20:45:44Z

Hi @dbalsom,

thanks. does it crash with the 0.2.0 release build?

Yes, it does.

dbalsom · 2024-05-22T18:39:23Z

Looks to be a bug with 0xFE - I execute 3 cycles but only provide 2 microcode addresses. The instruction in question is an undefined form, which is a bit unusual. My 8088 tests don't exercise the undefined 0xFE forms it so they didn't catch it.

This raises another point, that doing the bounds checking for the microcode line number slice every time I call cycles_i() is expensive, and the microcode logging was never fully accurate, so I may just tear that whole feature out.

ghaerr · 2024-05-22T20:11:10Z

Hello @dbalsom,

Looks to be a bug with 0xFE
The instruction in question is an undefined form, which is a bit unusual.

May I ask exactly what is the instruction sequence being referred to? Are we talking about an undefined 8086 instruction starting with 0xFE, where the following byte modRMReg bits indicate (byte operand size and 2 <= modRMReg <= 6) or modRMReg == 7? (That is 0xFE-based CALL/JMP/PUSH with byte operand or regmod == 7).

@Vutshi, I am wondering whether this was reproduced using unmodified NASM output, or whether this is the CS segment having been scribbled upon and creating an invalid opcode... ?

Thank you!

Vutshi · 2024-05-22T20:18:29Z

@Vutshi, I am wondering whether this was reproduced using unmodified NASM output, or whether this is the CS segment having been scribbled upon and creating an invalid opcode... ?

This is minimally modified NASM code for DOS which barely runs in ELKS. I guess it writes over some parts of ELKS and creates invalid opcode.

Best

dbalsom · 2024-05-22T20:59:46Z

Hello @dbalsom,

Looks to be a bug with 0xFE
The instruction in question is an undefined form, which is a bit unusual.

May I ask exactly what is the instruction sequence being referred to? Are we talking about an undefined 8086 instruction starting with 0xFE, where the following byte modRMReg bits indicate (byte operand size and 2 <= modRMReg <= 6) or modRMReg == 7? (That is 0xFE-based CALL/JMP/PUSH with byte operand or regmod == 7).

@Vutshi, I am wondering whether this was reproduced using unmodified NASM output, or whether this is the CS segment having been scribbled upon and creating an invalid opcode... ?

Thank you!

Sorry, the opcode is actually FF. The same bug is present in both FE and FF forms, and the instruction byte sequence is FF FF. The bug is present in both FE and FF forms, probably due to copy-paste. In any case both bugs involve the invalid form of CALL FAR with a register operand (mod=3,r/m=7) However REG is 7 which should be PUSH so I am not sure how I am winding up there. The disassembly viewer can be inaccurate if there is self modifying code, but IRET should clear the queue. I guess I have more debugging to do.

This value is actually from video memory at B800:000A. Here is where things go off the rails:

For some reason the trap flag is on and the trap handler iret returns us to a bad segment which wraps into VRAM.

dbalsom · 2024-05-22T21:11:53Z

Ah, I figured it out. the disassembler does not resolve MMIO addresses (I suppose I never assumed one would try to execute code from VRAM). A peek at the memory viewer which does resolve MMIO shows the true instruction sequence being executed: FF DD.

Mystery solved there I suppose.

ghaerr · 2024-05-22T21:13:47Z

the instruction byte sequence is FF FF.

This value is actually from video memory at B800:000A

Thanks for the clarification and screenshot. Since the IP is in video RAM that pretty much confirms a major problem. I would guess that since the code seems to be executing IRETs, a stack/data overwrite could be the cause. The application program is running as a first-port under real mode ELKS which doesn't allow setting interrupt vectors, and a hardware timer interrupt during an application interrupt handling could cause this sort of thing, as ELKS may have switched stacks, etc.

What I do find interesting is that special handling of 0xFFFF (like 0x00) is probably a good idea for emulators since those two values seem highly probable when a program takes a wrong turn... I've made a note to do the same for some of my emulators.

dbalsom · 2024-05-22T21:16:24Z

What I do find interesting is that special handling of 0xFFFF (like 0x00) is probably a good idea for emulators since those two values seem highly probable when a program takes a wrong turn... I've made a note to do the same for some of my emulators.

I do have an 'off rails detection' option, that triggers on 5+ repeated opcodes, which was very useful for development as I was off the rails often. It does have some false positives in rare code paths, but you could bump that up to 10 and probably be fine.

ghaerr · 2024-05-22T21:17:18Z

shows the true instruction sequence being executed: FF DD.

Do you suppose the emulator ran through the 0xFFFF's then died on 0xFFDD? In any case, I've noted that invalid 0xFF-starting instruction sequences might want to trap for my emulators.

I do have an 'off rails detection' option, that triggers on 5+ repeated opcodes

That's a good idea :)

dbalsom · 2024-05-22T21:19:18Z

Do you suppose the emulator ran through the 0xFFFF's then died on 0xFFDD? In any case, I've noted that 0xFF-starting instruction sequences might want to trap for my emulators.

yes FF FF itself is valid, PUSH DI.

dbalsom · 2024-05-22T21:21:57Z

That's another good idea :)

you can either have a list of opcodes to trigger on repeat (0x00 and 0xFF would catch a lot) or have exceptions for things like NOP, as NOP can be repeated a lot for timing purposes. Also probably avoid IN and OUT as IO was used for timing things, like adlib ports.

I have encountered a few titles (demos and such) that count on 0x00 0x00 being a fairly innocuous ADD instruction so they can roll through blocks of initialized but otherwise empty memory...

ghaerr · 2024-05-22T21:28:53Z

you can either have a list of opcodes to trigger on repeat (0x00 and 0xFF would catch a lot) or have exceptions for things like NOP, as NOP can be repeated a lot for timing purposes. Also probably avoid IN and OUT as IO was used for timing things, like adlib ports.

Very good ideas. I'll take a look at how this might be added to my Blink16 and 86Sim projects. You've done a great job on MartyPC especially being cycle-exact, the GUI looks super also :)

dbalsom · 2024-05-22T21:33:39Z

Very good ideas. I'll take a look at how this might be added to my Blink16 and 86Sim projects. You've done a great job on MartyPC especially being cycle-exact, the GUI looks super also :)

Thanks, I appreciate the kind words. As an emulator author I don't suppose you've seen my 8088 tests?
https://github.com/singleStepTests/8088

This issue is good motivation to add tests for the illegal FE and FF forms :D

ghaerr · 2024-05-22T21:42:22Z

I don't suppose you've seen my 8088 tests?

Wow, you've really gone to town on that!! Thank you, I'l take a closer look at them.

This issue is good motivation to add tests for the illegal FE and FF forms

Yes, my emulators (but not disassembler) currently ignore many invalid opcodes but now I realize that misses an opportunity to flag to the application developer something they likely don't know about. Of course, a runtime flag to allow the program to keep running with undefined behavior anyways is probably also needed, as that's what real hardware does in real mode!

dbalsom · 2024-05-22T21:45:37Z

Yes, my emulators (but not disassembler) currently ignore many invalid opcodes but now I realize that misses an opportunity to flag to the application developer something they likely don't know about. Of course, a runtime flag to allow the program to keep running with undefined behavior anyways is probably also needed, as that's what real hardware does in real mode!

Yeah, an 8088 is unstoppable, you can literally feed it random bytes (as long as you filter out HLT) and it will keep going...

I am working on a new test set for the NEC V20. It is not quite as resilient as the 8088 - a few undefined forms will make it unhappy and it enters a halt-like state (with no HALT bus condition)

ghaerr · 2024-05-22T21:52:52Z

Interesting on the NEC V20. Is that chip the same internal Intel-licensed silicon design, or did NEC come up with a new implementation and only implement the chip ISA?

dbalsom · 2024-05-22T21:58:41Z

Interesting on the NEC V20. Is that chip the same internal Intel-licensed silicon design, or did NEC come up with a new implementation and only implement the chip ISA?

NEC originally cloned the 8088 and 8086, got sued by Intel over it, and settled the lawsuit by signing an agreement to become an official second-source provider. Everything was fine until they got restless and extended the 8088/86 with 186 opcodes, new NEC-specific opcodes and 8080 support with the V20/V30 and got sued a second time. They won that lawsuit and would go on to make a bunch more V-series CPUs, none of which were quite as notable.

One of the reasons they won the second lawsuit is they proved that they had a clean-room approach to writing the V20 microcode and didn't simply copy Intel's. What I find hilarious is that the ruling also said Intel lost their copyright to the microcode because they didn't stamp a (C) symbol on all of their chips. You used to have to register copyright for it to be in effect...

dbalsom · 2024-05-22T22:11:43Z

This weekend I will pull the 5150 back out and see if this mandelbrot works on hardware. If MartyPC is correctly emulating things, bugs aside, I would expect the mandelbrot program to hang when it gets stuck in VRAM.

it's still possible I have some bug in handling the trap flag. That is one of the areas I have done the least testing as I cannot turn the trap flag on in my arduino validator.

I guess another good question is - is the trap flag supposed to be on? If not, it's possible we went off the rails earlier and did a POPF from garbage.

ghaerr · 2024-05-22T22:16:38Z

What I find hilarious is that the ruling also said Intel lost their copyright to the microcode because they didn't stamp a (C) symbol on all of their chips.

No wonder lawyers have to be cautious when one can lose IP over not having copyright stamped on just documents surrounding the work, but having to stamp the work of art itself. Now I'm wondering whether the replacement click-to-agree license agreements are valid for associated hardware. I guess I'll have to check my smart toaster for possible non-compliance.

it's still possible I have some bug in handling the trap flag.

IIRC, there's an IBM application note out on TF for the XT vs AT PCs when the trap handling changed, as well as for exactly what happens when PUSH SP is executed.

dbalsom · 2024-05-22T22:19:12Z

IIRC, there's an IBM application note out on TF for the XT vs AT PCs when the trap handling changed, as well as for exactly what happens when PUSH SP is executed.

Yeah 286 changed quite a few things. Division exceptions are handled differently as well

dbalsom · 2024-06-09T14:30:08Z

fixed in 8d48c74

Vutshi added the bug Something isn't working label May 18, 2024

Vutshi assigned dbalsom May 18, 2024

dbalsom added good first issue Good for newcomers question Further information is requested labels May 18, 2024

Vutshi mentioned this issue May 22, 2024

Games development on ELKS ghaerr/elks#871

Open

dbalsom closed this as completed Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emulator Crashes #116

Emulator Crashes #116

Vutshi commented May 18, 2024

dbalsom commented May 18, 2024

Vutshi commented May 18, 2024

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024

Vutshi commented May 22, 2024

dbalsom commented May 22, 2024 •

edited

Loading

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024 •

edited

Loading

dbalsom commented May 22, 2024

dbalsom commented May 22, 2024 •

edited

Loading

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024 •

edited

Loading

dbalsom commented May 22, 2024 •

edited

Loading

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024

dbalsom commented Jun 9, 2024

Emulator Crashes #116

Emulator Crashes #116

Comments

Vutshi commented May 18, 2024

dbalsom commented May 18, 2024

Vutshi commented May 18, 2024

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024

Vutshi commented May 22, 2024

dbalsom commented May 22, 2024 • edited Loading

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024 • edited Loading

dbalsom commented May 22, 2024

dbalsom commented May 22, 2024 • edited Loading

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024 • edited Loading

dbalsom commented May 22, 2024 • edited Loading

ghaerr commented May 22, 2024

dbalsom commented May 22, 2024

dbalsom commented Jun 9, 2024

dbalsom commented May 22, 2024 •

edited

Loading

ghaerr commented May 22, 2024 •

edited

Loading

dbalsom commented May 22, 2024 •

edited

Loading

dbalsom commented May 22, 2024 •

edited

Loading

dbalsom commented May 22, 2024 •

edited

Loading