-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emulator Crashes #116
Comments
thanks. does it crash with the 0.2.0 release build? |
Hi @dbalsom,
Yes, it does. |
Looks to be a bug with 0xFE - I execute 3 cycles but only provide 2 microcode addresses. The instruction in question is an undefined form, which is a bit unusual. My 8088 tests don't exercise the undefined 0xFE forms it so they didn't catch it. This raises another point, that doing the bounds checking for the microcode line number slice every time I call cycles_i() is expensive, and the microcode logging was never fully accurate, so I may just tear that whole feature out. |
Hello @dbalsom,
May I ask exactly what is the instruction sequence being referred to? Are we talking about an undefined 8086 instruction starting with 0xFE, where the following byte modRMReg bits indicate (byte operand size and 2 <= modRMReg <= 6) or modRMReg == 7? (That is 0xFE-based CALL/JMP/PUSH with byte operand or regmod == 7). @Vutshi, I am wondering whether this was reproduced using unmodified NASM output, or whether this is the CS segment having been scribbled upon and creating an invalid opcode... ? Thank you! |
This is minimally modified NASM code for DOS which barely runs in ELKS. I guess it writes over some parts of ELKS and creates invalid opcode. Best |
Sorry, the opcode is actually FF. The same bug is present in both FE and FF forms, and the instruction byte sequence is FF FF. The bug is present in both FE and FF forms, probably due to copy-paste. In any case both bugs involve the invalid form of CALL FAR with a register operand (mod=3,r/m=7) However REG is 7 which should be PUSH so I am not sure how I am winding up there. The disassembly viewer can be inaccurate if there is self modifying code, but IRET should clear the queue. I guess I have more debugging to do. This value is actually from video memory at B800:000A. Here is where things go off the rails: For some reason the trap flag is on and the trap handler iret returns us to a bad segment which wraps into VRAM. |
Thanks for the clarification and screenshot. Since the IP is in video RAM that pretty much confirms a major problem. I would guess that since the code seems to be executing IRETs, a stack/data overwrite could be the cause. The application program is running as a first-port under real mode ELKS which doesn't allow setting interrupt vectors, and a hardware timer interrupt during an application interrupt handling could cause this sort of thing, as ELKS may have switched stacks, etc. What I do find interesting is that special handling of 0xFFFF (like 0x00) is probably a good idea for emulators since those two values seem highly probable when a program takes a wrong turn... I've made a note to do the same for some of my emulators. |
I do have an 'off rails detection' option, that triggers on 5+ repeated opcodes, which was very useful for development as I was off the rails often. It does have some false positives in rare code paths, but you could bump that up to 10 and probably be fine. |
Do you suppose the emulator ran through the 0xFFFF's then died on 0xFFDD? In any case, I've noted that invalid 0xFF-starting instruction sequences might want to trap for my emulators.
That's a good idea :) |
yes FF FF itself is valid, PUSH DI. |
you can either have a list of opcodes to trigger on repeat (0x00 and 0xFF would catch a lot) or have exceptions for things like NOP, as NOP can be repeated a lot for timing purposes. Also probably avoid IN and OUT as IO was used for timing things, like adlib ports. I have encountered a few titles (demos and such) that count on 0x00 0x00 being a fairly innocuous ADD instruction so they can roll through blocks of initialized but otherwise empty memory... |
Very good ideas. I'll take a look at how this might be added to my Blink16 and 86Sim projects. You've done a great job on MartyPC especially being cycle-exact, the GUI looks super also :) |
Thanks, I appreciate the kind words. As an emulator author I don't suppose you've seen my 8088 tests? This issue is good motivation to add tests for the illegal FE and FF forms :D |
Wow, you've really gone to town on that!! Thank you, I'l take a closer look at them.
Yes, my emulators (but not disassembler) currently ignore many invalid opcodes but now I realize that misses an opportunity to flag to the application developer something they likely don't know about. Of course, a runtime flag to allow the program to keep running with undefined behavior anyways is probably also needed, as that's what real hardware does in real mode! |
Yeah, an 8088 is unstoppable, you can literally feed it random bytes (as long as you filter out HLT) and it will keep going... I am working on a new test set for the NEC V20. It is not quite as resilient as the 8088 - a few undefined forms will make it unhappy and it enters a halt-like state (with no HALT bus condition) |
Interesting on the NEC V20. Is that chip the same internal Intel-licensed silicon design, or did NEC come up with a new implementation and only implement the chip ISA? |
NEC originally cloned the 8088 and 8086, got sued by Intel over it, and settled the lawsuit by signing an agreement to become an official second-source provider. Everything was fine until they got restless and extended the 8088/86 with 186 opcodes, new NEC-specific opcodes and 8080 support with the V20/V30 and got sued a second time. They won that lawsuit and would go on to make a bunch more V-series CPUs, none of which were quite as notable. One of the reasons they won the second lawsuit is they proved that they had a clean-room approach to writing the V20 microcode and didn't simply copy Intel's. What I find hilarious is that the ruling also said Intel lost their copyright to the microcode because they didn't stamp a (C) symbol on all of their chips. You used to have to register copyright for it to be in effect... |
This weekend I will pull the 5150 back out and see if this mandelbrot works on hardware. If MartyPC is correctly emulating things, bugs aside, I would expect the mandelbrot program to hang when it gets stuck in VRAM. it's still possible I have some bug in handling the trap flag. That is one of the areas I have done the least testing as I cannot turn the trap flag on in my arduino validator. I guess another good question is - is the trap flag supposed to be on? If not, it's possible we went off the rails earlier and did a POPF from garbage. |
No wonder lawyers have to be cautious when one can lose IP over not having copyright stamped on just documents surrounding the work, but having to stamp the work of art itself. Now I'm wondering whether the replacement click-to-agree license agreements are valid for associated hardware. I guess I'll have to check my smart toaster for possible non-compliance.
IIRC, there's an IBM application note out on TF for the XT vs AT PCs when the trap handling changed, as well as for exactly what happens when |
Yeah 286 changed quite a few things. Division exceptions are handled differently as well |
fixed in 8d48c74 |
Describe the bug
I run a program in ELKS and martypc crashes with the following output:
To Reproduce
Steps to reproduce the behavior:
mand_el4
Expected behavior
Do not crash.
Environment (please complete the following information):
Build info
Disk image
fd360-fat-ELKS.img.zip
The text was updated successfully, but these errors were encountered: