Fix: IDE: restore shared writes to ATA Command Block registers#1541
Fix: IDE: restore shared writes to ATA Command Block registers#1541copy merged 1 commit intocopy:masterfrom
Conversation
1b90d2e changed writes to ports 0x1F1-0x1F5 (Features, Sector Count, LBA Low/Mid/High) from writing both master and slave to writing only current_interface. The commit message marked it experimental: 'It is not clear why these 6 functions simultaneously modify master and slave attributes... experimental until there is some explanation.' The explanation: these are channel registers, not device registers. There is one physical register file on the cable; both drives latch the same values. Only the Device register's DEV bit (port 0x1F6 bit 4) selects which drive responds to a subsequent COMMAND write — the other registers are shared by definition. Windows 95/98's ESDI_506.PDR depends on this when using LBA mode (disks above ~528MB). It writes LBA registers, switches DEV, issues a command — and the newly-selected drive sees stale registers from before the writes. The IDE read never completes; the OS spins at the splash screen waiting for an IRQ that never fires. Found via JS-only bisect against a 1GB Windows 95 image: 3c944a0 boots, 1b90d2e hangs deterministically. Fixes copy#1462, fixes copy#1502.
I haven't tested this, but this argument (and its implications) makes sense and also explains the old state of the code (before 1b90d2e), nice catch! I wish I had known this back then, my mistake was to assume that a device always gets selected before being otherwise adressed through these 6 registers. |
|
Good analysis, thanks! |
This took me a while to figure out, so bear with me! I promise there's a reason this is so long.
What
Restores the dual master+slave writes for ATA Command Block registers (Features, Sector Count, LBA Low/Mid/High — ports
0x1F1-0x1F5) that 1b90d2e changed to write onlycurrent_interface.Why
The 1b90d2e commit message asks: "It is not clear why these 6 functions simultaneously modify master and slave attributes instead of just modifying this.current_interface (as is the case everywhere else in the code). This patch changes that to use this.current_interface, experimental until there is some explanation."
The explanation is the hardware: the Command Block registers are channel registers, not device registers. There's one physical register file on the IDE cable interface — both drives are wired to the same I/O port lines and latch the same value on a write. Only the Device register's DEV bit (port
0x1F6bit 4) selects which drive responds to a subsequent COMMAND write (port0x1F7). The pre-1b90d2e7 code was modeling this correctly.What it broke
Windows 95/98's protected-mode IDE driver (
ESDI_506.PDR), when the disk is large enough to require LBA mode (~528MB+), does this during slave detection:0x1F6With single-interface writes, step 3 sees the slave's stale registers from before step 1. The 16-bit FIFO shift (
reg = (reg << 8 | data) & 0xFFFF) makes the divergence worse on each write. The read never completes → IRQ never fires → boot hangs at the splash screen with the CPU spinning indefinitely.This is the root cause of:
The 535MiB threshold matches the CHS→LBA transition. Images under that size don't hit this code path, which is why the copy.sh demo (450MB) and smaller test images haven't shown it.
Found via
JS-only bisect against a 1GB Windows 95 OSR2 image (Closure-rebuild
libv86.jsat each commit, freeze the wasm to sidestep toolchain drift).3c944a02boots to desktop in ~30s;1b90d2e7hangs at the splash screen across 3/3 attempts.Not changed
Port
0x1F6(Device register) writes still go tocurrent_interfaceonly — that handler does the drive-select switch first, so it's already pointing at the right interface when it writesdevice_reg,is_lba,head. (Those last two arguably should also be shared, but the bisect shows they're not load-bearing for this fix.)