Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupts are not cycle accurate #651

Closed
tomcw opened this issue May 29, 2019 · 18 comments

Comments

Projects
None yet
2 participants
@tomcw
Copy link
Contributor

commented May 29, 2019

Email from Arnaud:


Hello Tom,

I may have potentially found a problem with AppleWin.
This time, I remain cautious!
I'm sending you a zip as attached file containing test.dsk and sources to test by yourself.

Once again, with AiPC I get the expected result but we saw that it proved nothing the last time ;)

So here is the starting idea behind this program:

  • I wait until the beginning of the display and I set immediately an interruption with the value of an entire refresh cycle (VBL+DISPLAY) depending on the refresh rate (50Hz or 60Hz).

Here is a snipet of the code (from main.a):

; waiting next DISPLAY to init INT
            LDA bMachine      
-           CMP VERTBLANK         
            BMI -                ; wait end of current display

            LDA bMachine                                                                   
-           CMP VERTBLANK         
            BPL -                ; wait end of current VBL

            ; Here beginning of DISPLAY
            LDA bRefresh
            BNE .NTSC           
.PAL
            ; TIMER PAL
            ; => $4F38 (20280) = 65*(262+50)
            LDA #$38
            LDY #04
            STA (OUT2),Y        ; STA $Cx04         ; T1C-Lower
            LDA #$4F
            INY
            STA (OUT2),Y        ; STA $Cx05         ; T1C-High
            JMP +

.NTSC
            ; TIMER NTSC
            ; => $4286 (17030) = 65*262
            LDA #$86
            LDY #04
            STA (OUT2),Y        ; STA $Cx04         ; T1C-Lower
            LDA #$42
            INY
            STA (OUT2),Y        ; STA $Cx05         ; T1C-High

+ (suite)

What is expected with this code is that each interruption will occur systematically at the beginning of the display (I know it will not be EXACTLY in the first cycle of the display but whatever, it is not important here).

Next, just before to do a JSR PLAY (to play one tick with the PT3 player), I change the page (=>page2 filled with $20). And just after the JSR PLAY, I switch back to the page 1 (filled with $A0).
So with this trick, I must see visually the 'time' using by my PLAY routine...

As the interrupt routine containing the JSR PLAY is - theoretically - executed always at the beginning of the DISPLAY (at the same cycle), I have to see several lines of "$20" , not the same numbers for each tick but with a fixed start.
Sorry but I have a little trouble to explain the operation clearly ;)

In short, the expected result is OK with AiPC (in 50 and 60 Hz). But not with AppleWin. The display of the visual trick is not stable.

I tested with different values for interruption delay but I was never able to stabilize the display with AppleWin. Display moved forward or backward but was never stable...
I do not know where the problem comes from...

Test the disk (test.dsk in DSK directory) with AiPC and I think you will immediately understand the basic idea.

note: the main part of the code is in "main.a" ;)
(boot.a, floadm.a and ppt3.a are - I think - not concerned by the issue)

In any case, it demonstrates the need to have several powerful emulators covering all needs!
And I realize that it is complicated to do something concrete without having a real machine...

Arnaud

@tomcw tomcw added the bug label May 29, 2019

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

Yes, this is an issue with AppleWin.

Due to how I have implemented interrupt support in the 65(C)02 emulation, I only check for interrupts every 128 cycles. I took this approach for speed, so that I wasn't checking for interrupts after every opcode.

NB. All I/O accesses (eg. VERTBLANK=$C019) are cycle accurate - it's only interrupts (eg. Mockingboard TIMER) that have this issue.

I have a comment in the AppleWin code suggesting to change to cycle accurate when running at normal speed, and to only use this 128 cycle periodic check when running in "Full Speed" mode. I will consider making this change now.

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2019

See also #612.

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2019

Triage: it turned out to be more than just checking the 6522 TIMER underflow every cycle. In free-running mode (on underflow), the COUNTER was being reloaded with the LATCH value, but it wasn't accounting for the cycles underflowed, ie:

	pMB->sy6522.TIMER1_COUNTER.w = pMB->sy6522.TIMER1_LATCH.w;

Should be this:

	pMB->sy6522.TIMER1_COUNTER.w += pMB->sy6522.TIMER1_LATCH.w;

In fact leaving the 128 cycle check in (and the above fix) does produce a TIMER interrupt in sync with the VBL, except that the first 1-128 cycles will show as black (not white) due to the reduced accuracy in detecting the TIMER interrupt.

@Archange427

This comment has been minimized.

Copy link

commented May 31, 2019

I plan to write a routine that will use 3 "cascading" interrupts where the last one value delay INT will be only a few cycles (dunno exactly how much -for now- but less than 65 because it will have to take place before the end of one line.)
The goal is to get an exact cycle synchronization at the beginning of EACH display no matter what you do during the VBL.
For example, for the Crazy Cycles I and II, once I got the synchronization, I did not have to lose it anymore and I had to count all the cycles during VBL and DISPLAY for each refresh.
But with a PT3 player for example, the execution time is not at all the same depending on the ticks.
And to make the player "constant cycle" will be a nightmare.

I confess that I do not understand everything about the fixes that you intend to make to AppleWin (in this thread and the #612) but will AppleWin be accurate with this kind of routine?

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented May 31, 2019

Thanks for sharing your interrupt strategy, as this makes a clear case for cycle-accurate interrupts.

The fix I plan will support your cascading interrupts and short (<65 cycle) interrupt.

I guess you'll use the 6522's TIMER1 and TIMER2? And both 6522's, to give you 3 interrupt sources?

(I've noticed there's an AppleWin bug when both TIMER1 and TIMER2 underflow at the same time, the TIMER2 interrupt won't trigger an interrupt. I'll fix this too.)

@Archange427

This comment has been minimized.

Copy link

commented May 31, 2019

Actually, I plan to use just one TIMER. When INT1 occurs, inside the IRQ routine, I will redefine vectors (to point to a new IRQ routine) and set a new delay. And then, when INT2 occurs, I do the same to go to INT3.
So no need for different TIMERS...
For the moment, it's just a theoretical idea, I just started the code and it will have to find the right values for the delay of the INT 2 and 3.
And to be totally honest, this idea is - largely - inspired by what is done on C64 to obtain "a stable raster"!
Except that we have, alas, not really the same interrupt capability...

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

Implementation note:
Currently 65(C)02 emulation has this macro after every opcode:

#define CYC(a)	 uExecutedCycles += (a)+uExtraCycles; g_nIrqCheckTimeout -= (a)+uExtraCycles;

But since normal speed now calls CheckInterruptSources() after every opcode, the g_nIrqCheckTimeout code is redundant for the normal (ie. non-full-speed) case.

If we still want the full-speed case to only periodically call CheckInterruptSources(), then assuming 3 cycles on average per opcode, just count opcodes and call CheckInterruptSources() after ~40 opcodes.

TODO: measure the impact in full-speed of calling CheckInterruptSources() after every opcode.

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

Some quick benchmarking 1.28.5.0 against 3a41061:

Config:

  • Enhanced //e.
  • No cards in slot 4, 5, 7.
  • AMD Phenom II based machine on Win7-64.
  • Video mode = 50% Color (RGB Monitor) (x1 Windowed), no vertical blend

Looking at "Pure CPU MHz (video update)"

  • 1.28.5.0: mean=17.1MHz (ref: #424)
  • 3a41061: mean=16.3MHz

So a 0.8MHz (or ~5%) drop when checking interrupts after every opcode. In fact this will be worse when Mockingboards are installed as there will be checks on the 6522's (but #612 will help).

Looking at "Pure CPU MHz (full-speed)"

  • 1.28.5.0: mean=233.8MHz (ref: #424)
  • 3a41061: mean=223.3MHz

Again a ~5% drop.

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

TODO: measure the impact in full-speed of calling CheckInterruptSources() after every opcode.

ie. changed the CYC(a) macro, and completely removed all refs to g_nIrqCheckTimeout - fs_irq1.patch.txt

Looking at "Pure CPU MHz (video update)"

  • mean=16.9MHz

Looking at "Pure CPU MHz (full-speed)"

  • mean=206.2MHz

NB. the full-speed number is more important as this mode is actually used; whereas the "video update" number isn't needed, since cycle-accurate emulation is currently capped at 3.9MHz.

@Archange427

This comment has been minimized.

Copy link

commented Jun 1, 2019

So does this mean that checking the interrupt at each opcode is OK and does not impact the execution speed too much?
And what about checking interrupt at each cycle like the 6502 does ? ;)

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented Jun 2, 2019

So does this mean that checking the interrupt at each opcode is OK [?]

Yes, it looks OK, but I want to run the complete set of regression tests to be sure.

and does not impact the execution speed too much?

Right: at normal speed (1.0 - 3.9MHz) you won't notice this extra overhead; and at full-speed then there's a small (~5%) speed hit. (See the AppleWin.CHM's appendix for details on full-speed mode.)

And what about checking interrupt at each cycle like the 6502 does ? ;)

Actually, opcodes are atomic, so the interrupt only occurs after the opcode has completed.

@Archange427

This comment has been minimized.

Copy link

commented Jun 2, 2019

I was misunderstood. The Interrupt only occurs after the current opcode has completed. But...
the cycles elapsed at the moment when the 6502 Interrupt Sequence starts are different according to the opcode executed when the interrupt takes place.

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented Jun 2, 2019

Some quick benchmarking 1.28.5.0 against 1f2dc6e:

The patch (for full-speed mode) only does interrupt checking every 40 opcodes.

Config (same as before):

  • Enhanced //e.
  • No cards in slot 4, 5, 7.
  • AMD Phenom II based machine on Win7-64.
  • Video mode = 50% Color (RGB Monitor) (x1 Windowed), no vertical blend

Looking at "Pure CPU MHz (video update)"

  • 1.28.5.0: mean=17.3MHz (regenerated results today)
  • 1f2dc6e: mean=16.9MHz

So a tiny bit slower.

Looking at "Pure CPU MHz (full-speed)"

  • 1.28.5.0: mean=238.9MHz (regenerated results today)
  • 1f2dc6e: mean=251.1MHz

So a ~5% improvement! (Probably due to being more efficient to count opcodes instead counting cycles after every opcode.)

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented Jun 2, 2019

@Archange427 -

But... the cycles elapsed at the moment when the 6502 Interrupt Sequence starts are different according to the opcode executed when the interrupt takes place.

Assuming the 6522 TIMER1 count underflows mid-way through the opcode: then when the opcode completes, the TIMER1 count gets reset, decremented by a few cycles and the interrupt occurs.

This is now correctly accounted for in 941ef46.

@Archange427

This comment has been minimized.

Copy link

commented Jun 2, 2019

OK, it is time to clarify a little more my idea:
What if I manage to generate an interrupt during a suite of NOP.
Normally with a "real" 6502, when I get to the first instruction of my interrupt routine (the one defined in $FFFE/$FFFF), it happened exactly:

  • 2 OR 3 cycles depending on where exactly the interruption occurred (during or between two NOPs)
  • 7 cycles from the 6502 interrupt sequence (always).
    -> so at that exact moment 9 OR 10 cycles will have elapsed. No more no less.

Will be AppleWin accurate with this behavior ?

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented Jun 2, 2019

Yes, AppleWin will behave as you describe... and I've built a new AppleWin 1.28.6.0 here.

@Archange427

This comment has been minimized.

Copy link

commented Jun 2, 2019

Thanks Tom, I will test that ASAP!

@tomcw

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

Reply from @Archange427 via email:

I also tested the new version and it works perfectly with the test disk I sent you (but I guess you already know;)

Thanks for confirming. (Closing)

@tomcw tomcw closed this Jun 3, 2019

@tomcw tomcw added this to the 1.28.6 milestone Jun 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.