Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6522 TIMER1 counter write and/or read is 1 cycle out #701

Open
tomcw opened this issue Oct 13, 2019 · 15 comments
Open

6522 TIMER1 counter write and/or read is 1 cycle out #701

tomcw opened this issue Oct 13, 2019 · 15 comments
Labels
bug
Milestone

Comments

@tomcw
Copy link
Contributor

@tomcw tomcw commented Oct 13, 2019

Attached is a small bit of code that does:

  • Set TIMER1 for polled mode (ie. not interrupt)
  • Set COUNTER=0x100
  • Poll 6522.IFR until bit6 (TIMER1) is set. Count number of loops in X
  • On underflow, read COUNTER to Y:A
  • Then check: X==0x14, Y=0xFF, A=0xED
  • If all values match then print "OK" at $400, otherwise BRK

This works "OK" on AppleWin.
But on a real Apple II, the value for A is 0xEE.

So perhaps 1 extra cycle to start the 6522 counter after writing to $C405 / TIMER1H_COUNTER?
NB. Both LDA abs16 and STA abs16 read/write the data on the 4th cycle.

In fact, this is the code from this AppleWin-test: AppleWin/AppleWin-Test@549e0ff

@tomcw tomcw added the bug label Oct 13, 2019
@tomcw

This comment has been minimized.

Copy link
Contributor Author

@tomcw tomcw commented Oct 13, 2019

So perhaps 1 extra cycle to start the 6522 counter after writing to $C405 / TIMER1H_COUNTER?

No. From the Rockwell 6522 timing diagrams Fig.15 and Fig.16, countdown starts immediately after the T1C_H write.

So perhaps when reading T1C_L (ie. A-reg above), the 6502 reads the value before the 6522 has decremented T1C_L that cycle?

@tomcw

This comment has been minimized.

Copy link
Contributor Author

@tomcw tomcw commented Oct 13, 2019

Another thought:

LDA #4; STA $C404  ; T1C_L=4
LDA #0; STA $C405  ; T1C_H=0

AppleWin does the I/O on the 1st STA cycle, so after executing the STA $C405, it will decrement the counter by 4, and then assert the IRQ.

But a real 6502 will have T1C=0x0004 after completing the STA $C405, so you could execute a 2, 3 or 4 cycle opcode before T1C underflows and IRQ is asserted. I should check this.

@Archange427

This comment has been minimized.

Copy link

@Archange427 Archange427 commented Oct 14, 2019

My two cents (and that's not worth more ;):
it's probably a simplistic view of how an emulator and AppleWin works, but why not "just" add a table that would give the number of cycles to add for each instruction to get when the real write/read cycle occurs?
For an LDA / STA abs, this would be +3, for an LDA (ind) + 5
(in reference to cycle 1 used by AppleWin).

Would that really inflate the execution time?

There are several issues / enhancement in progress that are related to this feature ( #699, #665, this one and probably others to come ... see my "v2" test program)

@tomcw

This comment has been minimized.

Copy link
Contributor Author

@tomcw tomcw commented Oct 14, 2019

Simplified repro:

LDA #$FF
STA $C404 ; T1C_L
STA $C405 ; T1C_H
LDA $C404 ; T1C_L
BRK

300:A9 FF 8D 4 C4 8D 5 C4 AD 4 C4 0

Results:

  • AppleWin: A=FB
  • real unenhanced A//e (PAL): A=FC
@tomcw

This comment has been minimized.

Copy link
Contributor Author

@tomcw tomcw commented Oct 14, 2019

AppleWin:

  1. STA: addr_l (0x05)
  2. STA: addr_h (0x04)
  3. STA: 0xC405 on address bus / T1C=FFFF
  4. STA: 0xFF on data bus / End of opcode: T1C=FFFF-4=FFFB
  5. LDA: addr_l (0x04)
  6. LDA: addr_h (0xC4)
  7. LDA: 0xC404 on address bus
  8. LDA: 0xFB on data bus / End of opcode: T1C=FFFB-4=FFF7

Real hardware:

  1. STA: addr_l (0x05)
  2. STA: addr_h (0x04)
  3. STA: 0xC405 on address bus
  4. STA: 0xFF on data bus
  5. LDA: addr_l (0x04) / T1C=FFFF
  6. LDA: addr_h (0xC4) / T1C=FFFE
  7. LDA: 0xC404 on address bus / T1C=FFFD
  8. LDA: 0xFC on data bus / T1C=FFFC
@Archange427

This comment has been minimized.

Copy link

@Archange427 Archange427 commented Oct 14, 2019

REAL A2:

STA: addr_l (0x05)          
STA: addr_h (0xC4)                      
STA: 0xC405 on address bus              
STA: 0xFF on data bus / End of opcode:  / T1C = FFFF <- effective write 

LDA: addr_l (0x04)                      / T1C = FFFF <- still the same $FFFF
LDA: addr_h (0xC4)                      / T1C = FFFE
LDA: 0xC404 on address bus              / T1C = FFFD
LDA: 0xFC on data bus                   / T1C = FFFC <= effective read

APPLEWIN:

STA: addr_l (0x05)                      / T1C = FFFF <- effective write
STA: addr_h (0xC4)                      / T1C = FFFE
STA: 0xC405 on address bus              / T1C = FFFD
STA: 0xFF on data bus / End of opcode:  / T1C = FFFC

LDA: addr_l (0x04)                      / T1C = FFFB <== effective read
LDA: addr_h (0xC4)                      / T1C = FFFA
LDA: 0xC404 on address bus              / T1C = FFF9
LDA: 0xFx on data bus                   / T1C = FFF8 
@tomcw

This comment has been minimized.

Copy link
Contributor Author

@tomcw tomcw commented Oct 14, 2019

Hi Arnaud - yes, AppleWin vs real h/w is a little different!

Do you think this is related to the issue(s) with Mad Effect demo? (#699)

why not "just" add a table that would give the number of cycles to add for each instruction to get when the real write/read cycle occurs?

Before I start looking for a solution, I want to understand the problem space first.

@Archange427

This comment has been minimized.

Copy link

@Archange427 Archange427 commented Oct 14, 2019

Hi Tom,

As I needed to add 6 cycles (which is huge, as AppleWin is now accurate for page and modes change), I think that indeed, AppleWin "wins" cycles here compared to a real hardware (at least for a part).
It's at least an intuition!

But in my T1C count from above, there are two possibilities to explain this difference by one cycle (with $FC as result with a real A2).

  • the counter does not start immediately between the STA and the LDA (so T1C is still $FFFF at the first cycle of the LDA). As I wrote...
  • the counter start immediately (so at the first cycle of the LDA, T1C = $FFFE) but when the read is effective, the value returned is T1C+1 (actually the value of the previous cycle).

We need another code to validate one or the other of these two possibilities.
The two seem logical to me, each in their own way;)

@tomcw

This comment has been minimized.

Copy link
Contributor Author

@tomcw tomcw commented Oct 15, 2019

Hi Arnaud,
In your v2 main.a code, you have this comment:

            ; define DELAY for INT1
            ; PAL delay = 65*(192+70+50) = 20280
            ; -2 (6522 takes 2 cycles to generate INT)
            ; = 20278 = $4F36

Why does "6522 takes 2 cycles to generate INT" ?

@tomcw

This comment has been minimized.

Copy link
Contributor Author

@tomcw tomcw commented Oct 15, 2019

Is this just the N+2 cycles, as discussed in #652 ?

@Archange427

This comment has been minimized.

Copy link

@Archange427 Archange427 commented Oct 15, 2019

Is this just the N+2 cycles, as discussed in #652 ?

Yes that's it!
I know that my comments are not always very precise or very technical ;)

@tomcw

This comment has been minimized.

Copy link
Contributor Author

@tomcw tomcw commented Oct 19, 2019

Here's where I am currently:
(I'm using you v2 main.a code to debug this.)

TL;DR: You are doing a 3-way sync with the 6502, video-scanner & 6522!

In main.a, line 170 ("ici synchro précise => dernière ligne VBL cycle 0") you have synced the 6502 & video-scanner, then you write T1C_L/H with 0x4F36... getting all 3 in sync.

AppleWin currently has a bug where it does this for the 4-cycle STA $C405 (a STY in your case):

  • STA opcode: set T1L_H, and copy T1L to T1C
  • call MB_UpdateCycles(), which does:
    • update T1C = T1C - 4

But now there will only be 0x4F32 cycles until the TIMER1 interrupt is asserted - so we lost 4 cycles!

So I can "defer the copy of T1L to T1C" until after the STA opcode (or 1st cycle of next opcode), eg:

  • STA opcode: set T1L_H
  • call MB_UpdateCycles(), which does:
    • update T1C = T1C - 4
    • copy T1L to T1C

Because of this "defer", now the 6502 and 6522 are correctly in-sync at a cycle level. So I can no longer do 6502 emulation of 6522 I/O operations on the opcode's 1st cycle. (NB. 1st cycle for video-scanner is still fine.)

This means that the SBC $C404 (in your interrupt handler) also has to now happen on the opcode's 4th cycle (not the 1st cycle).

As for the other missing 2 cycles, I still need to figure this out. It could be the 6522 TIMER1 'N+2' thing above, but I'm already compensating this this (ie. #652), so perhaps something else? (eg. 6502 cycle emulation bug? Unlikely though.)

@Archange427

This comment has been minimized.

Copy link

@Archange427 Archange427 commented Oct 19, 2019

a few comments:

  • unlike the sync code ('v1') used in MAD EFFECT, the v2 contains a fix for the emulators in two parts: adding 6 cycles AND changing the value of the second sub (line 208).
    I tested these values empirically until my code worked with AppleWin!
    But I never be able to get a working version with only one change as for MAD EFFECT (v1).

  • is AppleWin accurate with the 6502 Interrupt sequence (=> 7 cycles) ?

  • the v2 compensates for the cycles spent when the interrupt occurs during any opcodes. So even if the value returned by AppleWin is not the right one here (compared to a real A2), the code would be OK.
    But the value must be in the range of 2 to 9 cycles. Otherwise, it does not work anymore... Is that the case ?

@tomcw tomcw added this to the 1.29.6 milestone Nov 18, 2019
@tomcw

This comment has been minimized.

Copy link
Contributor Author

@tomcw tomcw commented Nov 18, 2019

With 1.29.6.0, the code at the start of this issue is now worse under emulation (due to the fix for this issue, ie. delaying the load of T1C with T1L until after the opcode completes).

  • AppleWin 1.29.4.0: A = 0xED
  • AppleWin 1.29.5.0 (internal): A = 0xED
  • AppleWin 1.29.6.0: A = 0xEA
  • Real Apple II, A = 0xEE.

But the polling of $C40D (IFR) still occurs on the opcode's 1st cycle (in AppleWin), whereas for LDA $C40D, it should occur on the 4th (3 or 4 cycles?). NB. 0xEE-0xEA = 4 cycles.

None of the FT demos depend on this behaviour, but I have this code as a regression test, and it's currently failing :-/

@Archange427

This comment has been minimized.

Copy link

@Archange427 Archange427 commented Nov 18, 2019

just for information, the "simplified repro version" is now OK with AppleWin 1.29.6.0:
12960

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.