Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

joystick_get() and JOY() are corruptable by irq handler in v39 [includes patch] #203

Closed
irmen opened this issue Apr 14, 2021 · 10 comments · Fixed by #204
Closed

joystick_get() and JOY() are corruptable by irq handler in v39 [includes patch] #203

irmen opened this issue Apr 14, 2021 · 10 comments · Fixed by #204

Comments

@irmen
Copy link
Contributor

irmen commented Apr 14, 2021

I discovered that there is a race condition in the joystick_get() kernal routine (and thus also in the basic JOY() function): it seems that when it gets interrupted by the system's irq handler, the values returned from the routine can be corrupted, thus sending fake joystick triggers to the program.

Here is an attempt to reproduce the problem in a basic program:
image

The result shown is the program running for about a minute while not touching the joypad or keyboard.

This is a tricky issue that we found is very timing sensitive. Sometimes it takes a LONG time before the first ERROR appears. Sometimes you need to tweak the program a bit for instance by introducing a variable. Enabling WARP mode speeds up the process. However in my own program that I work on, the issue almost always immediately and consistently pops up at a certain point int he program, so it is a real problem (for me at least).

The big difference we think from my code vs other people's code, is that I am not reading the joystick status from within my own irq handler but simply from the normal program flow (and system irq handler is still running normally) -- like the basic program above does too. That may explain the fact that nobody else ran into this yet. (assumption)

As far as I was able to test, the issue does not occur on the previous V38 version of the emulator + roms.

A possible fix revolves around disabling IRQ during the joystick_get(). Suggestion:

  • at least make the basic JOY() function do a SEI / CLI to avoid returning corrupted results
  • joystick_get() can't do this because then you can't use it anymore from within an IRQ handler
  • joystick_get() could perhaps check processor status bit to see if it is running inside IRQ handler and if not, do the SEI/CLI.
  • if joystick_get() doesn't get changed to make it robust against this, at least the documentation should mention that user code should SEI/CLI if not calling from an irq handler
  • or perhaps whatever it was that got changed from v38 to v39 can be changed back...
@Elektron72
Copy link
Contributor

Could not replicate with x16-rom HEAD at 5496e0b and x16-emulator HEAD at 8852ddd, running the test program on warp mode for over a minute. Aside from a few errors being printed at the beginning of the program (as a result of the enter key being pressed), no incorrect inputs were detected. Looking at the code in joystick_get (located in kernal/drivers/x16/joystick.s), it seems unlikely that there could be a race condition, as all it does is retrieve values stored by joystick_scan. If there is an issue, it is likely in joystick_scan or the emulator itself.

@irmen
Copy link
Contributor Author

irmen commented Apr 15, 2021

@Elektron72 yeah we found it's pretty timing critical in the basic program. @indigodarkwolf had to change the program around a little bit to trigger the error as well.

I still have to isolate the issue from my game's code, where it pretty much always triggers consistently. I solved it (for now) by doing a SEI/CLI around joystick_get() so honestly I don't think the issue in is another routine

@indigodarkwolf
Copy link
Collaborator

@Elektron72 My theory is that kvswitch_tmp1 is being clobbered by code that runs during the interrupt, resulting in the A register getting clobbered if execution is interrupted during the sensitive window of time.

This is a very small window of time, but small just means it's more sensitive to exact timing. Small does not mean impossible.

@indigodarkwolf
Copy link
Collaborator

I notice that KVARS_START and KVARS_END employ sei to prevent corruption from untimely interruption. Perhaps joystick_get's usage of KVARS_START_TRASH_NZ is ultimately wrongheaded, and it simply needs to use KVARS_START and KVARS_END, which are interrupt-safe.

@greg-king5
Copy link

greg-king5 commented Apr 15, 2021

I notice that KVARS_START and KVARS_END employ sei to prevent corruption from untimely interruption. Perhaps joystick_get's usage of KVARS_START_TRASH_NZ is ultimately wrongheaded, and it simply needs to use KVARS_START and KVARS_END, which are interrupt-safe.

Either only a reversion to KVARS_END or a different temporary variable (kvswitch_tmp3) is needed.

@greg-king5
Copy link

greg-king5 commented Apr 15, 2021

By the way, this faster program shows the "error" word more often, for me:

10 IFJOY(2)THENPRINT"ERROR",
20 GOTO10

@Elektron72
Copy link
Contributor

I think it would be a good idea to add a warning to inc/banks.inc that KVARS_START_TRASH_NZ and KVARS_END_TRASH_NZ are not interrupt-safe.

@irmen
Copy link
Contributor Author

irmen commented Apr 15, 2021

I finally isolated the issue into a small assembler program, you can find it below. It prints "!!!" without touching the joypad buttons pretty quickly on my emulator.

➡️ Interesting detail: the problem also doesn't occur if the ROM bank is not set to 0 (to enable kernal rom) but instead simply left at the default....!

.cpu  'w65c02'

* = $4000
	;select rom bank 0 (kernal)
	;if remainig at default, the issue doesn't occur!
	stz  $01

_loop
	lda  #'J'
	jsr  $ffd2  ;chrout
	lda  #' '
	jsr  $ffd2

_joy	
	; sei
	jsr  $ff56   ;joystick_get -- corruptable by IRQ
	; cli
	cmp  #255
	beq  _joy

	lda  #'!'
	jsr  $ffd2  ;chrout
	jsr  $ffd2  ;chrout
	jsr  $ffd2  ;chrout
	lda  #13
	jsr  $ffd2
	jmp  _loop

@irmen
Copy link
Contributor Author

irmen commented Apr 16, 2021

I can try to make it into a PR but i'm not really familiar with those kernal source macro's....

@irmen
Copy link
Contributor Author

irmen commented Apr 17, 2021

thanks @Elektron72 for making the patch

@irmen irmen changed the title joystick_get() and JOY() seem corruptable by irq handler in v39 joystick_get() and JOY() are corruptable by irq handler in v39 May 1, 2021
@irmen irmen changed the title joystick_get() and JOY() are corruptable by irq handler in v39 joystick_get() and JOY() are corruptable by irq handler in v39 [with patch] Jun 24, 2021
@irmen irmen changed the title joystick_get() and JOY() are corruptable by irq handler in v39 [with patch] joystick_get() and JOY() are corruptable by irq handler in v39 [includes patch] Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants