Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C128] serial driver random crashes. #2443

Open
Divarin opened this issue Apr 11, 2024 Discussed in #2441 · 22 comments
Open

[C128] serial driver random crashes. #2443

Divarin opened this issue Apr 11, 2024 Discussed in #2441 · 22 comments
Labels
bug maybe not sure if this is actually a bug libs

Comments

@Divarin
Copy link

Divarin commented Apr 11, 2024

Discussed in #2441

Originally posted by Divarin April 10, 2024
It looks like there might be an issue with both the standard serial driver (that used in the sample 'terminal.c') and the swiftlink driver for the Commodore 128 target.

Using samples/terminal.c, if I compile it for C128 it works okay for a while but at some point will crash back to basic.

I've tested this both in VICE and on real hardware.

I have been able to get this to happen by typing a lot of characters (or just holding down one key). It happens on any BBS at random times (including Commodore boards as well as others) but usually only while doing a lot of typing (so if you're just reading posts and occasionally pressing enter to go to the next post then it isn't likely to happen, more likely to happen while writing a post/email)

For some strange reason I have been able to reproduce it more quickly on my own BBS with these specific repro steps (I have tried this on other synchronet boards and it doesn't happen, except randomly while typing a message as described above)

  • Log into mutinybbs.com:2332
  • go to (D) to configure preferences
  • press (T) to configure terminal mode
  • answer (N) for all questions
  • At some point during the questions or while displaying the menu afterwards it crashes.

Again, I'm not saying this only happens on my BBS I'm saying it happens on any BBS but that the quickest repro steps are those listed above, otherwise the longer repro steps are:

  • Connect to any BBS
  • start typing a message (email to yourself maybe?)
  • either type a lot, like 3 or 4 page fulls or just hold down any key and wait
  • eventually it'll crash

I first noticed this happening in my own program using the swiftlink driver (c128-swlink.ser) so as a sanity check I went to the sample (terminal.c) to see if the same behavior was happening there. I didn't change the sample code at all. I compiled it twice; one for C128 and one for C64. The C64 version ran fine but the C128 version would have this random crashing behavior.

P.S. if anyone has any methods that I could provide debugging information such as a stack trace please let me know.

@colinleroy
Copy link
Contributor

One of the first tests you could do is see whether one of the last three optimisation commits to the c128 driver is responsible for that:
https://github.com/cc65/cc65/commits/master/libsrc/c128/ser/c128-swlink.s

It'll require you to build cc65 from git, and do a git bisect or something like that.

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

Okay but before I do that keep in mind this happens when using ser_static_stddrv as well (in addition to the swiftlink driver).

@colinleroy
Copy link
Contributor

It's the same, only statically linked :)

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

I switched to the commit before your optimizations (ffa83c3), I couldn't do a MAKE though (Fatal error: Cannot open input file 'common/asctime.s': No such file or directory)
I tried compiling my terminal program anyway (though probably pointless since I didn't successfully compile cc65)
the issue is still there (but as I said since I can't MAKE, probably isn't using the older driver).

@colinleroy
Copy link
Contributor

colinleroy commented Apr 16, 2024

Indeed, that didn't change a thing. To fix compilation, do:
make clean all && sudo make install

Also, uninstall the cc65 package from your distro to make sure you build using the correct one.

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

Okay thanks for that. Well I went as far back as 008b4c4 and the issue remained. I also added checking cc65's and cl65's --version into the process just to be sure I was compiling / linking under the intended version.
So it doesn't look like this issue was caused by your optimizations.

It's possible this never worked, after all it appears to work at first, I mean I spent a good 45 minutes on particles reading messages with no issue it wasn't until I started writing one that the issue popped up.

@colinleroy
Copy link
Contributor

Thanks for confirming that. Sadly that doesn't narrow your issue down... You'll probably have to figure a way to trace execution, look at the registers, etc, basically setup a good enough debugging. I don't know what you can use for that on the c128 platform, maybe vice or MAME?

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

Yeah that's a bit over my head, I'm not an assembly programmer (hence me using C for my project instead of assembly). For now my only path forward is to target c64 instead and try to use the VDC and other 128 specific functionality (such as numeric keypad) while in 64 mode. All do-able but since the term program I'm trying to write is specifically for the 128 it would make more sense to target the 128.

Well thanks for taking a look.

@mrdudz
Copy link
Contributor

mrdudz commented Apr 16, 2024

I'd first try if the same thing happens with the respective C64 drivers - and if so, debug those first. C128 should use the same code, but adds some additional quirks and problems, so takkle that later :)

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

I tried the same code targeting c64 and that worked without issue.

At first I was working with my own code but when this issue popped up I started doing the testing with the sample program "terminal.c", same issue there. I didn't change that code at all from the sample. I built it once for c128 and once for c64. The c128 build exhibited the crashes and the c64 build did not.

In my own terminal program I have used the c64 driver (c64-swlink.ser) and targeting c64 this works fine (in 64 mode of course) but I can't use this driver while targeting c128. I tried and as soon as I try to send anything (press a key) then the program breaks into machine code monitor.

@mrdudz
Copy link
Contributor

mrdudz commented Apr 16, 2024

OK - you should open an issue about this then... Unfortunately quite tricky to debug - if you can provide a reliable way to reproduce the problem, please describe that as well. My guess would be some kind of problem related to the banking on C128 - but this would really need a C128 expert to look at.

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

This is an open issue (2443). I provided repro steps in my initial message.

Actually there's two repro steps, I can reproduce it on any BBS eventually by doing a lot of typing but for whatever reason on my BBS in particular it is more easily reproduced, also I've been mostly testing there because I figured if I'm going to be constantly crashing my session and leaving the BBS hanging I'd rather do that to my own than someone else's.

@mrdudz
Copy link
Contributor

mrdudz commented Apr 16, 2024

Ah ok, i find it really confusing to have both this discussion and that issue :)

As for reproducing... anything that involves "BBS" - or even any sort of remote computer - is not going to help much unfortunately, whoever is going to look at this probably prefers something that works completely local.

@skeetor
Copy link

skeetor commented Apr 16, 2024

If you want to debug this with Vice on Windows I recommend to use the VICE-Win-3.2-x64 version, as this is the last one which has the built in debugger included.

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

I have been using a mixture of real hardware and vice (not WinVICE since I'm on linux) but the behavior is the same whether through VICE or real hardware.

I am not able to reproduce it without connecting to something. When I'm testing against my BBS that's local network for me (though not the same machine) so in theory it should work against anything local that you can telnet into.

@mrdudz
Copy link
Contributor

mrdudz commented Apr 16, 2024

If you want to debug this with Vice on Windows I recommend to use the VICE-Win-3.2-x64 version, as this is the last one which has the built in debugger included.

Especially regarding RS232 stuff this is a really really bad idea - since tons of stuff was fixed in that area since 3.2 - And of course recent versions still can do everything - and much more - 3.2 could regarding debugging - it just doesn't have the fancy GUI for (a few) monitor things.

I am not able to reproduce it without connecting to something. When I'm testing against my BBS that's local network for me (though not the same machine) so in theory it should work against anything local that you can telnet into.

I know, it would still be great to find a simple testcase that works like this... perhaps using tcpser, or perhaps even using a little script with netcat (or whatever)

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

Okay I'll try to figure out how to do that. I can connect using netcat (nc -l) but that doesn't echo back anything and so far have been unable to reproduce the issue using this so the crash might be happening in the receive rather than the send. If that's the case I'll need an echo of each character.

@mrdudz
Copy link
Contributor

mrdudz commented Apr 16, 2024

Perhaps try sending a huge binary file (using minicom or so)?

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

neither my term program that I'm working on or the sample (terminal.c) supports file transfers. I wasn't intending on implementing file transfers. I have tried having a few thousand "X"'s in my clipboard and pasting into VICE but that didn't really cause any issues.

@mrdudz
Copy link
Contributor

mrdudz commented Apr 16, 2024

Yeah but you still should be able to use minicom to send a huge file to the emulator.... and at least test the transfer in that direction. It'd be already helpful to know if receiving or sending is the problem for that matter :)

@Divarin
Copy link
Author

Divarin commented Apr 16, 2024

A friend of mine helped me set up something on my linux box, an ncat session I can ATDT into and it just echos back each keypress. But so far I am unable to reproduce the crash this way.

@mrdudz mrdudz added bug maybe not sure if this is actually a bug libs labels Apr 26, 2024
@Divarin
Copy link
Author

Divarin commented Apr 27, 2024

OldWoman37 from the Lemon 64 Forum has found the cause of the issue and a workaround until it's fixed.
Forum thread: https://www.lemon64.com/forum/viewtopic.php?p=1019020#p1019020

For completeness I'll quote her here:
"Okay. I was able to see the program crash and I back traced it. Whenever something comes in on the RS232 device (ie. swiftlink) an NMI interrupt is initiated. By default, the c128 checks to see if it was associated with the stock RS232 code (CIA#2) and then it checks to see if you pressed RUN/STOP. An NMI is also caused when you press the RESTORE key.
What appears to be happening here is that when the NMI occurs, sometimes, when it checks for the RUN/STOP, it detects it even if you didn't press it. Well that is what happened to me. I was just pressing a bunch of text and then it was like I hit RUN/STOP-RESTORE. You can see the check happen at $f63d and it gets up to $f65b where it stores a $7f at $91 which is the flag for RUN/STOP being pressed.
This detection can be an artifact if you just using the keyboard. Since the keyboard is scanned, it is possible to generate the wrong signals especially if the NMI happens during a standard keyboard scan. Now the 128 kernal is careful not to do this if it is an NMI associated with the RS232; it just processes it and leaves. But if the RS232 is not the source, it checks the RUN/STOP.
In your case, the stock NMI handler doesn't know your are doing RS232 stuff. If just thinks you are hammering the RESTORE key up to 960 times a second. So eventually, it will get a false read on the keyboard when it checks for the RUN/STOP keypress.
The best way to avoid this is to stop having the NMI handler run this code. Since the NMI can happen so often, you should avoid all the stock stuff the 128 does. My suggestion is to write a $33 to $318, and a $FF to $319 before you initialize the swiftlink stuff. Right now, when the swiftlink code is done running, it just goes and runs the default handler at $318. By writing these values there before hand, after the swiftlink code is done, it ignores all the other stock stuff like RUN/STOP detection. You won't be able to "RESTORE" back if something goes wrong, but it will be more stable."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug maybe not sure if this is actually a bug libs
Projects
None yet
Development

No branches or pull requests

4 participants