-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add c65 sub for py65 with blockio + make ctests (> 100x faster on current test suite) #37
Conversation
I'll want to see if this passes the Klaus Dorman (https://github.com/Klaus2m5/6502_65C02_functional_tests) test suite, which is a pretty exhaustive test for 65(C)02 simulators, before switching to it, but it looks pretty good. I'm happy to see a compatible license for fake65c02.h (Tali is also CC0/Public Domain). Do you know what the licensing status of c65.c is? Is that something you wrote? It looks like the cycle counting could be implemented easily to allow the entire test suite to be run. I had to add it in python to py65, and I think I see where it could be added in C. How did you feed the tests to c65? In the Makefile, is the executable named c65 or prof65 (or am I missing something there)? |
Sorry I wrote I ran and compared test output like this:
The diff looks good ignoring some header comments between tests, and the cycle counting. Cycle count is available as It would be trivial to write that to a some fixed memory[] location as a double word if you wanted to access it from forth within the simulator, or vice versa you could add a magic memory location which the sim could hit to log the current cycle count. I didn't look at how the cycle counting is currently working in the tests. |
I don't think I'm going to have enough free time to get to testing this thoroughly this weekend, so this will PR will likely stay open for a few weeks. You're welcome to push more updates in the meantime. This will be adding an extra dependency (A C compiler - which is a given on Linux, but not on Windows and I don't think on Mac (unless you've installed the Xcode tools)), but it may be worth it for the speedup. Tali in py65mon works on Linux, Mac, and Windows (tested native and in Cygwin - haven't tested in WSL yet) on both 2.7 and 3.x flavors of Python. I know Windows, especially, is funny about its console I/O, so I'll want to make sure that works as expected. What platform are you working on? I have access to Windows 10, 11, and Linux. |
The license for Tali is CC0 Public Domain. Ideally all of the software that Tali comes with would have this license, although we do have one item in the forth_examples folder that has a different license. If you are OK with that, then CC0 is the license I'd like to use. If you want to see how the cycle count works, you can look in tests/talitest.py, which extends py65mon. The cycle counting uses addresses $F006-$F00B (I think just just fits without overlapping your block I/O), with a read to $F006 starting the cycle counting, a read to $F007 stops the cycle counting and then the result can be read from $F008-$F00B, but it's in NUXI format (you can ask the interwebs about nUxi endianness if you are not familiar). You can look at I also overrode the py65mon I/O so I could spoon feed from the test files and capture the results. The tester program checks to see if Tali crashed before reading all of the input and also searches for specific error messages in the output and displays a summary of what went wrong at the end, after running all of the tests. I think all of this can be rewritten in C, and we can have one version for general use and another that will be augmented for running the tests (perhaps c65 and talitest as the executable names) |
I'm on mac os x |
Oh good - OSX is the only platform I don't have access to. |
Added test headers plus cycle counting magic for c65. Now I looked manually at 0=. The diff says:
I count 43 (beq taken) or 45 (beq not taken) plus 4 for one of magic reads (lda $abs) which is 47 or 49. So different from both reports :) .a790 xt_zero_equal: 40+3 or 38+7
.a790 20 25 d8 jsr $d825 10+6 jsr underflow_1
.a793 b5 00 lda $00,x 4 lda 0,x
.a795 15 01 ora $01,x 4 ora 1,x
.a797 f0 04 beq $a79d 2+ beq _zero
.a799 a9 00 lda #$00 2 lda #0
.a79b 80 02 bra $a79f 3+ bra _store
.a79d _zero:
.a79d a9 ff lda #$ff 2 lda #$ff
.a79f _store:
.a79f 95 00 sta $00,x 4 sta 0,x
.a7a1 95 01 sta $01,x 4 sta 1,x
.a7a3 60 rts z_zero_equal: rts 6 It's slightly weird how pymon codes BEQ vs BRA - they should both be 2+branch taken+page cross so 3+ for BRA and 2++ for BEQ?
|
I agree that the cycle count for BRA looks wrong. You can file that as an issue with py65, but it may take a while for Mike to get to it. It looks like he hasn't been working on py65 recently. The cycle counts are mainly a double check that things didn't get radically slower (or faster), which might indicate that something was broken while making a change. I'm not concerned with the exact values, as some of them will change any time the code moves around and different words end up crossing a page boundary (as you've already seen). I've been able to play a bit with c65, and I have the following notes: Getting it to work character by character will be a hassle if you want it to work cross platform because Windows and Linux do that fundamentally differently, and OSX has some differences to Linux as well. You end up writing special code for all three platforms. It's doable (I did it for py65mon), but it's a real hassle and requires digging into some nitty gritty details, especially if you want to switch back to line editing mode. If the input were handled, I could see that it shouldn't take too much effort to bring it up to approximately py65mon levels of functionality. The part that is lacking is that the current test setup can generate a summary at the end that repeats the errors from all failed tests, as well as telling if Tali did not finish the tests. The former could just look at the output of c65, but the latter requires some method of telling if the tests did not finish. The most common reason for stopping early is hitting a BRK instruction - usually when something horribly breaks and the PC ends up in a place it's not supposed to be. Your current testing solution uses pipes, but Windows does pipes differently enough that it may not be a good fit here. Adding an option c65 to feed (multiple) files as input to Tali would solve that issue. Are you thinking that c65 is a solution to make just the testing for Tali2 go faster, or are you thinking of it as a total replacement for py65mon? |
I don't think of c65 as a full replacement for py65mon - for example its monitor functionality for examining/changing registers and memory etc is great for debugging. More like an optional add-on which can streamline your workflow, and helpful if you want to play more with block devices. In my own workflow c65 is great for speeding up iteration experimenting with tali source: super fast to run all tests and get a quick "all success?" indicator. I can always run same tests slowly if I need more granularity. I also find it very useful for my forth dev cycle where I can ingest a large volume of forth code v quickly and experiment with new changes. That gets painful in py65. Also I find the block device useful for loading/dumping code or memory easily from forth. For these purposes I don't mind the input duplication, and i'm happy to have the line-editing while i'm experimenting. Since it's a tty thing it doesn't affect batch execution when I pipe input into c65 so I find that's find for checking test output and so forth. If you think the input duplication is important I could take a look at an option to bypass the terminal line mode; i have access to a windows box so probably doable. For testing I just quickly hacked up the |
That sounds fine. Let's plan on leaving If we can get c65 to where the output of If you are interested, I think it's only a medium amount of work to get feature parity with py65mon, at which point we could consider removing py65mon as a requirement for Tali2 and adding a C compiler as a requirement instead. For Windows folk, we could either give cygwin instructions or WSL instructions. The default Ubuntu that Microsoft installed when you turn on WSL might already have Are you interested in going this route? I can help with the I/O. Most of the python code I wrote for py65mon is actually just calling the underlying C functions, so much of that is reusable here. Also, if you are interested in going this route, do you want to set up c65 as a separate project or would you rather leave it here as part of Tali2? If it's here as part of Tali, then we would only need to support it for use with Tali, which might make things a bit easier. If the remaining utilities were rewritten in C, we could also remove the python requirement altogether. I'm not adverse to that. I don't have a full idea of exactly what I need py65mon to do to get |
i will probably poke around with this in the next couple of weeks but might not get to it right away |
@SamCoVT here's a first pass at non-blocking, unbuffered IO for c65. I haven't done extensive testing but it builds and seems to work on my mac without duplicated text or mangling the terminal. I also ssh'd to my windows 10 box and used wsl + ubuntu to build and run there which also seems to work without changes (somewhat to my surprise; see c65/README). Not sure if that's what you intended or looking for a native windows exe. lmk and we can figure out what makes sense next |
Works on Linux as well. I think there is enough functionality now (ability to load binary at arbitrary location and start running at arbitrary location) to run the Dorman suite if I enable the I/O. I'm not sure when I'll have time to do that. It always seems to take me multiple tries to get it assembled and running properly. |
One extra thing I did here since it's easy with the new IO was add a non-blocking peekc location at $f005, so people could fool around with KEY? if they wanted. I changed to just one address to move the whole IO block since I've never needed to move individual magic locations on their own, but I do move the whole block. See updated README.
|
also just for amusement here's a proof of concept replacing py65 with c65 within talitest.py The tick count at the end claims 147.5M cycles for all tests in 1.1s => 134MHz ? I think I had a '286 machine @ 133Mhz back in the day :-)
|
This looks really good. I'm not sure if I'll have time this weekend to get the Dorman tests to run, but that's the last piece before I'm interested in adding this as a complete replacement for the current test system. We will also want the Makefile to build c65 if it doesn't exist, which is slightly complicated by Windows wanting a .exe extension on the end (does compiling under WSL result in an extensionless binary? I don't actually know). |
ya, no extension in wsl, the existing c65 makefile worked unchanged. so
top-level should just need to try a sub-make in the c65 folder
…On Fri, Apr 5, 2024 at 11:58 AM SamCoVT ***@***.***> wrote:
This looks really good. I'm not sure if I'll have time this weekend to get
the Dorman tests to run, but that's the last piece before I'm interested in
adding this as a complete replacement for the current test system. We will
also want the Makefile to build c65 if it doesn't exist, which is slightly
complicated by Windows wanting a .exe extension on the end (does compiling
under WSL result in an extensionless binary? I don't actually know).
—
Reply to this email directly, view it on GitHub
<#37 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABA5MKXLK7TMXCUHJ6PFC3LY33C37AVCNFSM6AAAAABEL4X7XCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBQGE3DEMBXGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
That's good news. Recommending WSL for windows users who need to run tests is probably the easiest way to get GNU Make and a C compiler on a windows box. Those who have installed make and gcc natively on Windows and have them working from a command prompt can probably also handle adjusting the Makefiles as needed. |
Here's a c-based substitute for py65mon which also supports blockio and heatmap profiling. First build it (requires
gcc
):Now run the regular interactive taliforth session:
You can run the test suite with
make ctests
which will producetests/results.txt
in about 1.2s compared to 160s on my macbook (100x faster).git diff
on the results seem very close totalitest.py
other than the header lines between files, and the cycle counting stuff at the end (which could probably be extracted here).As a bonus you get read/write coverage data dump so you can see hotspots (see
c65/profile.ipynb
as an example).The blockio support is explained in the README, but it's very easy. I use it like this
c65/c65 -r taliforth-py65mon.bin -b data.blk
wheredata.blk
is a file with binary data, then this word: