A Haskell 6502 Emulator
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Neskell - A Haskell 6502 Emulator


This repository contains the Haskell source code of an emulator for the 6502 CPU. The goal of the project was to learn more about Haskell, the 6502 and CPU emulation as well as developing an important piece of a future Nintendo Entertainment System (NES) emulator. So far, only the emulation of the NMOS 6502 / 2A03 is largely complete, no work on the PPU / APU has been attempted yet.


(Update: Now also builds with Stack/Cabal)

Tested on OS X 10.6 with the Haskell Platform 2013.2.0.0, building should succeed with a simple

$ make

The only dependency not already provided by the Haskell Platform is the ansi-terminal package, please install it from Hackage using Cabal.



Having comprehensive and easy to run tests is key for developing any emulator. Neskell features a test suite comprised of a number of custom and well-established tests from the community.

There's a nice progress display as the tests run using multiple CPU cores:


The command line interface for running tests is as follows:

Usage: neskell [OPTION...]
  -t[NAME]  --test[=NAME]        run full test suite / all tests containing NAME
  -q        --quick-test         run quick (<1s) test suite
  -l[NAME]  --list-tests[=NAME]  list all tests / all tests containing NAME
  -s        --no-verbose         less flashy test result display
+RTS -? runtime system help, +RTS -N2 use two cores, +RTS -N use all cores

The Makefile will automatically run the quick test suite after each successful build:

time ./neskell --quick-test --no-verbose +RTS -N -s
All 31 tests OK

The time and -s RTS option should help detect performance regression during development as well.


The most powerful tool to debug why a given test fails is looking at the emulator's trace log. The build-in tests have various default levels of tracing, but those can be overridden with the following command line option:

  -r [nbe]  --trace=[nbe]        override test tracing to [n]one, [b]asic, full [e]xecution

Traces for each test go into a per-test log file in the src/trace directory. A basic trace looks as follows:

Trace Log 2013-11-04 13:18:32.526098 CET

--- Branch Pagecrossing Test ---

Load Binary:

02F9 A9 01 D0 00 D0 01 EA D0 00 A9 FF

Setup: FF→SP 24→SR 0000→PC 02F9→PC

Cycles  PC   AC IX IY Status Reg. SP Instr. Operand ILnCycl Op. Load  Stores
Elapsed $PC  $A $X $Y $P:NV1BDIZC $S $I:Mne Data    [OIU]bC $Adr→$Val $Val→$Dst
(Execution trace disabled)

Zero Page:

(All Zero)


(All Zero)

    Stop Reason      [ OpCode(PC) == BRK ]
    Unmet Conditions [ ]
    Met Conditions   [ A == $FF
                     , Cycle ∈ [ 14 , 14 ]
    CPU State        C:0000014 PC:$0304 A:$FF X:$00 Y:$00 SR:$A4:N·1··I·· SP:$FF
    Next Instruction BRK

The emulator APIs allow to set up the CPU and RAM state, set a termination criteria and verify a number of success / failure conditions to be met.

For further debugging, a full execution trace can be enabled with --trace=e (it's already the default for all tests with a short runtime):

Trace Log 2013-11-04 13:28:10.597951 CET

--- BRK Test ---

Load Binary:

0600 4C 06 06 C6 FF 40 A9 03 8D FE FF A9 06 8D FF FF
0610 A9 45 85 FF 00 EA E6 FF 00 EA A9 00 38 00 EA 65
0620 FF 85 FF EA

Setup: FF→SP 24→SR 0000→PC 0600→PC 

Cycles  PC   AC IX IY Status Reg. SP Instr. Operand ILnCycl Op. Load  Stores
Elapsed $PC  $A $X $Y $P:NV1BDIZC $S $I:Mne Data    [OIU]bC $Adr→$Val $Val→$Dst
0000000 0600 00 00 00 24:··1··I·· FF 4C:JMP $0606   O3b3C   -         0606→PC 
0000003 0606 00 00 00 24:··1··I·· FF A9:LDA #$03    O2b2C   -         24→SR 03→A 0608→PC 
0000005 0608 03 00 00 24:··1··I·· FF 8D:STA $FFFE   O3b4C   -         03→FFFE 060B→PC 
0000009 060B 03 00 00 24:··1··I·· FF A9:LDA #$06    O2b2C   -         24→SR 06→A 060D→PC 
0000011 060D 06 00 00 24:··1··I·· FF 8D:STA $FFFF   O3b4C   -         06→FFFF 0610→PC 
0000015 0610 06 00 00 24:··1··I·· FF A9:LDA #$45    O2b2C   -         24→SR 45→A 0612→PC 
0000017 0612 45 00 00 24:··1··I·· FF 85:STA $FF     O2b3C   -         45→00FF 0614→PC 
0000020 0614 45 00 00 24:··1··I·· FF 00:BRK         O1b7C   -         06→01FF 16→01FE FD→SP 34→01FD FC→SP 24→SR 0603→PC 
0000027 0603 45 00 00 24:··1··I·· FC C6:DEC $FF     O2b5C   00FF→45   24→SR 44→00FF 0605→PC 
0000032 0605 45 00 00 24:··1··I·· FC 40:RTI         O1b6C   -         FD→SP 24→SR FF→SP 0616→PC 
0000038 0616 45 00 00 24:··1··I·· FF E6:INC $FF     O2b5C   00FF→44   24→SR 45→00FF 0618→PC 
0000043 0618 45 00 00 24:··1··I·· FF 00:BRK         O1b7C   -         06→01FF 1A→01FE FD→SP 34→01FD FC→SP 24→SR 0603→PC 
0000050 0603 45 00 00 24:··1··I·· FC C6:DEC $FF     O2b5C   00FF→45   24→SR 44→00FF 0605→PC 
0000055 0605 45 00 00 24:··1··I·· FC 40:RTI         O1b6C   -         FD→SP 24→SR FF→SP 061A→PC 
0000061 061A 45 00 00 24:··1··I·· FF A9:LDA #$00    O2b2C   -         26→SR 00→A 061C→PC 
0000063 061C 00 00 00 26:··1··IZ· FF 38:SEC         O1b2C   -         27→SR 061D→PC 
0000065 061D 00 00 00 27:··1··IZC FF 00:BRK         O1b7C   -         06→01FF 1F→01FE FD→SP 37→01FD FC→SP 27→SR 0603→PC 
0000072 0603 00 00 00 27:··1··IZC FC C6:DEC $FF     O2b5C   00FF→44   25→SR 43→00FF 0605→PC 
0000077 0605 00 00 00 25:··1··I·C FC 40:RTI         O1b6C   -         FD→SP 27→SR FF→SP 061F→PC 
0000083 061F 00 00 00 27:··1··IZC FF 65:ADC $FF     O2b3C   00FF→43   44→A 24→SR 0621→PC 
0000086 0621 44 00 00 24:··1··I·· FF 85:STA $FF     O2b3C   -         44→00FF 0623→PC 

Zero Page:

00F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 44


01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 37 1F 06

    Stop Reason      [ OpCode(PC) == NOP ]
    Unmet Conditions [ ]
    Met Conditions   [ $00FF == $44
                     , SR == $24:··1··I··
                     , SP == $FF
                     , Cycle ∈ [ 89 , 89 ]
    CPU State        C:0000089 PC:$0623 A:$44 X:$00 Y:$00 SR:$24:··1··I·· SP:$FF
    Next Instruction NOP

The execution trace has been carefully designed to be as compact as possible while still conveying all aspects of the CPU's state and the executed instructions acting upon it. Cycle data (included penalty cycles for page crossing etc.) as well as memory loads and stores are recorded, helping with comparison against a reference or figuring out when and for which reason execution diverged.

The trace is written to a fixed-size ringbuffer during execution (see RingBufferLog.hs), so the beginning may be truncated for very long sessions.

Note that the emulator recognizes Blargg's test ROMs and will output extra diagnostic information into the trace:

Blargg Test:
    Memory at 0x6001 indicates we are running one of Blargg's test ROMs
    Result Code: 0x00
    Status Text: \n01-basics\n\nPassed\n

Test Suite

All tests are passing and located in src/tests.

  • 6502_functional_tests contains Klaus Dormann's well-known 6502 Functional Tests. It's a very comprehensive test for all of the documented behavior of the 6502. Here's the readme.txt, documenting how Neskell's version of the test was obtained / assembled:
6502 functional tests version 23-jul-2013
Taken from http://2m5.de/6502_Emu/6502_functional_tests.zip

6502_functional_test.bin is 6502_functional_test.a65 assembled with the
AS65 assembler:

as65 -z 6502_functional_test.a65

This gives a single binary image. I used the Linux version of AS65 from:


6502_functional_test.lst is a listing for debugging purposes generated by:

as65 -l -m -h0

The source needs to have to have the following flag set to produce
tighter object code (<16K):

load_data_direct = 0

The binary can then be loaded at 0x400. Set the PC to that address and go.

See here for a hacked version of Visual 6502 and instructions on how to
generate a reference trace log:


The file v6502_first_235k_cycles.zip contains a trace of the first ~235K cycles
from Visual 6502.
  • decoding contains the instruction decoding and disassembly test. The file instr_test.asm is not only a comprehensive test of all instructions, but contains documentation collected from various sources, see the preamble from the file:
Test file containing all possible instruction and addressing mode
combinations, including a description of their operation, binary
representation, effect on CPU registers and execution timing.

This file is basically a carefully merged version of the references below
and used as ground truth during emulator development, also suitable as an
assembler / disassembler test. Nice to have it all in one place, plus each
of these documents has typos, omissions and errors which were discovered
during the N-way merge and some Visual 6502 debugging sessions.

For testing disassembly / instruction decoding, the file 'instr_test.bin' is
an assembled version of this file, and its disassembly should match
'instr_test_ref_disasm.asm'. Those two files were generated using the tools
of 6502js (https://github.com/skilldrick/6502js). The hexdump output
can be converted to a binary with 'cat hexdmp | xxd -r -seek -1536 > bin'.
Note that all instruction arguments are sequential numbers from $00. A
bug in 6502js prevents the relative addressing in branch instructions to
assemble, only labels can be targeted. As a workaround, all branch
instructions target 'lbl' and were later manually fixed in the binary /
disassembly to their correct targets (sequential numbers). There's also a
fixed version available at http://biged.github.io/6502js/. At the end of the
file are also all supported illegal / unofficial opcodes.

References / Sources / Originals:

  • hmc-6502 contains the test suite of the hmc-6502 project. It's not very in-depth and comprehensive, but provided a great first set of tests to run against during development. Tests have been extended and fixed, assembled using 6502js. Also see the notes in the .asm files and the 'readme.txt' from the directory:
All the tests in this directory have been taken from the hmc-6502 project, specifically


The code is licensed under BSD 3 / New BSD License. Many of the tests have been changed / extended.
  • instr_misc contains Blargg's 'NES CPU Instruction Behavior Misc Tests'. The .bin files are extracted raw binaries of just the 6502 code from the test's original .nes files. See the 'readme.txt' (another one in the 'src' subdirectory) for more information on the tests. We only run the tests which can be verified / work on a pure 6502 platform (no other hardware connected).

  • instr_test-v4 contains Blargg's 'NES CPU Instruction Behavior Tests'. These tests verify most instruction behavior fairly thoroughly, including unofficial instructions. All tests passing.

  • nestest contains Kevin Horton's 'Ultimate NES CPU test ROM', a quick running series of CPU tests covering a large amount of documented and undocumented behavior.

  • unit contains (mostly) new tests written just for Neskell. Specifically: - add_sub_cvzn_flag_test.asm Test CV flags as described in http://www.6502.org/tutorials/vflag.html - ahx_tas_shx_shy_pagecross_test.asm Test the pagecrossing behavior of AHX/TAS/SHX/SHY - ahx_tas_shx_shy_test.asm Test AHX,TAS,SHX,SHY (the & ADDR_HI instructions) in all addressing modes - arr_bcd_test.asm Test the decimal mode of the illegal ARR opcode - bcd_add_sub_test.asm Tests from http://www.6502.org/tutorials/decimal_mode.html - branch_backwards_test.asm Simple test checking if we correctly reach branch targets with negative offsets - branch_pagecross_test.asm Test extra cycle added for page crossing in branch instruction - brk_test.asm BRK interrupt test - full_bcd_test.asm Full BCD test from http://www.6502.org/tutorials/decimal_mode.html#B - illegal_bcd_test.asm Test undocumented BCD behavior of ADC/SBC with invalid BCD operands and status of NVZ after BCD operations - illegal_rmw_test.asm Test all illegal RMW opcodes (DCP,ISC,RLA,RRA,SLO,SRE) in all of their addressing modes - illegal_xb_test.asm Test ANC,ALR,ARR,XAA,LAS,AXS (0x*B opcodes, all with AND) in all addressing 3 modes - jump_bug_test.asm Test the page wrapping bug in the indirect addressing mode of the JMP instruction - kil_test.asm Test undocumented KIL opcode - lax_test.asm Test illegal / unofficial LAX opcode - nop_test.asm Test all variants of illegal NOP opcode - sax_test.asm Test unofficial SAX opcode

These references and tests have been invaluable during the development of Neskell, many thanks to all the individuals who spent the time and energy to make them!



Neskell aspires to emulate an NMOS 6502 or 2A03 (NES CPU) as accurately as possible. Every difference between the real hardware and the emulation observable by the program (or the user, once it is more than a CPU emulator) is considered a bug to be fixed. Currently, Neskell emulates all official and unofficial instructions with bit and cycle accuracy, including all known bugs and BCD behaviors. Certain aspects such as fake / double writes and sub-instruction timing have not been implemented yet, as they are only observable once extra hardware besides the CPU is emulated.


Visual 6502 and VICE have been used extensively as references for the test cases used.

During the development of Neskell a few discrepancies between the two reference emulators and / or the actual hardware have been discovered. See src/tests/unit/ahx_tas_shx_shy_pagecross_test.asm / src/tests/unit/illegal_xb_test.asm and the discussion at Illegal Instructions doing (A & Immediate), ANC/ALR/ARR/.. for some interesting details.

Notes from src/tests/vice.txt on how to run a test with the VICE emulator:

Basic steps to load a binary into the VICE emulator

Open the Monitor and type

f $0100 $01ff 0

to zero out data on the stack, just for clarity

Load the program at address $0200 with

fill $0200 $0300 aa bb cc 11 22 433 ...

Set the program counter to the starting address

r pc=0200

and then step with 'z', use 'r' to inspect registers

Finally, type

m $0100 $01ff

to inspect the stack.

Notes from src/tests/visual6502.txt on how to run a test with Visual 6502:

Basic steps to load a binary into the Visual 6502 simulator

Convert a 6502 binary into an ASCII hex dump string

mapM_ (\x -> Text.Printf.printf "%02x" x) . Data.ByteString.unpack =<< Data.ByteString.readFile "load_store_test.bin"

Form URL: First part (r=0600) sets the PC, graphics=false hides the rendering of
the simulation, a=0000&d=0... overwrites some pre-loaded code on the zero page.
The a=0600&d=a9... pair is the hex dump of the binary generated above


The 'runv6502' script automates this process completely and opens the URL in the

Now trace with the 'Step' button and watch the zero page / stack writes and
register changes. Be sure to click 'Log Up/Down' so that the trace doesn't
immediately go off screen

A quick hack on the Visual 6502 source was required to speed up generation of reference traces. It disables all updates of the GUI/DOM and instead just dumps the instruction trace into the JS Debug Console:

[16:44:25.988] "369 040d    ca  1   DEX 040d    41  2b  00  fd  nv-BdIzc"
[16:44:26.011] "370 040e    10  1       040e    41  2b  00  fd  nv-BdIzc"
[16:44:26.032] "371 040e    10  1   BPL     040e    41  2b  00  fd  nv-BdIzc"
[16:44:26.054] "372 040f    f8  1       040f    41  2a  00  fd  nv-BdIzc"
[16:44:26.076] "373 0410    a2  1       0410    41  2a  00  fd  nv-BdIzc"
[16:44:26.097] "374 0408    bd  1   LDA Abs,X   0408    41  2a  00  fd
[16:44:26.120] "375 0409    5a  1       0409    41  2a  00  fd  nv-BdIzc"
[16:44:26.178] "376 040a    36  1       040a    41  2a  00  fd  nv-BdIzc"
[16:44:26.200] "377 3684    02  1       040b    41  2a  00  fd  nv-BdIzc"
[16:44:26.221] "378 040b    95  1   STA zp,X    040b    02  2a  00  fd
[16:44:26.243] "379 040c    13  1       040c    02  2a  00  fd  nv-BdIzc"
[16:44:26.264] "380 0013    00  1       040d    02  2a  00  fd  nv-BdIzc"
[16:44:26.284] "381 003d    00  0       040d    02  2a  00  fd  nv-BdIzc"
[16:44:26.294] "381 003d    02  0       040d    02  2a  00  fd  nv-BdIzc"
[16:44:26.306] "382 040d    ca  1   DEX 040d    02  2a  00  fd  nv-BdIzc"

See here for the fork.

The script at src/tests/runv6502 can be used to quickly load the test binaries into Visual 6502.


The build-in disassembler can be used through the --dasm flag like this:

$ neskell --dasm tests/unit/nop_test.bin

All 256 instructions are supported and uniquely decoded. The folder src/tests/decoding contains a test for the disassembler. Here is how all possible instructions are represented in the disassembly:

ADC #$01          SBC ($98),Y          INC $4C           LAX $00,Y         
ADC $02           SEC                  INC $4D,X         LAX $0000         
ADC $03,X         SED                  INC $4E4F         LAX $0000,Y       
ADC $0405         SEI                  INC $5051,X       LAX ($00,X)       
ADC $0607,X       STA $99              INX               LAX ($00),Y       
ADC $0809,Y       STA $9A,X            INY               SAX $00           
ADC ($0A,X)       STA $9B9C            JMP $5253         SAX $00,Y         
ADC ($0B),Y       STA $9D9E,X          JMP ($5455)       SAX $0000         
AND #$0C          STA $9FA0,Y          JSR $5657         SAX ($00,X)       
AND $0D           STA ($A1,X)          LDA #$58          SBC($EB) #$00     
AND $0E,X         STA ($A2),Y          LDA $59           DCP $00           
AND $0F10         STX $A3              LDA $5A,X         DCP $00,X         
AND $1112,X       STX $A4,Y            LDA $5B5C         DCP $0000         
AND $1314,Y       STX $A5A6            LDA $5D5E,X       DCP $0000,X       
AND ($15,X)       STY $A7              LDA $5F60,Y       DCP $0000,Y       
AND ($16),Y       STY $A8,X            LDA ($61,X)       DCP ($00,X)       
ASL A             STY $A9AA            LDA ($62),Y       DCP ($00),Y       
ASL $17           TAX                  LDX #$63          ISC $00           
ASL $18,X         TAY                  LDX $64           ISC $00,X         
ASL $191A         TSX                  LDX $65,Y         ISC $0000         
ASL $1B1C,X       TXA                  LDX $6667         ISC $0000,X       
BCC $1D           TXS                  LDX $6869,Y       ISC $0000,Y       
BCS $1E           TYA                  LDY #$6A          ISC ($00,X)       
BEQ $1F           KIL($02)             LDY $6B           ISC ($00),Y       
BIT $20           KIL($12)             LDY $6C,X         RLA $00           
BIT $2122         KIL($22)             LDY $6D6E         RLA $00,X         
BMI $23           KIL($32)             LDY $6F70,X       RLA $0000         
BNE $24           KIL($42)             LSR A             RLA $0000,X       
BPL $25           KIL($52)             LSR $71           RLA $0000,Y       
BRK               KIL($62)             LSR $72,X         RLA ($00,X)       
BVC $26           KIL($72)             LSR $7374         RLA ($00),Y       
BVS $27           KIL($92)             LSR $7576,X       RRA $00           
CLC               KIL($B2)             NOP               RRA $00,X         
CLD               KIL($D2)             ORA #$77          RRA $0000         
CLI               KIL($F2)             ORA $78           RRA $0000,X       
CLV               NOP($7A)             ORA $79,X         RRA $0000,Y       
CMP #$28          NOP($5A)             ORA $7A7B         RRA ($00,X)       
CMP $29           NOP($1A)             ORA $7C7D,X       RRA ($00),Y       
CMP $2A,X         NOP($3A)             ORA $7E7F,Y       SLO $00           
CMP $2B2C         NOP($DA)             ORA ($80,X)       SLO $00,X         
CMP $2D2E,X       NOP($FA)             ORA ($81),Y       SLO $0000         
CMP $2F30,Y       NOP($80) #$00        PHA               SLO $0000,X       
CMP ($31,X)       NOP($82) #$00        PHP               SLO $0000,Y       
CMP ($32),Y       NOP($89) #$00        PLA               SLO ($00,X)       
CPX #$33          NOP($C2) #$00        PLP               SLO ($00),Y       
CPX $34           NOP($E2) #$00        ROL A             SRE $00           
CPX $3536         NOP($04) $00         ROL $82           SRE $00,X         
CPY #$37          NOP($64) $00         ROL $83,X         SRE $0000         
CPY $38           NOP($44) $00         ROL $8485         SRE $0000,X       
CPY $393A         NOP($0C) $000        ROL $8687,X       SRE $0000,Y       
DEC $3B           NOP($14) $00,        ROR A             SRE ($00,X)       
DEC $3C,X         NOP($34) $00,        ROR $88           SRE ($00),Y       
DEC $3D3E         NOP($54) $00,        ROR $89,X         ANC($0B) #$00     
DEC $3F40,X       NOP($74) $00,        ROR $8A8B         ANC($2B) #$00     
DEX               NOP($D4) $00,        ROR $8C8D,X       ALR #$00          
DEY               NOP($F4) $00,        RTI               ARR #$00          
EOR #$41          NOP($1C) $000        RTS               XAA #$00          
EOR $42           NOP($3C) $000        SBC #$8E          AHX ($00),Y       
EOR $43,X         NOP($5C) $000        SBC $8F           AHX $0000,Y       
EOR $4445         NOP($7C) $000        SBC $90,X         TAS $0000,Y       
EOR $4647,X       NOP($DC) $000        SBC $9192         SHX $0000,Y       
EOR $4849,Y       NOP($FC) $000        SBC $9394,X       SHY $0000,X       
EOR ($4A,X)       LAX #$00             SBC $9596,Y       LAS $0000,Y       
EOR ($4B),Y       LAX $00              SBC ($97,X)       AXS #$00          



The emulator itself runs inside the ST monad and provides a very small load/store interface for manipulation of CPU / RAM state from the emulation code. The design is strongly inspired by the A bit of ST and a bit of IO: Designing a DCPU-16 emulator blog post. The typeclass interface causes quite significant overhead, some of which can be mitigated through {-# SPECIALIZE #-} pragmas. This unfortunately is also not without its issues. See Ticket #8331, for instance.


The basic emulator interface looks as follows:

runEmulator ::
    Processor                -> -- Model of the processor to be emulated
    [(B.ByteString, Word16)] -> -- List of program binaries and their offsets
    [(LoadStore, L8R16)]     -> -- Store operations to set up simulator state
    [Cond]                   -> -- The simulator will stop when any of these conditions are met
    [Cond]                   -> -- Success conditions to verify once stopped
    TraceMode                -> -- Level of tracing
    Int                      -> -- MB of trace log ring buffer space
    ( [Cond]                    -- Success conditions which were met
    , [Cond]                    -- ...not met
    , [Cond]                    -- Stopping conditions met
    , String                    -- Debug string of last CPU state
    , String                    -- Instruction the PC is pointing at
    , B.ByteString              -- Last traceMB MB of the execution trace

And an example:

runEmulator NMOS_6502
            [ (bin, 0x0600) ]
            [ (PC, Right 0xC000)
            , (SP, Left 0xFD)
            [ CondOpC BRK
            , CondLoopPC
            , CondCycleR 100000 (maxBound :: Word64)
            ( [ CondLS SP $ Left 0x81
              , CondLS SR $ Left $ srFromString "N-1--I--"
              , CondLS A  $ Left 0x55
              , CondLS X  $ Left 0x2A
              , CondLS Y  $ Left 0x73
              , CondLS (Addr 0x022A) (Left 0x55)
              , CondCycleR 25735 25735
              ++ ( makeStackCond 0xFF $
                       "      00 34 0F 4C B5 D5 4E 35 7D 7B B4 8D 04 35 " ++
                       "0D 50 B4 A0 76 B5 9B 00 B5 BF 32 B5 BB 3A 35 3B " ++
                       "EC B5 FE 22 B5 B3 42 B5 F2 BA B5 FF 80 B5 80 CC " ++
                       "75 66 EF 35 7B 78 35 6A 9D B4 D7 79 35 59 11 34 " ++
                       "35 01 34 01 33 35 11 3B 35 33 ED B5 E4 21 37 00 " ++
                       "42 35 40 43 34 01 00 B5 FE 9A B4 FE 35 B4 97 F1 " ++
                       "B4 FE 3B B4 FE F3 B4 FC 38 B4 FF FF 37 FF 98 35 " ++
                       "99 33 37 33 EF 35 F0 39 35 3A F1 B4 F0 36 35 37 "

The actual emulation code operates on the CPU / RAM state using a small typeclass interface defined in MonadEmulator.hs:

data LoadStore = A | X | Y | SR | SP | PC | PCL | PCH | Addr Word16

class (Monad m, Applicative m) => MonadEmulator m where
    load8      :: LoadStore -> m Word8
    load16     :: LoadStore -> m Word16
    store8     :: LoadStore -> Word8  -> m ()
    store16    :: LoadStore -> Word16 -> m ()
    trace      :: String -> m ()
    traceM     :: m String -> m ()
    runNoTrace :: m a -> m a
    advCycles  :: Word64 -> m ()
    getCycles  :: m Word64
    getModel   :: m Processor

Future Work

The final goal would be extending the CPU emulator to a full system, most likely the NES. Concerning the CPU emulator currently implemented, there is still plenty of room for improvement.

The src/sandbox folder contains various test cases and experiments. One of the biggest gripes with the current codebase is the redundancy and verbosity of the actual emulation code in the Emulator module. The idea is to first get a working emulator and test suite in place, and then find the optimal code structure to meet the requirements. The folder src/sandbox/THTest contains some early experiments to generate the instruction emulation code from small fragments using Template Haskell. This is heavily inspired by Bisqwit's incredible sub-1000 line NES emulator and the clever encoding / CPP + template code for representing and generating the effects of each instruction.

One especially nasty bug still exists, and I suspect it to be an issue in GHC's garbage collector. See the comment from Test.hs:252:

-- Workaround for a GC bug. Without this, the GC just isn't collecting. If we
-- allocate some memory with an unboxed mutable vector inside the ST monad in
-- the emulator (ring buffer trace log), the memory seems to be retained
-- sometimes. There's no reference to the vector outside of ST. Even if we
-- never do anything but create the vector and put it inside the Reader record,
-- and then only return a single Int from ST, which is immediately evaluated,
-- the memory is retained (sometimes...). All memory profiles always show we
-- never allocate more than one vector at a time, yet multiple runs would cause
-- OS memory for multiple vectors to be used and would eventually cause an
-- out-of-memory error. Even then the RTS would not collect. Forcing collection
-- after leaving ST and evaluating all return values seems to solve the problem
-- entirely. This seems like a bug in the GC, running out of OS memory instead
-- of garbage collecting allocations it (demonstrably) knows how to free, while
-- all RTS memory profiles confirm it is indeed not referenced. In the parallel
-- case this only alleviates the situation, not fixing it entirely like for the
-- serial version before. Unfortunately, the bug is rather hard to reduce to a
-- simple reproduction case

Any suggestions regarding this, workarounds, analysis, ideas on how to best report this bug etc., would be much appreciated.

Performance could be described as 'barely acceptable'. Running all 92608051 cycles of the 'Functional 6502 Test' takes ~33.5sec on an older 2.2Ghz Core 2 Duo processor, giving us ~2.75Mhz for just the CPU.

Here's a profile run generated while running the entire test suite:

COST CENTRE                MODULE        %time %alloc

decodeInstructionM         Instruction    15.3    1.6
runEmulator.loop           Emulator       11.8   59.5
execute                    Execution      11.8   13.7
decodeOpCode               Instruction     5.0    0.0
load8                      MonadEmulator   3.2    1.8
detectLoopOnPC             Execution       2.8    1.0
bne                        Execution       2.6    0.7
setNZ.isN                  Execution       2.5    1.4
checkCond                  Condition       2.1    2.5
store16                    MonadEmulator   2.0    0.0
update16                   Execution       1.8    0.8
lda                        Execution       1.8    0.4
load16                     MonadEmulator   1.7    0.3
trace                      MonadEmulator   1.7    0.0
store16.(...)              MonadEmulator   1.6    2.2
advCycles                  MonadEmulator   1.6    0.5
store8Trace                Execution       1.6    0.9
loadOperand8.loadAndTrace  Execution       1.5    0.9
store16Trace               Execution       1.4    0.1
cmp                        Execution       1.2    0.3
execute.ilen               Execution       1.2    0.3
updateNZ                   Execution       1.1    0.4
getOperandPageCrossPenalty Execution       1.1    0.9
makeRingBuffer             RingBufferLog   0.0    2.3

The amount of inlining (into runEmulator.loop, for instance) used makes the results unfortunately somewhat harder to interpret.

One idea to speed up the test suite would be to not only run individual tests in parallel, but to use snapshots to further parallelize segments of long-running tests. Like the parallel testing this wouldn't help actual emulator speed, but being able to run more tests in shorter time is helpful during development.

Exploring type system features to represent more invariants of the 6502 directly on the type level would be useful. We could also split MonadEmulator into separate read / write interfaces for more access control.

It would be interesting to explore if Haskell's natural laziness could be used effectively to defer emulation work until it is actually needed. Often the result of computations like status register updates is never used.

The Makefile based build system should be replaced with Cabal.

There's very little error handling throughout the program, this should be improved as the emulator matures.


This program is published under the MIT License.

Some of the test suite entries have been developed by other authors and a different license may apply.


Developed by Tim C. Schroeder, visit my website to learn more.