Lumees Lab — FPGA-Verified, Production-Ready IP
The Lumees Lab RISC-V Debug Module IP Core is a complete implementation of the RISC-V External Debug Support Specification v0.13.2. It provides full debug access to RISC-V harts via a standard JTAG interface, enabling halt, resume, single-step, register/memory inspection, and program buffer execution — all controlled by industry-standard tools like OpenOCD and GDB.
The design spans two clock domains (JTAG TCK and system CLK) with a proven gray-code CDC async FIFO, and includes a System Bus Access (SBA) master for direct memory reads/writes without involving the CPU.
Unlike soft debug solutions that require CPU cooperation, this module operates independently — it can halt a hung processor, inspect crash state, and resume execution, making it essential for any production RISC-V SoC.
| Feature | Detail |
|---|---|
| Specification | RISC-V External Debug Support v0.13.2 |
| JTAG TAP | IEEE 1149.1 compliant, 5-bit IR, IDCODE/DTMCS/DMI |
| DMI | 7-bit address, 32-bit data, gray-code CDC async FIFO |
| Hart Control | Halt, resume, single-step, halt-on-reset |
| Abstract Commands | Access Register (GPR x0–x31, CSRs), 32/64-bit transfer |
| Program Buffer | 4-word (16 bytes) for arbitrary instruction execution |
| System Bus Access | 8/16/32-bit direct memory R/W via bus master |
| Data Registers | 12 × 32-bit (data0–data11) for command data exchange |
| Authentication | Hardwired authenticated=1 (v1.0), extensible |
| Clock Domains | TCK (JTAG) + CLK (system), 2-FF gray-code CDC |
| Bus Interfaces | AXI4-Lite slave + SBA master, Wishbone B4 slave |
| Technology | Pure synchronous RTL, no vendor primitives |
| Language | SystemVerilog |
| Resource | Full SoC | Core Alone | Available | SoC % |
|---|---|---|---|---|
| LUT | 953 | 566 | 63,400 | 1.50% |
| FF | 1,316 | 824 | 126,800 | 1.04% |
| DSP48 | 0 | 0 | 240 | 0% |
| Block RAM | 0 | 0 | 135 | 0% |
Timing: WNS = +1.194 ns @ 100 MHz. All endpoints met, 0 failing paths. Zero DSP and zero BRAM — the entire debug module fits in LUT/FF fabric.
JTAG pins (TCK, TMS, TDI, TDO, TRST_n)
│
┌────┴────┐
│ debug │ IEEE 1149.1 TAP
│ _jtag │ 5-bit IR, IDCODE/DTMCS/DMI
│ │ 16-state FSM (TCK domain)
└────┬────┘
│ DMI req/resp
┌────┴────┐
│ debug │ Gray-code CDC async FIFO
│ _dmi │ 2-entry req + 2-entry resp
│ │ TCK → CLK domain crossing
└────┬────┘
│ DMI req/resp (CLK domain)
┌────┴────┐
│ debug │ Register decode, abstract commands,
│ _dm │ program buffer, hart control,
│ │ System Bus Access FSM
└────┬────┘
╱ ╲
┌────────┘ └────────┐
Hart Control SBA Master
(halt/resume/step) (AXI4-Lite bus master)
JTAG TAP (debug_jtag): Full IEEE 1149.1 state machine with 16 states. Supports IDCODE (0x01), DTMCS (0x10), and DMI (0x11) scan registers. The DMI shift register is 41 bits (7-bit addr + 32-bit data + 2-bit op).
DMI CDC (debug_dmi): Bridges the TCK and CLK domains using a 2-deep gray-code async FIFO for requests and responses. Handles DMI busy/error status.
Debug Module (debug_dm): The core logic — decodes DMI register reads/writes, manages hart control (halt/resume), executes abstract commands (Access Register), provides 4-word program buffer, and drives the System Bus Access master.
CDC FIFO (debug_cdc_fifo): Generic parameterized async FIFO with gray-code pointers. Vendor-neutral, proven design.
| DMI Addr | Register | Description |
|---|---|---|
| 0x04–0x0F | DATA0–DATA11 | Abstract command data exchange (12 × 32-bit) |
| 0x10 | DMCONTROL | haltreq, resumereq, hartreset, dmactive, ndmreset |
| 0x11 | DMSTATUS | Hart state (halted, running, unavailable, havereset) |
| 0x12 | HARTINFO | Data register configuration (hardcoded) |
| 0x16 | ABSTRACTCS | cmderr, busy, progbufsize=4, datacount=12 |
| 0x17 | COMMAND | Write triggers abstract command execution |
| 0x18 | ABSTRACTAUTO | Auto-execute on data/progbuf access |
| 0x20–0x23 | PROGBUF0–3 | 4-word program buffer |
| 0x38 | SBCS | SBA control/status (sbaccess, sbusy, sberror) |
| 0x39 | SBADDRESS0 | SBA target address |
| 0x3C | SBDATA0 | SBA data (read/write triggers bus transaction) |
| Offset | Register | Access | Description |
|---|---|---|---|
| 0x00 | CTRL | R/W | [0] = dmactive (via ndmreset) |
| 0x04 | STATUS | RO | [0] = halted, [1] = running, [2] = unavailable |
| 0x08 | INFO | RO | [6:0] = DMI_ABITS (7), [15:8] = DMI_VERSION (1) |
| 0x0C | VERSION | RO | IP version = 0x00010000 |
debug_core u_debug (
.clk (clk),
.rst_n (rst_n),
// JTAG pins
.tck (jtag_tck),
.tms (jtag_tms),
.tdi (jtag_tdi),
.tdo (jtag_tdo),
.trst_n (jtag_trst_n),
// Hart control
.halt_req_o (halt_req),
.resume_req_o (resume_req),
.ndmreset_o (ndmreset),
.halted_i (halted),
.running_i (running),
.havereset_i (havereset),
.unavailable_i (unavailable),
// Abstract command interface
.cmd_valid_o (cmd_valid),
.cmd_data_o (cmd_data),
.cmd_ready_i (cmd_ready),
.cmd_rdata_i (cmd_rdata),
.cmd_resp_valid_i (cmd_resp_valid),
.cmd_resp_err_i (cmd_resp_err),
// System Bus Access master
.sba_req_o (sba_req),
.sba_we_o (sba_we),
.sba_addr_o (sba_addr),
.sba_wdata_o (sba_wdata),
.sba_size_o (sba_size),
.sba_gnt_i (sba_gnt),
.sba_rvalid_i (sba_rvalid),
.sba_rdata_i (sba_rdata),
.sba_err_i (sba_err),
// Info
.core_version (version)
);| Test Suite | Tests | Status |
|---|---|---|
JTAG TAP (test_debug_jtag) |
IDCODE, DTMCS, DMI shift, state transitions | 7/7 PASS |
Full-stack (test_debug_core) |
DM activate, halt, resume, abstract cmd, SBA | 5/5 PASS |
AXI4-Lite (test_debug_axil) |
VERSION, INFO, STATUS, IRQ | 4/4 PASS |
Wishbone (test_debug_wb) |
VERSION, INFO, STATUS, IRQ | 4/4 PASS |
- Full environment: agent, driver (JTAG bit-bang), monitor, scoreboard, coverage
- Sequences: directed (halt/resume/SBA), random (50 DMI ops), stress (200 rapid ops)
- Coverage:
dmi_op × addr_range × response,halt_req × haltedcross
- 10/10 UART regression tests at 100 MHz via LiteX SoC + UARTBone
- VERSION, STATUS, TAP state, SBA idle, CTRL write/read, field consistency
debug/
├── rtl/ # 9 SystemVerilog files
│ ├── debug_pkg.sv # Types, DMI addresses, FSM states
│ ├── debug_jtag.sv # IEEE 1149.1 JTAG TAP controller
│ ├── debug_dmi.sv # DMI layer + CDC (TCK → CLK)
│ ├── debug_cdc_fifo.sv # Generic async FIFO (gray-code)
│ ├── debug_dm.sv # Debug Module core (registers, cmds, SBA)
│ ├── debug_core.sv # Integration: JTAG + DMI + DM
│ ├── debug_top.sv # Top-level wrapper
│ ├── debug_axil.sv # AXI4-Lite slave
│ └── debug_wb.sv # Wishbone B4 slave
├── model/
│ └── debug_model.py # Python golden model (TAP + DM)
├── tb/
│ ├── directed/ # cocotb tests (4 suites, 20 tests)
│ │ ├── jtag_driver.py # Reusable JTAG bit-bang driver
│ │ ├── test_debug_jtag.py
│ │ ├── test_debug_core.py
│ │ ├── test_debug_axil.py
│ │ └── test_debug_wb.py
│ └── uvm/ # UVM testbench (11 files)
├── sim/
│ └── Makefile.cocotb # One-command sim (Verilator + cocotb)
├── litex/ # LiteX SoC integration
│ ├── debug_litex.py
│ ├── debug_soc.py # Arty A7-100T reference SoC
│ └── debug_uart_test.py
├── README.md
├── LICENSE # Apache 2.0 + Commons Clause
└── .gitignore
cd sim/
make -f Makefile.cocotb sim-top # Run all testscd litex/
python3 debug_soc.py --build # Vivado synthesis + P&R
python3 debug_soc.py --load # Program via JTAG
litex_server --uart --uart-port /dev/ttyUSB1 --uart-baudrate 115200
python3 debug_uart_test.py # Run hardware regression- Robust
resumeack— latch on rising edge ofrunning_iafterresumereq(currently requires single-cycle response) - OpenOCD configuration file for
riscv013target - Multi-hart support (
hasel, hart array mask) -
ndmresetactive control from AXI4-Lite CTRL register - Formal verification of CDC FIFO (SVA properties)
- Access Memory abstract command (type 2)
- SBA address auto-increment (
sbreadonaddr+sbautoincrement) - SBA 64-bit addressing
- Trigger Module integration (hardware breakpoints)
- Authentication state machine
- ASIC-targeted synthesis scripts (SkyWater 130nm)
- Silicon-proven validation
| Differentiator | Detail |
|---|---|
| Full spec compliance | RISC-V Debug v0.13.2 — JTAG TAP, DMI, DM, SBA |
| Two clock domains | Proven gray-code CDC, not a clock-domain hack |
| Zero BRAM / DSP | 566 LUTs — fits alongside any RISC-V core |
| System Bus Access | Direct memory R/W without CPU involvement |
| Hardware-verified | 10/10 HW PASS on Arty A7-100T at 100 MHz |
| OpenOCD compatible | Standard JTAG TAP with IDCODE/DTMCS/DMI |
| Source-available | Full RTL included, not an encrypted netlist |
Licensed under Apache License 2.0 with Commons Clause.
- Non-commercial use (academic, research, hobby, education): Free
- Commercial use: Requires a Lumees Lab commercial license
See LICENSE for full terms.
Lumees Lab · Hasan Kurşun · lumeeslab.com
Copyright © 2026 Lumees Lab. All rights reserved.