Skip to content

Lumees/debug

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RISC-V Debug Module IP Core

Lumees Lab — FPGA-Verified, Production-Ready IP

License FPGA Frequency Tests


Overview

The Lumees Lab RISC-V Debug Module IP Core is a complete implementation of the RISC-V External Debug Support Specification v0.13.2. It provides full debug access to RISC-V harts via a standard JTAG interface, enabling halt, resume, single-step, register/memory inspection, and program buffer execution — all controlled by industry-standard tools like OpenOCD and GDB.

The design spans two clock domains (JTAG TCK and system CLK) with a proven gray-code CDC async FIFO, and includes a System Bus Access (SBA) master for direct memory reads/writes without involving the CPU.

Unlike soft debug solutions that require CPU cooperation, this module operates independently — it can halt a hung processor, inspect crash state, and resume execution, making it essential for any production RISC-V SoC.


Key Features

Feature Detail
Specification RISC-V External Debug Support v0.13.2
JTAG TAP IEEE 1149.1 compliant, 5-bit IR, IDCODE/DTMCS/DMI
DMI 7-bit address, 32-bit data, gray-code CDC async FIFO
Hart Control Halt, resume, single-step, halt-on-reset
Abstract Commands Access Register (GPR x0–x31, CSRs), 32/64-bit transfer
Program Buffer 4-word (16 bytes) for arbitrary instruction execution
System Bus Access 8/16/32-bit direct memory R/W via bus master
Data Registers 12 × 32-bit (data0–data11) for command data exchange
Authentication Hardwired authenticated=1 (v1.0), extensible
Clock Domains TCK (JTAG) + CLK (system), 2-FF gray-code CDC
Bus Interfaces AXI4-Lite slave + SBA master, Wishbone B4 slave
Technology Pure synchronous RTL, no vendor primitives
Language SystemVerilog

Performance — Arty A7-100T (XC7A100T) @ 100 MHz

Resource Utilization

Resource Full SoC Core Alone Available SoC %
LUT 953 566 63,400 1.50%
FF 1,316 824 126,800 1.04%
DSP48 0 0 240 0%
Block RAM 0 0 135 0%

Timing: WNS = +1.194 ns @ 100 MHz. All endpoints met, 0 failing paths. Zero DSP and zero BRAM — the entire debug module fits in LUT/FF fabric.


Architecture

                    JTAG pins (TCK, TMS, TDI, TDO, TRST_n)
                         │
                    ┌────┴────┐
                    │  debug  │  IEEE 1149.1 TAP
                    │  _jtag  │  5-bit IR, IDCODE/DTMCS/DMI
                    │         │  16-state FSM (TCK domain)
                    └────┬────┘
                         │ DMI req/resp
                    ┌────┴────┐
                    │  debug  │  Gray-code CDC async FIFO
                    │  _dmi   │  2-entry req + 2-entry resp
                    │         │  TCK → CLK domain crossing
                    └────┬────┘
                         │ DMI req/resp (CLK domain)
                    ┌────┴────┐
                    │  debug  │  Register decode, abstract commands,
                    │  _dm    │  program buffer, hart control,
                    │         │  System Bus Access FSM
                    └────┬────┘
                    ╱          ╲
          ┌────────┘            └────────┐
     Hart Control              SBA Master
     (halt/resume/step)        (AXI4-Lite bus master)

JTAG TAP (debug_jtag): Full IEEE 1149.1 state machine with 16 states. Supports IDCODE (0x01), DTMCS (0x10), and DMI (0x11) scan registers. The DMI shift register is 41 bits (7-bit addr + 32-bit data + 2-bit op).

DMI CDC (debug_dmi): Bridges the TCK and CLK domains using a 2-deep gray-code async FIFO for requests and responses. Handles DMI busy/error status.

Debug Module (debug_dm): The core logic — decodes DMI register reads/writes, manages hart control (halt/resume), executes abstract commands (Access Register), provides 4-word program buffer, and drives the System Bus Access master.

CDC FIFO (debug_cdc_fifo): Generic parameterized async FIFO with gray-code pointers. Vendor-neutral, proven design.


DM Register Map (DMI Address Space)

DMI Addr Register Description
0x04–0x0F DATA0–DATA11 Abstract command data exchange (12 × 32-bit)
0x10 DMCONTROL haltreq, resumereq, hartreset, dmactive, ndmreset
0x11 DMSTATUS Hart state (halted, running, unavailable, havereset)
0x12 HARTINFO Data register configuration (hardcoded)
0x16 ABSTRACTCS cmderr, busy, progbufsize=4, datacount=12
0x17 COMMAND Write triggers abstract command execution
0x18 ABSTRACTAUTO Auto-execute on data/progbuf access
0x20–0x23 PROGBUF0–3 4-word program buffer
0x38 SBCS SBA control/status (sbaccess, sbusy, sberror)
0x39 SBADDRESS0 SBA target address
0x3C SBDATA0 SBA data (read/write triggers bus transaction)

AXI4-Lite / Wishbone Register Map

Offset Register Access Description
0x00 CTRL R/W [0] = dmactive (via ndmreset)
0x04 STATUS RO [0] = halted, [1] = running, [2] = unavailable
0x08 INFO RO [6:0] = DMI_ABITS (7), [15:8] = DMI_VERSION (1)
0x0C VERSION RO IP version = 0x00010000

Interface — Bare Core (debug_core)

debug_core u_debug (
  .clk           (clk),
  .rst_n         (rst_n),
  // JTAG pins
  .tck           (jtag_tck),
  .tms           (jtag_tms),
  .tdi           (jtag_tdi),
  .tdo           (jtag_tdo),
  .trst_n        (jtag_trst_n),
  // Hart control
  .halt_req_o    (halt_req),
  .resume_req_o  (resume_req),
  .ndmreset_o    (ndmreset),
  .halted_i      (halted),
  .running_i     (running),
  .havereset_i   (havereset),
  .unavailable_i (unavailable),
  // Abstract command interface
  .cmd_valid_o   (cmd_valid),
  .cmd_data_o    (cmd_data),
  .cmd_ready_i   (cmd_ready),
  .cmd_rdata_i   (cmd_rdata),
  .cmd_resp_valid_i (cmd_resp_valid),
  .cmd_resp_err_i   (cmd_resp_err),
  // System Bus Access master
  .sba_req_o     (sba_req),
  .sba_we_o      (sba_we),
  .sba_addr_o    (sba_addr),
  .sba_wdata_o   (sba_wdata),
  .sba_size_o    (sba_size),
  .sba_gnt_i     (sba_gnt),
  .sba_rvalid_i  (sba_rvalid),
  .sba_rdata_i   (sba_rdata),
  .sba_err_i     (sba_err),
  // Info
  .core_version  (version)
);

Verification

Simulation (cocotb + Verilator)

Test Suite Tests Status
JTAG TAP (test_debug_jtag) IDCODE, DTMCS, DMI shift, state transitions 7/7 PASS
Full-stack (test_debug_core) DM activate, halt, resume, abstract cmd, SBA 5/5 PASS
AXI4-Lite (test_debug_axil) VERSION, INFO, STATUS, IRQ 4/4 PASS
Wishbone (test_debug_wb) VERSION, INFO, STATUS, IRQ 4/4 PASS

UVM Testbench

  • Full environment: agent, driver (JTAG bit-bang), monitor, scoreboard, coverage
  • Sequences: directed (halt/resume/SBA), random (50 DMI ops), stress (200 rapid ops)
  • Coverage: dmi_op × addr_range × response, halt_req × halted cross

FPGA Hardware (Arty A7-100T)

  • 10/10 UART regression tests at 100 MHz via LiteX SoC + UARTBone
  • VERSION, STATUS, TAP state, SBA idle, CTRL write/read, field consistency

Directory Structure

debug/
├── rtl/                          # 9 SystemVerilog files
│   ├── debug_pkg.sv              # Types, DMI addresses, FSM states
│   ├── debug_jtag.sv             # IEEE 1149.1 JTAG TAP controller
│   ├── debug_dmi.sv              # DMI layer + CDC (TCK → CLK)
│   ├── debug_cdc_fifo.sv         # Generic async FIFO (gray-code)
│   ├── debug_dm.sv               # Debug Module core (registers, cmds, SBA)
│   ├── debug_core.sv             # Integration: JTAG + DMI + DM
│   ├── debug_top.sv              # Top-level wrapper
│   ├── debug_axil.sv             # AXI4-Lite slave
│   └── debug_wb.sv               # Wishbone B4 slave
├── model/
│   └── debug_model.py            # Python golden model (TAP + DM)
├── tb/
│   ├── directed/                 # cocotb tests (4 suites, 20 tests)
│   │   ├── jtag_driver.py        # Reusable JTAG bit-bang driver
│   │   ├── test_debug_jtag.py
│   │   ├── test_debug_core.py
│   │   ├── test_debug_axil.py
│   │   └── test_debug_wb.py
│   └── uvm/                      # UVM testbench (11 files)
├── sim/
│   └── Makefile.cocotb           # One-command sim (Verilator + cocotb)
├── litex/                        # LiteX SoC integration
│   ├── debug_litex.py
│   ├── debug_soc.py              # Arty A7-100T reference SoC
│   └── debug_uart_test.py
├── README.md
├── LICENSE                       # Apache 2.0 + Commons Clause
└── .gitignore

Quick Start

Simulation

cd sim/
make -f Makefile.cocotb sim-top    # Run all tests

FPGA Build (Arty A7-100T)

cd litex/
python3 debug_soc.py --build       # Vivado synthesis + P&R
python3 debug_soc.py --load        # Program via JTAG
litex_server --uart --uart-port /dev/ttyUSB1 --uart-baudrate 115200
python3 debug_uart_test.py          # Run hardware regression

Roadmap

v1.1

  • Robust resumeack — latch on rising edge of running_i after resumereq (currently requires single-cycle response)
  • OpenOCD configuration file for riscv013 target
  • Multi-hart support (hasel, hart array mask)
  • ndmreset active control from AXI4-Lite CTRL register
  • Formal verification of CDC FIFO (SVA properties)

v1.2

  • Access Memory abstract command (type 2)
  • SBA address auto-increment (sbreadonaddr + sbautoincrement)
  • SBA 64-bit addressing
  • Trigger Module integration (hardware breakpoints)

v2.0

  • Authentication state machine
  • ASIC-targeted synthesis scripts (SkyWater 130nm)
  • Silicon-proven validation

Why Lumees Debug?

Differentiator Detail
Full spec compliance RISC-V Debug v0.13.2 — JTAG TAP, DMI, DM, SBA
Two clock domains Proven gray-code CDC, not a clock-domain hack
Zero BRAM / DSP 566 LUTs — fits alongside any RISC-V core
System Bus Access Direct memory R/W without CPU involvement
Hardware-verified 10/10 HW PASS on Arty A7-100T at 100 MHz
OpenOCD compatible Standard JTAG TAP with IDCODE/DTMCS/DMI
Source-available Full RTL included, not an encrypted netlist

License

Licensed under Apache License 2.0 with Commons Clause.

See LICENSE for full terms.


Lumees Lab · Hasan Kurşun · lumeeslab.com

Copyright © 2026 Lumees Lab. All rights reserved.

About

The Lumees Lab RISC-V Debug Module IP Core is a complete implementation of the RISC-V External Debug Support Specification v0.13.2.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors