Skip to content

Nios Vg Boot Copier Reference Design AXC3000 26.1

skravats edited this page May 28, 2026 · 12 revisions

Table of Contents

Quick Start Guide
Build Instructions

Quick Start Guide

This Quick Start Guide describes the architecture of this reference design. It also describes the hardware and software setup needed to run it.

  1. Overview
  2. Theory of Operation
  3. QSPI Flash Layout
  4. HyperRAM
  5. Install the Quartus Prime Pro Programmer
  6. Configure the Board for the Demo
  7. Flash the FPGA configuration file.
  8. Run the Demo

Overview

The Nios V/g processor is a soft-core RISC-V processor implemented across many Altera FPGA device families. Unlike a Hard Processor System (HPS), it has no dedicated boot ROM or autonomous boot hardware. The Agilex 3 family of devices uses a Secure Device Manager (SDM) to configure the fabric. When the SDM releases the processor from reset, it begins executing from wherever its reset vector points. This makes the boot strategy entirely the responsibility of the system designer.

A common and effective approach for larger embedded designs is the bootcopier pattern: a small, self-contained application pre-initialised into On-Chip Memory (OCM) as part of the FPGA bitstream, which runs immediately at reset and copies the main user application from external QSPI flash into fast external HyperRAM before jumping to it. This pattern is relevant for several practical reasons. OCM is limited in size and cannot hold a large application. HyperRAM offers far greater capacity for a full-featured embedded application, and QSPI flash provides non-volatile persistent storage that survives power cycles.

The bootcopier bridges all three: it lives in OCM, reads from flash via the SDM mailbox interface (the only path to QSPI available to the Nios V on Agilex 3), copies the application to HyperRAM, and hands off execution cleanly. Understanding how the bootcopier works — including the flash record format, the toolchain steps that produce it, and the cache coherency requirements when the Nios V/g instruction and data caches are enabled — is essential for any engineer deploying a larger Nios V-based application on Agilex 3.

Theory of Operation

Block diagram.

The reference design includes

  • Nios® V/g Microcontroller
  • Onchip RAM (64KB)
  • SLL xSPI (HyperRAM) memory controller
  • System ID peripheral
  • Mailbox Client
  • Push-Button
  • RGB LED
  • Light Weight UART


Power-on sequence of events.

When power is applied to the Agilex 3 device, the SDM (Secure Device Manager) takes control of the boot process. The SDM is a dedicated hardened processor inside the device whose sole responsibility at this stage is to read the FPGA configuration bitstream from QSPI flash and program the FPGA fabric. Embedded within that bitstream is the pre-initialised OCM containing the bootcopier binary — so by the time the FPGA fabric is fully configured, the bootcopier code is already resident in OCM and ready to execute. The SDM then releases the Nios V/g processor from reset, at which point the processor begins executing from its reset vector, which points to the base of OCM.

The bootcopier runs entirely from OCM and has one job — to locate the user application image in QSPI flash, copy it to HyperRAM, and transfer execution to it. It communicates with the SDM via a mailbox interface to gain access to the QSPI flash, reads a structured image from a known offset in flash, and copies the application binary directly into HyperRAM. Once the copy is complete, the bootcopier performs the necessary cache maintenance operations to ensure the freshly written data is coherent with the instruction fetch path, then jumps to the application entry point in HyperRAM. From that point forward, the user application is in full control of the processor and the system.

Managing Cache

When the bootcopier copies the user application binary from QSPI flash into HyperRAM, the writes go through the Nios V/g data cache. Because the data cache operates in write-back mode, dirty cache lines — lines that have been written but not yet propagated to physical HyperRAM — may still be sitting in the cache when the copy loop completes. Physical HyperRAM may therefore not yet contain the correct application code. At the same time, the instruction cache has no knowledge of what the data cache has written and will fetch instructions directly from HyperRAM when the processor jumps to the application entry point. If the data cache has not been flushed before that jump, the instruction cache fetches stale or uninitialised memory, and the application fails to run correctly.

Three operations must be performed in order immediately before the jump:

First, alt_dcache_flush_all() is called to write every dirty data cache line back to physical HyperRAM. After this call, HyperRAM contains the complete and correct application binary.

Second, alt_icache_flush_all() is called to invalidate the entire instruction cache. This forces the processor to fetch fresh instructions from HyperRAM rather than from any speculatively cached lines that may have been loaded before the copy was complete.

Third, a fence.i instruction is executed. This is the RISC-V architectural instruction-fetch fence — it guarantees that all data writes preceding it are globally visible and that all instruction fetches following it will see those writes. It also flushes the processor pipeline, discarding any prefetched instructions that may have been loaded from stale memory addresses.

Only after these three steps in this order can the jr jump to the application entry point be guaranteed to execute the correct code.

QSPI Flash Layout


The diagram shows the three regions of the QSPI flash as seen after the quartus_pfg utility has generated and programmed the JIC file. The SDM region at the bottom of flash is always present and owned entirely by the SDM — the application designer does not place content there. The user application payload sits at 0x200000, fixed by the PAYLOAD_OFFSET constant in the bootcopier and the s_addr in the .pfg file. The FPGA SOF bitstream sits above it at 0x300000, grows upward toward higher addresses, and contains the FPGA fabric configuration with the OCM pre-initialised bootcopier embedded within it. The gap between 0x200000 and 0x300000 is a deliberate 1 MB separation ensuring the User Application has enough space to grow and not overwrite the contents of the FPGA SOF bitstream.

Packaging the User Application into QSPI Flash

The user application is not stored raw in QSPI flash. It is wrapped in a simple record-based format by the elf2flash tool before being placed in flash. This format allows the bootcopier to locate, copy, and jump to the application without needing any knowledge of the ELF file format or linker internals — it only needs to understand the simple record structure.

The Record Structure

Each record consists of an 8-byte header followed by the data payload:

┌───────────────────────────────────────┐
│  Word 0 (+0x00) : Length  (4 bytes)   │
│  Word 1 (+0x04) : Dest    (4 bytes)   │
├───────────────────────────────────────┤
│  Data payload   (Length bytes)        │
└───────────────────────────────────────┘

Length is the number of bytes in the data payload. A length of zero is the terminator — it signals the end of the image and carries no data.

Dest is the destination address in DRAM where the bootcopier writes the payload. For the terminator record, the Dest field is repurposed as the application entry point — the address the bootcopier jumps to after copying is complete.

Two special length values are handled by the bootcopier as error conditions — a length of 0xFFFFFFFF indicates erased/unprogrammed flash, causing the bootcopier to halt rather than attempt a copy of uninitialised data.


The Complete Flash Image Layout

A typical single-section application produces two records in flash:

QSPI Flash @ PAYLOAD_OFFSET (0x200000):

 ┌────────────────────────────────────────────────┐
 │  +0x00  Length  = N  (e.g. 90,724 bytes)       │  ← Record 1 header
 │  +0x04  Dest    = 0x00000000  (DRAM base)      │
 ├────────────────────────────────────────────────┤
 │  +0x08  [N bytes of application binary]        │  ← Record 1 data
 │         .text, .rodata, .data sections         │
 ├────────────────────────────────────────────────┤
 │  +0x08+N  Length  = 0x00000000  (terminator)   │  ← Record 2 header
 │  +0x0C+N  Dest    = 0x000004D4  (entry point)  │
 └────────────────────────────────────────────────┘

How the Bootcopier Processes the Records

The bootcopier maintains a pointer starting at PAYLOAD_OFFSET and walks through the records sequentially:

record_ptr = PAYLOAD_OFFSET (0x200000)
      │
      ▼
Read 8-byte header
      │
      ├── Length == 0xFFFFFFFF? ──► error() — flash not programmed
      │
      ├── Length == 0? ──────────► Dest = entry point — break out of loop
      │
      └── Length > 0? ──────────► Copy Length bytes from flash
                                   to Dest address in DRAM
                                   advance record_ptr by 8 + Length
                                   read next header

The Entry Point Jump

After the loop exits on the zero-length terminator, the bootcopier retrieves the entry point from the terminator's Dest field and jumps to it:

DRAM after copy:

 0x00000000  ┌─────────────────────────────┐
             │  HAL startup / init code    │  ← first byte copied
             │  crt0, BSS clear, etc.      │
             │                             │
 0x000004D4  │  ► entry point ◄            │  ← jr jumps here
             │  main() or _start           │
             │                             │
             │  rest of application...     │
             │                             │
 0x00016263  └─────────────────────────────┘  ← last byte copied

The entry point is not necessarily the first byte of the copied image. The HAL places startup and initialisation code at the lowest address, and the linker places the application entry symbol (_start or main) at a higher offset. The elf2flash tool reads the ELF e_entry field — the official entry point declared by the linker — and stores it in the terminator record's Dest field, ensuring the bootcopier always jumps to exactly the right address regardless of where it falls within the image.

HyperRAM

What Is HyperRAM?

HyperRAM is a low-pin-count DRAM device that uses the HyperBus interface, originally developed by Cypress Semiconductor (now Infineon). It is designed for embedded systems that need significantly more RAM than on-chip memory can provide, but where a full DDR SDRAM controller would be too complex, too power-hungry, or too large in PCB footprint.

Synaptic Labs

Synaptic Labs developed the xSPI Memory Controller used in this reference design. Click on this link to learn more about their solution.

Interface Characteristics

HyperRAM connects to the FPGA using a very small number of signals compared to DDR:

Signal Count Purpose
CK / CK# 2 Differential clock
CS# 1 Chip select
RWDS 1 Read/write data strobe
DQ[7:0] 8 Bidirectional data bus
Total 12 vs ~40+ for DDR3

The bus is source-synchronous and double-data-rate on the data lines, giving an effective bandwidth of 2 x clock frequency x 8 bits. At 200 MHz this yields 400 MB/s — sufficient for a Nios V/g application processor with instruction and data caches providing the burst access pattern the interface needs.

Nios V application execution is the primary use case in this system. HyperRAM provides the working memory into which the bootcopier copies the user application binary. With instruction and data caches on the Nios V/g, the bursty nature of cache-line fills maps well onto HyperRAM's pipelined access model, hiding the relatively high initial latency behind the cache hierarchy.

Virtues for use on AXC3000 Development Kit

Minimal pin count. 12 signals versus 40+ for DDR3 is a dramatic reduction. On an FPGA where I/O bank resources are shared with other peripherals, saving 30 pins is significant. It also simplifies PCB layout considerably — HyperRAM requires no complex impedance-matched differential pairs across long traces.

Simple controller. The HyperBus protocol is straightforward enough that the controller IP in Platform Designer is compact and consumes far less FPGA fabric than a DDR memory controller. This matters in applications where logic resources are at a premium.

No training or calibration. DDR interfaces require complex initialisation sequences, write levelling, read training, and ongoing calibration. HyperRAM requires none of this — it is ready to use after a simple initialisation sequence. This greatly simplifies bring-up and eliminates a common source of board-level debugging pain.

Single supply voltage. HyperRAM typically operates at 1.8 V, the same voltage as many FPGA I/O banks, avoiding the need for an additional power rail or level shifters.

Integrated DRAM and controller in one package. Unlike DDR which requires a separate external controller, HyperRAM packages the memory array and the interface logic together. The FPGA only needs to implement the HyperBus master side.

Small PCB footprint. HyperRAM is available in compact BGA packages as small as 5x5 mm, making it suitable for space-constrained designs where a DDR SODIMM or TSOP package would not fit.

For a Nios V/g processor running a typical embedded application — which is exactly the use case in this system — HyperRAM hits a practical sweet spot: enough capacity for a full application binary plus working data, low enough pin count and controller complexity to be practical in an FPGA design, and fast enough with caching to deliver adequate application performance.

Configure the Board for the Demo

The following components are required for the demo:

  • AXC3000 (TEI0131) development board,
  • USB C Cable


Assemble the Hardware

  • Plug the USB Cable into J9, the USB C connector.

Flash the FPGA Configuration File

Open the Quartus Programmer

    $ Tools --> Programmer

Add JTAG Hardware. If the Hardware Setup has 'No hardware'


    $ Press the Hardware Setup button
    $ Double click USB Blaster III. Press Close


Flash the QSPI device

Select the A3CY100BM16A device in the topology diagram


    $ Edit --> Change File. Select axc3000_top.jic
    $ Click the 'Program/Configure' check box.
    $ Processing --> Start


This will take up to a minute to complete.

Run the Demo

Open a terminal

     $ Open Tera-Term or equivalent terminal program
     $ set its baud rate to 115200
     $ Press the S1 button to configure the FPGA


Initially, the code displays 16 different colors on the RGB LED. Once it is complete, it reads the System ID peripheral and displays it. It is then held in a while loop forever. Press S2 to invoke the Interrupt Service Routine (ISR) to change the LED color.


Build Instructions

Release Contents
Prerequisites
Build the Reference Design

Release Contents

Latest Source Code Release Contents - Branches and Commit IDs

Component Location Branch Tag/Commit ID
GHRD https://github.com/ArrowElectronics/refdes-agilex3 master QPDS26.1_REL_AGILEX3_REFDES/d8216b8ce7b41e912fec688bc8152cdec09ceebb

Prerequisites

  • Host machine running Windows or Linux.
  • Internet connection to download the tools and clone the repositories from github. If you are behind a firewall you will need your system administrator to enable you to get to the git trees.
  • Quartus Prime Pro version 26.1 (with NIOS-V license)
  • Ashling RiscFree IDE for Intel FPGAs 26.1, if modifying software
  • Synaptic Labs xSPI IP License

Build the Reference Design

Open a niosv shell.

For Windows: Start --> Altera 26.1.xx Pro Edition --> Nios V Command Shell.

For Linux or WSL: Open a shell and then enter niosv-shell at the prompt.

Set up the Environment

    $ sudo rm -rf agilex_3
    $ mkdir agilex_3
    $ cd agilex_3
    $ export TOP_FOLDER=`pwd`

Clone the repository

    $ git clone -b QPDS26.1_REL_AGILEX3_REFDES https://github.com/ArrowElectronics/refdes-agilex3 refdes-agilex3
    $ cd refdes-agilex3/axc3000/niosv_qspi_hyperram_refdes

Build the Software

Create the Boot Copier Application

Create the Boot Copier BSP

    niosv-bsp \
      ./software/bootcopier_bsp/settings.bsp \
      --create \
      --qsys=./top_system.qsys \
      --quartus-project=./axc3000_top.qpf \
      --type=hal \
      --cpu-instance=niosv_system_0_niosv_g \
      --script=./software/bootcopier_bsp/settings.tcl

Create the project

    niosv-app \
      -b=software/bootcopier_bsp \
      -a=software/bootcopier_app \
      -s=software/bootcopier_app/mailbox_bootloader.c

Create the project Makefile

    cmake \
      -G "Unix Makefiles" \
      -DCMAKE_BUILD_TYPE=Release \
      -B software/bootcopier_app/build/release \
      -S software/bootcopier_app

Build the project

    cmake \
      --build software/bootcopier_app/build/release

Create the OCM payload

    elf2hex software/bootcopier_app/build/release/bootcopier_app.elf \
      -b 0x1000000 \
      -w 32 \
      -e 0x100ffff \
      -o software/bootcopier_app/build/release/niosv_system_0_bootcopier_ocm.hex

Create the User Application

Create the User BSP

    niosv-bsp \
      ./software/user_bsp/settings.bsp \
      --create \
      --qsys=./top_system.qsys \
      --quartus-project=./axc3000_top.qpf \
      --type=hal \
      --cpu-instance=niosv_system_0_niosv_g \
      --script=./software/user_bsp/settings.tcl

Create the project

    niosv-app \
      -b=software/user_bsp \
      -a=software/user_app \
      -s=software/user_app/user_application.c

Create the project Makefile

    cmake \
      -G "Unix Makefiles" \
      -DCMAKE_BUILD_TYPE=Release \
      -B software/user_app/build/release \
      -S software/user_app

Build the project

    make \
      -C software/user_app/build/release

Convert the binary to an S record format and add a wrapper to the payload defining the record structure

    elf2flash \
      --input software/user_app/build/release/user_app.elf \
      --output=software/user_app/build/release/user_app.srec \
      --epcs \
      --offset 0x200000

Convert to a hex file format for deployment to Flash

    riscv32-unknown-elf-objcopy \
      -I srec \
      -O ihex software/user_app/build/release/user_app.srec software/user_app/build/release/user_app.hex

Build the Hardware

Add pin assignments

    quartus_sh -t sources/axc3000_top.tcl

Compile the Quartus project

    quartus_sh --flow compile axc3000_top

The following file is created:

  • agilex_3/refdes-agilex3/axc3000/niosv_qspi_hyperram_refdes/output_files/axc3000_top.sof

Create the QSPI Flash Image

    quartus_pfg -c axc3000.pfg 

The following file is created:

  • agilex_3/refdes-agilex3/axc3000/niosv_qspi_hyperram_refdes/output_files/axc3000_top.jic


Return - Table of Contents

Clone this wiki locally