-
Notifications
You must be signed in to change notification settings - Fork 0
Nios Vg Boot Copier Reference Design AXC3000 26.1
Quick Start Guide
Build Instructions
This Quick Start Guide describes the architecture of this reference design. It also describes the hardware and software setup needed to run it.
- Overview
- Theory of Operation
- QSPI Flash Layout
- HyperRAM
- Install the Quartus Prime Pro Programmer
- Configure the Board for the Demo
- Flash the FPGA configuration file.
- Run the Demo
The Nios V/g processor is a soft-core RISC-V processor implemented across many Altera FPGA device families. Unlike a Hard Processor System (HPS), it has no dedicated boot ROM or autonomous boot hardware. The Agilex 3 family of devices uses a Secure Device Manager (SDM) to configure the fabric. When the SDM releases the processor from reset, it begins executing from wherever its reset vector points. This makes the boot strategy entirely the responsibility of the system designer.
A common and effective approach for larger embedded designs is the bootcopier pattern: a small, self-contained application pre-initialised into On-Chip Memory (OCM) as part of the FPGA bitstream, which runs immediately at reset and copies the main user application from external QSPI flash into fast external HyperRAM before jumping to it. This pattern is relevant for several practical reasons. OCM is limited in size and cannot hold a large application. HyperRAM offers far greater capacity for a full-featured embedded application, and QSPI flash provides non-volatile persistent storage that survives power cycles.
The bootcopier bridges all three: it lives in OCM, reads from flash via the SDM mailbox interface (the only path to QSPI available to the Nios V on Agilex 3), copies the application to HyperRAM, and hands off execution cleanly. Understanding how the bootcopier works — including the flash record format, the toolchain steps that produce it, and the cache coherency requirements when the Nios V/g instruction and data caches are enabled — is essential for any engineer deploying a larger Nios V-based application on Agilex 3.
The reference design includes
- Nios® V/g Microcontroller
- Onchip RAM (64KB)
- SLL xSPI (HyperRAM) memory controller
- System ID peripheral
- Mailbox Client
- Push-Button
- RGB LED
- Light Weight UART

When power is applied to the Agilex 3 device, the SDM (Secure Device Manager) takes control of the boot process. The SDM is a dedicated hardened processor inside the device whose sole responsibility at this stage is to read the FPGA configuration bitstream from QSPI flash and program the FPGA fabric. Embedded within that bitstream is the pre-initialised OCM containing the bootcopier binary — so by the time the FPGA fabric is fully configured, the bootcopier code is already resident in OCM and ready to execute. The SDM then releases the Nios V/g processor from reset, at which point the processor begins executing from its reset vector, which points to the base of OCM.
The bootcopier runs entirely from OCM and has one job — to locate the user application image in QSPI flash, copy it to HyperRAM, and transfer execution to it. It communicates with the SDM via a mailbox interface to gain access to the QSPI flash, reads a structured image from a known offset in flash, and copies the application binary directly into HyperRAM. Once the copy is complete, the bootcopier performs the necessary cache maintenance operations to ensure the freshly written data is coherent with the instruction fetch path, then jumps to the application entry point in HyperRAM. From that point forward, the user application is in full control of the processor and the system.
When the bootcopier copies the user application binary from QSPI flash into HyperRAM, the writes go through the Nios V/g data cache. Because the data cache operates in write-back mode, dirty cache lines — lines that have been written but not yet propagated to physical HyperRAM — may still be sitting in the cache when the copy loop completes. Physical HyperRAM may therefore not yet contain the correct application code. At the same time, the instruction cache has no knowledge of what the data cache has written and will fetch instructions directly from HyperRAM when the processor jumps to the application entry point. If the data cache has not been flushed before that jump, the instruction cache fetches stale or uninitialised memory, and the application fails to run correctly.
Three operations must be performed in order immediately before the jump:
First, alt_dcache_flush_all() is called to write every dirty data cache line back to physical HyperRAM. After this call, HyperRAM contains the complete and correct application binary.
Second, alt_icache_flush_all() is called to invalidate the entire instruction cache. This forces the processor to fetch fresh instructions from HyperRAM rather than from any speculatively cached lines that may have been loaded before the copy was complete.
Third, a fence.i instruction is executed. This is the RISC-V architectural instruction-fetch fence — it guarantees that all data writes preceding it are globally visible and that all instruction fetches following it will see those writes. It also flushes the processor pipeline, discarding any prefetched instructions that may have been loaded from stale memory addresses.
Only after these three steps in this order can the jr jump to the application entry point be guaranteed to execute the correct code.

The diagram shows the three regions of the QSPI flash as seen after the quartus_pfg utility has generated and programmed the JIC file. The SDM region at the bottom of flash is always present and owned entirely by the SDM — the application designer does not place content there. The user application payload sits at 0x200000, fixed by the PAYLOAD_OFFSET constant in the bootcopier and the s_addr in the .pfg file. The FPGA SOF bitstream sits above it at 0x300000, grows upward toward higher addresses, and contains the FPGA fabric configuration with the OCM pre-initialised bootcopier embedded within it. The gap between 0x200000 and 0x300000 is a deliberate 1 MB separation ensuring the User Application has enough space to grow and not overwrite the contents of the FPGA SOF bitstream.
The user application is not stored raw in QSPI flash. It is wrapped in a simple record-based format by the elf2flash tool before being placed in flash. This format allows the bootcopier to locate, copy, and jump to the application without needing any knowledge of the ELF file format or linker internals — it only needs to understand the simple record structure.
Each record consists of an 8-byte header followed by the data payload:
┌───────────────────────────────────────┐
│ Word 0 (+0x00) : Length (4 bytes) │
│ Word 1 (+0x04) : Dest (4 bytes) │
├───────────────────────────────────────┤
│ Data payload (Length bytes) │
└───────────────────────────────────────┘
Length is the number of bytes in the data payload. A length of zero is the terminator — it signals the end of the image and carries no data.
Dest is the destination address in DRAM where the bootcopier writes the payload. For the terminator record, the Dest field is repurposed as the application entry point — the address the bootcopier jumps to after copying is complete.
Two special length values are handled by the bootcopier as error conditions — a length of 0xFFFFFFFF indicates erased/unprogrammed flash, causing the bootcopier to halt rather than attempt a copy of uninitialised data.
A typical single-section application produces two records in flash:
QSPI Flash @ PAYLOAD_OFFSET (0x200000):
┌────────────────────────────────────────────────┐
│ +0x00 Length = N (e.g. 90,724 bytes) │ ← Record 1 header
│ +0x04 Dest = 0x00000000 (DRAM base) │
├────────────────────────────────────────────────┤
│ +0x08 [N bytes of application binary] │ ← Record 1 data
│ .text, .rodata, .data sections │
├────────────────────────────────────────────────┤
│ +0x08+N Length = 0x00000000 (terminator) │ ← Record 2 header
│ +0x0C+N Dest = 0x000004D4 (entry point) │
└────────────────────────────────────────────────┘
The bootcopier maintains a pointer starting at PAYLOAD_OFFSET and walks through the records sequentially:
record_ptr = PAYLOAD_OFFSET (0x200000)
│
▼
Read 8-byte header
│
├── Length == 0xFFFFFFFF? ──► error() — flash not programmed
│
├── Length == 0? ──────────► Dest = entry point — break out of loop
│
└── Length > 0? ──────────► Copy Length bytes from flash
to Dest address in DRAM
advance record_ptr by 8 + Length
read next header
After the loop exits on the zero-length terminator, the bootcopier retrieves the entry point from the terminator's Dest field and jumps to it:
DRAM after copy:
0x00000000 ┌─────────────────────────────┐
│ HAL startup / init code │ ← first byte copied
│ crt0, BSS clear, etc. │
│ │
0x000004D4 │ ► entry point ◄ │ ← jr jumps here
│ main() or _start │
│ │
│ rest of application... │
│ │
0x00016263 └─────────────────────────────┘ ← last byte copied
The entry point is not necessarily the first byte of the copied image. The HAL places startup and initialisation code at the lowest address, and the linker places the application entry symbol (_start or main) at a higher offset. The elf2flash tool reads the ELF e_entry field — the official entry point declared by the linker — and stores it in the terminator record's Dest field, ensuring the bootcopier always jumps to exactly the right address regardless of where it falls within the image.
HyperRAM is a low-pin-count DRAM device that uses the HyperBus interface, originally developed by Cypress Semiconductor (now Infineon). It is designed for embedded systems that need significantly more RAM than on-chip memory can provide, but where a full DDR SDRAM controller would be too complex, too power-hungry, or too large in PCB footprint.
Synaptic Labs developed the xSPI Memory Controller used in this reference design. Click on this link to learn more about their solution.
HyperRAM connects to the FPGA using a very small number of signals compared to DDR:
| Signal | Count | Purpose |
|---|---|---|
| CK / CK# | 2 | Differential clock |
| CS# | 1 | Chip select |
| RWDS | 1 | Read/write data strobe |
| DQ[7:0] | 8 | Bidirectional data bus |
| Total | 12 | vs ~40+ for DDR3 |
The bus is source-synchronous and double-data-rate on the data lines, giving an effective bandwidth of 2 x clock frequency x 8 bits. At 200 MHz this yields 400 MB/s — sufficient for a Nios V/g application processor with instruction and data caches providing the burst access pattern the interface needs.
Nios V application execution is the primary use case in this system. HyperRAM provides the working memory into which the bootcopier copies the user application binary. With instruction and data caches on the Nios V/g, the bursty nature of cache-line fills maps well onto HyperRAM's pipelined access model, hiding the relatively high initial latency behind the cache hierarchy.
Minimal pin count. 12 signals versus 40+ for DDR3 is a dramatic reduction. On an FPGA where I/O bank resources are shared with other peripherals, saving 30 pins is significant. It also simplifies PCB layout considerably — HyperRAM requires no complex impedance-matched differential pairs across long traces.
Simple controller. The HyperBus protocol is straightforward enough that the controller IP in Platform Designer is compact and consumes far less FPGA fabric than a DDR memory controller. This matters in applications where logic resources are at a premium.
No training or calibration. DDR interfaces require complex initialisation sequences, write levelling, read training, and ongoing calibration. HyperRAM requires none of this — it is ready to use after a simple initialisation sequence. This greatly simplifies bring-up and eliminates a common source of board-level debugging pain.
Single supply voltage. HyperRAM typically operates at 1.8 V, the same voltage as many FPGA I/O banks, avoiding the need for an additional power rail or level shifters.
Integrated DRAM and controller in one package. Unlike DDR which requires a separate external controller, HyperRAM packages the memory array and the interface logic together. The FPGA only needs to implement the HyperBus master side.
Small PCB footprint. HyperRAM is available in compact BGA packages as small as 5x5 mm, making it suitable for space-constrained designs where a DDR SODIMM or TSOP package would not fit.
For a Nios V/g processor running a typical embedded application — which is exactly the use case in this system — HyperRAM hits a practical sweet spot: enough capacity for a full application binary plus working data, low enough pin count and controller complexity to be practical in an FPGA design, and fast enough with caching to deliver adequate application performance.
The following components are required for the demo:
- AXC3000 (TEI0131) development board,
- USB C Cable

- Plug the USB Cable into J9, the USB C connector.
- Download the FPGA jic image axc3000_top.jic
Open the Quartus Programmer
$ Tools --> Programmer
Add JTAG Hardware. If the Hardware Setup has 'No hardware'

$ Press the Hardware Setup button
$ Double click USB Blaster III. Press Close

Select the A3CY100BM16A device in the topology diagram

$ Edit --> Change File. Select axc3000_top.jic
$ Click the 'Program/Configure' check box.
$ Processing --> Start

This will take up to a minute to complete.
$ Open Tera-Term or equivalent terminal program
$ set its baud rate to 115200
$ Press the S1 button to configure the FPGA

Initially, the code displays 16 different colors on the RGB LED. Once it is complete, it reads the System ID peripheral and displays it. It is then held in a while loop forever. Press S2 to invoke the Interrupt Service Routine (ISR) to change the LED color.

Release Contents
Prerequisites
Build the Reference Design
| Component | Location | Branch | Tag/Commit ID |
|---|---|---|---|
| GHRD | https://github.com/ArrowElectronics/refdes-agilex3 | master | QPDS26.1_REL_AGILEX3_REFDES/d8216b8ce7b41e912fec688bc8152cdec09ceebb |
- Host machine running Windows or Linux.
- Internet connection to download the tools and clone the repositories from github. If you are behind a firewall you will need your system administrator to enable you to get to the git trees.
- Quartus Prime Pro version 26.1 (with NIOS-V license)
- Ashling RiscFree IDE for Intel FPGAs 26.1, if modifying software
- Synaptic Labs xSPI IP License
For Windows: Start --> Altera 26.1.xx Pro Edition --> Nios V Command Shell.
For Linux or WSL: Open a shell and then enter niosv-shell at the prompt.
$ sudo rm -rf agilex_3
$ mkdir agilex_3
$ cd agilex_3
$ export TOP_FOLDER=`pwd`
$ git clone -b QPDS26.1_REL_AGILEX3_REFDES https://github.com/ArrowElectronics/refdes-agilex3 refdes-agilex3
$ cd refdes-agilex3/axc3000/niosv_qspi_hyperram_refdes
Create the Boot Copier BSP
niosv-bsp \
./software/bootcopier_bsp/settings.bsp \
--create \
--qsys=./top_system.qsys \
--quartus-project=./axc3000_top.qpf \
--type=hal \
--cpu-instance=niosv_system_0_niosv_g \
--script=./software/bootcopier_bsp/settings.tcl
Create the project
niosv-app \
-b=software/bootcopier_bsp \
-a=software/bootcopier_app \
-s=software/bootcopier_app/mailbox_bootloader.c
Create the project Makefile
cmake \
-G "Unix Makefiles" \
-DCMAKE_BUILD_TYPE=Release \
-B software/bootcopier_app/build/release \
-S software/bootcopier_app
Build the project
cmake \
--build software/bootcopier_app/build/release
Create the OCM payload
elf2hex software/bootcopier_app/build/release/bootcopier_app.elf \
-b 0x1000000 \
-w 32 \
-e 0x100ffff \
-o software/bootcopier_app/build/release/niosv_system_0_bootcopier_ocm.hex
Create the User BSP
niosv-bsp \
./software/user_bsp/settings.bsp \
--create \
--qsys=./top_system.qsys \
--quartus-project=./axc3000_top.qpf \
--type=hal \
--cpu-instance=niosv_system_0_niosv_g \
--script=./software/user_bsp/settings.tcl
Create the project
niosv-app \
-b=software/user_bsp \
-a=software/user_app \
-s=software/user_app/user_application.c
Create the project Makefile
cmake \
-G "Unix Makefiles" \
-DCMAKE_BUILD_TYPE=Release \
-B software/user_app/build/release \
-S software/user_app
Build the project
make \
-C software/user_app/build/release
Convert the binary to an S record format and add a wrapper to the payload defining the record structure
elf2flash \
--input software/user_app/build/release/user_app.elf \
--output=software/user_app/build/release/user_app.srec \
--epcs \
--offset 0x200000
Convert to a hex file format for deployment to Flash
riscv32-unknown-elf-objcopy \
-I srec \
-O ihex software/user_app/build/release/user_app.srec software/user_app/build/release/user_app.hex
Add pin assignments
quartus_sh -t sources/axc3000_top.tcl
Compile the Quartus project
quartus_sh --flow compile axc3000_top
The following file is created:
- agilex_3/refdes-agilex3/axc3000/niosv_qspi_hyperram_refdes/output_files/axc3000_top.sof
quartus_pfg -c axc3000.pfg
The following file is created:
- agilex_3/refdes-agilex3/axc3000/niosv_qspi_hyperram_refdes/output_files/axc3000_top.jic