M16C5x Soft-Core Microcomputer
Copyright (C) 2013-2017, Michael A. Morris firstname.lastname@example.org. All Rights Reserved.
Released under LGPL.
This project demonstrates the use of a PIC16C5x-compatible core as an FPGA- based processor. It implements the 12-bit instruction set, the timer 0 module, the pre-scaler, and the watchdog timer. The core provided here is compatible with the instruction set, but it is not a cycle accurate model of any particular PIC microcomputer. As configured, the core supports single (1) cycle operation with internal block RAM serving as program memory.
Three I/O ports are supported, but they are accessed as external registers and buffers using a bidirectional data bus. The TRIS I/O control registers are similarly supported. Thus, the core's user is able to map the TRIS and I/O port registers in a manner appropriate to the intended application.
Read-modify-write operations on the I/O ports do not generate read strobes. Read strobes of the three I/O ports are generated only if the ports are being read using MOVF xxx,0 instructions. Similarly, the write enables for the three I/O ports are asserted whenever the ports are updated. This occurs during MOVWF instructions, or during read-modify-write operations such as XORF, MOVF, etc.
(Clarification: RMW instructions do not generate read strobes in order to prevent the read portion of the operation from advancing a FIFO or generating a clear flag strobe in a status register. Instead, the data from the addressed I/O port is read "quietly", and only during the write portion of the RMW instruction is a strobe generated to write the result back to the addressed I/O port. Thus, RMW operations such as XORF can be used with status and FIFOs without corrupting their state.)
The implementation of the core provided consists of several Verilog source files and memory initialization files:
M16C5Xv2.v - Top level module P16C5x.v - PIC16C5x-compatible processor core P16C5x_IDEC.v - ROM-based instruction decoder for PIC16C5x core P16C5x_ALU.v - Arithmetic & Logic Unit for PIC16C5x core M16C5x_SPI.v - High-Speed, FIFO-buffered SPI Master Interface DPSFmnCE.v - Configurable Depth/Width LUT-based Synch FIFO TF_Init.coe - Transmit FIFO Initialization file RF_Init.coe - Receive FIFO Initialization file SPIxIF.v - Configurable Master SPI I/F with clock Generator M16C5x_UARTv2.v - UART with Serial Interface SSPx_Slv.v - SSP-compatible Slave Interface SSP_UART.v - SSP-compatible UART re1ce.v - Rising Edge Clock Domain Crossing Synchronizer DPSFmnCE.v - onfigurable Depth/Width LUT-based Synch FIFO UART_TF.coe - UART Transmit FIFO Initialization file UART_RF.coe - UART Receive FIFO Initialization file UART_BRG.v - UART Baud Rate Generator UART_TXSM.v - UART Transmit State Machine (includes SR) UART_RXSM.v - UART Receive State Machine (includes SR) UART_RTO.v - UART Receive Timeout Generator UART_INT.v - UART Interrupt Generator M16C5x_Test.coe - M16C5x Test Program Memory Initialization File M16C5x_Tst2.coe - M16C5x Test #2 Program Memory Initialization File M16C5x_Tst3.coe - M16C5x Test #3 Program Memory Initialization File M16C5x_Tst4.coe - M16C5x Test #4 Program Memory Initialization File M16C5x.ucf - M16C5x User Constraint File M16C5x.bmm - M16C5x Block RAM Memory Map File
Verilog tesbench files are included for the processor core, the FIFO, and the SPI modules.
tb_M16C5Xv2.v - testbench for the soft-core processor module tb_P16C5x.v - testbench for the processor core module tb_DPSFmnCE.v - testbench for the LUT-based FIFO module tb_SPIxIF.v - testbench for the SPI Master Interface module
Also provided is the MPLAB project and the source files used to create the memory initialization files for testing the microcomputer application. These files are found in the MPLAB subdirectory of the Code directory. An example CCS C project is also included that performs the same functions as the assembly language version test programs included in the MPLAB subdirectory.
Finally, the configuration of the Xilinx tools used to synthesize, map, place, and route are captured in the the TCL file:
M16C5x_3S50A.tcl - TCL file for XC3S50A-4VQG100I FPGA
Run this TCL script from within the TCL console of ISE, or examine it in a text editor, to set up the project files and to set the tools to the options used to achieve the results provided here.
Added utility program to convert MPLAB/CCS Intel Hex programming files into MEM files for use with Xilinx Data2MEM utility program to speed the process of incorporating program/data/parameter data into block RAMs. TCL also incorporates the process parameter changes to get the BMM file processed by Map/PAR/Bitgen.
IH2MEM.c - Source code for Intel Hex to MEM utility IH2MEM.exe - Windows Executable (32-bit) M16C5x_Tst3.mem - M16C5x Test #3 Program Memory Data2Mem File M16C5x_Tst4.mem - M16C5x Test #4 Program Memory Data2Mem File
The primary objective of the M16C5Xv2 project is to synthesize a processor core, 4kW of program memory, a buffered SPI master, and two (2) buffered UART into a Xilinx XC3S200A-5VQG100I FPGA. The present implementation includes the P16C5x core, 4kW of program memory, a dual-channel SPI Master I/F, and two SSP-compatible UARTs supporting baud rates from 3M bps to less than 1200 bps using a clock frequency of 48.0000 MHz for the core and UART. (Note: synthesis and PAR allow an input frequency of 60 MHz for a -4 Spartan-3A FPGA.)
Using ISE 10.1i SP3, the implementation results for an XC3S200A-5VQ100I are as follows:
Number of Slice FFs: 772 of 3584 21% Number of 4-input LUTs: 1671 of 3584 46% Number of Occupied Slices: 907 of 1792 50% Total Number of 4-input LUTs: 1722 of 3584 48% Logic: 1325 Route-Through: 51 16x1 RAMs: 8 Dual-Port RAMs: 306 32x1 RAMs: 32 Number of BUFGMUXs: 3 of 24 12% Number of RAMB16BWEs 3 of 3 100% Best Case Achievable: 14.902 ns (0.099 ns Setup, 0.535 ns Hold)
Design and verification is complete. Verification performed using ISim, MPLAB, and a board with an XC3S200AN-4VQG100I FPGA, various oscillators, SEEPROMs, and RS-232/RS-485 transceivers.
In this release, the M16C5x has been synthesized, mapped, placed, routed, and used to configure an FPGA. The FPGA used for this initial test of the M16C5x was the XC3S200A-4VQG100I FPGA. The test program provided demonstrated that the M16C5x was executing the program in the same manner as simulated with the MPLAB simulator.
Using an external 14.7456 MHz oscillator, selected for use for use with the UART, square waves were generated by the core to illuminate external LEDs using the upper 6 bits of PortA. The square waves have the appropriate ratios, and the frequency of the fastest LED drive signal is ~4.753kHz.
The clock generator multiplies the input frequency to 58.9824 MHz which results in an effective instruction frequency of 29.4912 MHz because of the two cycle nature of the core. The instruction loop is essentially 8*(+3256), which equals 6208 cycles per LED toggle. The measured toggle frequency of the fastest LED is approximately equal to 29.4912 MHz / 6208, or 4.750 kHz.
Work will continue to verify the testbench results with the FPGA. The next release should include the UART, and test the ability of the core to send/receive data using the FIFOs at rates of 115,200 baud or greater.
In this release, the UART has been added. An update has been made to the SPI I/F Master function; update corrected fault with the framing of SPI Mode 3 frames with shift lengths greater than 1 byte. A correction, not fully tested or verified, was made to the P16C5x core to correct anomalous behavior for BTFSC/BTFSS instructions.
UART integrated with the Release 1.0 core. Verification of the integrated interface is underway.
Testing with an M16C5x core processor program assembled using MPLAB and ISIM showed that polling of the UART status register to determine whether the transmit FIFO was empty or not (using the iTFE interrupt flag) would clear the generated interrupt flags before they had actually been captured and shifted in the SSP response to the core.
This indicated a clock domain crossing issue in the interrupt clearing logic. This release fixes that issue. Previous use of the UART does not poll the USR, so this problem does not manisfest itself in a reasonable amount of time, if ever. In other words, the synchronization fault has been present all along in the implementation, but the module's usage in the application (or testbench) did not present the conditions under which the fault manifests.
The correction required registering the USR data on the SSP clock domain, and qualifying the clearing of the interrupt flags on the basis of whether the flag is set in both domains when the USR is read. The addition of the register reduced the logic utilization, and only a small additional time delay was incurred. The resulting design is still able to fit into a Spartan 3A XC3S50A- 4VQG100I FPGA.
Modified the UART Baud Rate Generator. Removed the fixed 16x12 ROM that provided the pre-scaler and divider constants for a fixed set of 16 baud rates. Added a 12-bit, write-only register, BRR - Baud Rate Register, that can be used to set the baud rate from 1/16 of the processor clock. With a 58.9824 MHz oscillator, the baud rate can range from 3.6864Mbps down to 900 bps. Set the default baud rate to 9600 for a 58.9824 MHz UART clock.
Utilization for a XC3S50A-4VQG100I FPGA is 100%. The 128 byte LUT-based receive FIFO can be reduced to accommodate some additional functions. Synthesis and MAP/PAR able to implement the design. There is also some place holder logic that can be used for other purposes.
Updated the soft-core so as to be able to parameterize the microcontroller from the top module. Changed the frequency multiplication from 4 to 5 in order to test operation at the frequency which the UCF constrains Map/PAR tools. The input clock is driven by a 14.7456 MHz oscillator, and the clock multiplier (DCM) generates 73.7280 MHz. The default baud rate, 9600, required that the default settings be adjusted. All other parameters remain the same.
Also added a Block RAM Memory Map file to the project. Utilized Xilinx's Data2MEM tool to insert modified program contents into the affected Block RAMs using MEM files derived from standard MPLAB outputs. Tutorial on this subject is being prepared and will be released on an associated Wiki soon.
Updated the soft-core microcomputer. Fixed the UART clock, Clk_UART, to twice the input frequency. This means that the UART operates with a fixed reference frequency unlike Release 2.2 where Clk_UART was set to the system clock frequency.
Also added asynchronous resets to several registers in the UART so that it would simulate correctly with ISim. Direct control of the UART prescaler and divider was previously untested using the simulation. With that change to the baud rate generator made to UART, the reset/power-on values of these two logic functions are unknown. The unknowns, "X", propagate through the baud rate generator and prevent the simulator from resolving the state of the internal baud rate clock of the UART. Thus, although the rest of circuits simulate as expected, the transmit shift register never shifts because there's an "unknown" signal level applied on the bit clock.
Polling the UART's Receive Data Register (RDR) uncovered a race condition like that previously found and corrected in regards to polling the UART Status Register (USR). Correction required registering the RDR in the SCK clock domain, and qualifying the read enable pulse for the receive FIFO so that it is only generated if the Receive Rdy flag is present in the SCK clock domain. Otherwise, the Receive FIFO is not read which prevents the inadvertent clearing of the FIFO empty flag.
Test Program 4, M16C5x_Tst4.asm, is used to test the receive signal path. Hyper-Terminal and Tera Term were used to sent (without local echo) several large text files through the M16C5x UART. The test program polls the RDR, and if a character is received without error, then upper case are converted to lower case characters, and vice-versa. Using a Keyspan Quad Port USB serial port adapter, characters were sent to the M16C5x at a rate of 921.6k baud, the highest programmable baud rate supported by the Keyspan device. The echo back to terminal emulator appeared to be without error. (Note: the two wire RS-232 mode of the UART was used for this test. The ADM3232 charge-pump RS-232 transceiver appeared to work well at this frequency. Some slew rate limiting is visible on an O-scope, but it appears to be tolerable. These tests were conducted while the core was operating at 117.9648 MHz.)
Converted the core to operate in a single cycle mode with the block RAM memories of the FPGA. Operating frequency, in a -4 Spartan 3A FPGA, is 60+ MHz. This rate is equivalent to the 117.9848 MHz reported above of for Release 2.4. Some combinatorial path improvements were made to the processor core, P16C5x, by using wired-OR bus connections rather than explicit multiplexers. These improvements also provided some reductions in the resource utilization of the project.
Modified the BMM file to allow the MEM file data fields to be represented in natural order. In other words, unlike the previous release, the most significant nibble is the first (leftmost) character of each data word, and the least significant nibble is the last (rightmost) character in a data word. Also modified the utility provided that converts Intel Hex programming files into files compatible with the Xilinx Data2MEM utility program.
Cleaned up comments, added missing source HDL files for processor top and UART top for the dual UART configuration, and added file and testbench for Single Port Synchronous LIFO module.