Skip to content

Xyene/t258-cpu

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
cpu
 
 
 
 
 
 
 
 
 
 
 
 
 
 

T258

This project contains an implementation of a CPU in Verilog targetting a Cyclone II FPGA (though it works with few modifications on a Cyclone V); an assembler for the architecture; and a non-optimizing compiler for a small subset of C. It supports VGA output through a dedicated memory region.

a simple animation running on the T258

The name comes from being, in part, developed as a final project for the CSC258 course offered by the University of Toronto.

Design

I should preface all this by saying I'm not a hardware guy. This was my first time working with Verilog in any nontrivial capacity. A number of concessions were made in the process just to simplify the circuitry. Notably, the smallest unit of data is 16 bits, as opposed to 8 bits as on most (all?) modern architectures.

Overall, the CPU models a Harvard architecture. That is, the ROM bus is separate from the RAM bus, and instructions cannot be executed from RAM. Address and data buses are 16 bits wide.

12-bit color 160x120 VGA output is supported through writes to VRAM, positioned from $0000 to $4B00.

The stack is positioned at $4C00, and grows upwards.

Assembly

The assembly syntax is loosely based off of that of the Zilog Z80.

Instructions are all 16-bits. The Opcode column below lists the lower 8 bits for each instruction. If an instruction involves a single register, the register number will be written to bits 15-11. If it involves two, bits 13-11 will represent the first register, and 10-8 the second. There are 7 registers available.

An assembler is provided (assembler.py), which generates a rom.v file that must be built alongside the rest of the project.

Opcode Table
Opcode Mnemonic Operation
81 INC r0 r0++
82 DEC r0 r0--
83 ADD r0, r1 r0 += r1
85 SUB r0, r1 r0 -= r1
87 MUL r0, r1 r0 *= r1
88 OR r0, r1 r0 |= r1
89 AND r0, r1 r0 &= r1
8A XOR r0, r1 r0 ^= r1
8B CMP r0, r1 cmp(r0, r1)
8C PUSH r0 sp++; ram[0x9000 + sp] = r0
8D PUSH @const sp++; ram[0x9000 + sp] = @const
8E POP r0 r0 = ram[0x9000 + sp]; sp--;
8F POP @const r0 = ram[0x9000 + sp]; sp--;
90 JEQ @const pc = Z ? @const : pc
91 JLE @const pc = N ? pc : @const
92 JMP @const pc = @const
93 LD r0, r1 r0 = r1
94 LD r0, (r1) r0 = ram[r1]
95 LD (r0), r1 ram[r0] = r1
96 LD r0, @const r0 = @const

Transpiler

A transpiler from a subset of C ("TudorC") is implemented in transpiler.py. Some demo programs that can be compiled using it can be found in the demo directory, demonstrating the syntax supported.

To run it, you'll need to first pip install lark-parser. The grammar used can be found in tudorc.g

Limitations & Known Issues

  • The compiler is a bottom-up, non-optimizing process. It was written in 2 hours, and the assembly it generates is far from optimal.
  • For assembly, an MIF could have (and should have) been used instead of generating hardcoded opcode arrays.
  • The finite state automaton is too slow, and cannot be clocked at 50MHz. A rate divider is introduced to scale down to 5MHz, but this causes clock delivery issues during synthesis. As a result, a number of opcode pairs cannot be executed in sequence without throwing the CPU into an undefined state. This also means that often, the code generated by the transpiler won't actually run correctly.

About

A simple RISC CPU implemented in Verilog, as well as compilation toolchain for it.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published