Skip to content
Julian Kemmerer edited this page Mar 18, 2024 · 225 revisions

PipelineC

# Install 'pipelinec' executable
git clone https://github.com/JulianKemmerer/PipelineC.git
cd PipelineC/
export PATH=$PATH:$(pwd)/src
# Run blink example
pipelinec ./examples/blink.c

Example Output: (final product is VHDL/Verilog files)

██████╗ ██╗██████╗ ███████╗██╗     ██╗███╗   ██╗███████╗ ██████╗
██╔══██╗██║██╔══██╗██╔════╝██║     ██║████╗  ██║██╔════╝██╔════╝
██████╔╝██║██████╔╝█████╗  ██║     ██║██╔██╗ ██║█████╗  ██║     
██╔═══╝ ██║██╔═══╝ ██╔══╝  ██║     ██║██║╚██╗██║██╔══╝  ██║     
██║     ██║██║     ███████╗███████╗██║██║ ╚████║███████╗╚██████╗
╚═╝     ╚═╝╚═╝     ╚══════╝╚══════╝╚═╝╚═╝  ╚═══╝╚══════╝ ╚═════╝

Output directory: /pipelinec_output_blink.c_304105
================== Parsing C Code to Logical Hierarchy ================================
Parsing: /PipelineC/examples/blink.c
Preprocessing file...
Parsing C syntax...
Parsing non-function definitions...
Parsing derived fsm logic functions...
Doing old-style code generation based on PipelineC supported text patterns...
Parsing function logic...
Parsing function: blink
Elaborating pipeline hierarchies down to raw HDL logic...
Doing obvious logic trimming/collapsing...
Writing generated PipelineC code from elaboration to output directories...
Writing cache of parsed information to file...
================== Writing Resulting Logic to File ================================
Building map of combinatorial logic...
Writing log of integer math module instances: /pipelinec_output_blink.c_304105/integer_module_instances.log
Writing VHDL files for all functions (as combinatorial logic)...
Writing func blink ...
Writing func BIN_OP_EQ_uint25_t_uint25_t ...
Writing func MUX_uint1_t_uint1_t_uint1_t ...
Writing func MUX_uint1_t_uint25_t_uint25_t ...
Writing func UNARY_OP_NOT_uint1_t ...
Writing func BIN_OP_PLUS_uint25_t_uint1_t ...
Writing multi main top level files...
Writing the constant struct+enum definitions as defined from C code...
Writing clock cross definitions as parsed from C code...
Writing finalized comb. logic synthesis tool files...
Output VHDL files: /pipelinec_output_blink.c_304105/vhdl_files.txt

What is PipelineC in < 10 mins? Overview Presentation: w/ slides and paper!:

pcoverview

Tools:

Currently Supported Tools (Linux only):
Synthesis: 
  Xilinx Vivado, 
  Intel Quartus, 
  Lattice Diamond, 
  GHDL+Yosys+nextpnr,
  Gowin EDA, 
  Efinix Efinity, 
  PyRTL Models
Simulation: 
  Modelsim, 
  Verilator,
  cocotb,
  CXXRTL, 
  EDAPlayground

Quick Start

Pure functions can be pipelined!

Quickly render basic un-pipelined combinatorial logic VHDL:

pipelinec ./examples/pipeline.c --comb

To produced a pipeline of user selected N clocks cycles (N+1 total stages) first edit the example pipeline.c.

In code specify your target FPGA PART and have tools installed. (or install free PyRTL Python package for experimental ASIC timing models and provide no PART pragma.)

Ex. #pragma PART "LFE5UM5G-85F-8BG756C" for ghdl+yosys+nextpnr ECP5U flow.

Then run this command:

pipelinec ./examples/pipeline.c --coarse --sweep --start N --stop N

To produce a pipeline that meets timing at operating frequency F:

Open and edit pipeline.c to also specify the target frequency:

Ex. #pragma MAIN_MHZ my_pipeline F says the my_pipeline function is a single top level MAIN function intended to run at FMHz.

Since my_pipeline is a pure function the PipelineC tool will autopipeline the function to meet the target operating frequency.

pipelinec ./examples/pipeline.c # Default no-arguments autopipelines when possible.

Simulate using EDAPlayground right now!

pipelinec ./examples/edaplay.c --comb --sim --edaplay
# Drag output files into EDAPlayground window

Try simulating using Verilator (or CXXRTL)!

pipelinec ./examples/verilator/blink.c --comb --sim --verilator
# Template 10 cycle simulation will compile+run

Try simulating using cocotb+GHDL!

pipelinec ./examples/cocotb/blink.c --comb --sim --cocotb --ghdl
# Template 10 cycle simulation will compile+run

Everyone wants to blink LEDs right? (or "more quick examples")

Check out this very old intro video about blinking leds: blinkvid

// Example foo() blinking LEDs with counter state registers
#include "uintN_t.h"  // uintN_t types for any N
// 'Executes' every 5ns (200MHz)
#pragma MAIN_MHZ blink 200.0
uint1_t blink()
{
  // Count to 200000000 iterations * 5ns each = 1sec
  static uint28_t counter;
  // LED on off state
  static uint1_t led;

  // If reached 1 second
  if(counter==(200000000-1))
  {
    // Toggle led
    led = !led;
    // Reset counter
    counter = 0;
  }
  else
  {
    counter += 1; // one 5ns increment
  }
  return led;
}
// Example generated state machine for blinking four leds
uint4_t main()
{
  // a 28 bits unsigned integer register
  uint28_t counter = 0;
  while (1) {
    // LEDs updated every clock
    // with the 4 most significant bits
    uint4_t led = counter(27, 24);
    counter = counter + 1;
    __out(led); // Makes loop body take 1 clock
  }
}
// Example bar() which is a pure function
// that can be pipelined to a specified fmax
float bar(float a, float b)
{
  float c = a/b;
  float d = b*a;
  ...
  return z;
}
// Pseudo code of a multi cycle feed forward dataflow pipeline
// Composed of
//     pure,variable compile time latency,autopipelined pipelines
//     static,stateful func calls of single clock cycles
multi_cycle_pipeline(i)
{
   a = pipeline1(i)
   b = a_single_clk_w_stateful_static(a)
   c = pipeline2(b)
   ...
   return c
}

Overview

DIAGRAM

Functions = combinatorial logic to be pipelined (a single C function describes an N>=0 clock pipeline). Pure functions, ex. bar() above, or my_func below, can be pipelined to 'arbitrary' N>0 clock cycle pipelines. If a function is marked with a MAIN pragma then its inputs and return value are used for top level input and output ports.

autopipe

Static local variables = registers. Use a register, ex. foo() above, or my_counter below, and N=0. The function now describes a "stateful function" of combinatorial logic and registers, think processes in HDL. Generally speaking, isolate and minimize your use of static local variables for higher operating frequencies. Volatile static local variables = registers attached to regular N>0 clock pipeline logic.

staticvars

'Invocation is instantiation' is the default behavior of function calls. Each function call location is a new instance of the function's module.

Three iteration loop instantiating module 3 times

Global variables work similar to static local variables but also can be used as a mechanism for moving data between functions. Multiple locations can read the global variable but there can only be one instance of a function that writes to the global variable. If a global variable is to be used in multiple clock domains then special clock crossing functionality is required. Functions that maintain state are called stateful functions and cannot be autopipelined.

Arrays in code have a specific meaning/hardware implementation.

Complex 'clock-by-clock' derived state machines can be written using functions like the __clk() clock step operator in FSM style code.

In ~regular not FSM style code, the user does not see a clock signal as it is implied by specified or inferred clock domains. Clock enable and reset signals are made like any other input signal and have unexpected behavior when combined with autopipelining - but work fine with HDL-like stateful functions that are not autopipelined.

C isn't a great hardware description language in itself. Some functionality is auto generated for you to bridge the gap between C and traditional HDLs, providing very basic 'template' type like functionality for common needs (ex. bit manipulation).

PipelineC can replace VHDL/Verilog almost entirely. However, if the need arises there are hooks for writing arbitrary VHDL instead of PipelineC code.

Here is the short list of rules for writing PipelineC. It's regular C except...

  1. 'Invocation is instantiation'. Each time a functions appears in code is a new instance of the function/module (and any static local variables the function uses - not like regular C). Not able to reenter the same function twice in a single iteration. No recursion.
  2. No dynamic memory. No pointers.
  3. If it can execute in parallel, it will be implemented as parallel hardware. Can explicitly code sequentially as needed.
  4. Only loops that can be unrolled are able to be pipelined by the tool. (otherwise is likely a finite state machine)
  5. One return statement for a function to be pipelined (or none). Not pipelined state machines can have multiple return statements.
  6. No real standard libraries. (some headers are faked, ideally gcc compatible for 'simulation')
  7. Reserved keywords are a union of both C and VHDL keywords for now...

The PipelineC tool is pure python other than calls to the synthesis+simulation tools. See how to setup and run the tool.

PipelineC is synthesized into hardware so we can't avoid talking hardware for long.

Hardware modules have input ports. Input ports are function arguments. Function return statements are the single output port. Do multiple outputs as a struct.

Each function describes comb. logic (possibly to be auto-pipelined) that exists in a single clock domain. Functions marked with pragmas like MAIN_MHZ are single instance top level design modules (i.e. where you can have board connections). The clock domain for most functions is inferred from use within frequency-specified MAIN functions.

The comb. logic body of a PipelineC function is synthesized to a hardware pipeline. That is, a sequential series of combinatorial stages of logic separated by registers.

// Simple example of math pipeline
float main(float x1, float x2, float y1, float y2)
{
   float x_sum;
   x_sum = x1 + x2;
   float y_sum;
   y_sum = y1 + y2;
   return x_sum + y_sum;
}

The above example instantiates 3 floating point adders. Two in parallel, and a third for the return. You can think of the PipelineC main function as executing over and over again in a loop, each time getting a new set of inputs. The body of PipelineC functions are data flow graphs.

Examples: Please see the examples table of contents page.

PipelineC can generate a hardware pipeline for almost any operating frequency by increasing the depth/latency of the pipeline. All functions (ex. including floating point operations) are broken down into subpipelines thus allowing for fine grained control of synthesis results.

Any function that follows these rules should work - only the most basic C stuff works for now: (as you go down the list this becomes a 'known issues' section rather than permanent rules, also see issues)

  1. Only u/intN_t types, float, char, enums, structs, and arrays are supported. (I mean any N - ex. uint13_t is OK).
  2. No dynamic memory allocation. *Fundamentally non existent feature - implies a memory model. Unless you want to implement POSIX malloc together :)
  3. Autopipelining only loops that can be unrolled (need to manually code up unbounded/variable runtime finite state machine behavior for now). Loop iterators must be declared before the loop.
  4. No case statements, only IFs. . *Will implement in time
  5. Autopipelining requires only a single return statement (or none) per function.
  6. No increment or decrement (ex. ++) operators. *Use +=1
  7. Only casting to+from float<->integers is supported. *Can implement others if needed?
  8. Modulo(%) only for floating point *Will implement integer similar to integer division in time
  9. Unions are not yet supported. *Will implement in time
  10. Avoid putting a bunch of stuff on a single line of code. The C parser is a little loose with its accuracy on column numbers when identifying code locations. *Think I have a working fix but dont push your luck

Additional hardware description specific functionality is auto generated for you.