Skip to content

Example: VGA Graphics

JulianKemmerer edited this page May 30, 2022 · 79 revisions

pmodarty

bouncingimages

pong

man

This page describes using an Arty and a VGA PMOD to do basic VGA graphics.

This example is from a series of examples designed for the Arty Board. See that page for instructions on using the Arty board with PipelineC generated files.

Setup

Digilent provides reference files: Here is the .xdc file describing the PMOD ports for the VGA adapter. Connecting internal VGA signal to the external VGA PMOD is handled in vga_pmod.c.

Copying Digilent's VHDL example, a basic VGA test pattern example was the first step confirmed working. VGA timing parameters, front porch, back porch, etc for a fixed resolution can be found in vga_timing.h. The code for the VGA test pattern, using timing logic, and included PMOD port, etc can be seen in test_pattern_modular.c.

Bouncing Images

Based off of Digilent's test pattern which includes a bouncing box, the small black box was replaced with a colorful PipelineC logo and several of them were made to bounce around the screen. See bouncing_images.c. The file starts by including the board/pmod/vga things via #include "vga_pmod.c".

The logic for drawing and moving a rectangle filled with an image is also included #include "image_rect.h". Inside that file is another include #include "pipelinec_color.h" of a RAM initialization text file generated by make_image_files.py helper script.

Using the helper functions from image_rect.h the main function snippet shown below shows the per clock iteration of moving rectangles and getting the pixel color.

// Set design to run at pixel clock
MAIN_MHZ(app, PIXEL_CLK_MHZ)
void app()
{
  // VGA timing for fixed resolution
  vga_signals_t vga_signals = vga_timing();
  
  // N image rectangles all moving in parallel
  // Initial state values
  rect_t start_states[NUM_IMAGES];
  RECT_INIT(start_states) // Constants macro
  
  // Rectangle moving animation func/module outputs current state
  rect_t rects[NUM_IMAGES];
  uint32_t i;
  for(i=0;i<NUM_IMAGES;i+=1)
  {
    // Logic to make a rectangle move
    rects[i] = rect_move(start_states[i]);
  }
  
  // Color pixel at x,y
  color_12b_t color = get_pixel_color(vga_signals.active, vga_signals.pos, rects);
  
  // Drive output signals/registers
  ...
}

The rect_move() function contains static local variables that maintain a single rectangle's position+color state updated with each call from app(). The output from that function, image rectangle states rects, is then passed to get_pixel_color(). Inside get_pixel_color() a few things occur:

  // Func from from pipelinec_color.h
  uint32_t pixel_addr = pipelinec_color_pixel_addr(rel_pos);

  // In pipelineable luts, too slow for pycparser (and probabaly rest of PipelineC too)
  //color_12b_t pipelinec_color[pipelinec_color_W*pipelinec_color_H];
  // return pipelinec_color[pixel_addr]; 
  
  // As synthesis tool inferred (LUT)RAM
  pipelinec_color_DECL // Macro from pipelinec_color.h
  color_12b_t unused_write_data;
  // (LUT)RAM template function
  color_12b_t c = pipelinec_color_RAM_SP_RF_0(pixel_addr, unused_write_data, 0);

A function from the generated image file header is used to get the RAM address holding the single pixel of color data. Another piece of generated code in the form of a macro is used to initialize the RAM variable named pipelinec_color. Then using the pixel address a special PipelineC ROM-inferring function is invoked to retrieve the pixel color values (single/same cycle LUT RAM for simplicity).

Finally, RGB color values are swapped by using a color_mode state variable per stored rectangle. That value is incremented as the images bounce around and collide with the walls.

Pong

Similar in spirit to the above bouncing images example, Pong has 3 rectangles, two paddles and one 'ball'. The ball bounces off walls and user paddles. The paddles move from user input button presses on the Arty board.

pong.c starts off with inclusion of PMOD/VGA related things via #include "vga_pmod.c", and rectangle helper functions from rect.h.

Additionally buttons.c is included for access to the button state. Reading the state of the buttons and assigning them to user inputs looks like this:

// User input buttons
typedef struct user_input_t
{
  uint1_t paddle_r_up;
  uint1_t paddle_r_down;
  uint1_t paddle_l_up;
  uint1_t paddle_l_down;
}user_input_t;
user_input_t get_user_input()
{
  // Read buttons wire/board IO port
  uint4_t btns;
  WIRE_READ(uint4_t, btns, buttons)
  user_input_t i;
  // Select which buttons are up and down
  i.paddle_r_up = btns >> 0;
  i.paddle_r_down = btns >> 1;
  i.paddle_l_up = btns >> 2;
  i.paddle_l_down = btns >> 3;
  return i;
}

Several tiny functions are declared for collision detection, an example one: (not-Pong specific helpers inside rect.h are used too).

// Ball hit top of frame?
uint1_t ball_hit_roof(rect_animated_t ball)
{
  return (ball.vel_y_dir==UP) & (ball.rect.pos.y == 0);
}

And some helper functions for moving the paddles from user input, an example one:

// How to move paddle from user input, with screen limits
vga_pos_t move_paddle(vga_pos_t pos, uint1_t paddle_up, uint1_t paddle_down)
{
  if(paddle_up & !paddle_down)
  {
    if(pos.y >= BTN_POS_INC)
    {
      pos.y -= BTN_POS_INC;
    }
  }
  else if(paddle_down & !paddle_up)
  {
    if((pos.y + BTN_POS_INC) <= (FRAME_HEIGHT-PADDLE_HEIGHT))
    {
      pos.y += BTN_POS_INC;
    }
  }
  return pos;
}

Which then leads to the functionality described in the top level app() main function:

// State of objects in the game
typedef struct game_state_t
{
  rect_animated_t ball;
  rect_animated_t lpaddle;
  rect_animated_t rpaddle;
}game_state_t;

// Set design to run at pixel clock
MAIN_MHZ(app, PIXEL_CLK_MHZ)
void app()
{
  // VGA timing for fixed resolution
  vga_signals_t vga_signals = vga_timing();
  
  // Reset register
  static uint1_t reset = 1; // Start in reset
  // State registers
  static game_state_t state;
  // Per clock game logic:
  // Render the pixel at x,y pos given state
  pixel_t color = render_pixel(vga_signals.pos, state);
  // Do animation state update, not every clock, but on every frame
  if(vga_signals.end_of_frame)
  {
    // Read input controls from user
    user_input_t user_input = get_user_input();
    //printf("user input: %d\n", (int) user_input.paddle_r_up);

    state = next_state_func(reset, state, user_input);
    reset = 0; // Out of reset after first frame
  }  
  
  // Drive output signals/registers
  vga_pmod_register_outputs(vga_signals, color);
}

On each clock cycle the current game_state_t object (ball position+velocity, paddle position, etc) is passed to render_pixel() to determine the pixel color at a given VGA (x,y) position. Upon completing each frame, if(vga_signals.end_of_frame), the game is animated by a standard next state = f(current state) function via state = next_state_func(reset, state, user_input);.

render_pixel() is a simple function checking if the pixel position is colored for background, the ball, or one of the user paddles - it uses the rect_contains() helper function, for ex.

  if(rect_contains(state.ball.rect, pos))
  {
    c.r = BALL_RED;
    c.g = BALL_GREEN;
    c.b = BALL_BLUE;
  }

next_state_func() is where the majority of the Pong game logic resides. A snippet of such game logic for example:

  // Ball passing goal lines?
  if(ball_in_l_goal(state.ball))
  {
    if(rects_collide(state.ball.rect, state.lpaddle.rect))
    {
      // Bounce off left paddle
      next_state.ball.rect.pos = state.ball.rect.pos;
      next_state.ball.vel_x_dir = RIGHT;
      next_state.ball.vel = ball_paddle_inc_vel(state.ball.vel);
    }
    else
    {
      // Left scored on by right
      reset = 1; // Start over
      // TODO keep+display score
    }
  }

There absolutely exists optimizations that can be made for better resource utilization - but I did not feel the need as its already quite small: resources

Pretty device picture - look at that little chunk of logic, aw :) device

Mandelbrot Viewer

The above examples were quite simple in terms of the computation required to produce the image on screen. In fact, neither above example uses PipelineC's autopipelining capability since all operations can be completed in a single cycle (no pipelining required).

Computing the Mandelbrot set image on the other hand requires use of complex fractional/floating point numbers and requires many multiply and addition operations. The extent of the computation can be scaled to as many loop iterations as you desire (wiki pseudo code):

while (x2 + y2 ≤ 4 and iteration < max_iteration) do
    y := 2 × x × y + y0
    x := x2 - y2 + x0
    x2 := x × x
    y2 := y × y
    iteration := iteration + 1
return iteration

To pipeline a loop it must be a fixed number of iterations (for unrolling). mandelbrot.c is written using a for loop instead:

uint1_t not_found_n = 1;
for(i=0;i<MAX_ITER;i+=1)
{
  // Mimic while loop
  if(not_found_n) 
  {
    if((z_squared.re+z_squared.im) <= (ESCAPE*ESCAPE))
    {
      z.im = ((z.re*z.im)<< 1) + c.im;
      z.re = z_squared.re - z_squared.im + c.re;
      z_squared.re = z.re * z.re;
      z_squared.im = z.im * z.im;
      n += 1;
    }
    else
    {
      not_found_n = 0;
    }
  }
}
return n;

Note the 'body of the loop' uses the optimized Mandelbrot escape time calculation and makes use use of the built in floating point shift << operation for power of two operations that do not consume an entire multiplier/divider of resources. Both the ~fractalness MAX_ITER parameter and floating point mantissa size (~screen detail) can be scaled to as many resources as your FPGA allows.

In this demo MAX_ITER=14 iterations is used. C language FP32 float has an 8b exponent and 23b mantissa. An alias for this type in PipelineC is float_8_23_t. In this demo a reduced width floating point format is used:

#define float float_8_11_t // 8b exponent, 11b mantissa

There is a state_t struct holding state maintained from frame to frame. In this case, the bounds of the real and imaginary window:

typedef struct state_t
{
  // Plot window
  float re_start;
  float re_width;
  float im_start;
  float im_height;
}state_t;

Rendering a pixel involves using this state to compute the coordinate in the complex plane to run Mandelbrot iterations on.

// Convert pixel coordinate to complex number
complex_t c = {state.re_start + ((float)pos.x * (1.0f/(float)FRAME_WIDTH)) * state.re_width,
              state.im_start + ((float)pos.y * (1.0f/(float)FRAME_HEIGHT)) * state.im_height};
// Compute the number of iterations
uint32_t m = mandelbrot(c);
// The color depends on the number of iterations
uint8_t color = 255 - (int32_t)((float)m *(255.0/(float)MAX_ITER));

Similar to how Pong used buttons to control the state of paddle positions, this demo uses buttons and switches to move the state complex plane window bounds of the displayed image. However, unlike Pong, the state update computation requires pipelining as the floating point add and multiply operations cannot complete in a single pixel clock cycle. The state_t registers below are declared volatile so pipelining can still occur in this non-pure function that maintains state:

// Logic to update the state in a multiple cycle volatile feedback pipeline
inline state_t do_state_update(uint1_t reset, uint1_t end_of_frame)
{
  // Volatile state registers
  volatile static state_t state;
  
  // Use 'slow' end of frame pulse as 'now' valid flag occuring 
  // every N cycles > pipeline depth/latency
  uint1_t update_now = end_of_frame | reset;

  // Update state
  if(reset)
  {
    // Reset condition?
    state = reset_values();
  }
  else if(end_of_frame)
  {
    // Normal next state update
    state = next_state_func(reset, state);
  }  
  
  // Buffer/save state as it periodically is updated/output from above
  state_t curr_state = curr_state_buffer(state, update_now);
  
  // Overwrite potententially invalid volatile 'state' circulating in feedback
  // replacing it with always valid buffered curr state. 
  // This way state will be known good when the next frame occurs
  state = curr_state;
         
  return curr_state;
}

The above code uses a volatile static local variable called state. If the above code omitted the volatile keyword while keeping the static state then this function (which includes calling next_state_func(), floating point mults+adds, etc) would not be pipelined. The entire function logic would be squeezed into one long clock cycle - far too long to meet the pixel clock requirement. Instead volatile allows pipelining to occur and the user is responsible for maintaining non-volatile state via the curr_state_buffer() function.

Results

SDL C Prototype

Prior examples were small enough to need minimal debug if any at all. However, the scope of design iterations and debug needed for more complicated graphics demos is substantial. This work would not have been possible without the help of Victor Suarez Rovere @suarezvictor. Beginning many months ago he was invaluable in working to expand the verification/simulation capabilities of PipelineC and explore reusable hardware architectures focused on graphics - we found many bugs together. His main.cpp PipelineC-as-C OR Verilator code structure is the core of the simulation environment for this work. Thanks Victor!

With some preprocessor use it is possible to compile the PipelineC per pixel mandelbrot.c app() function as regular C code. The resulting pixels are drawn to the screen using the Simple DirectMedia Layer library. This same main.cpp is used for running Verilator based simulations as well. See the top of that .cpp file for how to build and run.

Full 32b floating point software compile of the PipelineC code: ccodesim

Generated VHDL

Verilator

It is possible to setup PipelineC and use --sim --comb --verilator arguments to prepare a Verilator simulation (VHDL->GHDL->Yosys->Verilog->Verilator flow). Similar to above, the SDL library is used in main.cpp to display pixels. See the #define USE_VERILATOR preprocessor directive to switch between compiling PipelineC as C vs running the C++ based Verilator simulation. Build instructions are at the top of the file.

Reduced 11b mantissa floating point format Verilator based simulation of the PipelineC code: ver

Autopipelining

This design is complex enough to require autopipelining from the PipelineC tool to meet timing at the 148.5 MHz target operating frequency. A summary of design autopipelining:

* render_pixel() : 252 stages
  * float_8_11_t adders : 6 stages each
  * float_8_11_t multipliers : 5 stages each
  * mandelbrot() : 222 stages
    * float_8_11_t adders : 6 stages each
    * float_8_11_t multipliers : 5 stages each
* do_state_update() : 15 stages
  * next_state_func() : 13 stages:
    * float_8_11_t adders : 6 stages each
    * float_8_11_t multipliers : 4,5 stages each

Vivado

Instantiation of the PipelineC entity inside the dev board top level VHDL file board.vhd:

-- The PipelineC generated entity
top_inst : entity work.top port map (   
    -- Main function clocks
    clk_148p5 => vga_pixel_clk,
    
    -- Switches
    switches_module_sw => unsigned(sw),
    
    -- Buttons
    buttons_module_btn => unsigned(btn),

    -- PMODB
    pmod_jb_return_output.jb0(0) => jb(0),
    pmod_jb_return_output.jb1(0) => jb(1),
    pmod_jb_return_output.jb2(0) => jb(2),
    pmod_jb_return_output.jb3(0) => jb(3),
    pmod_jb_return_output.jb4(0) => jb(4),
    pmod_jb_return_output.jb5(0) => jb(5),
    pmod_jb_return_output.jb6(0) => jb(6),
    pmod_jb_return_output.jb7(0) => jb(7),
    -- PMODC
    pmod_jc_return_output.jc0(0) => jc(0),
    pmod_jc_return_output.jc1(0) => jc(1),
    pmod_jc_return_output.jc2(0) => jc(2),
    pmod_jc_return_output.jc3(0) => jc(3),
    pmod_jc_return_output.jc4(0) => jc(4),
    pmod_jc_return_output.jc5(0) => jc(5),
    pmod_jc_return_output.jc6(0) => jc(6),
    pmod_jc_return_output.jc7(0) => jc(7)  
);

Resource utilization: resources device Autopipelined design critical path meets timing: timereport

Demo

Check out the demo!

Next Steps

Want to add color? More iterations? Math optimizations? There are so many options for improvement. Reach out to say hello! I want to help :)