Skip to content

JCAP Log #7: Video Part 2

cspang1 edited this page Nov 15, 2017 · 1 revision

VGA Waveform

VGA Video Generation

The first video mode to be developed is VGA, due to the wide availability and accessibility of VGA monitors for testing and verification. The question has arisen as to why I don't just re-purpose or reuse existing video drivers, and the answer is two-fold:

  • I want to be able to tailor the graphics driver from scratch as directly as possible for retro arcade games, and
  • It's a good learning experience in the fields of graphics development and assembly programming
Some more detailed information on VGA timing is necessary to produce the signals programatically.

VGA Timings

Probably the most vanilla and straight-forward video mode to generate is the standard 640x480 @ 60 Hz. Down the road, more esoteric resolutions such as the classic 224x288 can be extrapolated from this "baseline":

VGA Signal Timing
Resolution 640 x 480
Horizontal Frequency 31.46875 kHz
Vertical Frequency 60 Hz
Pixel Clock Frequency 25.175 MHz

The pixel clock frequency tells us exactly the rate at which the bits have to be banged out of the microcontroller. Once this pixel rate has been established, the timings for the active video, sync pulses, and all the porches in-between can be generated. But how?

The Propeller Video Generator

Why is it necessary?

Let's assume an unpleasant scenario where we have to manually toggle the various VGA outputs, pin-by-pin. Toggle for Hsync, toggle for VSync, toggle to change between all 64 possible 6-bit RGB color combinations (right now we're working with an 8-bit VGA signal from the Propeller, meaning 6 bits for color in the form of RRGGBB and 2 sync signals for HSync and VSync giving RRGGBBHV)... doesn't sound appealing. As a matter of fact, for most practical cases it's not even possible! Consider the absolute maximum rate at which you can even toggle a pin using the mov instruction: 1 second/(CLK_FREQ/CLK_PER_INSTR/2). For an 80 MHz clock, and at 4 clock cycles per instruction, this would mean the smallest period of a pulse would be 100 ns. That's 10 MHz at BEST; not even half the bandwidth we need to generate our 25.175 MHz VGA signal.

What does it do?

The Video Generator remedies this shortfall by using a phase-locked loop (PLL) in combination with a counter module to output data at a rate greater than the base clock speed limited by instruction time (toggles are done in hardware, without the constraints of n clock-cycle instruction execution time). The base clock provides the input for the Counter A module, which is configured to use a PLL in combination with a specific frequency to increment (and therefore cause the LSB to toggle each time, creating a clock signal) at a rate between 500 kHz and 128 MHz. This new clock signal is then used by the Video Generator module to generate video data.

How does it do it?

Both Counter A (and by extension its PLL) and the Video Generator have control registers which they use to define their behavior. There are several fields within these registers, however most are extremely straightforward and can be understood easily from the Propeller documentation. Only the more complex aspects of setting up the counter and generator are described here.

Counter A and the PLL

The first thing to configure is the PLL mode of Counter A. The PLL mode works by adding a user-defined frequency in the frqa register to the current value in a 32-bit accumulation register, phsa (such that phsa = phsa + frqa). The MSB of this register is used as the input for the PLL itself, so the rate at which the PLL input is toggled (aka the rate that the MSB of phsa is toggled) is dependent upon the size of the value in frqa (higher values will result in a faster-toggling PLL input, and vice-versa), and the phsa register rolls back to zero on overflow.


Counter A PLL Mode Block Diagram

Counter A PLL Mode Block Diagram

For example, with a base clock rate of 80 MHz and the value 2147483648 (10000000000000000000000000000000) loaded into frqa, the input to the PLL will be a 40 MHz clock signal (each positive edge of the 80 MHz clock will add the frqa value to the phsa register, resulting in the phsa value switching between 2147483648 and rolling over to 0, making the MSB of the register toggle between 1 and 0). This illustrates banally how the frqa and phsa registers work together to create an input for the PLL, but this example PLL input clock rate of 40 MHz isn't actually usable by the PLL. In fact, the PLL will only work with an input clock rate between 4 MHz and 8 MHz due to PLL hardware constraints. But that's ok, because the PLL does the rest of the magic to produce our high clock rates.

To complete the modulation of the clock signal, the PLL itself takes this input and does two things to create the final output: multiplies it by 16, and then multiplies it by one of the following divisions:

PLL Input Divisions
rate ÷ 128 rate ÷ 64 rate ÷ 32 rate ÷ 16 rate ÷ 8 rate ÷ 4 rate ÷ 2 rate ÷ 1

We can see how the min and max output ranges of the PLL are calculated from this schema:

PLL Output Range
Min 4 MHz (minimum PLL input clock rate) * 16 * (1/128) = 500 KHz
Max 8 MHz (maximum PLL input clock rate) * 16 * (1/1) = 128 MHz

The multiplication by 16 acts as the main clock rate "amplifier", and the various divisions give the user finer control over the final PLL output. For our 640x480 @ 60 Hz VGA signal, the two most important numbers needed to this system are the input frequency and the target output frequency:

Key Frequencies
Base Input Clock Frequency 80 MHz
Target Output Pixel Clock Frequency 25.175 MHz

At this point we have all the information we need to configure the Counter A module to convert our 80 MHz base clock signal into our 25.175 MHz pixel clock signal to be used by the video generator. We simply need to determine what value to put into frqa and what divider to use for the PLL. Based on the min and max PLL inputs and possible dividers, choosing the right divider is a simple matter of calculating the min and max possible PLL outputs with each and finding the one our target PLL output falls into:

Ranges of PLL Outputs per Divider
1 ÷ 128 1 ÷ 64 1 ÷ 32 1 ÷ 16 1 ÷ 8 1 ÷ 4 1 ÷ 2 1 ÷ 1
8 MHz (maximum PLL input) 1 MHz 2 MHz 4 MHz 8 MHz 16 MHz 32 MHz 64 MHz 128 MHz
4 MHz (minimum PLL input) 500 KHz 1 MHz 2 MHz 4 MHz 8 MHz 16 MHz 32 MHz 64 MHz

Our target clock rate of 25.175 MHz falls within the 1÷4 divider range, so we will use that as our divider. Finally, simple arithmetic is used to calculate the value to put into frqa:

  • PLLOUT = PLLIN * 16 * PLLDIV
  • PLLIN = (FRQA * CLKFRQ)/(2^32) [remember: the MSB, or the 32nd bit, of PHSA is the one that is ingested by the PLL]
  • ∴ PLLOUT = 16 * PLLDIV * [(FRQA * CLKFRQ)/(2^32)]

Solving for frqa, we get:

  • FRQA = (2^28 * PLLOUT)/(PLLDIV * CLKFREQ)

And if:

  • PLLOUT = 25,175,000 (25.175 MHz)
  • PLLDIV = 1 ÷ 4 (referencing from table)
  • CLKFREQ = 80,000,000 (80 MHz)

Then:

  • FRQA = (2^28 * 25,175,000)/[(1/4) * 80,000,000]
  • FRQA = 337,893,130

Therefore, by starting the Counter A module in PLL mode via the ctra control register, and setting the frqa register to 337,893,130, our Counter A module will run with an output of 25.175 MHz for use by the Video Generator. It may seem like an excessive number of indirect steps to get from A to B here, but it's a powerful paradigm that allows for precision.

The Video Generator

With the Counter A module properly set-up, configuration of the Video Generator can be performed. Two registers are necessary for producing a VGA signal with the generator: vcfg (the video generator control register) and vscl (the video scale register). The control register serves the same purpose for the generator as ctra does for the counter; setting it up in the correct mode for our uses. In the case of the generator control register, this means:

  • setting the mode to VGA,
  • specifying the color depth (1 bit per pixel for 2-color mode or 2 bits per pixel for 4-color mode), and
  • specifying which GPIO pins to output the 8-bit signal on.

It also has fields for TV video signal attributes, which are "don't cares" for VGA mode.

The vscl register is used to set the pixel and frame output rates for the generator. To understand what this means, you have to understand the final critical element of the Video Generator: the waitvid instruction. This instruction is how we actually send data to the generator to be outputted to our screen. When called, it waits for the video generator to finish printing the pixels from the previous waitvid and then sends its pixels. It takes two arguments:

  • a long containing the address of a 32-bit color palette (4 8-bit colors for 4-color mode, or the lower two bytes contain 2 8-bit colors for 2-color mode)
  • a long containing the address of a 32-bit pixel pattern (16 2-bit pixels, or 32 1-bit pixels)


The waitvid Instruction

The waitvid Instruction (Source = Pixel Pattern, Destination = Color Palette)

Each 1 or 2-bit pixel in the pixel pattern indexes the color palette, with 0 being the lowest byte. For example, given a color palette containing red_green_blue_black, a 2-bit pixel of 01 in the pixel pattern would reference blue in 4-color mode. In 2-color mode, the red and green color bytes wouldn't be accessible (1-bit pixels can only be 1 or 0; blue or black in our example). The waitvid instruction sends the color palette and pixel pattern to the Video Generator which takes each 1 or 2-bit pixel in the pixel pattern starting with the LSB, determines which color it references in the color palette, and outputs that color on the GPIO pins. This means you could change what colors are printed to the screen by changing the color palette, or change which pixels are printed to the screen by changing the pixel pattern. You can use a color palette for sync signals as well in order to generate the off-screen areas of video. So how does the vscl register fit in?

vscl tells the Video Generator what rate to print the pixels at, and how many to print. But wait, didn't we just set the Counter A module up to GIVE the generator the pixel clock to us? And didn't we just learn that the waitvid instruction prints out specifically either 16 or 32 pixels depending on the color mode? Absolutely! But being able to change the video scale register on the fly allows for some neat instruction-saving tricks to simplify code by giving us very fine control over the Video Generator's output.

Assuming 4-color mode (16 2-bit pixels in the color pattern), the default behavior of waitvid can be implemented by simply setting vscl for 1 clock per pixel and 16 clocks per frame. This would result in the Video Generator printing 16 pixels, with each pixel outputting at a rate of 25.175 MHz. And this is in fact exactly how you would want to print the visible parts of an image at a 640x480 resolution @ 60 Hz: a 1:1 ratio of pixels to the pixel clock as God intended, each visible line requiring 640/16=40 waitvid instructions in 4-color mode(or 640/32=20 waitvid instructions for 2-color mode). So why would you bother changing it at all?

How about video data we need to print which we know won't change over several "frames". The following table contains 640x480 @ 60 Hz horizontal sync and porch info:

640x480 @ 60 Hz Horizontal Sync/Porch Data
Duration (Pixels)
Front porch 16
HSync 96
Back Porch 48

Assuming 4-color mode and therefore 16 pixels per waitvid, and vscl set for 1 clock per pixel and 16 clocks per frame, printing the entire horizontal non-visible screen area would require (16+96+48)/16=10 waitvid instructions. Not the end of the world, but why do in 10 what you can do in 2: one vscl change and one waitvid call? If we know that all 16 front porch pixels are the same, all HSync pixels are the same, and all back porch pixels are the same, we can simply stretch the display of those pixels out by changing the scale. If you look at those durations above, you might notice they all have a common divisor of 16. Dividing them by such gives us a front porch, HSync, and back porch duration of 1, 6, and 3 respectively (for a total of 10). So: if we change vscl so that the number of clocks per frame is 160, and clocks per pixel is 16, then the next pixel in the pixel pattern will be printed every 16 clocks, and 160/16=10 pixels will be printed. And if our pixel pattern contains 1+6+3=10 porch/sync pixels, and we use a color palette containing the porch/sync "colors", we can print:

  • 1 16-pixel-long front porch frame
  • 6 16-pixel-long HSync frames
  • 3 16-pixel-long back porch frames

Doing this only even takes up 10 pixels of the 16 one could fit in the pixel pattern, and we can display it using a single waitvid instruction. Usually, for 2 and 4-color modes, the clocks per frame is either 32 or 16 times clocks per pixel when displaying active video. Making the ratio smaller results in only a certain number of pixels in the pixel pattern being displayed (such as the 10 in our horizontal sync example with 10 times the number of frame clocks as pixel clocks), and making this ratio larger causes the most significant bit(s) of the pattern to be repeated until the frame size has been reached (do with that information what you will). The act of changing vscl to change how many pixels are displayed or how long they are displayed will be used extensively when implementing game graphics and upscaling (displaying lower resolution graphics at a higher resolution) later on.

Conclusion


Propeller Video Output

Propeller Video Output

The details of generating video on the Propeller are certainly a lot to digest, and the datasheet is an invaluable resource for understanding what's going on under the hood. But with the counter/PLL and generator details out of the way, the next step is actually pushing a VGA signal with them frame by frame, line by line, screen by screen.