# Microblaze Compilation

This notebook introduces how to compile Microblaze code from within Jupyter and IPython. The examples presented here use the base overlay and the IOPs but this process can be extended to other Microblaze systems.

The first thing we need is to load the Microblaze library.

In [1]:
import ipython_microblaze as ipmb

This package will install a new `%%microblaze` magic which takes at least two parameters. The first is the IOP to compile against and load the code on to and the second is the name of the variable to store the program. The easiest way to show the various use cases is through a series of examples.

## Example 1: Lower-casing some text

This example gives the simplest implementation of reading and writing to a stream inside of the C code and how that is exposed in Python. It reads a block of data, lower-cases it all and then writes it back again. First we need to import the Base overlay to access the IOPs

In [2]:
from pynq.overlays.base import BaseOverlay

base = BaseOverlay('base.bit')

Then we can use one of our new magic cells to compile the program and load it on to the IOP. The first argument is the Microblaze to use and the second parameter is the variable to store the compiled program in.

In [3]:
%%microblaze base.ARDUINO lower_case
#include <unistd.h>
#include <ctype.h>

int main() {
        char buf[1024];
        int bytes;
        int remain = 0;
        while (1) {
                bytes = read(STDIN_FILENO, buf, 1024);
                for (int i = 0; i < bytes; ++i) {
                        buf[i] = (char)tolower(buf[i]);
                }
                remain = bytes;
                while (remain > 0) {
                        remain -= write(STDOUT_FILENO, buf + bytes - remain, remain);
                }
        }
}

We can now run some text through the program using the `stream` member of the returned variable. Note that the read and write functions on the stream take binary strings so the Unicode strings will need to be `encode`d or `decode`d.

In [4]:
test_string = 'HELLO, WORLD!'.encode()
lower_case.stream.write(test_string)
result = None
while not result:
    result = lower_case.stream.read()
print(result.decode())

hello, world!


## Example 2: `printf` and its variants

This example looks at some of the ways of handling and printing strings inside the microblaze code. The most common function is `printf` which, due to its design, is too large to fit in the microblaze code memory. Instead there are a variety of functions that offer restricted subsets of printf. The full list is detailed a [https://www.xilinx.com/support/answers/19592.html]. The two of most use here are `print` which prints a plain string and `xil_printf` which offers non-reentrant printing to stdout without support for floating point numbers. `xil_io.h` should be included for the `xil_printf` and `print` function prototypes.

The code below prints a header before starting with `print` and then reads a line of text, character by character, from stdin using `getchar`, echoing the characters with `putchar`. Once a whole line has been read the number of characters in the line is written using `xil_printf`.

In [5]:
%%microblaze base.PMODB letter_count
#include <stdio.h>
#include <xil_io.h>

int main() {
    print("Starting Letter Count\n");
    while (1) {
        int letter_count = 0;
        int c = getchar();
        while (c != '\n') {
            putchar(c);
            c = getchar();
            letter_count++;
        }
        fflush(stdout);
        xil_printf(" (%d letters)\n", letter_count);
    }

}

In [6]:
import time

test_string = "Hello, World!\nA really really long string\n"
letter_count.stream.write(test_string.encode())

time.sleep(0.2)
print(letter_count.stream.read().decode())

Starting Letter Count
Hello, World! (13 letters)
A really really long string (27 letters)



## Example 3: Asynchronous Communication
The IOPs have interrupt support on reading so the PS can be idle while waiting for data. The use of `asyncio` allows us to chain the IOPs together using coroutines. This example will feed the result of the lower-case IOP into the input of the counting IOP.

As the whole process will happen asynchronously, we use three tasks to feed the data, transfer the data, and print the result. To ensure that interrupts rather than polling is being used we also have a fourth task to print the CPU utilisation periodically throughout the program's execution.

In [7]:
import asyncio
import psutil

async def write_data():
    for i in range(20):
        lower_case.stream.write(f"TeSt String {i}\n".encode())
        await asyncio.sleep(0.5)
        
async def transfer_data():
    while True:
        data = await lower_case.stream.read_async()
        letter_count.stream.write(data)

async def read_print():
    while True:
        data = await letter_count.stream.read_async()
        print(data.decode().strip('\n'))

async def print_cpu_usage():
    # Calculate the CPU utilisation by the amount of idle time
    # each CPU has had in three second intervals
    last_idle = [c.idle for c in psutil.cpu_times(percpu=True)]
    while True:
        await asyncio.sleep(3)
        next_idle = [c.idle for c in psutil.cpu_times(percpu=True)]
        usage = [(1-(c2-c1)/3) * 100 for c1,c2 in zip(last_idle, next_idle)]
        print("CPU Usage: {0:3.2f}%, {1:3.2f}%".format(*usage))
        last_idle = next_idle


        
write_task = asyncio.ensure_future(write_data())
transfer_task = asyncio.ensure_future(transfer_data())
read_task = asyncio.ensure_future(read_print())
usage_task = asyncio.ensure_future(print_cpu_usage())

Finally we can run the event loop until the writing task has finished

In [8]:
loop = asyncio.get_event_loop()
loop.run_until_complete(write_task)

test string 0 (13 letters)
test string 1 (13 letters)
test string 2 (13 letters)
test string 3 (13 letters)
test string 4 (13 letters)
test string 5 (13 letters)
CPU Usage: 3.67%, 5.00%
test string 6 (13 letters)
test string 7 (13 letters)
test string 8 (13 letters)
test string 9 (13 letters)
test string 10 (14 letters)
test string 11 (14 letters)
CPU Usage: 2.67%, 4.33%
test string 12 (14 letters)
test string 13 (14 letters)
test string 14 (14 letters)
test string 15 (14 letters)
test string 16 (14 letters)
test string 17 (14 letters)
CPU Usage: 2.67%, 3.67%
test string 18 (14 letters)
test string 19 (14 letters)


To clean up we need to cancel our never-ending tasks to avoid polluting the event loop

In [9]:
transfer_task.cancel()
read_task.cancel()
usage_task.cancel()

lower_case.reset()
letter_count.reset()

## Example 4 - Direct Communication between IOPs

As the API is symmetric (with the exception of the interrupts which are special-cased), it is entirely possible for two IOPs to talk to each other without needing the PS to do anything. The system implements the `open` syscall where the path is the pointer to the base address of the buffer. First we'll recreate our upper-case/counting demo but with the IOP communication done without the PS.

First we need to allocate some memory for the buffer using the xlnk driver and re-download the bitstream to clear the IOP programs.

In [10]:
from pynq import Xlnk

xlnk = Xlnk()
buffer = xlnk.cma_alloc(0x800)
phys_buffer_ptr = xlnk.cma_get_phy_addr(buffer)

We are going to pass this pointer into each program as part of the initialisation procedure. The microblaze doesn't share directly the same memory address as the PS so the pointer needs to be or'ed with 0x20000000 to be valid in the microblaze's address space.

In [11]:
%%microblaze base.PMODA letter_count
#include <stdio.h>
#include <xil_io.h>
#include <fcntl.h>
#include <unistd.h>

int main() {
    print("Starting Letter Count\n");
    char* stream_ptr;
    int fd;
    read(STDIN_FILENO, &stream_ptr, 4);
    fd = open(stream_ptr, O_RDONLY);
    FILE* f = fdopen(fd, "r");
    xil_printf("Using descriptor %d on buffer %d\n", fd, (int)stream_ptr);
    while (1) {
        int letter_count = 0;
        int c;
        c = getc(f);
        while (c != '\n') {
            putchar(c);
            c = getc(f);
            letter_count++;
        }
        fflush(stdout);
        xil_printf(" (%d letters)\n", letter_count);
    }

}

In [12]:
%%microblaze base.PMODB lower_case
#include <unistd.h>
#include <ctype.h>
#include <fcntl.h>
#include <stdio.h>
#include <xil_io.h>
int main() {
        char buf[256];
        int bytes;
        int remain = 0;
        char* buffer_ptr;
        int fd;
        read(STDIN_FILENO, &buffer_ptr, 4);
        fd = open(buffer_ptr, O_WRONLY);
        xil_printf("Using descriptor %d on buffer %d\n", fd, (int)buffer_ptr);
        while (1) {
                bytes = read(STDIN_FILENO, buf, 256);
                for (int i = 0; i < bytes; ++i) {
                        buf[i] = (char)tolower(buf[i]);
                }
                remain = bytes;
                while (remain > 0) {
                        int written = write(fd, buf + bytes - remain, remain);
                        remain -= written;
                }
        }
}

In [13]:
import struct

mb_ptr = phys_buffer_ptr | 0x20000000
letter_count.stream.write(struct.pack('I',mb_ptr))
lower_case.stream.write(struct.pack('I',mb_ptr))

4

We can then base our string into the lower microblaze and read the result back from the count program without having to do any data transfer in the PS.

In [14]:
lower_case.stream.write(b'Hello, World\n')
time.sleep(0.2)
print(letter_count.stream.read().decode())

Starting Letter Count
Using descriptor 2 on buffer 914644992
hello, world (12 letters)



We can also read the debug information being printed from the lower program as well.

In [15]:
print(lower_case.stream.read())

b'Using descriptor 2 on buffer 914644992\n'


Unlike other microblaze programs, any which touch the DDR memory need to be closed properly to otherwise invalid transactions can occur when the bitstream is reprogrammed leading to the PS interconnect in an ill-defined state and many hours spent debugging. We can also take this opportunity to free the Xlnk buffer.

In [16]:
lower_case.reset()
letter_count.reset()
xlnk.cma_free(buffer)

## Example 5 - Peripheral Libraries

To make peripherals easier to use the driver code from peripherals in the PYNQ standard library has been refactored into libraries which can be used by passing additional parameters to the `%%microblaze` magic. The first two examples are the Grove ADC and LEDBar attached to G4 and G1 of a Pmod/Grove adapter on PMODA.

To look at the APIs for the devices we can look at the `declaration` attribute of the Peripheral.

In [17]:
print(ipmb.LEDBar.declaration)
print(ipmb.GroveADC.declaration)



typedef struct {
    short data;
    short clk;
} ledbar;

ledbar ledbar_init(unsigned char port);
void ledbar_set_level(ledbar, unsigned char i);
void ledbar_set_data(ledbar, unsigned char data[10]);



void adc_init(unsigned char port);
float adc_read_sample();



These two peripherals can now be used independently as with the current API by creating simple python and C wrappers. The LED bar has a simple write only interface which reads 10-byte blocks from stdin and sets the LEDs appropriately.

In [18]:
%%microblaze base.PMODA ledbar_program ipmb.LEDBar
#include <unistd.h>

int main() {
    ledbar g1 = ledbar_init(G1);
    
    unsigned char data[10];
    while (1) {
        read(STDIN_FILENO, data, 10);
        ledbar_set_data(g1, data);
    }
}

In [19]:
ledbar_program.stream.write(b'\xFF\x00' * 5)

10

While a slightly more complex driver program for the ADC can record periodic samples based on a couple of parameters

In [20]:
ledbar_program.reset()

In [21]:
%%microblaze base.PMODA adc_program ipmb.GroveADC
#include <unistd.h>

int main(void) {
    adc_init(G4);
    while (1) {
        int num;
        int delay;
        read(STDIN_FILENO, &num, 4);
        read(STDIN_FILENO, &delay, 4);
        for (int i = 0; i < num; ++i) {
            float voltage = adc_read_sample();
            write(STDOUT_FILENO, &voltage, 4);
            delay_ms(delay);
        }
    }

}

In [22]:
# 10 samples spaced by 200 ms
import struct
import time
adc_program.stream.write(struct.pack('II', 10, 200))
time.sleep(3)
struct.unpack('10f', adc_program.stream.read())

(1.6287109851837158,
 1.627197265625,
 1.627197265625,
 1.6287109851837158,
 1.6287109851837158,
 1.6287109851837158,
 1.627197265625,
 1.627197265625,
 1.627197265625,
 1.6287109851837158)

The LEDBar Peripheral is also supported on the Arduino PYNQ Shield. If we plug it into G1 on the shield then we can reuse the same exact code.

In [23]:
%%microblaze base.ARDUINO ledbar_program ipmb.LEDBar
#include <unistd.h>

int main() {
    ledbar g1 = ledbar_init(G1);
    
    unsigned char data[10];
    while (1) {
        read(STDIN_FILENO, data, 10);
        ledbar_set_data(g1, data);
    }
}

In [24]:
ledbar_program.stream.write(b'\xFF\x00' * 5)

10

The final peripheral in this demo is the OLED PMOD which currently as a simplified API for testing purposes

In [25]:
print(ipmb.PmodOLED.declaration)


void oled_init();
void oled_clear();
void oled_print_string(char* string, int x, int y);



And again we can have a simple driver program that reads strings from stdin and writes them to the OLED screen.

In [26]:
%%microblaze base.PMODB oled_program ipmb.PmodOLED
#include <stdio.h>
#include <unistd.h>

int main() {
    oled_init();
    char buf[64];
    while (1) {
        int bytes = read(STDIN_FILENO, buf, 63);
        buf[bytes] = 0;
        oled_clear();
        oled_print_string(buf, 0, 0);
    }
}

In [27]:
oled_program.stream.write(b'Hello, World!')

13

In [28]:
oled_program.reset()

## Example 6 - Offloaded Computation

Now we have libraries for both our LED bar and ADC/temperature sensor we can combine them into a single program that reads the voltage and displays it on the LED bar. To determine what the levels should be the program will read 10 float values at startup that correspond to the threshold voltages for each LED.

In [29]:
base.download()

In [30]:
%%microblaze base.PMODA led_adc ipmb.LEDBar ipmb.GroveADC
#include <unistd.h>
#include "pmod.h"
#include "xil_io.h"

int main(void) {
    adc_init(G4);
    ledbar output = ledbar_init(G1);

    float thresholds[10];
    
    ssize_t bytes = read(STDIN_FILENO, thresholds, 40);
    xil_printf("Read %d bytes\n", bytes);
    while (1) {
        float voltage = adc_read_sample();
        int val = 0;
        for (int i = 0; i < 10; ++i) {
            if (voltage > thresholds[i]) val = i + 1;
        }
        ledbar_set_level(output, val);
    }
}

In [31]:
import struct
led_adc.stream.write(struct.pack('10f', *[0 + 0.3 * i for i in range(10)]))

40

## Example 7 - Bringing it all together

In the final example we will bring together the offloaded computation and the inter-IOP communication. The goal is to read the temperature from the ADC, update the LED bar and write the current temperature to the LED screen, all without involving the host CPU.

The starting point is to write our two driver programs. The LED bar/ADC code needs little modification, only reading in a buffer address on startup and writing out the voltage on each pass. The `fsync` system call is used to wait for the buffer to be read by the remote end before starting the cycle again.

In [32]:
base.download()

In [33]:
%%microblaze base.PMODA ledbar_adc ipmb.GroveADC ipmb.LEDBar
#include <unistd.h>
#include <fcntl.h>
#include "pmod.h"
#include "xil_io.h"

int main(void) {
    adc_init(G4);
    ledbar output = ledbar_init(G1);
    char* buffer_ptr;
    float thresholds[10];
    
    ssize_t bytes = read(STDIN_FILENO, thresholds, 40);
    read(STDIN_FILENO, &buffer_ptr, 4);
    int fd = open(buffer_ptr, O_WRONLY);
    
    xil_printf("Read %d bytes\n", bytes);
    while (1) {
        float voltage = adc_read_sample();
        int val = 0;
        for (int i = 0; i < 10; ++i) {
            if (voltage > thresholds[i]) val = i + 1;
        }
        write(fd, &voltage, 4);
        ledbar_set_level(output, val);
    }
}


In [34]:
ledbar_adc.stream.write(struct.pack('10f', *[1.5 + 0.04 * i for i in range(10)]))

40

And we need to modify the OLED program to read a value from a buffer, format it and write it out. Please note that sprintf would be the usual option for this but the function way too big to fit in the code memory of the microblaze so we roll our own.

In [35]:
%%microblaze base.PMODB oled_program ipmb.PmodOLED
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include "pmod.h"

int main() {
    pmod_init(0, 0);
    oled_init();
    char buf[64] = "Voltage: xxxx mV\n";
    char* buffer_ptr;
    read(STDIN_FILENO, &buffer_ptr, 4);
    int fd = open(buffer_ptr, O_RDONLY);
    while (1) {
        float val;
        read(fd, &val, 4);
        char* ptr = buf + 9;
        int mv = (int)(val / 0.001);
        for (int i = 0; i < 4; ++i) {
            ptr[3-i] = '0' + (mv % 10);
            mv /= 10;
        }
        oled_print_string(buf, 0, 0);
    }
}

Now the two programs are started all that remains is to allocate a buffer for them and feed the pointer into the devices.

In [36]:
buffer = xlnk.cma_alloc(0x800)
phys_buffer_ptr = xlnk.cma_get_phy_addr(buffer)
mb_ptr = phys_buffer_ptr | 0x20000000
ledbar_adc.stream.write(struct.pack('I',mb_ptr))
oled_program.stream.write(struct.pack('I',mb_ptr))

4

Once done we need to properly tear down the microblazes as they use the DRAM and free the buffer.

In [37]:
ledbar_adc.reset()
oled_program.reset()
xlnk.cma_free(buffer)