## Contents

| 1 | Intr             | roduction                | 1 |
|---|------------------|--------------------------|---|
| 2 | Pirmware Project |                          | 1 |
|   | 2.1              | Using the Stream Modules | 1 |
|   | 2.2              | Application Logic        | 2 |
|   | 2.3              | Simulation               | 2 |
| 3 | Soft             | tware                    | 3 |

## 1 Introduction

The StreamLoopback128 sample project demonstrates how to use the streaming data flow model to communicate with your firmware in the FPGA. The Pico API docs explain the concept and motivation for streams. You may wish to keep a copy of those docs handy while you read this.

This sample sets up one stream going from the host to the FPGA, and another stream going from the FPGA back to the host. The logic keeps a running sum of the data received from the host. For each piece of input data, it generates an output consisting of the sum of all input data seen so far.

# 2 Firmware Project

This project uses two streams: one in and one out. We use the PicoDefines.v file to tell the Pico framework that we need these two streams. The framework then takes care of impelementing the support logic to connect these streams to the PCIe interface, which gives them access to the host computer or other FPGAs.

Once we have the two streams setup, we'll simply connect the incoming stream to the outgoing stream. In your application, of course, you'll want to transform the data before you send it back. This is the point at which you would insert your application logic.

## 2.1 Using the Stream Modules

Here's what our PicoDefines.v file looks like:

```
'define USER_MODULE_NAME StreamLoopback128

'define STREAM1_IN_WIDTH 128

'define STREAM1_OUT_WIDTH 128

// We define the type of FPGA and card we're using.
'define PICO_MODEL_M501
'define PICO_FPGA_LX240T

'include "BasePicoDefines.v"
```

You can see the relevant STREAM1\_IN\_WIDTH and STREAM1\_OUT\_WIDTH defines that specify both streams are 128b wide. The Pico framework sees these defines and connects the two streams to our module. Here's what the port list of our module looks like:

```
module StreamLoopback128 (
    input
                          clk,
    input
                          rst,
                          s1i_valid,
    input
    output
                          s1i_rdy,
    input [127:0]
                          s1i_data,
    output
                          s1o_valid,
    input
                          s1o_rdy,
    output [127:0]
                          s1o_data
    );
```

Note the common clock and reset signals, and the two stream interfaces.

When s1i\_valid is high and we assert s1i\_rdy, we'll grab the incoming data on s1i\_data. We'll know our output has been accepted on s1o\_data when we're asserting s1o\_valid and we see s1o\_rdy go high.

#### 2.2 Application Logic

As we mentioned earlier, the "application logic" in this example sums the incoming data. It generates an output stream consisting of the sum of the input at each point in the input stream. For example, given an input stream of five values: 0, 1, 2, 3, 4, it will produce an output stream of five values: 0, 1, 3, 6, 10.

We sum the low 32 bits of the input stream rather than the whole 128. We report the last value received on the low 32 bits and use the high 64 bits of the output stream to inject a "tag." This tag is a just a constant value in this sample, but in practice it could be additional computed statistics for the input stream beyond just a sum. Here's how we construct the output data from our running sum and the tag data:

```
assign \ slo_data = \{32'h42424242, \ 32'hdeadbeef, \ sum[31:0], \ last_input[31:0]\};
```

The main piece of logic in this module watches the input and output streams. It consumes the next input as long as we're not receiving backpressure from the output. We keep track of the sum in the "sum" register. This introduces a single-cycle pipeline in the design. We use the "empty" register to track whether the pipeline contains valid data or a bubble. For example, while we're receiving input the pipeline stays full, but when s1i\_valid goes low, a bubble gets into the pipeline and we set "empty" true. When s1i\_valid goes high again, "empty" goes low and the pipeline runs full.

```
sum <= sum + s1i_data[31:0];
    last_input <= s1i_data[31:0];
    end
end
end</pre>
```

#### 2.3 Simulation

We can now simulate the design using Isim to demonstrate the power of the streaming system. To launch Isim, select the *Simulation* radio button in ISE, select the *PicoTestbench*, and then double-click *Simulate Behavioral Model*.



After Isim launches, we add the UserModule signals to the waveform and run the simulation. Here



Note the s1i\_data to the left of the yellow line (0x10) is added to the sum when s1i\_valid and s1i\_rdy are both asserted. Also note the s1o\_data is updated to reflect the new sum to the right of the yellow line.

## 3 Software

The software portion of this sample sends a sequence of data to the FPGA and reads the results back.

First, we load the bit file into the FPGA and obtain a handle using the RunBitFile function, taking the file name from the command line:

Next, we use this handle to open the stream to communicate with the FPGA:

```
// data goes out to the firmware on stream #1 and also comes back on stream
#1
printf("Opening streams to test counter\n");
stream = pico->CreateStream(1);
if (stream < 0) {
    fprintf(stderr, "couldn't open stream 1! (return code: %i)\n", stream);
    exit(1);
}</pre>
```

Now we fill our buffer with an increasing sequence, and send it to the card. Before we send it, we verify that the card is ready to handle the data.

```
// fill the buffer with data we'll recognize when it comes back.
for (i=0; i < sizeof(buf)/sizeof(buf[0]); ++i)</pre>
    buf[i] = 0x42000000 | i;
printf("%i bytes of room in stream to firmware.\n", i=pico->
   GetBytesAvailable(1, false /* writing */));
// write 256 words if there's room. (and there should be if all is well.)
if (i > 256*16)
    i = 256*16;
i &= ~^{\circ}Oxf; // process only whole words.
if (i > 0) {
    err = pico->WriteStream(stream, buf, i);
    printf("Wrote %i B\n", err);
    if (err != i) {
        fprintf(stderr, "write error. returned %i, but should have been %i\n
           ", err, i);
        exit(1);
```

```
}
```

At this point the data has been sent to the FPGA, and we can ask the system to read back the returned data from the FPGA.

```
printf("%i B available to read from firmware.\n", i=pico->GetBytesAvailable
   (1, true /* reading */));
// read up to 32 words back from the firmware.
// (due to the way the buffering in the firmware works, we might not see all
     128 words at once right now.)
if (i > 32*16)
    i = 32*16;
i &= ^{\circ}Oxf; // process only whole words.
if (i > 0) {
    printf("Reading %i B\n", i);
    err = pico->ReadStream(stream, buf, i);
    if (err != i) {
        fprintf(stderr, \ "write \ error. \ returned \ \%i, \ but \ should \ have \ been \ \%i \ \backslash n
            ", err, i);
        exit(1);
    printf("Data received back from firmware:\n");
    print128(stdout, buf, i/16);
}
```

That's it! You can look at the values printed out and see the running sum the FPGA computed.