In [None]:
from pygments import highlight
from pygments.lexers import CLexer
from pygments.lexers.python import PythonLexer
from pygments.lexers.shell import BashLexer
from pygments.lexers.make import MakefileLexer
from pygments.formatters import HtmlFormatter
from pygments.lexers.special import TextLexer
import IPython

def get(path):
    if path.endswith(".c"):
        lexer = CLexer()
    elif path.endswith(".py"):
        lexer = PyLexer()
    elif path.endswith(".sh"):
        lexer = BashLexer()
    elif "Makefile" in path:
        lexer = MakefileLexer()
    else:
        lexer = TextLexer()
    with open(path) as f:
        code = f.read()
        return '<style type="text/css">{}</style>{}'.format(
            HtmlFormatter().get_style_defs('default'),
            highlight(code, lexer, HtmlFormatter()))

In [None]:
import os
os.environ["RISCV"] = "/pulp-riscv-gnu-toolchain"

# Bare Metal Deployment on RISC-V Host core

The Diana system-on-chip (SoC) is based on [pulpissimo](https://github.com/pulp-platform/pulpissimo) and comes with three main parts:
1. An open-source RISC-V core. In this case, a [**RI5CY**](https://github.com/openhwgroup/cv32e40p) core with the `RV32IMFCXpulp` ISA.
2. A Digital accelerator named **SOMA** with a 16-by-16 PE array supporting e.g. Conv2D operations at 8 bit precision
3. An analog/mixed-signal Compute-in-memory (CIM) core named **ANIA** with a 1152-by-512 PE array supporting e.g. Conv2D operations on 7-bit inputs and ternary weight values (\[-1,0,1\]).

In this tutorial you will learn how to:
* Write a simple "hello world" program on Diana's RISC-V core
* Step through a program with `gdb` on the RISC-V core.
* Cross-compile your C code for deployment on the RISC-V core with `gcc` from the pulp RISC-V GCC toolchain.
* Perform memory managemement for the SoC
* Troubleshoot standard errors and common problems

This tutorial assumes some familiarity with various aspects of C programming:
* C programming
* Basic knowledge of GCC compilation options
* Building C code with GNU `make` and `Makefile`s
* Static and dynamic memory management in C (e.g. `malloc` and `free`)
* Familiarity with a debugger like `gdb`, `lldb`, `pdb` or debuggers found in common IDE's.

**Note that this notebook runs in an `IPython` interactive shell, but that commands with and exclamation mark (!) in front are actually passed onto the linux shell (`bash` in this case)**

## Writing a simple "hello world" program

Writing programs on a platform like Diana is quite different from writing a program on your average desktop platform:
* Diana's code is running *bare-metal*, meaning there is *no OS* on the Diana platform. Your program is directly running on the RISC-V core with no operating system in between.
* You can not attach a screen to Diana, nor can you SSH into Diana. You can only interface with Diana over *UART* or *JTAG* (with the help of `gdb`). 
* Diana has very few memory (*only 512 kB!*) to work with and has no file system.
* Diana has *no hardware caches*, but only *scratchpad memories*, meaning that all data caching has to be controlled by the programmer/compiler.
* Diana's host core works with the quite recent and fully open-source *RISC-V ISA* as opposed to the more traditional x86 or ARM ISA's.

### Writing a simple hello world on X86:

Let's write a basic C program to illustrate how to work with this platform.
In this case we'll use a lot of programs which are available in the GNU `binutils` software collection.

In [None]:
IPython.display.HTML(get("helloworld.c"))

In [None]:
IPython.display.HTML(get("Makefile_ex1.x86"))

In [None]:
!make -f Makefile_ex1.x86 clean all

Now you can run the `helloworld` binary:

In [None]:
!./helloworld

### What is wrong with this example?

This binary is not deployable on Diana though! Given the earlier comments, do you know what's wrong?

#### Wrong binary format

Let's inspect what type of binary we just created:

In [None]:
!file helloworld

We've just created a binary file in ELF format.
We can look into the header of the ELF file with `readelf -h helloworld`

In [None]:
!readelf -h helloworld

In this case you can see that it's **not written for a `RISC-V`** machine, in fact this is a **`X86-64`** binary (see the `Machine` entry):

#### No OS support

Even though Diana **doesn't support an OS**, GCC has wrapped it inside **Linux** startup (`_start`) functions, which can in this case be seen from the entry point address (see previous cell) by *disassembling* the binary with `objdump`. Furthermore, as executing `file` already saw, this file is dynamically linked. Since there's no OS support for dynamic linking, we'll have to make a statically linked binary:

In [None]:
!objdump -S --start-address=0x1060 --stop-address=0x1090 helloworld

#### No `printf` support

We can just read the output from the comfort of the terminal. As stated earlier, this is not possible with Diana. 
   
   In fact, with the current lab setup, it is **not even possible to read from `printf();`** statements at all. To bypass this restriction, we will demonstrate how to use gdb to print globally defined variables.

Take for example this adapted C file and corresponding makefile:

In [None]:
IPython.display.HTML(get("helloworld_gdb.c"))

In [None]:
IPython.display.HTML(get("Makefile_ex2.x86"))

In [None]:
!make -f Makefile_ex2.x86 clean all

Now we can start up the program and log the program with gdb.
To do this we start the program with `gdb -x gdb_script.sh`, where `gdb_script.sh` is a script that performs all the commands in gdb. For example:

In [None]:
IPython.display.HTML(get("gdb_script.sh"))

This script will do the following:
* `file` : Load in the `helloworld_gdb` binary
* `break`: Insert a breakpoint at the `gdb_anchor` function
* `run`  : Start execution of the program
* `printf` : Print the string `global_string`
* `continue`: Continue execution of the program after it hit the breakpoint
* `quit` : Exit `gdb`'s interactive shell.

In [None]:
!gdb -x gdb_script.sh

#### Binary size is crucial

As stated before, **Diana has very few memory**. Hence, keeping track of your memory usage is very important.

   We can inspect how large the sections of our binary are by using `size` or `readelf`.

In [None]:
!size helloworld

Here we can see some of the basic sections that make up an [elf binary](https://wiki.osdev.org/ELF) :
The `.text` section is 1566 bytes large, and the total size is 2174 bytes.   
   
   
| Sections making up an ELF binary ([source](https://wiki.osdev.org/ELF))  |
|-------------------------------------------------------------------------------------------------|

| section | Description                                             |
|---------|-------------------------------------------------------------------------------------------------|
| `.text` | 	where code live, as said above. `objdump -drS .process.o` will show you that.               |
|`.data`  | 	where global tables, variables, etc. live. `objdump -s -j .data .process.o` will hexdump it.|
|`.bss`| 	don't look for bits of `.bss` in your file: there's none. That's where your uninitialized arrays and variable are, and the loader 'knows' they should be filled with zeroes ... there's no point storing more zeroes on your disk than there already are, is it? |
|`.rodata`| 	that's where your strings go, usually the things you forgot when linking and that cause your kernel not to work. `objdump -s -j .rodata .process.o`makefile will hexdump it. Note that depending on the compiler, you may have more sections like this.|
|`.comment` & `.note`| 	just comments put there by the compiler/linker toolchain |
|`.stab` & `.stabstr`| 	debugging symbols & similar information. |

A more detailed description can be found by looking at the ELF Section headers:

In [None]:
!readelf helloworld -Wt

### Writing a simple hello world program on Diana

Taking into account all the previous comments about why you can not just call `gcc` and run that binary on the Ri5cy core on Diana, what does it take to run a binary on Diana?

1. We need a custom version of `gcc` that cross compiles to RISC-V ELF-files.
2. We need a version of (part of the) C standard library that allows to run programs on Diana.

These components are provided in different software projects:
1. `pulp-riscv-gnu-toolchain` provides versions of `gcc` and `binutils` for use with PULPissimo-derived platforms (like Diana)
2. `pulp-sdk-diana` provides an adapted version of the pulp runtime for use with Diana.

The easiest way to use these tools is to use the wrappers provided by the PULPissimo developers by setting up a makefile like so:

In [None]:
IPython.display.HTML(get("Makefile_ex3.pulprt"))

------------
*Let's go over what it takes to create such a makefile:*

1. The line at the bottom of the makefile calls the wrapper:
    ```makefile
    include $(PULP_SDK_HOME)/install/rules/pulp.mk
    ```
2. `BUILD_DIR` sets the name of the build directory
3. `PULP_APP` sets the name of the generated binary
4. `PULP_APP_SRCS` points to all C source files used in this build. The main file of the build. Note that you should always specify the absolute path. This can easily be achieved by using the `$(abspath ...)` directive in GNU Make.
5. `PULP_INC_PATHS` should always include a macro to be set to `SDK`
6. `PULP_CFLAGS` controls the compiler flags for `gcc`:
    * `-Wall` sets all warnings
    * `-pedantic` toggles pedantic compilation
    * `-O0` controls `gcc`s optimization level. You should always compile with `-O0` for debugging purposes. For optimal performance `-O3` should be used, and for binary size reduction the `-Os` option can be used.
    * `-g` turns on debug symbol generation.
    * `-I` can be used to include extra headers in compilation
    
------------

Now, inside a jupyter terminal run (there will be a lot of terminal output):
```bash
make -f Makefile_ex3.pulprt clean all 
```
Unless any errors show up, the binary will then be generated at `build/pulpissimo/helloworld_gdb_app/helloworld_gdb_app`.
Also note that the `clean` recipe in the Makefile was automatically generated by inclusion of the PULP-SDK.

We can now inspect the created file:

In [None]:
!file build/pulpissimo/helloworld_gdb_app/helloworld_gdb_app

Hurray! A statically linked RISC-V ELF file!
Let's look into the size?

In [None]:
!readelf -h build/pulpissimo/helloworld_gdb_app/helloworld_gdb_app

In [None]:
!size build/pulpissimo/helloworld_gdb_app/helloworld_gdb_app

Cool! And now let's dump the disassembly at the entry point address:

In [None]:
!objdump -S --start-address=0x1c008080 --stop-address=0x1c008086 build/pulpissimo/helloworld_gdb_app/helloworld_gdb_app

That doesn't work, let's instead directly use the `binutils` version that comes with PULPissimo

In [None]:
!$RISCV/bin/riscv32-unknown-elf-objdump -S --start-address=0x1c008080 --stop-address=0x1c008086 build/pulpissimo/helloworld_gdb_app/helloworld_gdb_app

### Running the "hello world" example on Diana

WIP

## Memory management on Diana

As a platform built on PULPissimo, Diana has a special memory map (this can be found at page 9 of [the PULPissmio datasheet](https://github.com/pulp-platform/pulpissimo/blob/master/doc/datasheet/datasheet.pdf)).

The memory map largeley consists of 3 parts:
|start address | memory |
|-|-|
|`0x1A00 0000`| Boot ROM (8kiB)|
|`0x1A10 0000`| Memory-mapped Peripherals |
|`0x1C00 0000`| RAM a.k.a L2 Memory (512 kiB)|

Note that the first part (the first 64 kiB) of L2 memory is only available to the RI5CY core (or fc = _fabric controller_ in PULPissimo terminology)). This is called the **private part of L2**. The other 448kiB are called the **shared part of L2**.

The accelerators on Diana are configured as **HWPEs** (= Hardware Processing Engines) in PULPissimo terminology.
The HWPE's can only access the shared part of L2 for performance reasons.


|***VERY IMPORTANT***|
|--------------------|
|**This means that you can not send data to Diana's accelerators if the data is allocated on the private part of L2. Please be very mindful about this, since you won't be warned about this (it compiles just fine). And it can lead to bugs which can be very difficult to debug**|


This is all fine, but how do you get these values to be placed in the shared part of your memory in the first place?

### Statically allocated data on shared L2

Statical data allocation is the job of the linker. Say you'd like to make sure a variable inside a function is mapped onto L2, like so:

```c
int bias = [32, 15, 88, 99]; 

int mult_add_bias(int a, int b, int bias_index){
    // Avoid a segfault here
    if (bias_index > 3 || bias_index < 0){
        abort();
    }
    int result = a * b + bias[bias_index]; 
    return result;
}
```
You'd like to make sure that bias is in the shared part of your memory. 
You can instruct this to the linker with the `L2_DATA` macro provided by the PULP-SDK like so:

```c
#include <pulp.h>

L2_DATA int bias = [32, 15, 88, 99]; 

int mult_add_bias(int a, int b, int bias_index){
    // Avoid a segfault here
    if (bias_index > 3 || bias_index < 0){
        abort();
    }
    int result = a * b + bias[bias_index]; 
    return result;
}
```
_Note that at this point, your code will not be portable to x86 anymore, you can not compile it anymore with a regular makefile._

### Dynamically allocated data on shared L2



