

# **LAB06: Tiling on PULP**

Lorenzo Lamberti – lorenzo.lamberti@unibo.it Luka Macan – luka.macan@unibo.it Francesco Conti – f.conti@unibo.it

# **Objective of the Class**

**Intro:** Tiling

Tasks:

2D convolution in L1

2D convolution in L2

Layer Tiling

**Programming Language:** C

**Lab duration**: 3h

**Assignment:** 

Time for delivery: 1 week

Submission deadline: Nov 23th 2022 (16:00)

The class is meant to be interactive: coding together and on your own!



# How to deliver the Assignment

#### You will deliver ONLY the GDOC assignment, no code

- Copy the google doc to your drive, so that you can modify it. (File -> make a copy)
- Fill the tasks on this google doc.
- Export to pdf format.
- Rename the file to: LAB<number\_of\_the\_lesson>\_APAI\_<your\_name>.pdf
- Use Virtuale platform to load ONLY your .pdf file





#### **SETUP:** How to access the server

- Open this web page: <a href="https://compute.eees.dei.unibo.it:8443/guacamole/">https://compute.eees.dei.unibo.it:8443/guacamole/</a> (works only from ALMA WIFI NETWORK!)
- 2. Login. We distribute credentials by hand.
- Open a terminal (right click open a new terminal)
- Open a text editor (For example "VSCode"): \$ code .
   Now you can use the integrated terminal to run your applications!

IMPORTANT: activate the pulp-sdk module file <u>every</u> time a new shell is open.

\$ module load pulp-sdk
\$ module load dory-conda









#### **SETUP:** How to access the server

- Open this web page: <a href="https://compute.eees.dei.unibo.it:8443/guacamole/">https://compute.eees.dei.unibo.it:8443/guacamole/</a> (works only from ALMA WIFI NETWORK!)
- 2. Login. We distribute credentials by hand.
- 3. Open a terminal (right click open a new terminal)
- 4. Clone:
   git clone https://github.com/EEESlab/<insert\_here\_the\_right\_repo!>
- 5. module load pulp-sdk
- 6. cd <insert\_here\_the\_right\_repo!>
- 7. make clean all run







# INTRO



# TASK1: fit in L1

# Case study: 1x1 conv2D

We tackle a 1x1 convolution with this sizes:

- Input = SPATIAL\_DIM → defined by you
- Output = SPATIAL\_DIM → defined by you
- Kernel = 1x1
- Stride = 1
- Padding = 0

NB: with conv1x1 the spatial size between input and output does not change!

We want to fit into the L1 memory!





1x1 Convolution
Used today!



#### 3x3 Convolution

Used in lab04!



**LAB APAI 22/23** 

8

# PULP Platform: today we focus on the <u>8-cores cluster</u>



**GitHub** HW Project: <a href="https://github.com/pulp-platform/pulp">https://github.com/pulp-platform/pulp</a>

**HW Documentation:** 

https://raw.githubusercontent.com/pulp-platform/pulp/master/doc/datasheet.pdf

- Cores: 1 + 8
- On-chip Memories
  - A level 2 Memory, shared among all cores
  - A level 1 Memory, shared by the 8-cores cluster
- cluster-DMA: A multi-channel 1D/2D DMA, controlling the transactions between the L2 and L1 memories
- micro-DMA: A smart, lightweight and completely autonomous DMA () capable of handling complex I/O scheme
  - **Bus+Peripherals:** HyperBus, I2S, CPI, timers, SPI, GPIOs, etc...

NB: this is the architecture you find on the nano-drone!



# PULP Platform: today we focus on the <u>8-cores cluster</u>



**GitHub** HW Project: <a href="https://github.com/pulp-platform/pulp">https://github.com/pulp-platform/pulp</a> **HW Documentation**:

https://raw.githubusercontent.com/pulp-platform/pulp/master/doc/datasheet.pdf

Cores: 1 + 8

- On-chip Memories
  - A level 2 Memory, shared among all cores
  - A level 1 Memory, shared by the 8-cores cluster
- cluster-DMA: A multi-channel 1D/2D DMA, controlling the transactions between the L2 and L1 memories
- micro-DMA: A smart, lightweight and completely autonomous DMA () capable of handling complex I/O scheme
  - **Bus+Peripherals:** HyperBus, I2S, CPI, timers, SPI, GPIOs, etc...

NB: this is the architecture you find on the nano-drone!



# **Convolution Operation: naive**



Credits: Daniele Palossi, Lorenzo Lamberti

# **Convolution Operation: im2col and MatMul**



# **Convolution Operation: im2col and MatMul**



# EX1: find maximum dimensions of layers fitting L1 without tiling

#### Prerequisites:

module load pulp-sdk
module load dory-conda

#### Run the code:

\$ python3 parameters\_generate.py --channels=<add\_here> --spatial\_dimension=<add\_here>
\$ make clean all run

#### Follow the assignment document.

**NB:** Choose the exercise by uncommenting one of the following defines in main.h:

```
#define EXERCISE1
//#define EXERCISE2
// #define EXERCISE3
```





# TASK2: fetch from L2

#### EX2: fetch data from L2

#### Run the code:

- \$ python3 parameters\_generate.py --channels=<add\_here> --spatial\_dimension=<add\_here>
- \$ make clean all run

Follow the assignment document.

**NB:** Choose the exercise by uncommenting one of the following defines in main.h:

```
#define EXERCISE1
//#define EXERCISE2
// #define EXERCISE3
```



# PULP Platform: today we focus on the <u>8-cores cluster</u>



**GitHub** HW Project: <a href="https://github.com/pulp-platform/pulp">https://github.com/pulp-platform/pulp</a>

**HW Documentation:** 

https://raw.githubusercontent.com/pulp-platform/pulp/master/doc/datasheet.pdf

- Cores: 1 + 8
- On-chip Memories
  - A level 2 Memory, shared among all cores
  - A level 1 Memory, shared by the 8-cores cluster
- cluster-DMA: A multi-channel 1D/2D DMA, controlling the transactions between the L2 and L1 memories
- micro-DMA: A smart, lightweight and completely autonomous DMA () capable of handling complex I/O scheme
  - **Bus+Peripherals:** HyperBus, I2S, CPI, timers, SPI, GPIOs, etc...

NB: this is the architecture you find on the nano-drone!



# PULP Platform: today we focus on the <u>8-cores cluster</u>



**GitHub** HW Project: <a href="https://github.com/pulp-platform/pulp">https://github.com/pulp-platform/pulp</a>

**HW Documentation:** 

https://raw.githubusercontent.com/pulp-platform/pulp/master/doc/datasheet.pdf

- Cores: 1 + 8
- On-chip Memories
  - A level 2 Memory, shared among all cores
  - A level 1 Memory, shared by the 8-cores cluster
- cluster-DMA: A multi-channel 1D/2D DMA, controlling the transactions between the L2 and L1 memories
- micro-DMA: A smart, lightweight and completely autonomous DMA () capable of handling complex I/O scheme
  - **Bus+Peripherals:** HyperBus, I2S, CPI, timers, SPI, GPIOs, etc...

NB: this is the architecture you find on the nano-drone!



# TASK3: Tiling

# Tiling from L2 to L1









# Tiling from L2 to L1











# Tiling from L2 to L1







# **EX3: Tiling layer**

#### Run the code:

- \$ python3 parameters\_generate.py --channels=#### --spatial\_dimension=####
- \$ make clean all run

Follow the assignment document.

**NB:** Choose the exercise by uncommenting one of the following defines in main.h:

```
#define EXERCISE1
//#define EXERCISE2
// #define EXERCISE3
```





DEI – Università di Bologna