# ${\bf Laboration~3} \\ {\bf Design\mbox{-}Space~Exploration~with~MPARM}$

Björn Hvass Cyril Barrelet

March 14, 2019

Course: TDTS07 Liu Ids: Hvass bjohv276

Barrelet cyrba593

## 1 Design-Space Exploration for Energy Minimization

In this section, different parameters are tested, like the frequency and the cache size, to find the greatest parameter to minimaze the energy consumption.

Here is the test table:

| Test | Frequency (MHz) | Divider | Energy spent (33.5 µJ) | Exec time (ms) | Cache size<br>(bytes) | Cache type        |
|------|-----------------|---------|------------------------|----------------|-----------------------|-------------------|
| 1    | 200             | 1       | 45.03                  | 3.42           | 4096                  | Fully associative |
| 2    | 100             | 2       | 34.37                  | 3.42           | 4096                  | Fully associative |
| 3    | 67              | 3       | 33.53                  | 4.99           | 4096                  | Fully associative |
| 4    | 50              | 4       | 33.43                  | 6.35           | 4096                  | Fully associative |
| 5    | 25              | 8       | 33.35                  | 12.39          | 4096                  | Fully associative |
| 6    | 200             | 1       | 45.64                  | 1.69           | 8192                  | Fully associative |
| 7    | 100             | 2       | 34.97                  | 3.42           | 8192                  | Fully associative |
| 8    | 200             | 1       | 44.76                  | 1.69           | 2048                  | Fully associative |
| 9    | 100             | 2       | 34.09                  | 3.42           | 2048                  | Fully associative |

As regarding energy consumption, the cache size has a low impact on the energy consumption for the same frequency. In the second hand, when the divider is set to decrease the frequency, the power consumption decrease as well to tend to be around  $33.5\,\mu\mathrm{J}$ . However, the execution time is also increased.

The greatest parameters set according to the test table to find the best energy consumption and the best execution time is to set the divider as 2 and the cache size as 2048 bytes.

### 2 Assignment 2

Efficiency comparison of shared memory and distributed message passing. To gather the results presented below, see the table below, two different GSM voice codec implementations was used and simulated in mparm. Using mparm some tests were conducted to asses the impact that the frequency of the different processors had upon the system. By reducing the frequency with a divider of 2 as can be seen in test 6 in the table, both the traffic and the busyness of the bus has been reduced quite a lot. Compared to the first test where the system ran with the default settings this is an improvement of efficiency and energy consumption without sacrificing to much execution time.

| Test | Frequency (MHz) p0   p1   p2 | Communication | Bus busy (% of master cycles) | Bus transferring (% of master cycles) | Energy spent (µJ) | Exec time (ms) |
|------|------------------------------|---------------|-------------------------------|---------------------------------------|-------------------|----------------|
| 1    | 200   200   200              | Queue         | 56.25                         | 20.63                                 | 605               | 10             |
| 2    | 200   200   200              | Shared        | 44.91                         | 17.2                                  | 895               | 14             |
| 3    | 100   200   200              | Shared        | 37.39                         | 13.31                                 | 910               | 18.85          |
| 4    | 200   100   200              | Shared        | 39.43                         | 14.22                                 | 769               | 14.09          |
| 5    | 200   200   100              | Shared        | 36.61                         | 13                                    | 999               | 21.5           |
| 6    | 100   100   100              | Shared        | 23.57                         | 8.25                                  | 589               | 26.23          |
| 7    | 67   67   67                 | Shared        | 16.54                         | 5.77                                  | 578               | 37.6           |
| 8    | 67   100   100               | Shared        | 20.39                         | 7.13                                  | 614               | 31.5           |

#### 3 Mapping/Scheduling Exercice

This exercise talks about improving the execution time of an application by changing the Scheduling and the Mapping of the tasks.

#### 3.1 New schedule

As regarding the data dependencies, this schedule has been created to have the minimal length.



Figure 1: New schedule

By changing the schedule without changing the mapping, the goal is to find the best scheduling to avoid the wast of time. The task T3 is required to allow the execution of T5 which is the longest task. T5 doesn't require T2 and T4, so T3 has to be before T2 to avoid that waste of time.

#### 3.2 New mapping

This new schedule has been constructed by changing the mapping of tasks to processors.



Figure 2: New mapping

By changing the mapping, the goal is to condense the tasks as much as possible. The tasks T1, T3, and T5 are dependent on each other and they need to be ordered like that. So a single processor can handle those different tasks. In the second hand, the tasks T2 and T4 can be run by another processor at the same time as T1, T2, and T5.