# Memory Hierarchy

## 2 Procedure

### 2.1 Understanding the Program
Without delving into the details of the signal processing application, analyze the flow of the C program.
Observe the data access patterns and identify the critical sequence of accesses which may have a larger
impact on the performance of the system.

```
R: Acessos feitos no processamento dos Macroblocks (L241 a L281).
```

### 2.3 Cache L1

#### 2.3.1 Theory of Cache

1. Explain the different types of cache misses: compulsory, capacity, and conflict



```
Compulsory: Cache misses inevitáveis por conteúdos que nunca foram carregados ao iniciar uma dada execução pela primeira vez (cold start).
Capacity: Ocorrem quando um dado nível de cache não tem capacidade para armazenar toda a informação que uma dada execução necessita, sendo necessário substituir blocos.
Conflict: Dão se em caches de mapeamento direto ou set-associative quando uma dada execução pretende guardar um bloco num endereço onde já reside outro bloco, sendo necessário substitui lo.

```

2. Explain the different types of cache writing-policies.


### 2.3.2 Cache L1: dimension and block size

a) Consider a memory hierarchy composed of a single cache memory (L1), which interconnects the
SDRAM frame memory and the CPU.\
Considering the characteristics of the available memory devices (see Table 1), and the maximum total cost of the memory hierarchy, determine the maximum
storage space of cache L1.
- NOTES:
    - the size of any of the memory modules (frame buffer, any cache) must be an integer power of 2:
        - L1_size = $2^{MAX}$;
    - do not forget to consider the cost of the 128 kByte frame memory.

In [24]:
from math import pow

BUDGET = 0.02
PRICE_PER_MBYTE_L1 = 10
PRICE_PER_MBYTE_SDRAM = 0.01
SIZE_SDRAM = pow(2,17)

def calculate_price(size_in_bytes, price_per_Mbyte):
    
    price_per_byte = price_per_Mbyte / pow(2,20)

    return price_per_byte * size_in_bytes


SDRAM_PRICE = calculate_price(SIZE_SDRAM,PRICE_PER_MBYTE_SDRAM)

print("SDRAM price: ", SDRAM_PRICE)

final_budget = BUDGET - SDRAM_PRICE

print("Budget antes de L1: " ,final_budget)

i = 0
while (calculate_price(pow(2,i),PRICE_PER_MBYTE_L1) < final_budget):
    i += 1 
i -=1

print("Valor do i é: " , i)



SDRAM price:  0.00125
Budget antes de L1:  0.01875
Valor do i é:  10


b) Consider three different dimensions for the L1 data cache: L1_size $\in$ {$ 2^{MAX}, 2^{MAX−1}, 2^{MAX−2} $}.\
For each of these dimensions, and assuming a direct mapping configuration, use the dineroIV
simulator to evaluate the resulting average data miss-rate considering the following block sizes:
- Block_size $\in$ {$8, 16, 32, 64$}.\
Fill the following table with the obtained data:

$$
\begin{matrix}
& L1\_size = 2^{10} & L1\_size = 2^{9} & L1\_size = 2^{8}\\
Block size = 8\ Bytes & 0.0305 & 0.1247 & 0.1960\\
Block size = 16\ Bytes & 0.0363 & 0.1184 & 0.1829\\
Block size = 32\ Bytes & 0.0770 & 0.1492 & 0.2288\\
Block size = 64\ Bytes & 0.1181 & 0.2021 & 0.3340
\end{matrix}
$$

c) For each L1 cache size, plot the variation of the miss-rate with the size of the block. 

![Graph L1 cache size miss rate with size of the block](./plot1.png#gh-dark-mode-only)

d) By considering the obtained results, select two L1 cache configurations (dimension and block size)
that offer the best trade-off between the cost of the device and the resulting average miss-rate.\
Label in the previous plot the two configurations chosen.

In [25]:
print("Size = 2¹⁰")
print(calculate_price(pow(2,10), PRICE_PER_MBYTE_L1) * 0.0305 + SDRAM_PRICE)
print(calculate_price(pow(2,10), PRICE_PER_MBYTE_L1) * 0.0363 + SDRAM_PRICE)
print(calculate_price(pow(2,10), PRICE_PER_MBYTE_L1) * 0.0770 + SDRAM_PRICE)
print(calculate_price(pow(2,10), PRICE_PER_MBYTE_L1) * 0.1181 + SDRAM_PRICE)
print("Size = 2⁹")
print(calculate_price(pow(2,9), PRICE_PER_MBYTE_L1) * 0.1247 + SDRAM_PRICE)
print(calculate_price(pow(2,9), PRICE_PER_MBYTE_L1) * 0.1184 + SDRAM_PRICE)
print(calculate_price(pow(2,9), PRICE_PER_MBYTE_L1) * 0.1492 + SDRAM_PRICE)
print(calculate_price(pow(2,9), PRICE_PER_MBYTE_L1) * 0.2021 + SDRAM_PRICE)
print("Size = 2⁸")
print(calculate_price(pow(2,8), PRICE_PER_MBYTE_L1) * 0.1960 + SDRAM_PRICE)
print(calculate_price(pow(2,8), PRICE_PER_MBYTE_L1) * 0.1829 + SDRAM_PRICE)
print(calculate_price(pow(2,8), PRICE_PER_MBYTE_L1) * 0.2288 + SDRAM_PRICE)
print(calculate_price(pow(2,8), PRICE_PER_MBYTE_L1) * 0.3340 + SDRAM_PRICE)

Size = 2¹⁰
0.0015478515625
0.0016044921875
0.002001953125
0.0024033203125000003
Size = 2⁹
0.0018588867187500002
0.001828125
0.001978515625
0.0022368164062499998
Size = 2⁸
0.001728515625
0.001696533203125
0.00180859375
0.0020654296875


In [26]:
L1_CONFIG_1_COST = calculate_price(pow(2,10), PRICE_PER_MBYTE_L1) * 0.0305
L1_CONFIG_2_COST = calculate_price(pow(2,10), PRICE_PER_MBYTE_L1) * 0.0363

price_miss_rate_config_1 = L1_CONFIG_1_COST + SDRAM_PRICE # Não concordo que a SDRAM_PRICE seja incluída
price_miss_rate_config_2 = L1_CONFIG_2_COST + SDRAM_PRICE # Não concordo que a SDRAM_PRICE seja incluída

print(price_miss_rate_config_1)
print(price_miss_rate_config_2)

0.0015478515625
0.0016044921875


$$
\begin{matrix}
 & L1\ config \ 1\\
Cache\ size & 2^{10}\\
Block\ size & 2^{3}\\
Miss\ rate & 0.0305\\
Cost & 0.0015478515625
\end{matrix}
$$

$$
\begin{matrix}
 & L1\ config \ 2\\
Cache\ size & 2^{10}\\
Block\ size & 2^{4}\\
Miss\ rate & 0.0363\\
Cost & 0.0016044921875
\end{matrix}
$$

#### 2.3.3 Cache L1: set associativity

a) For each of the two L1 cache setups previously selected, evaluate the compulsory, capacity, conflict and total miss-rates when the following configurations are considered:
- set associativity of 1 (direct-mapped), 2, 4, 8.

Fill the following table with the obtained data:

$$
\begin{matrix}
 &  & L1\ config\ 1 &  & \\
Miss\ rate & 1-way & 2-way & 4-way & 8-way\\
Compulsory & 0.0007747 &  0.00077112 & 0.00077193 & 0.00077652\\
Capacity & 0.0171105 & 0.00174216 & 0.00190593 & 0.0019116\\
Conflict & 0.0280173 & 0.03318315 & 0.00002214 & 0.00001188 \\
Total & 0.0305 & 0.0357 & 0.0027 & 0.0027
\end{matrix}
$$

$$
\begin{matrix}
 &  & L1\ config\ 2 &  & \\
Miss\ rate & 1-way & 2-way & 4-way & 8-way\\
Compulsory & 0.00038478 & 0.00038584 & 0.00039104 & 0.00038608 \\
Capacity & 0.0109989 & 0.00119756 & 0.00120896 & 0.00121392  \\
Conflict & 0.03481533 & 0.0348166  & 0.0000 & 0.0000\\
Total & 0.0363 & 0.0364 & 0.0016  & 0.0016
\end{matrix}
$$

b) For each L1 cache setup, draw a plot with the variation of the obtained compulsory, capacity,
conflict and total miss-rates for the considered set associativity ways.

c) Comment the results above

d) Write the expression that provides the mean access time as a function of the L1 cache hit
and miss
rates, the L1 cache hit
and miss
access times, and the time penalty
associated to each associativity level, as expressed in Table 1.\
 Consider a non-blocking critical word-first load policy, where the bus occupancy rate has a lower impact in the performance of the
cache.

In [27]:
from math import log2

MISS_PENALTY = 140

def mean_acess_time (miss_rate, num_of_ways):

    hit_time = 2 * (0.7 + 0.35 * log2(num_of_ways))
    
    return hit_time + miss_rate * MISS_PENALTY


e) Evaluate the mean access time of each configuration, considering the obtained miss-rates and the
time penalty associated to each associativity level.\
Evaluate the resulting cost function, as defined
in Eq. 1 (including the frame memory).\
Fill the following table with the obtained data:

In [28]:
print("####### Access Time #######")
print("L1 config 1")
print(mean_acess_time(0.0305,1))
print(mean_acess_time(0.0357,2))
print(mean_acess_time(0.0027,4))
print(mean_acess_time(0.0027,8))
print("L1 config 2")
print(mean_acess_time(0.0363,1))
print(mean_acess_time(0.0364,2))
print(mean_acess_time(0.0016,4))
print(mean_acess_time(0.0016,8))
print("\n####### Price #######")
print("Config 1 and 2 price")
CONFIG_PRICE = calculate_price(pow(2,10), PRICE_PER_MBYTE_L1) + SDRAM_PRICE
print(CONFIG_PRICE)
print("\n####### Cost Function #######")
print("L1 config 1")
print(mean_acess_time(0.0305,1) * CONFIG_PRICE)
print(mean_acess_time(0.0357,2) * CONFIG_PRICE)
print(mean_acess_time(0.0027,4) * CONFIG_PRICE)
print(mean_acess_time(0.0027,8) * CONFIG_PRICE)
print("L1 config 2")
print(mean_acess_time(0.0363,1) * CONFIG_PRICE)
print(mean_acess_time(0.0364,2) * CONFIG_PRICE)
print(mean_acess_time(0.0016,4) * CONFIG_PRICE)
print(mean_acess_time(0.0016,8) * CONFIG_PRICE)



####### Access Time #######
L1 config 1
5.67
7.098
3.178
3.8779999999999997
L1 config 2
6.481999999999999
7.196
3.024
3.7239999999999998

####### Price #######
Config 1 and 2 price
0.011015625

####### Cost Function #######
L1 config 1
0.06245859374999999
0.07818890625
0.03500765625
0.04271859374999999
L1 config 2
0.07140328124999999
0.0792684375
0.03331125
0.041022187499999994


$$
\begin{matrix}
 &  & L1\ config\ 1 &  & \\
 & 1-way & 2-way & 4-way & 8-way\\
Miss\ rate &   0.0305 & 0.0357 & 0.0027 & 0.0027 \\
Acess\ time & 5.67 & 7.098 & 3.178 & 3.8779999\\
Price & & 0.011015625  & & \\
Cost\ function & 0.06245859374999999 & 0.07818890625 & 0.03500765625  & 0.04271859374999999
\end{matrix}
$$
$$
\begin{matrix}
 &  & L1\ config\ 2 &  & \\
 & 1-way & 2-way & 4-way & 8-way\\
Miss\ rate &  0.0363 & 0.0364 & 0.0016  & 0.0016 \\
Acess\ time & 6.481999 & 7.196  & 3.024 & 3.723999\\
Price & & 0.011015625 & &  \\
Cost\ function & 0.07140328124999999 & 0.0792684375  & 0.03331125 & 0.041022187499999994
\end{matrix}
$$

f) Draw conclusions:

#### 2.3.4 Cache L1: write policy

a) By analyzing the sequence of memory accesses generated by the motion estimation algorithm (see
Fig. 3), select the best setup for the cache writing-policy: write-back versus write-through, writeallocate versus write-not-allocate.\
Justify. (Note that the number of writes is much smaller than
the number of reads.)

#### 2.3.5 Cache L1: final selection

a) By considering the obtained results, select the L1 cache setup that offers the best compromise
between the cost of the device and the resulting average access time.

$$
\begin{matrix}
 & L1\ config\\
Cache\ dimension & 2^{10}\\
Block\ size & 2^4\\
Associativity & 4-ways\\
Write\ policy & TODO\\
Miss\ rate & 0.0016\\
Acess\ time & 3.024\\
Price & \\
Cost\ function & 
\end{matrix}
$$

### 2.4 Cache L2

#### 2.4.1 Cache L2: dimension

a) Considering the maximum cost of the whole memory hierarchy, as well as the price of L1 cache
and the 128 kByte frame memory, determine the maximum storage space of L2 cache (an integer
power of 2), considering the characteristics of the available memory devices (see Table 1).

b) For the obtained maximum storage space for L2 cache, adopting a direct mapping configuration,
use dineroIV simulator to evaluate the resulting average data miss-rate considering the following
block sizes: (1 × L1_block), (2 × L1_block), (4 × L1_block) and (8 × L1_block).\
Fill the following table with the obtained data:

c) Plot the variation of the miss-rate with the size of the block. 

d) From the obtained results, select the block size that offers the best trade-off between the resulting
average miss-rate and the time penalty associated with each data fetch from the primary memory.\
Justify.

#### 2.4.2 Cache L2

a) Evaluate the compulsory, capacity, conflict and total miss-rates for the direct-mapped L2 data
cache.\
Fill the following table with the obtained data:


$$
\begin{matrix}
& Miss Rate\\
Compulsory &   \\
Capacity   & \\
Conflict   & \\
Total      &
\end{matrix}
$$

b) Plot the variation of the obtained compulsory, capacity, conflict and total miss-rate.

c) Write the expression which provides the mean access time as a function of the L1 and L2 cache
hit and miss  rates, L1 and L2 cache hit and miss  access
times, and the time penalty, as expressed in table 1.

d) Evaluate the mean access time provided by the chosen configuration, considering the obtained
miss-rate and the time penalty. Evaluate the resulting cost function, as defined in Eq. 1.\
Fill the following table with the obtained data:

$$
\begin{matrix}
 Miss Rate &\\
Acess time &   \\
Price   & \\
Cost function   & \\
\end{matrix}
$$

### 2.5 Memory Hierarchy Configuration

a) By considering the obtained results, fill the following table with the selected characteristics for L1
and L2 cache memories, as well as the corresponding performance results of the overall memory
hierarchy.

$$
\begin{matrix}
 & Cache\ L1 & Cache\ L2 & Frame\ Memory\\
Dimension\ ( Bytes) &  &  & 128\ *\ 1024\\
Block\ size\ ( Bytes) &  &  & -\\
Associativity &  &  & -\\
Write\ policy &  &  & -\\
Local\ Miss\ rate\ ( \%) &  &  & -\\
Price\ ( € ) &  &  & \\
Global\ Miss\ rate\ ( \%) &  &  & \\
Global\ access\ rate\ ( ns) &  &  & \\
Total\ Price\ ( € ) &  &  & \\
Cost\ function\ ( € \ *\ ns) &  &  & 
\end{matrix}
$$