# Memory Hierarchy

## 2 Procedure

### 2.1 Understanding the Program
Without delving into the details of the signal processing application, analyze the flow of the C program.
Observe the data access patterns and identify the critical sequence of accesses which may have a larger
impact on the performance of the system.

### 2.3 Cache L1

#### 2.3.1 Theory of Cache

1. Explain the different types of cache misses: compulsory, capacity, and conflict



2. Explain the different types of cache writing-policies.


### 2.3.2 Cache L1: dimension and block size

a) Consider a memory hierarchy composed of a single cache memory (L1), which interconnects the
SDRAM frame memory and the CPU.\
Considering the characteristics of the available memory devices (see Table 1), and the maximum total cost of the memory hierarchy, determine the maximum
storage space of cache L1.
- NOTES:
    - the size of any of the memory modules (frame buffer, any cache) must be an integer power of 2:
        - L1_size = $2^{MAX}$;
    - do not forget to consider the cost of the 128 kByte frame memory.

In [12]:
from math import pow

BUDGET = 0.02
PRICE_PER_MBYTE_L1 = 10
PRICE_PER_MBYTE_SDRAM = 0.01
SIZE_SDRAM = pow(2,17)

def calculate_price(size_in_bytes, price_per_Mbyte):
    
    price_per_byte = price_per_Mbyte / pow(2,20)

    return price_per_byte * size_in_bytes


final_budget = BUDGET - calculate_price(SIZE_SDRAM,0.01)

print("Budget antes de L1: " ,final_budget)

i = 0
while (calculate_price(pow(2,i),PRICE_PER_MBYTE_L1) < final_budget):
    i += 1 
i -=1

print("Valor do i é: " , i)



Budget antes de L1:  0.01875
Valor do i é:  10


b) Consider three different dimensions for the L1 data cache: L1_size $\in$ {$ 2^{MAX}, 2^{MAX−1}, 2^{MAX−2} $}.\
For each of these dimensions, and assuming a direct mapping configuration, use the dineroIV
simulator to evaluate the resulting average data miss-rate considering the following block sizes:
- Block_size $\in$ {$8, 16, 32, 64$}.\
Fill the following table with the obtained data:

$$
\begin{matrix}
 & 2{^{\ }}{^{\ }}^{10} & 2^{9} & 2^{8}\\
8\ Bytes & 0.0305 & 0.1247 & 0.1960\\
16\ Bytes & 0.0363 & 0.1184 & 0.1829\\
32\ Bytes & 0.0770 & 0.1492 & 0.2288\\
64\ Bytes & 0.1181 & 0.2021 & 0.3340
\end{matrix}
$$

c) For each L1 cache size, plot the variation of the miss-rate with the size of the block. 

![Graph L1 cache size miss rate with size of the block](./plot1.png#gh-dark-mode-only)

d) By considering the obtained results, select two L1 cache configurations (dimension and block size)
that offer the best trade-off between the cost of the device and the resulting average miss-rate.\
Label in the previous plot the two configurations chosen.

#### 2.3.3 Cache L1: set associativity

a) For each of the two L1 cache setups previously selected, evaluate the compulsory, capacity, conflict and total miss-rates when the following configurations are considered:
- set associativity of 1 (direct-mapped), 2, 4, 8.

Fill the following table with the obtained data:

b) For each L1 cache setup, draw a plot with the variation of the obtained compulsory, capacity,
conflict and total miss-rates for the considered set associativity ways.

c) Comment the results above

d) Write the expression that provides the mean access time as a function of the L1 cache hit
and miss
rates, the L1 cache hit
and miss
access times, and the time penalty
associated to each associativity level, as expressed in Table 1.\
 Consider a non-blocking criticalword-first load policy, where the bus occupancy rate has a lower impact in the performance of the
cache.

e) Evaluate the mean access time of each configuration, considering the obtained miss-rates and the
time penalty associated to each associativity level.\
Evaluate the resulting cost function, as defined
in Eq. 1 (including the frame memory).\
Fill the following table with the obtained data:

f) Draw conclusions:

#### 2.3.4 Cache L1: write policy

a) By analyzing the sequence of memory accesses generated by the motion estimation algorithm (see
Fig. 3), select the best setup for the cache writing-policy: write-back versus write-through, writeallocate versus write-not-allocate.\
Justify. (Note that the number of writes is much smaller than
the number of reads.)

#### 2.3.5 Cache L1: final selection

a) By considering the obtained results, select the L1 cache setup that offers the best compromise
between the cost of the device and the resulting average access time.

### 2.4 Cache L2

#### 2.4.1 Cache L2: dimension

a) Considering the maximum cost of the whole memory hierarchy, as well as the price of L1 cache
and the 128 kByte frame memory, determine the maximum storage space of L2 cache (an integer
power of 2), considering the characteristics of the available memory devices (see Table 1).

b) For the obtained maximum storage space for L2 cache, adopting a direct mapping configuration,
use dineroIV simulator to evaluate the resulting average data miss-rate considering the following
block sizes: (1 × L1_block), (2 × L1_block), (4 × L1_block) and (8 × L1_block).\
Fill the following table with the obtained data:

c) Plot the variation of the miss-rate with the size of the block. 

d) From the obtained results, select the block size that offers the best trade-off between the resulting
average miss-rate and the time penalty associated with each data fetch from the primary memory.\
Justify.

#### 2.4.2 Cache L2

a) Evaluate the compulsory, capacity, conflict and total miss-rates for the direct-mapped L2 data
cache.\
Fill the following table with the obtained data:

b) Plot the variation of the obtained compulsory, capacity, conflict and total miss-rate.

c) Write the expression which provides the mean access time as a function of the L1 and L2 cache
hit and miss  rates, L1 and L2 cache hit and miss  access
times, and the time penalty, as expressed in table 1.

d) Evaluate the mean access time provided by the chosen configuration, considering the obtained
miss-rate and the time penalty. Evaluate the resulting cost function, as defined in Eq. 1.\
Fill the following table with the obtained data:

### 2.5 Memory Hierarchy Configuration

a) By considering the obtained results, fill the following table with the selected characteristics for L1
and L2 cache memories, as well as the corresponding performance results of the overall memory
hierarchy.