# Exercises on thermal policies

To investigate and optimize the thermal behavior of the system, we can combine gem5-X, a power model, and 3D-ICE to do the full system simulation. We have prepared a set of exercises on thermal management for you to follow. To streamline the process and save time, we have collected power traces of the perlbench benchmark from the SPEC CPU 2017, eliminating the need for time-consuming gem5-X simulations.

You can locate the power traces in the "data" folder. These power traces were generated by running the perlbench benchmark with gem5-X at nine different frequency levels: 0.12720, 0.24020, 0.64980, 1.0045, 1.3174, 1.5973, 1.8505, 2.0817, and 2.2943 GHz.

To complete these exercises, we predefined the following functions and code snippets you can use:

In [None]:
# Import necessary packages, and define the useful functions
import pandas as pd
import numpy as np
import subprocess
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

path_3d_ice = "PATHTO/3d-ice/bin/3D-ICE-Emulator"
freq_level_list = [0.12720, 0.24020, 0.64980, 1.0045, 1.3174, 1.5973, 1.8505, 2.0817, 2.2943]
temp_lim = 337.5

# The function to create a simple demo flp file for 3D-ICE. The flp file has 4 cores, and each core has the same power density.
def create_flp_uniform(freq_level, flp_name):
    with open('data/freq_level'+str(freq_level+1)+'.csv', 'r') as f:
        content=f.readlines()
    with open(flp_name, 'w') as f:
        for i in range(4):
            block_name='core'+str(i)
            f.write("{0} :\n  position  {1},   {2} ;\n  dimension  {3},   {4} ;\n  power values  {5} ;\n\n".format(block_name, 1000*(i%2), 1000*int(i/2), 1000, 1000, content[0][0:-1]))

# The function to create a complex demo flp file for 3D-ICE. The flp file has 4 cores, and each core has different power density according to its frequency level.
def create_flp(freq_level, flp_name, start_index):
    with open(flp_name, 'w') as fflp:
        for i in range(4):
            with open('data/freq_level'+str(freq_level[i]+1)+'.csv', 'r') as fcsv:
                content=fcsv.readlines()
            block_name='core'+str(i+start_index)
            fflp.write("{0} :\n  position  {1},   {2} ;\n  dimension  {3},   {4} ;\n  power values  {5} ;\n\n".format(block_name, 1000*(i%2), 1000*int(i/2), 1000, 1000, content[0][0:-1]))

# The function to plot temperature traces of each core in the result file. It also plots the dashed line to indicate the thermal threshold at 340K.       
def plot_cores(result_file):
    df = pd.read_csv(result_file, sep='\s\t\s', header=1, engine='python')
    fig, ax = plt.subplots()
    for i in range(len(df.columns)-1):
        df.plot(x='% Time(s)', y=df.columns.values[i+1],ax=ax)
    ax.legend(df.columns.values[1:])
    ax.set_xlabel("Time (s)")
    ax.set_ylabel("Temperature (K)")
    plt.hlines(temp_lim, 0, df.shape[0]/10, colors='k', linestyles='dashed')

# The function to plot temperature traces of cores in two layers (therefore, takes two input result files). It also plots the dashed line to indicate the thermal threshold at 340K.
def plot_2layer_cores(result_file_top, result_file_bottom):
    df1 = pd.read_csv(result_file_top, sep='\s\t\s', header=1, engine='python')
    df2 = pd.read_csv(result_file_bottom, sep='\s\t\s', header=1, engine='python')
    df2.pop(df2.columns[0])
    df = pd.concat([df1, df2], axis=1)
    df = pd.concat([df1, df2], axis=1)
    fig, ax = plt.subplots()
    for i in range(len(df.columns)-1):
        df.plot(x='% Time(s)', y=df.columns.values[i+1],ax=ax)
    ax.legend(df.columns.values[1:])
    ax.set_xlabel("Time (s)")
    ax.set_ylabel("Temperature (K)")
    plt.hlines(temp_lim, 0, df.shape[0]/10, colors='k', linestyles='dashed')

# function to read the flp file and return a dictionary of positions and dimensions of each core
def parse_flp_file(filename):
    data = {}
    counter = 0
    with open(filename, 'r') as file:
        for line in file:
            line = line.strip()
            if line.startswith('position') or line.startswith('dimension'):
                key, x, y =line.replace(', ', ' ').split()[0:3]
                key=key+str(counter)
                data[key] = [float(x), float(y)]
                if line.startswith('dimension'):
                    counter += 1
            else:
                continue
    return data

# function to plot the floorplan of the flp file
def plot_rectangles(rectangles_dict):
    fig, ax = plt.subplots()
    ax.set_aspect('equal')

    for key in rectangles_dict:
        if key.startswith('position'):
            position = rectangles_dict[key]
            dimension_key = 'dimension' + key[-1]
            dimension = rectangles_dict[dimension_key]
            rect = Rectangle(position, dimension[0], dimension[1], alpha=0.5, linewidth=2, edgecolor='r', facecolor='none')
            ax.add_patch(rect)
            ax.text(position[0]+dimension[0]/2, position[1]+dimension[1]/2, 'core'+key[-1],
                horizontalalignment='center',
                verticalalignment='center',)

    plt.xlim(0, max(rectangles_dict['position0'][0], rectangles_dict['position1'][0]) +
              rectangles_dict['dimension0'][0])
    plt.ylim(0, max(rectangles_dict['position2'][1], rectangles_dict['position3'][1]) +
              rectangles_dict['dimension2'][1])

    plt.xlabel('X (mm)')
    plt.ylabel('Y (mm)')
    plt.title('Floorplan')
    plt.show()

## EX1: Simple thermal policy with frequency scaling

The first exercise aims to try out a simple thermal policy that scales the frequency of the system to keep the temperature below a certain threshold.

1. Run 3D-ICE with given power trace @ 2.2943 GHz. Considering 2.2943 GHz is the highest frequency and its index is **8** in the **freq_level_list**. So give **freq_level=8** and run the following code:

In [None]:
freq_level = 8
create_flp_uniform(freq_level,'flp_demo.flp')
print("run at freq: ", freq_level_list[freq_level], "GHz")
run_output = subprocess.run([path_3d_ice, "stk_demo.stk"], stdout=subprocess.DEVNULL)

2. After the simulation, the temperature output is **CPU_DIE_flp.txt** according to the specification in the stk file. Then plot the temperature trace and check if the temperature is within the safe range (340K)


In [None]:
plot_cores('CPU_DIE_flp.txt')

3. Now go back to step 1 and lower down the frequency to run 3D-ICE again. One key idea of thermal management is to lower down the frequency to reduce the power consumption and thus the temperature. So try to find the highest frequency that can keep the temperature below 340K.

4. If you have found the highest frequency to let the temperature below 340K, then you successfully complete this exercise and understand how the frequency scaling works in the thermal management. In this way, both performance (frequency not decreased too much) and thermal safety (temperature not too high) can be guaranteed.

## EX2: Frequency scaling for each core


In the previous exercises, we applied a common approach of lowering the frequency of all cores simultaneously. While this approach is commonly used in certain designs when cores share a common global frequency. It is also possible to have different cores operating at different frequencies. By doing so, we can reduce the frequency of idle cores while keeping the frequency of active cores unchanged. In this exercise, we will focus on investigating this specific thermal policy.

1. For instance, each core's frequency level can be specified in the following example with a freq_level array, with four elements indicating four cores' frequencies.

In [None]:
freq_level = [8,8,8,8]
create_flp(freq_level,'flp_demo.flp',start_index=0)
print("run at freq: ", freq_level_list[freq_level[0]],freq_level_list[freq_level[1]],freq_level_list[freq_level[2]],freq_level_list[freq_level[3]], "GHz")
run_output = subprocess.run([path_3d_ice, "stk_demo.stk"], stdout=subprocess.DEVNULL)

2. Plot the temperature of the cores

In [None]:
plot_cores('CPU_DIE_flp.txt')

3. Go to step 1 and change the frequency of each core until the temperature is within the safe range, while trying to keep the frequency of these four cores as high as possible.

4. Propose a hypothesis on why the final frequency levels are chosen to maintain the temperature within the safe range. You can use the temperature trace and the core floorplan to support your hypothesis.


5. The follow code is to draw the floorplan of the cores.

In [None]:
flp_data=parse_flp_file('flp_demo.flp')
plot_rectangles(flp_data)

## EX3: Task mapping on 3D MPSoC

Now we will move to the 3D MPSoC. 3D MPSoC usually contains a multilayer of cores. For instance, we can have 4 cores in each layer and 2 core layers in total.
In this exercise, we will investigate the thermal behavior of the 3D MPSoC by specifying the frequency level of each core. The frequency level of each core is specified in the following example with a freq_level array, with eight elements indicating eight cores' frequency.

1. Give the frequency level of each core and run 3D-ICE

In [None]:
# first 4 elements are top layer cores (near the heatsink), last 4 elements are bottom layer cores (near the substrate)
freq_level = [6,6,6,6,6,6,6,6]
create_flp(freq_level[0:4],'flp_top_demo.flp',start_index=0)
create_flp(freq_level[4:8],'flp_bot_demo.flp',start_index=4)
print("Top layer cores run at freq: ", freq_level_list[freq_level[0]],freq_level_list[freq_level[1]],freq_level_list[freq_level[2]],freq_level_list[freq_level[3]], "GHz")
print("Bottom layer cores run at freq: ", freq_level_list[freq_level[4]],freq_level_list[freq_level[5]],freq_level_list[freq_level[6]],freq_level_list[freq_level[7]], "GHz")
run_output = subprocess.run([path_3d_ice, "stk_3D_demo.stk"], stdout=subprocess.DEVNULL)

2. Plot the temperature of the cores

In [None]:
plot_2layer_cores('CPU_DIE_TOP_flp.txt', 'CPU_DIE_BOT_flp.txt')

3. Check the differences of each core's temperature and explain why the temperature is different even the frequency/power of each core is the same.

4. Go back to step 1 and think about the task mapping strategy to activate 6 active cores on the target MPSoC. You can use the frequency level 0 to deactivate the core and use the frequency level larger than 1 to activate the core.

## EX4: Dynamic thermal management

The above analyses are all about static thermal management. In other words, the frequency is fixed for the whole execution time. However, in a real scenario, the runtime can be adjusted based on the runtime status of the system, i.e., power and thermal constraints. In this exercise, we will investigate dynamic thermal management by designing a simple thermal policy. The idea is to dynamically adjust the frequency of the cores to keep the temperature within the safe range.

To complete these exercises, we predefined the following functions and code snippets you can use:

In [None]:
# Number of elements in each power trace file
ncol = np.array([6342, 3401, 1370,  954,  769,  668,  598,  557,  520])
# The synthetic metric to locate where the runtime phase of the application
acc_factor = ncol[0]/ncol
ins_left = np.array([6342]*4)
# Initial frquency level of each core, there are four cores in this example
freq_level = np.array([8,8,8,8])
# Initialize the power list
power_list = ([[],[],[],[]])
freq_list = ([[],[],[],[]])
# function to create the flp file with dynamically power list
def create_flp_dynamic(freq_level, flp_name, start_index):
    with open(flp_name, 'w') as fflp:
        for i in range(4):
            file_name='data/freq_level'+str(freq_level[i]+1)+'.csv'
            df = pd.read_csv(file_name, header=None)
            power_data=df.to_numpy().squeeze()
            sim_t = len(power_data) - ins_left[i]/acc_factor[freq_level[i]]
            power_data_sel=power_data[int(sim_t)]
            power_list[i].append(power_data_sel)
            freq_list[i].append(freq_level_list[freq_level[i]])
            power_list_np = np.array(power_list[i])
            power_string=np.array2string(power_list_np, separator=', ')
            block_name='core'+str(i+start_index)
            fflp.write("{0} :\n  position  {1},   {2} ;\n  dimension  {3},   {4} ;\n  power values  {5} ;\n\n".format(block_name, 1000*(i%2), 1000*int(i/2), 1000, 1000, power_string[1:-1]))

# function to read the latest temperature data from the result file
def read_temp(result_file):
    df = pd.read_csv(result_file, sep='\s\t\s', header=1, engine='python')
    return df.tail(1).values.squeeze()[1:]

# function to evaluate the results, including the number of events larger than 340K and the execution time
def eva_results(result_file):
    df = pd.read_csv(result_file, sep='\s\t\s', header=1, engine='python')
    # number of elements larger than 340K
    num = df[df['core0(K)'] > temp_lim].shape[0]+df[df['core1(K)'] > temp_lim].shape[0]+df[df['core2(K)'] > temp_lim].shape[0]+df[df['core3(K)'] > temp_lim].shape[0]
    print("number of events larger than",temp_lim,"K : ", num)
    print("execution time: ", df['core0(K)'].shape[0]/10, "s")

# function to plot the temperature/freq/power trace core0
def plot_all_trace(temp_file,freq,power):
    df = pd.read_csv(temp_file, sep='\s\t\s', header=1, engine='python')
    temp = df['core0(K)'].values.squeeze()
    fig, axs = plt.subplots(3, 1)
    axs[0].plot(temp)
    axs[0].hlines(temp_lim, 0, temp.shape[0], colors='k', linestyles='dashed')
    axs[0].set(xlabel='time (s)', ylabel='Temperature (K)')

    axs[1].plot(freq)
    axs[1].set(xlabel='time (s)', ylabel='Frequency (GHz)')

    axs[2].plot(power)
    axs[2].set(xlabel='time (s)', ylabel='Power (W)')

    for ax in axs.flat:
        ax.label_outer()
    plt.show()

1. Here is the predefined simple thermal policy and the corresponding code snippets for dynamic thermal management.

In [None]:
def thermal_policy(temp, freq_level):
# ↓↓↓ Write your thermal policy here  ↓↓↓ #
    if np.any(temp > temp_lim):
        freq_level = freq_level - 1
# ↑↑↑ Write your thermal policy here  ↑↑↑ #
    return freq_level


# Initialize the runtime parameters before the running
ins_left = np.array([6342]*4)
freq_level = np.array([8,8,8,8])
power_list = ([[],[],[],[]])
freq_list = ([[],[],[],[]])

# Run the simulation
while True: 
    create_flp_dynamic(freq_level,'flp_demo.flp',start_index=0)
    ins_left = ins_left - acc_factor[freq_level]
    print("run at freq: ", freq_level_list[freq_level[0]],freq_level_list[freq_level[1]],freq_level_list[freq_level[2]],freq_level_list[freq_level[3]], "GHz")
    run_output = subprocess.run([path_3d_ice, "stk_demo.stk"], stdout=subprocess.DEVNULL)
    temp=read_temp('CPU_DIE_flp.txt')
    
    # Thermal policy
    freq_level = thermal_policy(temp, freq_level)

    # Check and maintain the frequency level in the right range
    for i in range(4):
        if freq_level[i] < 0:
            freq_level[i] = 0
        if freq_level[i] > 8:
            freq_level[i] = 8
    # Stop condition
    if np.any(ins_left < 5000):
        break

2. Plot the temperature traces and check the evaluation results.

In [None]:
plot_cores('CPU_DIE_flp.txt')
eva_results('CPU_DIE_flp.txt')

3. You can even dig into the temperature/frequency/power trace of core0 to see how the system works

In [None]:
plot_all_trace('CPU_DIE_flp.txt',freq_list[0],power_list[0])

4. Can you understand how the thermal policy works? If so, go back to step 1 and propose a simple method to make the thermal policy better.

## Additional EX5: Design a heat sink for the 3D MPSoC
Based on the knowledge learned from this course, including power modeling, 3D-ICE thermal modeling, and heat sink design, try to design a heat sink for the provided 3D MPSoC configuration (EX3) to investigate the differences in the thermal behaviors of the default heat sink in 3D-ICE and your own designed heat sink.