# Workload


To build: COUNT=5000000 make build/runALU. Basically every call to `alu` generates 10 million instructions

```
work.S:
	.global alu
	.intel_syntax noprefix
	.text
alu:
	.rept REPEAT_COUNT
	addq    rsi, rdi
	popcnt  rax, rsi
	.endr

	ret
```

### Validation with perf

Running one iteration takes around 28 million instructions (10 million from `alu` function and 18 million from rest of code). Doubling the number of iterations results in around 38 million instructions.
```
[peaks@kube-worker-68 microbenchmarks]$ for ((i=0;i<10;i++)); do perf stat -e instructions,cache-misses,ref-cycles -x, taskset -c 87 ./build/runALU 1 && echo ""; done
28004730,,instructions,15375277,100.00,,
344200,,cache-misses,15375277,100.00,,
33689634,,ref-cycles,15375277,100.00,,

28026515,,instructions,17130345,100.00,,
336375,,cache-misses,17130345,100.00,,
37541658,,ref-cycles,17130345,100.00,,

27997892,,instructions,13704671,100.00,,
318343,,cache-misses,13704671,100.00,,
30017966,,ref-cycles,13704671,100.00,,

28000442,,instructions,14172964,100.00,,
302967,,cache-misses,14172964,100.00,,
31078234,,ref-cycles,14172964,100.00,,

28006342,,instructions,14456407,100.00,,
292586,,cache-misses,14456407,100.00,,
31642512,,ref-cycles,14456407,100.00,,

28018571,,instructions,13723581,100.00,,
283121,,cache-misses,13723581,100.00,,
30070920,,ref-cycles,13723581,100.00,,

27987235,,instructions,13564456,100.00,,
272930,,cache-misses,13564456,100.00,,
29742328,,ref-cycles,13564456,100.00,,

28004690,,instructions,13669173,100.00,,
267673,,cache-misses,13669173,100.00,,
29955068,,ref-cycles,13669173,100.00,,

27991981,,instructions,14824238,100.00,,
269503,,cache-misses,14824238,100.00,,
32460098,,ref-cycles,14824238,100.00,,

28089998,,instructions,13856998,100.00,,
260315,,cache-misses,13856998,100.00,,
30351530,,ref-cycles,13856998,100.00,,


[peaks@kube-worker-68 microbenchmarks]$ for ((i=0;i<10;i++)); do perf stat -e instructions,cache-misses,ref-cycles -x, taskset -c 87 ./build/runALU 2 && echo ""; done
38044946,,instructions,23147417,100.00,,
470815,,cache-misses,23147417,100.00,,
50711518,,ref-cycles,23147417,100.00,,

38059694,,instructions,20969259,100.00,,
177823,,cache-misses,20969259,100.00,,
46005828,,ref-cycles,20969259,100.00,,

38052264,,instructions,23218097,100.00,,
201321,,cache-misses,23218097,100.00,,
50865342,,ref-cycles,23218097,100.00,,

38055470,,instructions,21022039,100.00,,
151106,,cache-misses,21022039,100.00,,
46023516,,ref-cycles,21022039,100.00,,

38032770,,instructions,21166507,100.00,,
156159,,cache-misses,21166507,100.00,,
46358972,,ref-cycles,21166507,100.00,,

38049037,,instructions,22772074,100.00,,
151531,,cache-misses,22772074,100.00,,
49950142,,ref-cycles,22772074,100.00,,

38046255,,instructions,20270277,100.00,,
147346,,cache-misses,20270277,100.00,,
44493394,,ref-cycles,20270277,100.00,,

38042996,,instructions,21290801,100.00,,
136496,,cache-misses,21290801,100.00,,
46657952,,ref-cycles,21290801,100.00,,

38048794,,instructions,21466750,100.00,,
158948,,cache-misses,21466750,100.00,,
47082772,,ref-cycles,21466750,100.00,,

38052733,,instructions,20083389,100.00,,
137960,,cache-misses,20083389,100.00,,
44047344,,ref-cycles,20083389,100.00,,
```

Running for a total of 60 seconds and 180 seconds:

```
[peaks@kube-worker-68 microbenchmarks]$ for ((i=0;i<10;i++)); do perf stat -e instructions,cache-misses,ref-cycles -x, taskset -c 87 ./build/runALU 9000 && echo ""; done
90346696263,,instructions,62172728041,100.00,,
1795414,,cache-misses,62172728041,100.00,,
135869153134,,ref-cycles,62172728041,100.00,,

90346721901,,instructions,62171493809,100.00,,
1260656,,cache-misses,62171493809,100.00,,
135867139738,,ref-cycles,62171493809,100.00,,
......

[peaks@kube-worker-68 microbenchmarks]$ for ((i=0;i<10;i++)); do perf stat -e power/energy-pkg/,power/energy-ram/ -x, taskset -c 87 ./build/runALU 9000 && echo ""; done
6553.08,Joules,power/energy-pkg/,249397611610,100.00,,
4914.94,Joules,power/energy-ram/,249397605610,100.00,,

6599.03,Joules,power/energy-pkg/,249230672389,100.00,,
4919.16,Joules,power/energy-ram/,249230665172,100.00,,
......

[peaks@kube-worker-68 microbenchmarks]$ for ((i=0;i<3;i++)); do perf stat -e instructions,cache-misses,ref-cycles -x, taskset -c 87 ./build/runALU 27000 && echo ""; done
271019342020,,instructions,186566646449,100.00,,
5422180,,cache-misses,186566646449,100.00,,
407704045012,,ref-cycles,186566646449,100.00,,

271005179581,,instructions,186505627492,100.00,,
2469731,,cache-misses,186505627492,100.00,,
407560339098,,ref-cycles,186505627492,100.00,,

271004543468,,instructions,186502085043,100.00,,
2818592,,cache-misses,186502085043,100.00,,
407568635628,,ref-cycles,186502085043,100.00,,

[peaks@kube-worker-68 microbenchmarks]$ for ((i=0;i<3;i++)); do perf stat -e power/energy-pkg/,power/energy-ram/ -x, taskset -c 87 ./build/runALU 27000 && echo ""; done
19765.18,Joules,power/energy-pkg/,747671052414,100.00,,
14772.30,Joules,power/energy-ram/,747671054744,100.00,,

19816.03,Joules,power/energy-pkg/,747963365249,100.00,,
14778.91,Joules,power/energy-ram/,747963359915,100.00,,

19770.85,Joules,power/energy-pkg/,747652525908,100.00,,
14769.34,Joules,power/energy-ram/,747652515511,100.00,,
```

## Summary
Iteration of 9000 takes 60 seconds, 90346696263 instructions, 6553.08 Package Joules, 4914.94 DRAM Joules

Iteation of 27000 takes 180 seconds, 271019342020 instructions, 19765.18 Package Joules, 14772.30 DRAM Joules

# Kepler measurements

These measurements are collected manually by `kubectl port-forward svc/kepler-exporter 9102:9102` then doing `curl localhost:9102/metrics`

In [2]:
for i in range(1,6):
    fs = f"kepler.ALU9000.metricsS{i}"
    fe = f"kepler.ALU9000.metricsE{i}"


    with open(fe) as f:
        lines = f.readlines()
    for line in lines:
        if "kepler_container_cpu_instructions_total" in line:
            inse = float(line.split(' ')[1])
        if "kepler_container_cache_miss_total" in line:
            ce = float(line.split(' ')[1])
        if "kepler_container_cpu_cycles_total" in line:
            cyce = float(line.split(' ')[1])
            
        if "dynamic" in line:
            if "kepler_container_package_joules_total" in line:
                jpe = float(line.split(' ')[1])
            if "kepler_container_dram_joules_total" in line:
                jre = float(line.split(' ')[1])

    with open(fs) as f:
        lines = f.readlines()
    for line in lines:
        if "kepler_container_cpu_instructions_total" in line:
            inss = float(line.split(' ')[1])
        if "kepler_container_cache_miss_total" in line:
            cs = float(line.split(' ')[1])
        if "kepler_container_cpu_cycles_total" in line:
            cycs = float(line.split(' ')[1])
            
        if "dynamic" in line:
            if "kepler_container_package_joules_total" in line:
                jps = float(line.split(' ')[1])
            if "kepler_container_dram_joules_total" in line:
                jrs = float(line.split(' ')[1])

    print(f"Run {i}:")
    print(f"\t kepler_container_cpu_instructions_total: {inse-inss}, {round(90346696263/(inse-inss), 2)}")
    print(f"\t kepler_container_cache_miss_total: {ce-cs}, {round(1795414/(ce-cs), 2)}")
    print(f"\t kepler_container_cpu_cycles_total: {cyce-cycs}, {round(135869153134/(cyce-cycs), 2)}")
    print(f"\t kepler_container_package_joules_total: {round(jpe-jps,2)}, {round(6553.08/(jpe-jps), 2)}")
    print(f"\t kepler_container_dram_joules_total: {round(jre-jrs, 2)}\n")

    print(f"\t kepler_container_package_joules_total: {round(jpe-jps, 2)}, {round((6553.08-1063.66)/(jpe-jps), 2)}")
    
    

Run 1:
	 kepler_container_cpu_instructions_total: 28458575512.0, 3.17
	 kepler_container_cache_miss_total: 60152.0, 29.85
	 kepler_container_cpu_cycles_total: 47532988281.0, 2.86
	 kepler_container_package_joules_total: 1851.39, 3.54
	 kepler_container_dram_joules_total: 1.84

	 kepler_container_package_joules_total: 1851.39, 2.97
Run 2:
	 kepler_container_cpu_instructions_total: 37516475291.0, 2.41
	 kepler_container_cache_miss_total: 36444.0, 49.27
	 kepler_container_cpu_cycles_total: 58841169703.0, 2.31
	 kepler_container_package_joules_total: 2020.03, 3.24
	 kepler_container_dram_joules_total: 0.54

	 kepler_container_package_joules_total: 2020.03, 2.72
Run 3:
	 kepler_container_cpu_instructions_total: 33213024638.0, 2.72
	 kepler_container_cache_miss_total: 63453.0, 28.3
	 kepler_container_cpu_cycles_total: 47816475145.0, 2.84
	 kepler_container_package_joules_total: 2037.34, 3.22
	 kepler_container_dram_joules_total: 1.0

	 kepler_container_package_joules_total: 2037.34, 2.69
Run

In [21]:
for i in range(1,4):
    fs = f"kepler.ALU27000.metricsS{i}"
    fe = f"kepler.ALU27000.metricsE{i}"


    with open(fe) as f:
        lines = f.readlines()
    for line in lines:
        if "kepler_container_cpu_instructions_total" in line:
            inse = float(line.split(' ')[1])
        if "kepler_container_cache_miss_total" in line:
            ce = float(line.split(' ')[1])
        if "kepler_container_cpu_cycles_total" in line:
            cyce = float(line.split(' ')[1])
            
        if "dynamic" in line:
            if "kepler_container_package_joules_total" in line:
                jpe = float(line.split(' ')[1])
            if "kepler_container_dram_joules_total" in line:
                jre = float(line.split(' ')[1])

    with open(fs) as f:
        lines = f.readlines()
    for line in lines:
        if "kepler_container_cpu_instructions_total" in line:
            inss = float(line.split(' ')[1])
        if "kepler_container_cache_miss_total" in line:
            cs = float(line.split(' ')[1])
        if "kepler_container_cpu_cycles_total" in line:
            cycs = float(line.split(' ')[1])
            
        if "dynamic" in line:
            if "kepler_container_package_joules_total" in line:
                jps = float(line.split(' ')[1])
            if "kepler_container_dram_joules_total" in line:
                jrs = float(line.split(' ')[1])

    print(f"Run {i}:")
    print(f"\t kepler_container_cpu_instructions_total: {inse-inss}, {round(271019342020/(inse-inss), 2)}")
    print(f"\t kepler_container_cache_miss_total: {ce-cs}, {round(2818592/(ce-cs), 2)}")
    print(f"\t kepler_container_cpu_cycles_total: {cyce-cycs}, {round(407568635628/(cyce-cycs), 2)}")
    print(f"\t kepler_container_package_joules_total: {round(jpe-jps,2)}, {round(19765.18/(jpe-jps), 2)}")
    print(f"\t kepler_container_dram_joules_total: {round(jre-jrs, 2)}\n")

Run 1:
	 kepler_container_cpu_instructions_total: 81695242554.0, 3.32
	 kepler_container_cache_miss_total: 117117845.0, 0.02
	 kepler_container_cpu_cycles_total: 134887793734.0, 3.02
	 kepler_container_package_joules_total: 5314.17, 3.72
	 kepler_container_dram_joules_total: 595.49

Run 2:
	 kepler_container_cpu_instructions_total: 113709154733.0, 2.38
	 kepler_container_cache_miss_total: 406986848.0, 0.01
	 kepler_container_cpu_cycles_total: 183228650584.0, 2.22
	 kepler_container_package_joules_total: 6980.64, 2.83
	 kepler_container_dram_joules_total: 1982.71

Run 3:
	 kepler_container_cpu_instructions_total: 117433212110.0, 2.31
	 kepler_container_cache_miss_total: 422058682.0, 0.01
	 kepler_container_cpu_cycles_total: 195343072402.0, 2.09
	 kepler_container_package_joules_total: 6820.92, 2.9
	 kepler_container_dram_joules_total: 2006.13



# Summary

kepler_container_cpu_instructions_total, kepler_container_cpu_cycles_total, kepler_container_package_joules_total all seem to be roughly 3X lower than the perf values

kepler_container_cache_miss_total, kepler_container_dram_joules_total don't seem reliable at the moment