# Energy Model sanity checks

The Energy Models (EM) of real platforms rarely match the theoretical models. This notebook is an attempt to centralize a series of checks on the EM of a live target in order to detect abnormalities.

The presence of undesired properties in an EM doesn't necessarily mean that something needs fixing, especially if the this is a real hardware issue. The goal of this notebook is mainly to be informative and to summarize the level of brokeness of a new platform quickly. What shall be done with this information (if it is possible to do something) is totally target and usecase-dependent.

 ## Setup

In [1]:
import logging
from conf import LisaLogging
LisaLogging.setup()
from env import TestEnv
from energy_model import EnergyModel
import libs.utils.energy_model as em

2018-07-24 17:05:34,090 INFO    : root         : Using LISA logging configuration:
2018-07-24 17:05:34,090 INFO    : root         :   /home/queper01/dev/lisa/logging.conf
2018-07-24 17:05:34,124 INFO    : root         : Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
2018-07-24 17:05:34,140 INFO    : root         : Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt


In [2]:
# Global variables
target = None
nrg_model = None

### Create EM from a live target

In [3]:
# Minimal configuration for a Juno
my_conf = {
    
    # Target platform and board
    "platform"    : "linux",
    "board"       : "juno",
    "host"        : "10.4.13.128",
    "username"    : "linaro",
    "password"    : "linaro",
    
    "modules" : ["cpufreq", "cpuidle"],
    "exclude_modules" : ["bl", "hwmon"],
}

target = TestEnv(target_conf=my_conf, force_new=True).target
nrg_model = EnergyModel.from_target(target)

2018-07-24 17:05:35,062 INFO    : TestEnv      : Using base path: /data/work/dev/lisa
2018-07-24 17:05:35,063 INFO    : TestEnv      : Loading custom (inline) target configuration
2018-07-24 17:05:35,064 INFO    : TestEnv      : Devlib modules to load: ['cpuidle', 'cpufreq']
2018-07-24 17:05:35,065 INFO    : TestEnv      : Connecting linux target:
2018-07-24 17:05:35,065 INFO    : TestEnv      :   username : linaro
2018-07-24 17:05:35,066 INFO    : TestEnv      :       host : 10.4.13.128
2018-07-24 17:05:35,067 INFO    : TestEnv      :   password : linaro
2018-07-24 17:05:35,067 INFO    : TestEnv      : Connection settings:
2018-07-24 17:05:35,068 INFO    : TestEnv      :    {'username': 'linaro', 'host': '10.4.13.128', 'password': 'linaro'}
2018-07-24 17:05:42,790 INFO    : TestEnv      : Initializing target workdir:
2018-07-24 17:05:42,791 INFO    : TestEnv      :    /home/linaro/devlib-target
2018-07-24 17:05:43,559 INFO    : TestEnv      : Topology:
2018-07-24 17:05:43,560 INFO    

### Create EM from a static energy model

In [4]:
#from platforms.hikey_energy import hikey_energy as nrg_model
#from platforms.hikey960_energy import hikey960_energy as nrg_model
#from platforms.juno_r0_energy import juno_r0_energy as nrg_model
#from platforms.pixel_energy import pixel_energy as nrg_model

## Energy efficiency

On sane hardware, OPPs at high frequencies should be less energy-efficient than OPPs at low frequencies. If that is not true, the heuristics used in schedutil and EAS are sub-optimal since they always assume the smallest possible frequency request for any given utilization level.

The following check computes the 'capacity / power' ratio of all OPPs of the Energy Model, and asserts that this ratio is monotonically decreasing.

In [5]:
passed, msg = em.is_efficiency_decreasing(nrg_model)
if not passed:
    print('ERROR: ' + msg)
else:
    print('OK: The power efficiency of all CPUs decreases monotonically with growing frequencies')

OK: The power efficiency of all CPUs decreases monotonically with growing frequencies


## Number of active states

In the wake-up path, EAS tries to predict what will be the frequency request given a certain utilization level. The (private) tables used by EAS are supposed to match the OPPs of the target, hence enabling a quick look-up. However, it can be hard on some platform (QCOM SoCs, for example) to know the exact number of OPPs prior to boot, so reporting a static Energy Model in DT which is synchronized with the real set of OPPs isn't easy.

The following check compares the number of entries in the Energy Model tables with the actual available frequencies as reported by CPUFreq in order to detect any mismatch between the two.

In [6]:
if not target:
    raise RuntimeError("Impossible to check what CPUFreq reports whitout a target")

freqs = []
for fd in nrg_model.freq_domains:
    cpu = fd[0]
    freqs.append(len(target.cpufreq.list_frequencies(cpu)))
passed, msg = em.check_active_states_nb(nrg_model, freqs)

if not passed:
    print('ERROR: ' + msg)
else:
    print('OK: the EM and CPUFreq both report the same number of OPPs')

OK: the EM and CPUFreq both report the same number of OPPs


## Over-utilized band

When no CPU is over-utilized, EAS will try to avoid over-utilization as much as possible to avoid loosing the 'control' on the system. As such, the OPPs that are only available in the over-utilized band (between 80% and 100% of the CPU capacity) are generally used very infrequently.

The following hunk reports the OPPs in the over-utilized band.

In [7]:
opp_overutilized = em.get_opp_overutilized(nrg_model)
msg = "The OPPs in the overutilized band are:\n"
for i, opp in enumerate(opp_overutilized):
    msg += "\tGroup {}: {}\n".format(i, opp[1])
print(msg)

The OPPs in the overutilized band are:
	Group 0: [367, 406, 446]
	Group 1: [884, 1024]



The fact that some OPPs are mostly unused because of the over-utilized mechanism can lead to sub-optimal task placements. Indeed, the overutilized OPPs of the little CPUs won't be used much, even if they actually are more energy efficient than the low OPPs of big CPUs.

The over-utilized threshold is a fairly strong design choice for EAS, so it is not obvious if anything needs fixing. However, some OPPs of the little CPUs might be wasted because of this. The following hunk checks if that is the case.

In [8]:
passed, msg = em.compare_big_little_opp(nrg_model)
if not passed:
    print('ERROR: ' + msg)
else:
    print ('OK: the overutilized OPPs of the little CPUs are less efficient that the low OPPs of bigs')

ERROR: It is more energy efficient to run for the utilization [367, 406, 446, 356.8, 418] on the little cpu 
		but it is run on the big cpu due to the overutilization zone


## Ideal placement

It is not desired to have a lot of overlap in the power / perf curves of different types of CPUs. As an attempt to detect completely bogus Energy Models, the following hunk checks that a task using exactly the average capacity of a certain type of CPU would indeed be ran on this type of CPU, if EAS was optimal.

In [9]:
passed, msg = em.ideal_placements(nrg_model)
if not passed:
    print('ERROR: ' + msg)
else:
    print('OK: Average-capacity tasks end up where expected')

OK: Average-capacity tasks end up where expected
