Skip to content

Initial energy model RFC#246

Closed
bjackman wants to merge 10 commits intoARM-software:masterfrom
bjackman:initial-energy-model-rfc
Closed

Initial energy model RFC#246
bjackman wants to merge 10 commits intoARM-software:masterfrom
bjackman:initial-energy-model-rfc

Conversation

@bjackman
Copy link
Contributor

This class provides a model of systems with:

  • CPU capacity at different frequencies.
  • Power usage at different frequency.
  • Power usage in different idle states.

The model is aware of topologically shared resources (clusters),
topological dependencies for idle states (power domains) and frequency
domains.

The intended use case for this model is for testing energy-aware
scheduling in a platform-agnostic way.


This is an RFC of the EnergyModel class. This doesn't include the complex stuff, just the expression of the data. A later version will introduce code to estimate energy usage for a workload both by imagining ideal kernel behaviour and examining Trappy traces. This will be used to compare traces against "ideal" behaviour for automated tests of Energy Aware Scheduling.

The first commit (energy_model: Add EnergyModel class) contains the interesting bit, the rest of the commits demonstrate some intended uses.

This leaves lots of room for refactoring around LISA, for example the TestEnv::platform attribute becomes redundant. However I'm leaving that until the design is finalised here as the way that refactoring is done will probably need quite a lot of discussion.

Brendan Jackman added 10 commits November 30, 2016 18:19
This class provides a model of systems with:

- CPU capacity at different frequencies.
- Power usage at different frequency.
- Power usage in different idle states.

The model is aware of topologically shared resources (clusters),
topological dependencies for idle states (power domains) and frequency
domains.

The intended use case for this model is for testing energy-aware
scheduling in a platform-agnostic way.
The migrators probably won't start on big CPUs.
Copy link
Contributor

@derkling derkling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite missing some support material (https://goo.gl/2wiF5b) I've tried to go through all the patches.
Overall really appreciable effort, from the data structures side I guess we are almost there.

I would just suggest to add a little bit of documentation and perhaps few dummy wrapping functions to make code more similar in the EM initialization side.

Some doubts about the way we represent Pixel like systems and how the EM methods affects workloads generation (e.g. number of tasks based on number of little CPUs)

Regarding tests, well done... perhaps we should try better to consolidate hard-coded values... but I'm not sure if the effort is worth. At the end the test is written once... thus perhaps having the same hard-coded value in multiple places is not a big issue. A midway solution can be that to use global constants.

# limitations under the License.
#

import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not used by this patch, can we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

#

import logging
from collections import namedtuple, OrderedDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrderedDict also not used.

# TODO check that this is the highest cap available
capacity_scale = 1024

def __init__(self, levels=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is levels? We should add at least a minimal documentation to describe parameters.

in various configurations.

The topology is stored in "levels", currently hard-coded to be "cpu" and
"cluster". Each level is a list of EnergyModelNode objects. An EnergyModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use the same naming used by TRAPpy? I mean: CPUs and Clusters, while in kernel space usually we talk about CORE and DIE...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. I think probably they should have no name at all and just be numerically indexed. We can assume that level 0 is the logical CPU. I'll play around with that and see if it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all in for the usage of a logical index... maybe with the possibility to pass in somehow a map which defined labels to be used for logging and/or reporting...


import devlib

import platforms.juno_energy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

from platform.juno_energy import juno_energy

which simplifies also the following assignment?

}

# Set to true to run a test only on heterogeneous systems
skip_on_smp = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since for the time being, all the tests we have are only for !smp systems, and actually most of the tests we will develop will be for !smp systems, should not be easier to set this default to True?


sched_assert = self.get_multi_assert(experiment)
sched_assert = SchedMultiAssert(
experiment.out_dir, self.te.topology, tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to belong to a different patch, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this should be here at all.

])

gold_cpu_active_states = OrderedDict([
(307200, ActiveState(capacity=149, power=93)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is... we have same lower OPP capacities but different max OPP capacities for SILVER and GOLD.
Isn't the EnergyModel::littlest_cpus() returning all the CPUs while EnergyModel::biggest_cpus() returns only the GOLD ones? Does this affects workloads generated by some tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@experiment_test
def test_big_cpus_fully_loaded(self, experiment, tasks):
"""Offload Migration and Idle Pull: Big cpus are fully loaded as long as there are tasks left to run in the system"""
num_big_cpus = len(self.target.bl.bigs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this patch can be squashed with these previous two:

  • Remove bl reference in first CPU test
  • Remove big.LITTLE assumption from SmallTaskPacking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sounds good to just have one patch that updates the acceptance tests to remove big.LITTLE assumptions.

if self.parent:
self.parent.add_cpus(self.cpus)

def __repr__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar method would be appreciated both for EnergyModelNode and EnergyModel classes

Copy link
Contributor Author

@bjackman bjackman Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnergyModelNode inherits a __repr__ fom namedtuple. Good point on EnergyModel - I do have one in a subsequent patch but I'll need to think about how it should look, it's a little bit awkward because it has quite a lot of data so the same style as the namedtuple __repr__ might be totally unreadable.

@bjackman
Copy link
Contributor Author

bjackman commented Dec 8, 2016

OK thanks a lot for the review, I'm going to do some more work on this, especially on the testing, and do a v2 PR.

@derkling derkling added this to the 17.01 milestone Dec 13, 2016
@bjackman
Copy link
Contributor Author

bjackman commented Jan 3, 2017

superseded by #263

@bjackman bjackman closed this Jan 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants