Conversation
This class provides a model of systems with: - CPU capacity at different frequencies. - Power usage at different frequency. - Power usage in different idle states. The model is aware of topologically shared resources (clusters), topological dependencies for idle states (power domains) and frequency domains. The intended use case for this model is for testing energy-aware scheduling in a platform-agnostic way.
The migrators probably won't start on big CPUs.
derkling
left a comment
There was a problem hiding this comment.
Despite missing some support material (https://goo.gl/2wiF5b) I've tried to go through all the patches.
Overall really appreciable effort, from the data structures side I guess we are almost there.
I would just suggest to add a little bit of documentation and perhaps few dummy wrapping functions to make code more similar in the EM initialization side.
Some doubts about the way we represent Pixel like systems and how the EM methods affects workloads generation (e.g. number of tasks based on number of little CPUs)
Regarding tests, well done... perhaps we should try better to consolidate hard-coded values... but I'm not sure if the effort is worth. At the end the test is written once... thus perhaps having the same hard-coded value in multiple places is not a big issue. A midway solution can be that to use global constants.
| # limitations under the License. | ||
| # | ||
|
|
||
| import logging |
There was a problem hiding this comment.
That's not used by this patch, can we remove it?
| # | ||
|
|
||
| import logging | ||
| from collections import namedtuple, OrderedDict |
There was a problem hiding this comment.
OrderedDict also not used.
| # TODO check that this is the highest cap available | ||
| capacity_scale = 1024 | ||
|
|
||
| def __init__(self, levels=None): |
There was a problem hiding this comment.
What is levels? We should add at least a minimal documentation to describe parameters.
| in various configurations. | ||
|
|
||
| The topology is stored in "levels", currently hard-coded to be "cpu" and | ||
| "cluster". Each level is a list of EnergyModelNode objects. An EnergyModel |
There was a problem hiding this comment.
Do we want to use the same naming used by TRAPpy? I mean: CPUs and Clusters, while in kernel space usually we talk about CORE and DIE...
There was a problem hiding this comment.
Yeah, that's a good point. I think probably they should have no name at all and just be numerically indexed. We can assume that level 0 is the logical CPU. I'll play around with that and see if it works.
There was a problem hiding this comment.
I'm all in for the usage of a logical index... maybe with the possibility to pass in somehow a map which defined labels to be used for logging and/or reporting...
|
|
||
| import devlib | ||
|
|
||
| import platforms.juno_energy |
There was a problem hiding this comment.
What about:
from platform.juno_energy import juno_energywhich simplifies also the following assignment?
| } | ||
|
|
||
| # Set to true to run a test only on heterogeneous systems | ||
| skip_on_smp = False |
There was a problem hiding this comment.
Since for the time being, all the tests we have are only for !smp systems, and actually most of the tests we will develop will be for !smp systems, should not be easier to set this default to True?
|
|
||
| sched_assert = self.get_multi_assert(experiment) | ||
| sched_assert = SchedMultiAssert( | ||
| experiment.out_dir, self.te.topology, tasks) |
There was a problem hiding this comment.
This seems to belong to a different patch, isn't it?
There was a problem hiding this comment.
Don't think this should be here at all.
| ]) | ||
|
|
||
| gold_cpu_active_states = OrderedDict([ | ||
| (307200, ActiveState(capacity=149, power=93)), |
There was a problem hiding this comment.
Here it is... we have same lower OPP capacities but different max OPP capacities for SILVER and GOLD.
Isn't the EnergyModel::littlest_cpus() returning all the CPUs while EnergyModel::biggest_cpus() returns only the GOLD ones? Does this affects workloads generated by some tests?
| @experiment_test | ||
| def test_big_cpus_fully_loaded(self, experiment, tasks): | ||
| """Offload Migration and Idle Pull: Big cpus are fully loaded as long as there are tasks left to run in the system""" | ||
| num_big_cpus = len(self.target.bl.bigs) |
There was a problem hiding this comment.
Maybe this patch can be squashed with these previous two:
- Remove
blreference in first CPU test - Remove big.LITTLE assumption from SmallTaskPacking
There was a problem hiding this comment.
Yeah, sounds good to just have one patch that updates the acceptance tests to remove big.LITTLE assumptions.
| if self.parent: | ||
| self.parent.add_cpus(self.cpus) | ||
|
|
||
| def __repr__(self): |
There was a problem hiding this comment.
A similar method would be appreciated both for EnergyModelNode and EnergyModel classes
There was a problem hiding this comment.
EnergyModelNode inherits a __repr__ fom namedtuple. Good point on EnergyModel - I do have one in a subsequent patch but I'll need to think about how it should look, it's a little bit awkward because it has quite a lot of data so the same style as the namedtuple __repr__ might be totally unreadable.
|
OK thanks a lot for the review, I'm going to do some more work on this, especially on the testing, and do a v2 PR. |
|
superseded by #263 |
This class provides a model of systems with:
The model is aware of topologically shared resources (clusters),
topological dependencies for idle states (power domains) and frequency
domains.
The intended use case for this model is for testing energy-aware
scheduling in a platform-agnostic way.
This is an RFC of the EnergyModel class. This doesn't include the complex stuff, just the expression of the data. A later version will introduce code to estimate energy usage for a workload both by imagining ideal kernel behaviour and examining Trappy traces. This will be used to compare traces against "ideal" behaviour for automated tests of Energy Aware Scheduling.
The first commit (energy_model: Add EnergyModel class) contains the interesting bit, the rest of the commits demonstrate some intended uses.
This leaves lots of room for refactoring around LISA, for example the TestEnv::platform attribute becomes redundant. However I'm leaving that until the design is finalised here as the way that refactoring is done will probably need quite a lot of discussion.