Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for new upstream-friendly load-tracking events #406

Merged
merged 1 commit into from Dec 7, 2017

Conversation

bjackman
Copy link
Contributor

There is a new upstream-friendly version 'sched_load_se' of the
load-tracking trace events added by EAS. Add an abstraction so that
LISA can work with both this new version and the older
'sched_load_avg_task' events.

Trace._sanitize_SchedLoadAvgTask is removed and the logic is moved
into the new function _dfg_task_lt_events in tasks_analysis.

The users of sched_load_avg_cpu in tasks_analysis are updated to use
the new _dfg method (except where their use is specific to the old
sched_load_avg_cpu event).

Also add basic unit tests to exercise the code changed in this
commit.

@bjackman bjackman force-pushed the generic-task-signals branch 5 times, most recently from 975a987 to e8448f8 Compare May 30, 2017 14:49
@bjackman
Copy link
Contributor Author

bjackman commented May 30, 2017

The CI failure is due to this change introduced by Pandas 0.20.1 breaking this tasks_analysis code, which happens to be exercised by the unit tests added in this PR. So I'll need to

  1. Split out a commit that adds that test
  2. Fix the issue and submit a PR with that test and fix
  3. Rework this PR

But I'm going to leave it open so it doesn't get forgotten.

@bjackman bjackman requested a review from derkling May 30, 2017 17:05
@bjackman bjackman force-pushed the generic-task-signals branch 4 times, most recently from f02ba5a to 5eb1ed5 Compare May 31, 2017 12:06
@bjackman
Copy link
Contributor Author

Phew, finally got the tests passing. Decided I'm just gona leave this as one PR rather than faffing around too much.

@bjackman bjackman changed the title trace: Add abstraction for per-task load-tracking trace events Add support for new upstream-friendly load-tracking events Jun 20, 2017
@bjackman
Copy link
Contributor Author

Added support for the CPU events (sched_load_cfs_rq) too.

Copy link
Contributor

@derkling derkling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've got at the end of this review... and I would say that we can get this in instead of mine #517 .

There are few things to fix and the series has to be rebased on top of #518.
Tests are not more valid... they need to be re-coded into the existing tests/lisa/test_trace.py version.

I'll rebased the remaining bits of #517 on top of this one as soon as you refresh it.

else:
return None

# TODO: Remove these additional columns? It doesn't work without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead keep this code, which is used by some plotting APIs.

Just merge in a fix similar to what we already have in:
https://github.com/ARM-software/lisa/blob/master/libs/utils/trace.py#L515

if not self._trace.has_big_little:
        return df

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, of course, remove this comment.

I still think these columns, whenever they can be provided, are a useful help for the table consumer.

df = df.rename(columns={'util': 'util_avg', 'load': 'load_avg'})
# In sched_load_se, PID shows -1 for task groups.
df = df[df.pid != -1]
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: add an empty line before

@@ -388,7 +432,7 @@ def plotBigTasks(self, max_tasks=10, min_samples=100,
return

# Get the list of events for all big frequent tasks
df = self._dfg_trace_event('sched_load_avg_task')
df = self._dfg_task_load_events()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add check for df being None similar to L283.

@@ -648,7 +692,7 @@ def _plotTaskSignals(self, axes, tid, signals, is_last=False):
:type is_last: bool
"""
# Get dataframe for the required task
util_df = self._dfg_trace_event('sched_load_avg_task')
util_df = self._dfg_task_load_events()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add check for df being None similar to L283.

@@ -711,7 +755,7 @@ def _plotTaskResidencies(self, axes, tid, signals, is_last=False):
:param is_last: if True this is the last plot
:type is_last: bool
"""
util_df = self._dfg_trace_event('sched_load_avg_task')
util_df = self._dfg_task_load_events()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add check for df being None similar to L283.

import matplotlib
# Prevent matplotlib from trying to connect to X11 server, for headless testing.
# Must be done before importing matplotlib.pyplot or pylab
matplotlib.use('Agg')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this is required... we should post it as a separate PR.
Maybe this fixes as issue that @valschneider has also hit while running tests on intel-eas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was about to comment on that. Could this maybe be added one layer above (tests/__init__.py) ? I'm hitting that issue with tests/eas/generics.py so it's a common issue - I have that import in generic.py (in a local bugfix branch) but it'll be redundant.

Imho you could push this and I'll take care of moving that up one level - I'm the one who created the bug after all :)

else:
return None

# TODO: Remove these additional columns? It doesn't work without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following code was not part of the original saniteze method... it's likely a copy/past from the task's _dfg... and it does not even make sense for the CPU signals.

@@ -125,7 +173,7 @@ def _plotCPU(self, cpus, label=''):

# Add CPU utilization
axes.set_title('{0:s}CPU [{1:d}]'.format(label1, cpu))
df = self._dfg_trace_event('sched_load_avg_cpu')
df = self._dfg_cpu_load_events()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add check for df being None similar to L283:

df = self._dfg_task_load_events() is None:
if df is None:
        self._log.warning('No trace events for task signals, plot DISABLED')

@@ -26,7 +26,9 @@ class TraceBase(TestCase):

traces_dir = os.path.join(os.path.dirname(__file__),
'example_traces')
events = ['sched_switch', 'sched_load_se', 'sched_load_avg_task']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has to be rebased/integrated in the current test_trace.py version

df['cluster'] = np.select(
[df.cpu.isin(platform['clusters']['little'])],
['LITTLE'], 'big')
# Add a column which represents the max capacity of the smallest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you should add before this check:

if 'nrg_model' not in platform:
        return df

@valschneider
Copy link
Contributor

Will you drop #518 since you have the same commits in here ? We want to merge both anyway, it's just a matter or ordering the contents.

@bjackman
Copy link
Contributor Author

bjackman commented Dec 7, 2017

I'll drop the commits from here once 518 is merged.

@bjackman bjackman force-pushed the generic-task-signals branch 5 times, most recently from 1368b6a to 84abcb2 Compare December 7, 2017 11:28
@bjackman
Copy link
Contributor Author

bjackman commented Dec 7, 2017

Fixed failing test

Copy link
Contributor

@derkling derkling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a couple of checks still missing?

@@ -186,6 +186,42 @@ def _dfg_rt_tasks(self, min_prio=100):

return rt_tasks

def _dfg_task_lt_events(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not df be initialised to None?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover: I think task_load_events was a better name... actually, to be more scheduler aligned, what about sched_task_load?

@@ -384,7 +419,7 @@ def plotBigTasks(self, max_tasks=10, min_samples=100,
return

# Get the list of events for all big frequent tasks
df = self._dfg_trace_event('sched_load_avg_task')
df = self._dfg_task_lt_events()
big_frequent_tasks_events = df[df.pid.isin(big_frequent_task_pids)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing check for df not None?
Like the one you have at line 581

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup actually it's needed above for big_frequenct_tasks_df

@@ -636,7 +671,7 @@ def _plotTaskSignals(self, axes, tid, signals, is_last=False):
:type is_last: bool
"""
# Get dataframe for the required task
util_df = self._dfg_trace_event('sched_load_avg_task')
util_df = self._dfg_task_lt_events()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing check for util_df not None?
Like the one you have at line 581

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't strictly needed as it's checked by the caller. But I'm gona add it in anyway, rather than have invisible dependencies between methods like that. Thanks for pointing it out.

cdata.plot(ax=axes, style=[ccolor+'+'], legend=False)

util_df = self._dfg_task_lt_events()
data = util_df[util_df.pid == tid][['cluster', 'cpu']]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing check for util_df not None?
Like the one you have at line 581

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

There is a new upstream-friendly version 'sched_load_se' of the
load-tracking trace events added by EAS. Add an abstraction so that
LISA can work with both this new version and the older
'sched_load_avg_task' events.

Trace._sanitize_SchedLoadAvgTask is removed and the logic is moved
into the new function _dfg_task_lt_events in tasks_analysis.

The users of sched_load_avg_cpu in tasks_analysis are updated to use
the new _dfg method (except where their use is specific to the old
sched_load_avg_cpu event).

Also add basic unit tests to exercise the code changed in this
commit.
Copy link
Contributor Author

@bjackman bjackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks, updated!

@@ -384,7 +419,7 @@ def plotBigTasks(self, max_tasks=10, min_samples=100,
return

# Get the list of events for all big frequent tasks
df = self._dfg_trace_event('sched_load_avg_task')
df = self._dfg_task_lt_events()
big_frequent_tasks_events = df[df.pid.isin(big_frequent_task_pids)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup actually it's needed above for big_frequenct_tasks_df

@@ -636,7 +671,7 @@ def _plotTaskSignals(self, axes, tid, signals, is_last=False):
:type is_last: bool
"""
# Get dataframe for the required task
util_df = self._dfg_trace_event('sched_load_avg_task')
util_df = self._dfg_task_lt_events()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't strictly needed as it's checked by the caller. But I'm gona add it in anyway, rather than have invisible dependencies between methods like that. Thanks for pointing it out.

cdata.plot(ax=axes, style=[ccolor+'+'], legend=False)

util_df = self._dfg_task_lt_events()
data = util_df[util_df.pid == tid][['cluster', 'cpu']]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor

@derkling derkling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some minor things I would like to update... but, since from a functional stanpoint it's not ok, I can easily fix them myself on a following patch when this is in and I'm adding the utilest support.

Let's merge it! ;-)

@@ -57,15 +57,15 @@ def _dfg_top_big_tasks(self, min_samples=100, min_utilization=None):
default: capacity of a little cluster
:type min_utilization: int
"""
if not self._trace.hasEvents('sched_load_avg_task'):
self._log.warning('Events [sched_load_avg_task] not found')
if self._dfg_task_load_events() is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind-of confusing: we check here (L60) and get the df after (L68). That's error prone if we change the code.

I would prefer something like:

df = self._dfg_task_load_events()
if df is None:
    self._log.warning('No trace events for task signals, plot DISABLED')
    return None

# So that from now on we don't use `_dfg_task_load_events()` in this function.

[df.cpu.isin(self._trace.platform['clusters']['little'])],
['LITTLE'], 'big')

if 'nrg_model' in self._trace.platform:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also I would prefer a:

if `nrg_model` not in self._trace.platform:
    return df

@derkling derkling merged commit edfed50 into ARM-software:master Dec 7, 2017
@bjackman bjackman deleted the generic-task-signals branch December 15, 2017 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants