Add support for new upstream-friendly load-tracking events #406

bjackman · 2017-05-25T13:31:08Z

There is a new upstream-friendly version 'sched_load_se' of the
load-tracking trace events added by EAS. Add an abstraction so that
LISA can work with both this new version and the older
'sched_load_avg_task' events.

Trace._sanitize_SchedLoadAvgTask is removed and the logic is moved
into the new function _dfg_task_lt_events in tasks_analysis.

The users of sched_load_avg_cpu in tasks_analysis are updated to use
the new _dfg method (except where their use is specific to the old
sched_load_avg_cpu event).

Also add basic unit tests to exercise the code changed in this
commit.

bjackman · 2017-05-30T17:02:27Z

The CI failure is due to this change introduced by Pandas 0.20.1 breaking this tasks_analysis code, which happens to be exercised by the unit tests added in this PR. So I'll need to

Split out a commit that adds that test
Fix the issue and submit a PR with that test and fix
Rework this PR

But I'm going to leave it open so it doesn't get forgotten.

bjackman · 2017-05-31T12:12:57Z

Phew, finally got the tests passing. Decided I'm just gona leave this as one PR rather than faffing around too much.

bjackman · 2017-06-20T10:47:45Z

Added support for the CPU events (sched_load_cfs_rq) too.

derkling

Ok, I've got at the end of this review... and I would say that we can get this in instead of mine #517 .

There are few things to fix and the series has to be rebased on top of #518.
Tests are not more valid... they need to be re-coded into the existing tests/lisa/test_trace.py version.

I'll rebased the remaining bits of #517 on top of this one as soon as you refresh it.

derkling · 2017-12-06T14:22:38Z

libs/utils/analysis/tasks_analysis.py

+        else:
+            return None
+
+        # TODO: Remove these additional columns? It doesn't work without


I would instead keep this code, which is used by some plotting APIs.

Just merge in a fix similar to what we already have in:
https://github.com/ARM-software/lisa/blob/master/libs/utils/trace.py#L515

if not self._trace.has_big_little: return df

And, of course, remove this comment.

I still think these columns, whenever they can be provided, are a useful help for the table consumer.

derkling · 2017-12-06T14:23:24Z

libs/utils/analysis/tasks_analysis.py

+            df = df.rename(columns={'util': 'util_avg', 'load': 'load_avg'})
+            # In sched_load_se, PID shows -1 for task groups.
+            df = df[df.pid != -1]
+        else:


nit-pick: add an empty line before

derkling · 2017-12-06T14:31:17Z

libs/utils/analysis/tasks_analysis.py

@@ -388,7 +432,7 @@ def plotBigTasks(self, max_tasks=10, min_samples=100,
            return

        # Get the list of events for all big frequent tasks
-        df = self._dfg_trace_event('sched_load_avg_task')
+        df = self._dfg_task_load_events()


Add check for df being None similar to L283.

derkling · 2017-12-06T14:32:44Z

libs/utils/analysis/tasks_analysis.py

@@ -648,7 +692,7 @@ def _plotTaskSignals(self, axes, tid, signals, is_last=False):
        :type is_last: bool
        """
        # Get dataframe for the required task
-        util_df = self._dfg_trace_event('sched_load_avg_task')
+        util_df = self._dfg_task_load_events()


Add check for df being None similar to L283.

derkling · 2017-12-06T14:33:02Z

libs/utils/analysis/tasks_analysis.py

@@ -711,7 +755,7 @@ def _plotTaskResidencies(self, axes, tid, signals, is_last=False):
        :param is_last: if True this is the last plot
        :type is_last: bool
        """
-        util_df = self._dfg_trace_event('sched_load_avg_task')
+        util_df = self._dfg_task_load_events()


Add check for df being None similar to L283.

derkling · 2017-12-06T14:39:25Z

tests/lisa/__init__.py

+import matplotlib
+# Prevent matplotlib from trying to connect to X11 server, for headless testing.
+# Must be done before importing matplotlib.pyplot or pylab
+matplotlib.use('Agg')


Is this is required... we should post it as a separate PR.
Maybe this fixes as issue that @valschneider has also hit while running tests on intel-eas

Yeah I was about to comment on that. Could this maybe be added one layer above (tests/__init__.py) ? I'm hitting that issue with tests/eas/generics.py so it's a common issue - I have that import in generic.py (in a local bugfix branch) but it'll be redundant.

Imho you could push this and I'll take care of moving that up one level - I'm the one who created the bug after all :)

derkling · 2017-12-06T14:45:28Z

libs/utils/analysis/cpus_analysis.py

+        else:
+            return None
+
+        # TODO: Remove these additional columns? It doesn't work without


The following code was not part of the original saniteze method... it's likely a copy/past from the task's _dfg... and it does not even make sense for the CPU signals.

derkling · 2017-12-06T14:46:20Z

libs/utils/analysis/cpus_analysis.py

@@ -125,7 +173,7 @@ def _plotCPU(self, cpus, label=''):

            # Add CPU utilization
            axes.set_title('{0:s}CPU [{1:d}]'.format(label1, cpu))
-            df = self._dfg_trace_event('sched_load_avg_cpu')
+            df = self._dfg_cpu_load_events()


Add check for df being None similar to L283:

df = self._dfg_task_load_events() is None: if df is None: self._log.warning('No trace events for task signals, plot DISABLED')

derkling · 2017-12-06T14:49:17Z

tests/lisa/test_trace.py

@@ -26,7 +26,9 @@ class TraceBase(TestCase):

    traces_dir = os.path.join(os.path.dirname(__file__),
                              'example_traces')
-    events = ['sched_switch', 'sched_load_se', 'sched_load_avg_task']


This test has to be rebased/integrated in the current test_trace.py version

derkling · 2017-12-06T14:55:10Z

libs/utils/analysis/tasks_analysis.py

+        df['cluster'] = np.select(
+                [df.cpu.isin(platform['clusters']['little'])],
+                ['LITTLE'], 'big')
+        # Add a column which represents the max capacity of the smallest


Here you should add before this check:

if 'nrg_model' not in platform: return df

valschneider · 2017-12-07T10:32:24Z

Will you drop #518 since you have the same commits in here ? We want to merge both anyway, it's just a matter or ordering the contents.

bjackman · 2017-12-07T10:45:09Z

I'll drop the commits from here once 518 is merged.

bjackman · 2017-12-07T11:29:01Z

Fixed failing test

derkling

Maybe a couple of checks still missing?

derkling · 2017-12-07T16:43:39Z

libs/utils/analysis/tasks_analysis.py

@@ -186,6 +186,42 @@ def _dfg_rt_tasks(self, min_prio=100):

        return rt_tasks

+    def _dfg_task_lt_events(self):


Should not df be initialised to None?

Moreover: I think task_load_events was a better name... actually, to be more scheduler aligned, what about sched_task_load?

derkling · 2017-12-07T16:44:23Z

libs/utils/analysis/tasks_analysis.py

@@ -384,7 +419,7 @@ def plotBigTasks(self, max_tasks=10, min_samples=100,
            return

        # Get the list of events for all big frequent tasks
-        df = self._dfg_trace_event('sched_load_avg_task')
+        df = self._dfg_task_lt_events()
        big_frequent_tasks_events = df[df.pid.isin(big_frequent_task_pids)]


Still missing check for df not None?
Like the one you have at line 581

Yup actually it's needed above for big_frequenct_tasks_df

derkling · 2017-12-07T16:45:41Z

libs/utils/analysis/tasks_analysis.py

@@ -636,7 +671,7 @@ def _plotTaskSignals(self, axes, tid, signals, is_last=False):
        :type is_last: bool
        """
        # Get dataframe for the required task
-        util_df = self._dfg_trace_event('sched_load_avg_task')
+        util_df = self._dfg_task_lt_events()


Still missing check for util_df not None?
Like the one you have at line 581

This one isn't strictly needed as it's checked by the caller. But I'm gona add it in anyway, rather than have invisible dependencies between methods like that. Thanks for pointing it out.

derkling · 2017-12-07T16:45:55Z

libs/utils/analysis/tasks_analysis.py

-                    cdata.plot(ax=axes, style=[ccolor+'+'], legend=False)
-
+        util_df = self._dfg_task_lt_events()
+        data = util_df[util_df.pid == tid][['cluster', 'cpu']]


Still missing check for util_df not None?
Like the one you have at line 581

There is a new upstream-friendly version 'sched_load_se' of the load-tracking trace events added by EAS. Add an abstraction so that LISA can work with both this new version and the older 'sched_load_avg_task' events. Trace._sanitize_SchedLoadAvgTask is removed and the logic is moved into the new function _dfg_task_lt_events in tasks_analysis. The users of sched_load_avg_cpu in tasks_analysis are updated to use the new _dfg method (except where their use is specific to the old sched_load_avg_cpu event). Also add basic unit tests to exercise the code changed in this commit.

bjackman

Cool thanks, updated!

bjackman · 2017-12-07T16:53:51Z

libs/utils/analysis/tasks_analysis.py

@@ -384,7 +419,7 @@ def plotBigTasks(self, max_tasks=10, min_samples=100,
            return

        # Get the list of events for all big frequent tasks
-        df = self._dfg_trace_event('sched_load_avg_task')
+        df = self._dfg_task_lt_events()
        big_frequent_tasks_events = df[df.pid.isin(big_frequent_task_pids)]


Yup actually it's needed above for big_frequenct_tasks_df

bjackman · 2017-12-07T16:56:12Z

libs/utils/analysis/tasks_analysis.py

@@ -636,7 +671,7 @@ def _plotTaskSignals(self, axes, tid, signals, is_last=False):
        :type is_last: bool
        """
        # Get dataframe for the required task
-        util_df = self._dfg_trace_event('sched_load_avg_task')
+        util_df = self._dfg_task_lt_events()


This one isn't strictly needed as it's checked by the caller. But I'm gona add it in anyway, rather than have invisible dependencies between methods like that. Thanks for pointing it out.

bjackman · 2017-12-07T16:56:37Z

libs/utils/analysis/tasks_analysis.py

-                    cdata.plot(ax=axes, style=[ccolor+'+'], legend=False)
-
+        util_df = self._dfg_task_lt_events()
+        data = util_df[util_df.pid == tid][['cluster', 'cpu']]


derkling

Still some minor things I would like to update... but, since from a functional stanpoint it's not ok, I can easily fix them myself on a following patch when this is in and I'm adding the utilest support.

Let's merge it! ;-)

derkling · 2017-12-07T18:28:20Z

libs/utils/analysis/tasks_analysis.py

@@ -57,15 +57,15 @@ def _dfg_top_big_tasks(self, min_samples=100, min_utilization=None):
            default: capacity of a little cluster
        :type min_utilization: int
        """
-        if not self._trace.hasEvents('sched_load_avg_task'):
-            self._log.warning('Events [sched_load_avg_task] not found')
+        if self._dfg_task_load_events() is None:


This is kind-of confusing: we check here (L60) and get the df after (L68). That's error prone if we change the code.

I would prefer something like:

df = self._dfg_task_load_events() if df is None: self._log.warning('No trace events for task signals, plot DISABLED') return None # So that from now on we don't use `_dfg_task_load_events()` in this function.

derkling · 2017-12-07T18:30:04Z

libs/utils/analysis/tasks_analysis.py

+                [df.cpu.isin(self._trace.platform['clusters']['little'])],
+                ['LITTLE'], 'big')
+
+        if 'nrg_model' in self._trace.platform:


Here also I would prefer a:

if `nrg_model` not in self._trace.platform: return df

bjackman force-pushed the generic-task-signals branch 5 times, most recently from 975a987 to e8448f8 Compare May 30, 2017 14:49

bjackman requested a review from derkling May 30, 2017 17:05

bjackman force-pushed the generic-task-signals branch 4 times, most recently from f02ba5a to 5eb1ed5 Compare May 31, 2017 12:06

bjackman force-pushed the generic-task-signals branch from 5eb1ed5 to f00154c Compare June 20, 2017 10:46

bjackman changed the title ~~trace: Add abstraction for per-task load-tracking trace events~~ Add support for new upstream-friendly load-tracking events Jun 20, 2017

bjackman force-pushed the generic-task-signals branch from f00154c to bc18ffb Compare June 27, 2017 16:41

bjackman mentioned this pull request Aug 24, 2017

Add test for NOHZ load updates #471

Closed

derkling mentioned this pull request Dec 5, 2017

Add new task tracepoints #517

Closed

derkling suggested changes Dec 6, 2017

View reviewed changes

bjackman force-pushed the generic-task-signals branch from bc18ffb to 6b0bd7b Compare December 6, 2017 16:27

bjackman force-pushed the generic-task-signals branch 5 times, most recently from 1368b6a to 84abcb2 Compare December 7, 2017 11:28

derkling suggested changes Dec 7, 2017

View reviewed changes

bjackman force-pushed the generic-task-signals branch from 84abcb2 to d957f38 Compare December 7, 2017 16:59

bjackman commented Dec 7, 2017

View reviewed changes

derkling approved these changes Dec 7, 2017

View reviewed changes

derkling merged commit edfed50 into ARM-software:master Dec 7, 2017

bjackman deleted the generic-task-signals branch December 15, 2017 14:49

		@@ -186,6 +186,42 @@ def _dfg_rt_tasks(self, min_prio=100):

		return rt_tasks

		def _dfg_task_lt_events(self):

Add support for new upstream-friendly load-tracking events #406

Add support for new upstream-friendly load-tracking events #406

Conversation

bjackman commented May 25, 2017

bjackman commented May 30, 2017 • edited

bjackman commented May 31, 2017

bjackman commented Jun 20, 2017

derkling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valschneider Dec 6, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valschneider commented Dec 7, 2017

bjackman commented Dec 7, 2017

bjackman commented Dec 7, 2017

derkling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjackman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derkling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjackman commented May 30, 2017 •

edited

valschneider Dec 6, 2017 •

edited