New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for new upstream-friendly load-tracking events #406
Conversation
975a987
to
e8448f8
Compare
The CI failure is due to this change introduced by Pandas 0.20.1 breaking this
But I'm going to leave it open so it doesn't get forgotten. |
f02ba5a
to
5eb1ed5
Compare
Phew, finally got the tests passing. Decided I'm just gona leave this as one PR rather than faffing around too much. |
5eb1ed5
to
f00154c
Compare
Added support for the CPU events (sched_load_cfs_rq) too. |
f00154c
to
bc18ffb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've got at the end of this review... and I would say that we can get this in instead of mine #517 .
There are few things to fix and the series has to be rebased on top of #518.
Tests are not more valid... they need to be re-coded into the existing tests/lisa/test_trace.py
version.
I'll rebased the remaining bits of #517 on top of this one as soon as you refresh it.
else: | ||
return None | ||
|
||
# TODO: Remove these additional columns? It doesn't work without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would instead keep this code, which is used by some plotting APIs.
Just merge in a fix similar to what we already have in:
https://github.com/ARM-software/lisa/blob/master/libs/utils/trace.py#L515
if not self._trace.has_big_little:
return df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, of course, remove this comment.
I still think these columns, whenever they can be provided, are a useful help for the table consumer.
df = df.rename(columns={'util': 'util_avg', 'load': 'load_avg'}) | ||
# In sched_load_se, PID shows -1 for task groups. | ||
df = df[df.pid != -1] | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit-pick: add an empty line before
@@ -388,7 +432,7 @@ def plotBigTasks(self, max_tasks=10, min_samples=100, | |||
return | |||
|
|||
# Get the list of events for all big frequent tasks | |||
df = self._dfg_trace_event('sched_load_avg_task') | |||
df = self._dfg_task_load_events() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check for df being None similar to L283.
@@ -648,7 +692,7 @@ def _plotTaskSignals(self, axes, tid, signals, is_last=False): | |||
:type is_last: bool | |||
""" | |||
# Get dataframe for the required task | |||
util_df = self._dfg_trace_event('sched_load_avg_task') | |||
util_df = self._dfg_task_load_events() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check for df being None similar to L283.
@@ -711,7 +755,7 @@ def _plotTaskResidencies(self, axes, tid, signals, is_last=False): | |||
:param is_last: if True this is the last plot | |||
:type is_last: bool | |||
""" | |||
util_df = self._dfg_trace_event('sched_load_avg_task') | |||
util_df = self._dfg_task_load_events() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check for df being None similar to L283.
tests/lisa/__init__.py
Outdated
import matplotlib | ||
# Prevent matplotlib from trying to connect to X11 server, for headless testing. | ||
# Must be done before importing matplotlib.pyplot or pylab | ||
matplotlib.use('Agg') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this is required... we should post it as a separate PR.
Maybe this fixes as issue that @valschneider has also hit while running tests on intel-eas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was about to comment on that. Could this maybe be added one layer above (tests/__init__.py
) ? I'm hitting that issue with tests/eas/generics.py
so it's a common issue - I have that import in generic.py
(in a local bugfix branch) but it'll be redundant.
Imho you could push this and I'll take care of moving that up one level - I'm the one who created the bug after all :)
libs/utils/analysis/cpus_analysis.py
Outdated
else: | ||
return None | ||
|
||
# TODO: Remove these additional columns? It doesn't work without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following code was not part of the original saniteze method... it's likely a copy/past from the task's _dfg... and it does not even make sense for the CPU signals.
libs/utils/analysis/cpus_analysis.py
Outdated
@@ -125,7 +173,7 @@ def _plotCPU(self, cpus, label=''): | |||
|
|||
# Add CPU utilization | |||
axes.set_title('{0:s}CPU [{1:d}]'.format(label1, cpu)) | |||
df = self._dfg_trace_event('sched_load_avg_cpu') | |||
df = self._dfg_cpu_load_events() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check for df being None similar to L283:
df = self._dfg_task_load_events() is None:
if df is None:
self._log.warning('No trace events for task signals, plot DISABLED')
tests/lisa/test_trace.py
Outdated
@@ -26,7 +26,9 @@ class TraceBase(TestCase): | |||
|
|||
traces_dir = os.path.join(os.path.dirname(__file__), | |||
'example_traces') | |||
events = ['sched_switch', 'sched_load_se', 'sched_load_avg_task'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test has to be rebased/integrated in the current test_trace.py version
df['cluster'] = np.select( | ||
[df.cpu.isin(platform['clusters']['little'])], | ||
['LITTLE'], 'big') | ||
# Add a column which represents the max capacity of the smallest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you should add before this check:
if 'nrg_model' not in platform:
return df
bc18ffb
to
6b0bd7b
Compare
Will you drop #518 since you have the same commits in here ? We want to merge both anyway, it's just a matter or ordering the contents. |
I'll drop the commits from here once 518 is merged. |
1368b6a
to
84abcb2
Compare
Fixed failing test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a couple of checks still missing?
@@ -186,6 +186,42 @@ def _dfg_rt_tasks(self, min_prio=100): | |||
|
|||
return rt_tasks | |||
|
|||
def _dfg_task_lt_events(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not df
be initialised to None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover: I think task_load_events
was a better name... actually, to be more scheduler aligned, what about sched_task_load
?
@@ -384,7 +419,7 @@ def plotBigTasks(self, max_tasks=10, min_samples=100, | |||
return | |||
|
|||
# Get the list of events for all big frequent tasks | |||
df = self._dfg_trace_event('sched_load_avg_task') | |||
df = self._dfg_task_lt_events() | |||
big_frequent_tasks_events = df[df.pid.isin(big_frequent_task_pids)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing check for df not None
?
Like the one you have at line 581
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup actually it's needed above for big_frequenct_tasks_df
@@ -636,7 +671,7 @@ def _plotTaskSignals(self, axes, tid, signals, is_last=False): | |||
:type is_last: bool | |||
""" | |||
# Get dataframe for the required task | |||
util_df = self._dfg_trace_event('sched_load_avg_task') | |||
util_df = self._dfg_task_lt_events() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing check for util_df not None
?
Like the one you have at line 581
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one isn't strictly needed as it's checked by the caller. But I'm gona add it in anyway, rather than have invisible dependencies between methods like that. Thanks for pointing it out.
cdata.plot(ax=axes, style=[ccolor+'+'], legend=False) | ||
|
||
util_df = self._dfg_task_lt_events() | ||
data = util_df[util_df.pid == tid][['cluster', 'cpu']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing check for util_df not None
?
Like the one you have at line 581
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There is a new upstream-friendly version 'sched_load_se' of the load-tracking trace events added by EAS. Add an abstraction so that LISA can work with both this new version and the older 'sched_load_avg_task' events. Trace._sanitize_SchedLoadAvgTask is removed and the logic is moved into the new function _dfg_task_lt_events in tasks_analysis. The users of sched_load_avg_cpu in tasks_analysis are updated to use the new _dfg method (except where their use is specific to the old sched_load_avg_cpu event). Also add basic unit tests to exercise the code changed in this commit.
84abcb2
to
d957f38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks, updated!
@@ -384,7 +419,7 @@ def plotBigTasks(self, max_tasks=10, min_samples=100, | |||
return | |||
|
|||
# Get the list of events for all big frequent tasks | |||
df = self._dfg_trace_event('sched_load_avg_task') | |||
df = self._dfg_task_lt_events() | |||
big_frequent_tasks_events = df[df.pid.isin(big_frequent_task_pids)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup actually it's needed above for big_frequenct_tasks_df
@@ -636,7 +671,7 @@ def _plotTaskSignals(self, axes, tid, signals, is_last=False): | |||
:type is_last: bool | |||
""" | |||
# Get dataframe for the required task | |||
util_df = self._dfg_trace_event('sched_load_avg_task') | |||
util_df = self._dfg_task_lt_events() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one isn't strictly needed as it's checked by the caller. But I'm gona add it in anyway, rather than have invisible dependencies between methods like that. Thanks for pointing it out.
cdata.plot(ax=axes, style=[ccolor+'+'], legend=False) | ||
|
||
util_df = self._dfg_task_lt_events() | ||
data = util_df[util_df.pid == tid][['cluster', 'cpu']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still some minor things I would like to update... but, since from a functional stanpoint it's not ok, I can easily fix them myself on a following patch when this is in and I'm adding the utilest support.
Let's merge it! ;-)
@@ -57,15 +57,15 @@ def _dfg_top_big_tasks(self, min_samples=100, min_utilization=None): | |||
default: capacity of a little cluster | |||
:type min_utilization: int | |||
""" | |||
if not self._trace.hasEvents('sched_load_avg_task'): | |||
self._log.warning('Events [sched_load_avg_task] not found') | |||
if self._dfg_task_load_events() is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind-of confusing: we check here (L60) and get the df after (L68). That's error prone if we change the code.
I would prefer something like:
df = self._dfg_task_load_events()
if df is None:
self._log.warning('No trace events for task signals, plot DISABLED')
return None
# So that from now on we don't use `_dfg_task_load_events()` in this function.
[df.cpu.isin(self._trace.platform['clusters']['little'])], | ||
['LITTLE'], 'big') | ||
|
||
if 'nrg_model' in self._trace.platform: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also I would prefer a:
if `nrg_model` not in self._trace.platform:
return df
There is a new upstream-friendly version 'sched_load_se' of the
load-tracking trace events added by EAS. Add an abstraction so that
LISA can work with both this new version and the older
'sched_load_avg_task' events.
Trace._sanitize_SchedLoadAvgTask is removed and the logic is moved
into the new function _dfg_task_lt_events in tasks_analysis.
The users of sched_load_avg_cpu in tasks_analysis are updated to use
the new _dfg method (except where their use is specific to the old
sched_load_avg_cpu event).
Also add basic unit tests to exercise the code changed in this
commit.