New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add profiling analysis #302
Conversation
Here is a more interesting examples of the produced plots which have been generated focusing on Surfaceflinger running on an HiKey board: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not at all familiar with the kernel stuff that's actually being analysed so I can only give superficial input here but I trust your logic :D, all looks good aside from my nits. Some of the comments apply to both analyses but I just put them in one.
Also can I suggest that you put those wonderful docstrings from the plot*
methods into the notebooks (help
or just print foo.bar.__doc__
)? I was a bit lost until I read them.
@@ -146,6 +146,19 @@ def _dfg_latency_preemption_df(self, task): | |||
df.rename(columns={'t_delta' : 'preempt_latency'}, inplace=True) | |||
return df | |||
|
|||
@memoized | |||
def _dfg_activations_df(self, task): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring saying what the column in the output df is, and the format of task
(comm string or PID?). Need to figure out how it can be done but eventually I'd like to have Sphinx API docs for the trace analysis biz.
where the trace is. | ||
|
||
:param task: the task to report latencies for | ||
:type task: int or list(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning that int
is PID and list(str)
is comm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's existing code.. will fix in another patch
|
||
All plots are parameterized based on the value of threshold_ms, which | ||
can be used to filter activations intervals bigger than 2 times this | ||
value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this might be naive.. why is it bigger than 2 times this value? Why not just use the value itself and make the default 32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm naive as well... ;-)
I would say that, provided a certain threshold, we can still be interested in checking if there are many samples right above that threshold. Thus, I filter the dataframe at 2x threshold [ms] which should allow to see focus the analysis around the region of interest.
That should make sense, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK... still seems strange to me but doesn't add any complexity so no problem I guess.
len_plt = len(wkp_df) | ||
if len_plt < len_tot: | ||
len_dif = len_tot - len_plt | ||
len_pct = float(len_dif) / len_tot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... * 100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
|
||
:param threshold_ms: the minimum acceptable [ms] value to report | ||
graphically in the generated plots | ||
:type threshold_ms: int or float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention that this function returns the stats
pl.savefig(figname, bbox_inches='tight') | ||
|
||
# Return statistics | ||
return wkp_df.describe(percentiles=[0.95, 0.99]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any matplotllib fu so can't give much input on this code but the output looks great 👍 😛
@@ -159,6 +159,54 @@ def _dfg_activations_df(self, task): | |||
wkp_df = wkp_df[['activation_interval']].shift(-1) | |||
return wkp_df | |||
|
|||
@memoized | |||
def _dfg_runtimes_df(self, task): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on request for docstring
'(red: {} [ms] threshold)'\ | ||
.format(threshold_ms)) | ||
axes.axhline(y=threshold_ms / 1000., linewidth=2, | ||
color='r', linestyle='--') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed IRL I think "cumulative" is a misnomer, @credp suggested "ranked distribution". Also some suggestion to replace this with a real cumulative distribution. Not a big deal for me though, especially since we've already used "cumulative" for this type of plot elsewhere.
return cr.runtime | ||
if row['next_state'] in ['n']: | ||
return cr.runtime | ||
print "Unexpected next state: ", row['next_state'], ' @ ', row['t_start'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._log.error
?
9763216
to
9f46353
Compare
Not a strict dependency, but perhaps this should be merged after #303 |
DataFrame of task's wakeup/suspend events | ||
|
||
The returned DataFrame has these columns | ||
- Time: the time an event related to this task happened |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actually a column in the DataFrame it's just the index, I don't think you need to mention it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... because you know Pandas internals... but to the "average" user, that's nothing else that another column. I just want to describe what the timstamp represents, in this case it's "just" the task events... but in other cases, like the ones below, it's a "generated time" which corresponds explicitly the wakup or blocking time.
Perhaps I can specify that this is also the DF's index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per f2f discussion: I think it should say something like "The index is the time, in seconds, an event related to this task happened". The index is not a column and I think the "average" user of pandas is required to understand this (and might be confused by it being listed like this).
DataFrame of task's wakeup latencies | ||
|
||
The returned DataFrame has these columns: | ||
- Time: the time the task wakeups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
DataFrame of task's preemption latencies | ||
|
||
The returned DataFrame has these columns: | ||
- Time: the time the has been preempted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto again
The new CDF looks good to me although my judgement is very inexpert. Other than that, looks great apart from the nitpick about 'Time'. |
While testing with Jankbench on HiKey I noticed this problem with plotRuntimes(). Debug output reports
Also log says that 100% of samples fell under 40ms, but tabular data (df.T) reports a max of ~54ms. Trace uploaded here https://drive.google.com/open?id=0B0gETIMiqtYIN2hVN2JNNjFlRDA |
The reported 100% value is due to rounding at format time... there is just one sample (the max value) above the threshold... in a set of +5k samples. The actual percentage should be: 99.981% Regarding the events, I cannot find them in the trace you linked. I've used: grep "==>" results_customers/Juri/trace.txt | grep -v -e "[R|S|D|x] ==" | grep "D|K" However, it's true we do not currently parse such events, which are related to UNINTERRUPTIBLE tasks being killed (quite an interesting condition)... will try to make this case covered as well. If you can find the trace with these events and share it would be useful for testing. |
9f46353
to
78647e7
Compare
Fixed synchronization with setXTimeRange for existing and new plots. |
The currently cumulative function is just a plot of ordered latencies. This plots and reports a proper cumulative distribution function. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
78647e7
to
807ea78
Compare
Fixed documentation according to @bjackman comments. |
Perhaps the current version can still have some issues depending on specific trace events and tasks states specifically, but I would say that it's still worth to have it merged now and address small fixes in future PR. |
f7a3ecbbf Merge pull request ARM-software#302 from valschneider/ftrace_function 2ea55a58e ftrace: Add support to parse function tracing ('function' tracer) 412924f28 Merge pull request ARM-software#301 from JaviMerino/fix_documentation e231aca72 Merge pull request ARM-software#300 from douglas-raillard-arm/fix_custom_scope b25800328 doc: Update Dynamic traces 5babb40b6 ftrace: Add endtime attribute 23a35ecae ftrace: Avoid storing trace-cmd report output in memory 9d67e5555 tests: Fix unit tests 08c7da476 ftrace: Only add dynamic classes on non-custom scopes git-subtree-dir: external/trappy git-subtree-split: f7a3ecbbfb4a92031431dca31c48eb67ac9db08d
f7a3ecbbf Merge pull request ARM-software#302 from valschneider/ftrace_function 2ea55a58e ftrace: Add support to parse function tracing ('function' tracer) 412924f28 Merge pull request ARM-software#301 from JaviMerino/fix_documentation e231aca72 Merge pull request ARM-software#300 from douglas-raillard-arm/fix_custom_scope b25800328 doc: Update Dynamic traces 5babb40b6 ftrace: Add endtime attribute 23a35ecae ftrace: Avoid storing trace-cmd report output in memory 9d67e5555 tests: Fix unit tests 08c7da476 ftrace: Only add dynamic classes on non-custom scopes git-subtree-dir: external/trappy git-subtree-split: f7a3ecbbfb4a92031431dca31c48eb67ac9db08d
No description provided.