Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profiling analysis #302

Merged
merged 6 commits into from Feb 27, 2017
Merged

Conversation

derkling
Copy link
Contributor

No description provided.

@derkling
Copy link
Contributor Author

derkling commented Feb 17, 2017

Here is a more interesting examples of the produced plots which have been generated focusing on Surfaceflinger running on an HiKey board:
https://gist.github.com/derkling/256256f47bc9daf4883f3cb6e356e26b

Copy link
Contributor

@bjackman bjackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not at all familiar with the kernel stuff that's actually being analysed so I can only give superficial input here but I trust your logic :D, all looks good aside from my nits. Some of the comments apply to both analyses but I just put them in one.

Also can I suggest that you put those wonderful docstrings from the plot* methods into the notebooks (help or just print foo.bar.__doc__)? I was a bit lost until I read them.

@@ -146,6 +146,19 @@ def _dfg_latency_preemption_df(self, task):
df.rename(columns={'t_delta' : 'preempt_latency'}, inplace=True)
return df

@memoized
def _dfg_activations_df(self, task):
Copy link
Contributor

@bjackman bjackman Feb 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring saying what the column in the output df is, and the format of task (comm string or PID?). Need to figure out how it can be done but eventually I'd like to have Sphinx API docs for the trace analysis biz.

where the trace is.

:param task: the task to report latencies for
:type task: int or list(str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning that int is PID and list(str) is comm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's existing code.. will fix in another patch


All plots are parameterized based on the value of threshold_ms, which
can be used to filter activations intervals bigger than 2 times this
value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this might be naive.. why is it bigger than 2 times this value? Why not just use the value itself and make the default 32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm naive as well... ;-)
I would say that, provided a certain threshold, we can still be interested in checking if there are many samples right above that threshold. Thus, I filter the dataframe at 2x threshold [ms] which should allow to see focus the analysis around the region of interest.

That should make sense, isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... still seems strange to me but doesn't add any complexity so no problem I guess.

len_plt = len(wkp_df)
if len_plt < len_tot:
len_dif = len_tot - len_plt
len_pct = float(len_dif) / len_tot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... * 100

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!


:param threshold_ms: the minimum acceptable [ms] value to report
graphically in the generated plots
:type threshold_ms: int or float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should mention that this function returns the stats

pl.savefig(figname, bbox_inches='tight')

# Return statistics
return wkp_df.describe(percentiles=[0.95, 0.99])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any matplotllib fu so can't give much input on this code but the output looks great 👍 😛

@@ -159,6 +159,54 @@ def _dfg_activations_df(self, task):
wkp_df = wkp_df[['activation_interval']].shift(-1)
return wkp_df

@memoized
def _dfg_runtimes_df(self, task):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on request for docstring

'(red: {} [ms] threshold)'\
.format(threshold_ms))
axes.axhline(y=threshold_ms / 1000., linewidth=2,
color='r', linestyle='--')
Copy link
Contributor

@bjackman bjackman Feb 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed IRL I think "cumulative" is a misnomer, @credp suggested "ranked distribution". Also some suggestion to replace this with a real cumulative distribution. Not a big deal for me though, especially since we've already used "cumulative" for this type of plot elsewhere.

return cr.runtime
if row['next_state'] in ['n']:
return cr.runtime
print "Unexpected next state: ", row['next_state'], ' @ ', row['t_start']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._log.error?

@derkling derkling force-pushed the add-profiling-analysis branch 2 times, most recently from 9763216 to 9f46353 Compare February 20, 2017 12:16
@derkling
Copy link
Contributor Author

Addressed main @bjackman comments and added some small updates required by Joel.
After #301 gets merged I think we can merge this as well ;-)

@derkling derkling added this to the 17.02 milestone Feb 20, 2017
@derkling derkling mentioned this pull request Feb 20, 2017
@derkling
Copy link
Contributor Author

Not a strict dependency, but perhaps this should be merged after #303

DataFrame of task's wakeup/suspend events

The returned DataFrame has these columns
- Time: the time an event related to this task happened
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually a column in the DataFrame it's just the index, I don't think you need to mention it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... because you know Pandas internals... but to the "average" user, that's nothing else that another column. I just want to describe what the timstamp represents, in this case it's "just" the task events... but in other cases, like the ones below, it's a "generated time" which corresponds explicitly the wakup or blocking time.

Perhaps I can specify that this is also the DF's index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per f2f discussion: I think it should say something like "The index is the time, in seconds, an event related to this task happened". The index is not a column and I think the "average" user of pandas is required to understand this (and might be confused by it being listed like this).

DataFrame of task's wakeup latencies

The returned DataFrame has these columns:
- Time: the time the task wakeups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

DataFrame of task's preemption latencies

The returned DataFrame has these columns:
- Time: the time the has been preempted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto again

@bjackman
Copy link
Contributor

The new CDF looks good to me although my judgement is very inexpert. Other than that, looks great apart from the nitpick about 'Time'.

@jlelli
Copy link
Contributor

jlelli commented Feb 20, 2017

While testing with Jankbench on HiKey I noticed this problem with plotRuntimes().

Debug output reports

2017-02-20 14:15:46,293 WARNING : Unexpected next state: D|K @ 11.605174
2017-02-20 14:15:46,294 WARNING : Unexpected next state: D|K @ 11.623388
2017-02-20 14:15:46,295 WARNING : Unexpected next state: D|K @ 11.627947
2017-02-20 14:15:46,296 WARNING : Unexpected next state: D|K @ 11.634416
2017-02-20 14:15:46,297 WARNING : Unexpected next state: D|K @ 11.638789
2017-02-20 14:15:46,298 WARNING : Unexpected next state: D|K @ 11.641854
2017-02-20 14:15:46,299 WARNING : Unexpected next state: D|K @ 11.644686
2017-02-20 14:15:46,300 WARNING : Unexpected next state: D|K @ 11.646673
2017-02-20 14:15:46,301 WARNING : Unexpected next state: D|K @ 11.649564
2017-02-20 14:15:46,304 WARNING : Unexpected next state: D|K @ 11.686784
2017-02-20 14:15:46,309 WARNING : Unexpected next state: D|K @ 11.691463
2017-02-20 14:15:46,310 WARNING : Unexpected next state: D|K @ 11.693135
2017-02-20 14:15:46,312 WARNING : Unexpected next state: D|K @ 11.699686
2017-02-20 14:15:46,314 WARNING : Unexpected next state: D|K @ 11.700721

Also log says that 100% of samples fell under 40ms, but tabular data (df.T) reports a max of ~54ms.

Trace uploaded here https://drive.google.com/open?id=0B0gETIMiqtYIN2hVN2JNNjFlRDA
Gist @ https://gist.github.com/jlelli/cd31da0f22cb859ee842ba44209425c7 (see last two cells).

@derkling
Copy link
Contributor Author

The reported 100% value is due to rounding at format time... there is just one sample (the max value) above the threshold... in a set of +5k samples. The actual percentage should be: 99.981%
Will fix by adding a bit of resolution to the formatting strings.

Regarding the events, I cannot find them in the trace you linked. I've used:

grep "==>" results_customers/Juri/trace.txt  | grep -v -e "[R|S|D|x] ==" | grep "D|K"

However, it's true we do not currently parse such events, which are related to UNINTERRUPTIBLE tasks being killed (quite an interesting condition)... will try to make this case covered as well.

If you can find the trace with these events and share it would be useful for testing.

@derkling
Copy link
Contributor Author

Fixed synchronization with setXTimeRange for existing and new plots.

The currently cumulative function is just a plot of ordered latencies.
This plots and reports a proper cumulative distribution function.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
@derkling
Copy link
Contributor Author

Fixed documentation according to @bjackman comments.

@derkling
Copy link
Contributor Author

Perhaps the current version can still have some issues depending on specific trace events and tasks states specifically, but I would say that it's still worth to have it merged now and address small fixes in future PR.

@jlelli jlelli merged commit f0b6afb into ARM-software:master Feb 27, 2017
douglas-raillard-arm added a commit to douglas-raillard-arm/lisa that referenced this pull request Dec 4, 2019
f7a3ecbbf Merge pull request ARM-software#302 from valschneider/ftrace_function
2ea55a58e ftrace: Add support to parse function tracing ('function' tracer)
412924f28 Merge pull request ARM-software#301 from JaviMerino/fix_documentation
e231aca72 Merge pull request ARM-software#300 from douglas-raillard-arm/fix_custom_scope
b25800328 doc: Update Dynamic traces
5babb40b6 ftrace: Add endtime attribute
23a35ecae ftrace: Avoid storing trace-cmd report output in memory
9d67e5555 tests: Fix unit tests
08c7da476 ftrace: Only add dynamic classes on non-custom scopes

git-subtree-dir: external/trappy
git-subtree-split: f7a3ecbbfb4a92031431dca31c48eb67ac9db08d
douglas-raillard-arm added a commit to douglas-raillard-arm/lisa that referenced this pull request Dec 4, 2019
f7a3ecbbf Merge pull request ARM-software#302 from valschneider/ftrace_function
2ea55a58e ftrace: Add support to parse function tracing ('function' tracer)
412924f28 Merge pull request ARM-software#301 from JaviMerino/fix_documentation
e231aca72 Merge pull request ARM-software#300 from douglas-raillard-arm/fix_custom_scope
b25800328 doc: Update Dynamic traces
5babb40b6 ftrace: Add endtime attribute
23a35ecae ftrace: Avoid storing trace-cmd report output in memory
9d67e5555 tests: Fix unit tests
08c7da476 ftrace: Only add dynamic classes on non-custom scopes

git-subtree-dir: external/trappy
git-subtree-split: f7a3ecbbfb4a92031431dca31c48eb67ac9db08d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants