Add profiling analysis #302

derkling · 2017-02-17T11:38:56Z

No description provided.

derkling · 2017-02-17T11:45:20Z

Here is a more interesting examples of the produced plots which have been generated focusing on Surfaceflinger running on an HiKey board:
https://gist.github.com/derkling/256256f47bc9daf4883f3cb6e356e26b

bjackman

I'm not at all familiar with the kernel stuff that's actually being analysed so I can only give superficial input here but I trust your logic :D, all looks good aside from my nits. Some of the comments apply to both analyses but I just put them in one.

Also can I suggest that you put those wonderful docstrings from the plot* methods into the notebooks (help or just print foo.bar.__doc__)? I was a bit lost until I read them.

bjackman · 2017-02-17T12:46:54Z

libs/utils/analysis/latency_analysis.py

@@ -146,6 +146,19 @@ def _dfg_latency_preemption_df(self, task):
        df.rename(columns={'t_delta' : 'preempt_latency'}, inplace=True)
        return df

+    @memoized
+    def _dfg_activations_df(self, task):


Can you add a docstring saying what the column in the output df is, and the format of task (comm string or PID?). Need to figure out how it can be done but eventually I'd like to have Sphinx API docs for the trace analysis biz.

bjackman · 2017-02-17T12:51:47Z

libs/utils/analysis/latency_analysis.py

+        where the trace is.
+
+        :param task: the task to report latencies for
+        :type task: int or list(str)


Might be worth mentioning that int is PID and list(str) is comm

That's existing code.. will fix in another patch

bjackman · 2017-02-17T12:53:15Z

libs/utils/analysis/latency_analysis.py

+
+        All plots are parameterized based on the value of threshold_ms, which
+        can be used to filter activations intervals bigger than 2 times this
+        value.


Sorry this might be naive.. why is it bigger than 2 times this value? Why not just use the value itself and make the default 32?

I'm naive as well... ;-)
I would say that, provided a certain threshold, we can still be interested in checking if there are many samples right above that threshold. Thus, I filter the dataframe at 2x threshold [ms] which should allow to see focus the analysis around the region of interest.

That should make sense, isn't it?

OK... still seems strange to me but doesn't add any complexity so no problem I guess.

bjackman · 2017-02-17T12:54:11Z

libs/utils/analysis/latency_analysis.py

+        len_plt = len(wkp_df)
+        if len_plt < len_tot:
+            len_dif = len_tot - len_plt
+            len_pct = float(len_dif) / len_tot


... * 100

Good catch!

bjackman · 2017-02-17T13:00:08Z

libs/utils/analysis/latency_analysis.py

+
+        :param threshold_ms: the minimum acceptable [ms] value to report
+                             graphically in the generated plots
+        :type threshold_ms: int or float


Should mention that this function returns the stats

bjackman · 2017-02-17T13:01:33Z

libs/utils/analysis/latency_analysis.py

+        pl.savefig(figname, bbox_inches='tight')
+
+        # Return statistics
+        return wkp_df.describe(percentiles=[0.95, 0.99])


I don't have any matplotllib fu so can't give much input on this code but the output looks great 👍 😛

bjackman · 2017-02-17T13:02:11Z

libs/utils/analysis/latency_analysis.py

@@ -159,6 +159,54 @@ def _dfg_activations_df(self, task):
        wkp_df = wkp_df[['activation_interval']].shift(-1)
        return wkp_df

+    @memoized
+    def _dfg_runtimes_df(self, task):


Ditto on request for docstring

bjackman · 2017-02-17T13:08:30Z

libs/utils/analysis/latency_analysis.py

+                                          '(red: {} [ms] threshold)'\
+                                          .format(threshold_ms))
+        axes.axhline(y=threshold_ms / 1000., linewidth=2,
+                     color='r', linestyle='--')


As discussed IRL I think "cumulative" is a misnomer, @credp suggested "ranked distribution". Also some suggestion to replace this with a real cumulative distribution. Not a big deal for me though, especially since we've already used "cumulative" for this type of plot elsewhere.

bjackman · 2017-02-17T13:35:14Z

libs/utils/analysis/latency_analysis.py

+                return cr.runtime
+            if row['next_state'] in ['n']:
+                return cr.runtime
+            print "Unexpected next state: ", row['next_state'], ' @ ', row['t_start']


self._log.error?

derkling · 2017-02-20T15:37:05Z

Addressed main @bjackman comments and added some small updates required by Joel.
After #301 gets merged I think we can merge this as well ;-)

derkling · 2017-02-20T15:53:51Z

Not a strict dependency, but perhaps this should be merged after #303

bjackman · 2017-02-20T18:47:02Z

libs/utils/analysis/latency_analysis.py

+        DataFrame of task's wakeup/suspend events
+
+        The returned DataFrame has these columns
+        - Time: the time an event related to this task happened


This isn't actually a column in the DataFrame it's just the index, I don't think you need to mention it here.

Well... because you know Pandas internals... but to the "average" user, that's nothing else that another column. I just want to describe what the timstamp represents, in this case it's "just" the task events... but in other cases, like the ones below, it's a "generated time" which corresponds explicitly the wakup or blocking time.

Perhaps I can specify that this is also the DF's index.

As per f2f discussion: I think it should say something like "The index is the time, in seconds, an event related to this task happened". The index is not a column and I think the "average" user of pandas is required to understand this (and might be confused by it being listed like this).

bjackman · 2017-02-20T18:47:12Z

libs/utils/analysis/latency_analysis.py

+        DataFrame of task's wakeup latencies
+
+        The returned DataFrame has these columns:
+        - Time: the time the task wakeups


bjackman · 2017-02-20T18:47:31Z

libs/utils/analysis/latency_analysis.py

+        DataFrame of task's preemption latencies
+
+        The returned DataFrame has these columns:
+        - Time: the time the has been preempted


Ditto again

bjackman · 2017-02-20T19:01:51Z

The new CDF looks good to me although my judgement is very inexpert. Other than that, looks great apart from the nitpick about 'Time'.

jlelli · 2017-02-20T22:22:59Z

While testing with Jankbench on HiKey I noticed this problem with plotRuntimes().

Debug output reports

2017-02-20 14:15:46,293 WARNING : Unexpected next state: D|K @ 11.605174
2017-02-20 14:15:46,294 WARNING : Unexpected next state: D|K @ 11.623388
2017-02-20 14:15:46,295 WARNING : Unexpected next state: D|K @ 11.627947
2017-02-20 14:15:46,296 WARNING : Unexpected next state: D|K @ 11.634416
2017-02-20 14:15:46,297 WARNING : Unexpected next state: D|K @ 11.638789
2017-02-20 14:15:46,298 WARNING : Unexpected next state: D|K @ 11.641854
2017-02-20 14:15:46,299 WARNING : Unexpected next state: D|K @ 11.644686
2017-02-20 14:15:46,300 WARNING : Unexpected next state: D|K @ 11.646673
2017-02-20 14:15:46,301 WARNING : Unexpected next state: D|K @ 11.649564
2017-02-20 14:15:46,304 WARNING : Unexpected next state: D|K @ 11.686784
2017-02-20 14:15:46,309 WARNING : Unexpected next state: D|K @ 11.691463
2017-02-20 14:15:46,310 WARNING : Unexpected next state: D|K @ 11.693135
2017-02-20 14:15:46,312 WARNING : Unexpected next state: D|K @ 11.699686
2017-02-20 14:15:46,314 WARNING : Unexpected next state: D|K @ 11.700721

Also log says that 100% of samples fell under 40ms, but tabular data (df.T) reports a max of ~54ms.

Trace uploaded here https://drive.google.com/open?id=0B0gETIMiqtYIN2hVN2JNNjFlRDA
Gist @ https://gist.github.com/jlelli/cd31da0f22cb859ee842ba44209425c7 (see last two cells).

derkling · 2017-02-22T10:31:37Z

The reported 100% value is due to rounding at format time... there is just one sample (the max value) above the threshold... in a set of +5k samples. The actual percentage should be: 99.981%
Will fix by adding a bit of resolution to the formatting strings.

Regarding the events, I cannot find them in the trace you linked. I've used:

grep "==>" results_customers/Juri/trace.txt  | grep -v -e "[R|S|D|x] ==" | grep "D|K"

However, it's true we do not currently parse such events, which are related to UNINTERRUPTIBLE tasks being killed (quite an interesting condition)... will try to make this case covered as well.

If you can find the trace with these events and share it would be useful for testing.

derkling · 2017-02-24T15:05:37Z

Fixed synchronization with setXTimeRange for existing and new plots.

The currently cumulative function is just a plot of ordered latencies. This plots and reports a proper cumulative distribution function. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>

derkling · 2017-02-27T14:29:09Z

Fixed documentation according to @bjackman comments.

derkling · 2017-02-27T14:33:46Z

Perhaps the current version can still have some issues depending on specific trace events and tasks states specifically, but I would say that it's still worth to have it merged now and address small fixes in future PR.

f7a3ecbbf Merge pull request ARM-software#302 from valschneider/ftrace_function 2ea55a58e ftrace: Add support to parse function tracing ('function' tracer) 412924f28 Merge pull request ARM-software#301 from JaviMerino/fix_documentation e231aca72 Merge pull request ARM-software#300 from douglas-raillard-arm/fix_custom_scope b25800328 doc: Update Dynamic traces 5babb40b6 ftrace: Add endtime attribute 23a35ecae ftrace: Avoid storing trace-cmd report output in memory 9d67e5555 tests: Fix unit tests 08c7da476 ftrace: Only add dynamic classes on non-custom scopes git-subtree-dir: external/trappy git-subtree-split: f7a3ecbbfb4a92031431dca31c48eb67ac9db08d

derkling assigned jlelli Feb 17, 2017

derkling requested review from ionela-voinescu, jlelli and bjackman February 17, 2017 11:38

bjackman reviewed Feb 17, 2017

View reviewed changes

derkling force-pushed the add-profiling-analysis branch 2 times, most recently from 9763216 to 9f46353 Compare February 20, 2017 12:16

derkling added this to the 17.02 milestone Feb 20, 2017

derkling mentioned this pull request Feb 20, 2017

Fix platform parsing #303

Merged

bjackman reviewed Feb 20, 2017

View reviewed changes

derkling force-pushed the add-profiling-analysis branch from 9f46353 to 78647e7 Compare February 24, 2017 15:04

derkling added 6 commits February 27, 2017 14:28

analysis/latency: fixes to return a proper CDF

a5fd1b5

The currently cumulative function is just a plot of ordered latencies. This plots and reports a proper cumulative distribution function. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>

analysis/latency: fixes to keep plots in sync with XTimeRange

a2f0a76

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>

analysis/latency: add missing documentation

c3e5038

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>

utils/analysis/latency: add support to activations analysis

22cbe16

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>

utils/analysis/latency: add support to runtimes analysis

2fe4ef1

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>

examples/latency: add examples for Runtimes and Activations Analysis

807ea78

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>

derkling force-pushed the add-profiling-analysis branch from 78647e7 to 807ea78 Compare February 27, 2017 14:28

jlelli merged commit f0b6afb into ARM-software:master Feb 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add profiling analysis #302

Add profiling analysis #302

derkling commented Feb 17, 2017

derkling commented Feb 17, 2017 •

edited

bjackman left a comment

bjackman Feb 17, 2017 •

edited

bjackman Feb 17, 2017

derkling Feb 17, 2017

bjackman Feb 17, 2017

derkling Feb 17, 2017

bjackman Feb 17, 2017

bjackman Feb 17, 2017

derkling Feb 17, 2017

bjackman Feb 17, 2017

bjackman Feb 17, 2017

bjackman Feb 17, 2017

bjackman Feb 17, 2017 •

edited

bjackman Feb 17, 2017

derkling commented Feb 20, 2017

derkling commented Feb 20, 2017

bjackman Feb 20, 2017

derkling Feb 20, 2017

bjackman Feb 21, 2017

bjackman Feb 20, 2017

bjackman Feb 20, 2017

bjackman commented Feb 20, 2017

jlelli commented Feb 20, 2017 •

edited

derkling commented Feb 22, 2017

derkling commented Feb 24, 2017

derkling commented Feb 27, 2017

derkling commented Feb 27, 2017

Add profiling analysis #302

Add profiling analysis #302

Conversation

derkling commented Feb 17, 2017

derkling commented Feb 17, 2017 • edited

bjackman left a comment

Choose a reason for hiding this comment

bjackman Feb 17, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjackman Feb 17, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derkling commented Feb 20, 2017

derkling commented Feb 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjackman commented Feb 20, 2017

jlelli commented Feb 20, 2017 • edited

derkling commented Feb 22, 2017

derkling commented Feb 24, 2017

derkling commented Feb 27, 2017

derkling commented Feb 27, 2017

derkling commented Feb 17, 2017 •

edited

bjackman Feb 17, 2017 •

edited

bjackman Feb 17, 2017 •

edited

jlelli commented Feb 20, 2017 •

edited