extension trace issues (occasional -1 on TTI) #753

brendankenny · 2016-10-06T21:12:46Z

I spent a little while looking into the latest trace issues, which seem to be only occurring in the extension. Usual manifestation is a -1 on TTI due to an error inside it while calculating FMP, even though the exact same calculation of FMP on the same trace was successful in an earlier audit. news.google.com is a site that can often produce this. Information dump in case this takes me a little while to get back to or someone else wants to take over.

tl;dr, I'm currently thinking:

Sort in ts order for FMP calculation. This would make us more correct but also unable to calculate FMP in more cases in the extension, at least.
Fix anywhere else we're dependent on ordering of entries in traceEvents array
Fix Adopt firstMeaningfulPaint trace event #618 when m54 comes out (~end of October)
Find out why ordering of events from the same thread can be different when looking at ts vs tts and (if expected) which timestamp should win in those cases
Better surface debugStrings in the extension so it's clearer why -1s are popping up. They're in the pretty print version of the report for the CLI, so it would be nice to at least log.warn them somewhere (in lighthouse-background.js before populating the report, maybe?) so you don't have to set breakpoints just to see them on the extension side.

What I've found so far:
The proximate cause is weird trace event ordering. When we're trying to find what we consider to be navigation start in a trace, we look for the first navigationStart event after the TracingStartedInPage event. A typical example problem trace, highlighting the TracingStartedInPage and navigationStart events:

If we look at event.ts ("the tracing clock timestamp") in this example, both navigationStart events occur before the TracingStartedInPage event, so according to ts, there is no nav start event after tracing started.
If we look at event.tts ("the thread clock timestamp of the event"), there is a navigationStart event after the tracing started.

All three events are in the same thread, so I don't see how they could have a different ordering based on the clock used. Interestingly, after many many runs I haven't been able to recreate this situation in the CLI, so I'm wondering if this is a weird timestamp bug somehow due to tracing from an extension.

I've uploaded an example trace to a gist, which you can look at in timeline viewer.

Other issues at work here:

The reason that one FMP calculation is successful while the next fails is that catapult/traceviewer is changing the array order of the trace events out from under us when we pass the trace into it between the two FMP calls. Specifically, it appears to sort the events purely by ts, which is not the order before we pass it in to traceviewer (it appears to be mostly, but not completely, in tts order).

The calculation code assumes that a navigationStart event will be found after TracingStartedInPage. As stated above, if in tts order, we find the navigationStart where we expect it. When traceviewer sorts in ts order, no navigationStart is after tracing start, so we fail.

We could prevent traceviewer from changing things by only passing a copy of the trace in or we could cache the FMP calculation, but...
We're sensitive to ordering of events in the trace, which we shouldn't be. In this case, if we explicitly order by tts before calculating FMP (as we implicitly depend on in the first run through FMP), we can continue to calculate successfully. However, we use ts for basically all other calculations, so it feels like we should be explicitly ordering by ts, always fail at calculating FMP in this situation, and hope this can be fixed upstream.
When Chrome 54 hits stable, we'll have a totally new way of calculating FMP which won't be sensitive to this, so some of this will soon (~end of October) be irrelevant. However, we should be more certain on the timestamps we use, be insensitive to trace event ordering, and we'll still need to know which navigationStart event we care about for other calculations.

The text was updated successfully, but these errors were encountered:

* Use fMP from chrome timings instead of calculating it ourselves. * Cleanup audit & driver * Cleanup unused functions * Fix fmp calculations * Fix test records fixes #618, #1010, #890 and a part of #753 pr: #1066

paulirish · 2016-12-02T01:02:37Z

FMP is now coming from the trace event, thanks to #1066

* Use fMP from chrome timings instead of calculating it ourselves. * Cleanup audit & driver * Cleanup unused functions * Fix fmp calculations * Fix test records fixes GoogleChrome#618, GoogleChrome#1010, GoogleChrome#890 and a part of GoogleChrome#753 pr: GoogleChrome#1066

brendankenny mentioned this issue Oct 7, 2016

sort trace by timestamp before calculating FMP #756

Merged

wardpeet mentioned this issue Nov 28, 2016

Use fMP from chrome timings instead of calculating it ourselves. #1066

Merged

paulirish mentioned this issue Dec 13, 2016

FMP metrics: Don't require tracingStartedInPage to precede navStart #1152

Merged

brendankenny closed this as completed in #1152 Dec 14, 2016

addyosmani mentioned this issue May 30, 2017

navigationStart was not found in the trace #2394

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extension trace issues (occasional -1 on TTI) #753

extension trace issues (occasional -1 on TTI) #753

brendankenny commented Oct 6, 2016 •

edited by paulirish

Loading

paulirish commented Dec 2, 2016

extension trace issues (occasional -1 on TTI) #753

extension trace issues (occasional -1 on TTI) #753

Comments

brendankenny commented Oct 6, 2016 • edited by paulirish Loading

paulirish commented Dec 2, 2016

brendankenny commented Oct 6, 2016 •

edited by paulirish

Loading