How to best handle "child cannot end after parent" errors? #8

soulgalore · 2019-05-03T07:04:19Z

Hi @aslushnikov
when I test cnet.com sometimes my trace logs is broken with the child cannot end after parent error. I've pushed an example trace here: https://github.com/soulgalore/tracium/blob/failing-cnet/examples/cnet.json

I think this started to happen 1-2 Chrome releases ago. I guess it should be fixed upstream but I wonder if Tracium should just swallow the error and continue?

Best
Peter

mbalabash · 2019-05-03T07:27:11Z

Hello
I have same error (Fatal trace logic error - child cannot end after parent). But only when i run Tracium on CI server (Travis CI).
On my personal laptop it doesn't happen.

patrickhulce · 2019-05-03T13:13:06Z

Our thinking was that if this has happened, all our assumptions about trace processing might not hold so it's better to alert the consumer than provide potentially false numbers.

Anything extra about the type of activity you're recording you can share?

mbalabash · 2019-05-03T13:32:09Z

Hi @patrickhulce
Thanks for quick response.

I have next html file:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Estimo Template</title>
  </head>
  <body>
    <script src="https://unpkg.com/react@16/umd/react.development.js"></script>
    <h1>Estimo</h1>
  </body>
</html>

And next code to capture Chrome trace:

...
await page.tracing.start({ path: traceFile })
try {
  await page.goto(getUrlToHtmlFile(lib.html), { timeout: options.timeout })
} catch (err) {
  handleSessionError(err, browser)
}
await page.tracing.stop()
...

Then i'm trying to parse it with Tracium:

async function generateTasksReport(pathToTraceFile) {
  const content = JSON.parse(await readFile(pathToTraceFile))
  const tasks = tracium.computeMainThreadTasks(content, {
    flatten: true,
  })
  return tasks
}

On my local machine it works perfectly.
But when i run it on CI it throws an error:

  /home/travis/build/mbalabash/estimo/node_modules/tracium/lib/main-thread-tasks.js:155
  Rejected promise returned by test. Reason:
  Error {
    message: 'Fatal trace logic error - child cannot end after parent',
  }
  Function._computeRecursiveSelfTime (node_modules/tracium/lib/main-thread-tasks.js:155:13)
  node_modules/tracium/lib/main-thread-tasks.js:159:37
  Function._computeRecursiveSelfTime (node_modules/tracium/lib/main-thread-tasks.js:159:8)
  node_modules/tracium/lib/main-thread-tasks.js:159:37
  Function._computeRecursiveSelfTime (node_modules/tracium/lib/main-thread-tasks.js:159:8)
  Function.getMainThreadTasks (node_modules/tracium/lib/main-thread-tasks.js:244:23)
  Function.compute (node_modules/tracium/lib/main-thread-tasks.js:272:28)
  Object.computeMainThreadTasks (node_modules/tracium/index.js:13:36)

This may be due to the fact that script refers to remote url?

soulgalore · 2019-05-03T13:43:33Z

From my side I think it's content related somehow? It happens sometimes when I test cnet.com, I test the same way all the time, but sometimes the trace fails. Since it's CNET I was thinking there's one of the ads or something that is acting up somehow?

patrickhulce · 2019-05-03T15:05:46Z

Hm interesting, thanks for the repro case @mbalabash.

@soulgalore are your failures also on a linux box or CI environment only?

soulgalore · 2019-05-03T15:26:09Z

@patrickhulce I could reproduce today on my Mac, else we are running in Docker on Linux (Ubuntu) where we also get it. I've also got reports from users that got the same error but on other pages, but they haven't shared their URLs. We used to use our own parsing lib but wanna move over to Tracium asap. And we got the same problem with events in the wrong order.

soulgalore · 2019-05-03T20:11:04Z

Had a go and tried to reproduce with puppeteer (I use Chromedriver normally). I didn't get the same error but the same behaviour that one out of three/four runs fails. The one failing are waiting a long time on some content (I'm running without any throttling) and the runs that are ok finish within a couple of seconds, but when it fails it takes 15-20s (maybe). And when it fails I get LHError: NO_TRACING_STARTED. So I think there's some content that somehow creates the problem?

const puppeteer = require('puppeteer');
const tracium = require('tracium');
const fs = require('fs');
const util = require('util');
const readFile = util.promisify(fs.readFile);

puppeteer.launch({ headless: false })
    .then(async (browser) => {
            const page = await browser.newPage();
            await page.setViewport({ width: 1024, height: 800 });
            await page.tracing.start({ path: 'trace.json' });
            await page.goto('http://www.cnet.com', { waitUntil: 'load', timeout: 0 });
            await page.tracing.stop();
            browser.close();
            const content = JSON.parse(await readFile('trace.json'));
            const tasks = tracium.computeMainThreadTasks(content, {
                    flatten: true,
             });
})

patrickhulce · 2019-05-04T04:21:40Z

Thanks @soulgalore! NO_TRACING_STARTED is a different problem that's particularly prevalent in m74+. We've got a fix for it in LH master but it hasn't been ported over here yet.

I'll have to dig in to those repro cases you've provided when I've got time. I haven't seen those cases locally before, but I'll try with your settings :)

mbalabash · 2019-05-04T07:27:16Z

Thanks! @patrickhulce
I have small update for my case.

This may be due to the fact that script refers to remote url?

No. I got same error when used local scripts.

soulgalore · 2019-05-09T19:55:07Z

Yeah, I understand that it's another error but I thought I saw the same kind of behaviour. I've seen this on another URL (https://www.aftonbladet.se), it's a Swedish newspaper with a lot of ads. Let me see if I can find a way to reproduce.

mbalabash · 2019-05-09T20:20:27Z

Hi @patrickhulce
Do you have any updates or some ideas?

patrickhulce · 2019-05-10T17:02:24Z

So far the only case where I've seen this happen where the trace isn't totally useless/non-sensical is when the ending trace event of the child that is B/E is missing and it was nested under an X.

Available options I see:

Anytime we are missing an end event for a B/E pair, we just end it at the last logical place that would've made sense (the end of the parent for example). This is a bit aggressive and could lead to some misleading conclusions, but might be good enough.
We omit any toplevel tasks (and all their children) that contained this nonsensical scenario.
We only support the above case if it's the last event in the trace.
We continue to fatally throw for a case of invalid data.

I tend to support option 4 in Lighthouse, but I see the value of 1/2 in the tracium case where other consumers might be ready to be more forgiving (maybe it's even a setting to pass in failLoudly: true or something?)

soulgalore · 2019-05-10T21:52:17Z

failLoudly: true

If that's ok for you all that would be great. I've been testing Tracium in different scenarios the last days and have gotten others errors too. I wonder if it could be because of that I use Chromedriver and somehow some events is dropped? I also do other scenarios, like testing a SPA.

I've got cases where one event don't get a duration/selfTime so that the https://github.com/aslushnikov/tracium/blob/master/lib/main-thread-tasks.js#L258 check fails.

Also got traces where I get end event without a start as in https://github.com/aslushnikov/tracium/blob/master/lib/main-thread-tasks.js#L111

Also using the debug log in those places would help.

patrickhulce · 2019-05-10T22:30:22Z

I wonder if it could be because of that I use Chromedriver and somehow some events is dropped? I also do other scenarios, like testing a SPA.

Interesting, yeah our trace processing up until tracium has always been cold load of page with raw protocol so it's possible these other scenarios increase the likelihood of these otherwise unlikely events from LH perspective.

@soulgalore if you could get us copies of any of these traces you're talking about I'd love to add some test cases and make sure we can handle them appropriately :)

soulgalore · 2019-05-11T08:56:42Z

I can collect a couple traces today/tomorrow. For SPA I'm injecting a nav start event for it to work. However I think the problem is on my side since I use the Chromedriver. Checking the docs it says "...will continue tracing until Chrome closes.". Without checking the code, I guess when we get the trace, the driver has not stopped the trace and it doesn't wait on Tracing.tracingComplete, so we probably don't get all the events. I'll try to switch to use CDP directly instead, that will minimise the risk (and work) for me :)

soulgalore · 2019-05-11T09:43:12Z

I made a quick POC and it works much better using CDP directly, sorry my fault, I'll try to push it to our test servers later today and then I will know for sure.

soulgalore · 2019-05-13T13:22:45Z

Ok been running these for a while and it works much better, so it was on my side! I've got one use case where Tracium sometimes breaks using a SPA. The use case is like this: Hit a menu item to open up a menu, wait for it to appear, start a trace, click on the new menu item and measure that request and load.

Sometimes (not always) I run into Fatal trace logic error - unexpected end event with an event that looks like this:

{ pid: 4034,
  tid: 4034,
  ts: 17974008372878,
  ph: 'E',
  cat: 'blink,devtools.timeline',
  name: 'HitTest',
  args:
   { endData:
      { x: 683,
        y: 282,
        rectilinear: true,
        move: true,
        nodeId: 95,
        nodeName: 'CANVAS class=\'flot-overlay\'' } },
  tts: 2163034 }

Maybe I can fix that on my side, not sure though. I can post the full trace log if it helps!

patrickhulce · 2019-05-13T14:18:53Z

Thanks @soulgalore that's good to hear many of the issues sorted themselves out!

wait for it to appear, start a trace, click on the new menu item and measure that request and load.

Ah I can see this scenario if we start a trace in the middle of some event and we only get the end part of it which isn't the case in normal LH operations.

I think turning this into a flag to drop these invalid cases (maybe stick the events into an array so the user can fail on their own if there were lots of them?) seems like the best way to go.

patrickhulce · 2019-05-13T14:19:17Z

Oh and a copy of that full trace would be awesome for test case prepping if you're willing :)

soulgalore · 2019-05-13T17:08:45Z

Sharing is caring: https://gist.github.com/soulgalore/6b5d13a485334afe009eab3866379e5f

mbalabash · 2019-06-04T12:26:09Z

Hello @patrickhulce @aslushnikov

I added docker support for reproducting Travis CI errors in estimo.

And it seems like strange chrome / puppeteer behavior.
Test results are unstable. Same test for same container may pass few times and fails few times with error: Fatal trace logic error - child cannot end after parent

I run puppeteer like this:

const browser = await puppeteer.launch({ headless, args: ['--no-sandbox', '--disable-setuid-sandbox'] })

All code for work with puppeteer here.

Docker config

I'm still do stuff like this (to analyze only js parse/execution time):

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Estimo Template</title>
  </head>
  <body>
    <h1>Estimo</h1>
    <script src="https://unpkg.com/react@16/umd/react.development.js"></script>
  </body>
</html>

And i got errors only in Docker / CI (on my host machine all works perfectly).
Unstable tests confusing me a little.
Do you have any ideas?

Thanks

patrickhulce · 2019-06-04T14:18:31Z

Ah thanks so much for the docker repro @mbalabash! There's been movement on getting tracium publishable from within LH itself, so once that is done I'll work on getting a version published that can better handle these cases.

Trace events are not sorted by default. Proper sorting not only uses timestamps, but also accounts for trace event duration. Reference #8

aslushnikov · 2019-06-18T17:17:43Z

I pushed the 0.2.1 version that sorts out some of the errors here that relate to trace event sorting.
Unfortunately, these don't help with the original cnet trace there - I'll have to look into it separately.

soulgalore · 2019-06-18T21:51:19Z

Thanks @aslushnikov! Right now I'm using a modified version of Tracium, where we don't throw the errors. I've been continuously running Tracium and got a couple of more errors (that happens sometimes, not all the time). I get them all on a single navigation on "normal" pages.

Fatal trace logic error - unexpected end event
Fatal trace logic error - expected start event, got
Fatal trace logic error - child cannot end after parent
Invalid task timing data

I can revert and run them on another server just to collect the trace log. But I wonder if you don't also get them in Lighthouse? We run the same trace config and since I skipping enabling trace using Chromedriver and instead using CDP I'm 99.9% sure the trace log that we collect is correct in the sense it's what we expect out of Chrome.

patrickhulce · 2019-06-18T21:55:42Z

I've been digging more into these and there's a major root issue we've identified that Chrome's invariants for trace event ordering have drifted slightly from what was expected when the original code was written ~2 years ago. To be fair to our Chromium colleagues, there was a document that notes order could be different in the future, but it was difficult to find these bugs without any examples of how it could be different. Now that we have them it'll just take time to sort through :)

I'm working on some LH PRs that should hopefully help a lot with this class of error.

soulgalore mentioned this issue May 10, 2019

Support missing TracingStartedInBrowser #10

Open

patrickhulce mentioned this issue Jun 8, 2019

Fatal trace logic error - expected start event, got X GoogleChrome/lighthouse#7764

Closed

mbalabash mentioned this issue Jun 9, 2019

Use Puppeter Core mbalabash/estimo#7

Closed

aslushnikov added a commit that referenced this issue Jun 18, 2019

fix: properly sort trace events

7f608d4

Trace events are not sorted by default. Proper sorting not only uses timestamps, but also accounts for trace event duration. Reference #8

aslushnikov added a commit that referenced this issue Jun 18, 2019

fix: properly sort trace events

86cc046

Trace events are not sorted by default. Proper sorting not only uses timestamps, but also accounts for trace event duration. Reference #8

aslushnikov mentioned this issue Jun 18, 2019

fix: properly sort trace events #11

Merged

aslushnikov added a commit that referenced this issue Jun 18, 2019

fix: properly sort trace events (#11)

24ac55c

Trace events are not sorted by default. Proper sorting not only uses timestamps, but also accounts for trace event duration. Reference #8

patrickhulce mentioned this issue Jun 18, 2019

core(tracehouse): sort trace events by nesting order GoogleChrome/lighthouse#9230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to best handle "child cannot end after parent" errors? #8

How to best handle "child cannot end after parent" errors? #8

soulgalore commented May 3, 2019

mbalabash commented May 3, 2019 •

edited

patrickhulce commented May 3, 2019

mbalabash commented May 3, 2019 •

edited

soulgalore commented May 3, 2019

patrickhulce commented May 3, 2019

soulgalore commented May 3, 2019

soulgalore commented May 3, 2019

patrickhulce commented May 4, 2019

mbalabash commented May 4, 2019

soulgalore commented May 9, 2019

mbalabash commented May 9, 2019

patrickhulce commented May 10, 2019

soulgalore commented May 10, 2019

patrickhulce commented May 10, 2019

soulgalore commented May 11, 2019

soulgalore commented May 11, 2019 •

edited

soulgalore commented May 13, 2019

patrickhulce commented May 13, 2019

patrickhulce commented May 13, 2019

soulgalore commented May 13, 2019

mbalabash commented Jun 4, 2019

patrickhulce commented Jun 4, 2019

aslushnikov commented Jun 18, 2019

soulgalore commented Jun 18, 2019

patrickhulce commented Jun 18, 2019 •

edited

How to best handle "child cannot end after parent" errors? #8

How to best handle "child cannot end after parent" errors? #8

Comments

soulgalore commented May 3, 2019

mbalabash commented May 3, 2019 • edited

patrickhulce commented May 3, 2019

mbalabash commented May 3, 2019 • edited

soulgalore commented May 3, 2019

patrickhulce commented May 3, 2019

soulgalore commented May 3, 2019

soulgalore commented May 3, 2019

patrickhulce commented May 4, 2019

mbalabash commented May 4, 2019

soulgalore commented May 9, 2019

mbalabash commented May 9, 2019

patrickhulce commented May 10, 2019

soulgalore commented May 10, 2019

patrickhulce commented May 10, 2019

soulgalore commented May 11, 2019

soulgalore commented May 11, 2019 • edited

soulgalore commented May 13, 2019

patrickhulce commented May 13, 2019

patrickhulce commented May 13, 2019

soulgalore commented May 13, 2019

mbalabash commented Jun 4, 2019

patrickhulce commented Jun 4, 2019

aslushnikov commented Jun 18, 2019

soulgalore commented Jun 18, 2019

patrickhulce commented Jun 18, 2019 • edited

mbalabash commented May 3, 2019 •

edited

mbalabash commented May 3, 2019 •

edited

soulgalore commented May 11, 2019 •

edited

patrickhulce commented Jun 18, 2019 •

edited