Flaky E2E: Jest failed to execute suite #70674

worldomonation · 2022-12-01T23:01:01Z

Spec file

various

TeamCity ID

9073509

Logs

page.screenshot: Target crashed
=========================== logs ===========================
taking page screenshot
============================================================
page.screenshot: Target crashed
=========================== logs ===========================
taking page screenshot
============================================================
    at JestEnvironmentPlaywright.handleTestEvent (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/packages/calypso-e2e/src/jest-playwright-config/environment.ts:344:28)
    at dispatch (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/jest-circus/build/state.js:75:5)
    at runAndTransformResultsToJestFormat (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapterInit.js:234:3)
    at jestAdapter (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapter.js:82:19)
    at runTestInternal (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/jest-runner/build/runTest.js:389:16)
    at runTest (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/jest-runner/build/runTest.js:475:34)
    at Object.worker (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/jest-runner/build/testWorker.js:133:12)

The text was updated successfully, but these errors were encountered:

worldomonation · 2022-12-01T23:03:27Z

This has been an ongoing issue since the early days of the framework, but it's gotten a lot more frequent in the last few months.

There have been 8 occurrences for the editor/editor__revision.ts spec alone for the last week:

and 13 cases in the editor__navbar.ts spec in the same time period:

worldomonation · 2022-12-06T23:36:11Z

Enabled the pw:browser debug level in Playwright and after the same build several times, I came across the crash.

This is the resulting log:

01:29:45 
  2022-12-03T01:29:45.152Z pw:browser [pid=4174][err] [1203/012945.152220:ERROR:render_process_host_impl.cc(5157)] Terminating render process for bad Mojo message: Received bad user message: Prefetch/IsValidCrossOrigin: frame origin mismatch

01:29:45 
  2022-12-03T01:29:45.152Z pw:browser [pid=4174][err] [1203/012945.152304:ERROR:bad_message.cc(29)] Terminating renderer for bad IPC message, reason 123

01:29:45 
  2022-12-03T01:29:45.153Z pw:browser writing frame 1670030985.146939

01:29:45 
  2022-12-03T01:29:45.153Z pw:browser writing frame 1670030985.148676

01:29:45 
  2022-12-03T01:29:45.153Z pw:browser writing frame 1670030985.148827

01:29:45 
  2022-12-03T01:29:45.155Z pw:browser writing frame 1670030985.151879

01:29:45 
  2022-12-03T01:29:45.158Z pw:browser writing frame 1670030985.151862

01:29:45 
  2022-12-03T01:29:45.162Z pw:browser writing frame 1670030985.156923

01:29:45 
  Event for import__site: hook_failure

01:29:45 
  ERROR: Site Import > beforeAll

01:29:45 
  

01:29:45 
   page.goto: net::ERR_ABORTED; maybe frame was detached?

01:29:45 
  =========================== logs ===========================

01:29:45 
  navigating to "https://container-vigorous-khayyam.calypso.live/", waiting until "load"

01:29:45 
  ============================================================

01:29:45 
      at TestAccount.authenticate (/home/teamcity-0/buildAgent/work/c4a9d5b38c1dacde/packages/calypso-e2e/src/lib/test-account.ts:41:15)

01:29:45 
      at /home/teamcity-0/buildAgent/work/c4a9d5b38c1dacde/test/e2e/specs/tools/import__site.ts:18:3 {

01:29:45 
    name: 'Error'

01:29:45 
  }

worldomonation · 2022-12-06T23:41:24Z

Further logs:

01:29:45 
  Event for import__site: test_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_fn_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_done

01:29:45 
  Event for import__site: test_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_fn_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_done

01:29:45 
  Event for import__site: test_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_fn_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_done

01:29:45 
  Event for import__site: test_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_fn_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_done

01:29:45 
  Event for import__site: test_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_fn_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_done

01:29:45 
  Event for import__site: test_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_fn_start

01:29:45 
  In test_fn_start, is browser connected: true

01:29:45 
  Event for import__site: test_done

01:29:45 
  Event for import__site: run_describe_finish

01:29:45 
  Event for import__site: run_describe_finish

worldomonation · 2022-12-06T23:44:59Z

Searching the web for the error messages don't lead to much, but here are some possibly related issues:

I wonder if this line is related.

worldomonation · 2022-12-08T23:53:10Z

I have a potential fix in #70970. After merging them, I will have to monitor for a bit.

worldomonation · 2022-12-09T23:17:48Z

Reopening to monitor.

worldomonation · 2022-12-13T18:29:59Z

Looks like #70970 did not really have the intended effect.

For example, in the support__popover-invalid spec alone, we have seen 4 new instances of browser crashing:

Even today (December 13, 2022) there has been multiple occurrences of this across a variety of specs.

worldomonation · 2022-12-13T18:30:21Z

@dpasque I think it's time we try increasing the shm allocation for docker. I'm going to spin up a new PR for that.

worldomonation · 2022-12-21T22:00:00Z

Let's see if #71356 helps - all failure we've had today (2022/12/21) have been this failure.

worldomonation · 2023-01-24T22:36:27Z

Incidents of this issue have dropped off quite significantly, but still occurs occasionally.

Looking at the trend, this issue appears to occur with media related tests.

worldomonation · 2023-02-01T00:59:31Z

Still happening occasionally after increasing the shared memory to 4GB, albeit at a rate lower than it used to at 3GB.

dpasque · 2023-04-06T19:02:50Z

Just adding what the newest trace stacks look like:

page.screenshot: Target crashed
      =========================== logs ===========================
      taking page screenshot
      ============================================================
 page.screenshot: Target crashed
      =========================== logs ===========================
      taking page screenshot
      ============================================================
          at JestEnvironmentPlaywright.teardown (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/packages/calypso-e2e/src/jest-playwright-config/environment.ts:165:22)
          at runTestInternal (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/jest-runner/build/runTest.js:461:5)
          at runTest (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/jest-runner/build/runTest.js:475:34)
          at Object.worker (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/jest-runner/build/testWorker.js:133:12)

dpasque · 2023-04-07T14:19:09Z

I've been playing around with this on my 🎩 rotation, and I think I have some next steps to try...

Get a fresh set of logs. I'm pushing up a PR that captures browser logging and am going to run tests on CI till I get the crash. I also added some other logs here. I'm actually a tiny suspicious the browser may already be crashed by the time we are taking the screenshot?
Do a deep dive on machine resources during these test failures. I've been noticing some sluggishness during tests -- maybe looks like congested network? I wonder if any resource constraints are lining up with these failures?
With all that info, if we still don't know... I think it's time to ask again on the Playwright repo 😅
We could add a try-catch here so that we stop crashing the whole suite -- that might actually make the failures more meaningful to people (we would see what test actually failed). But, it would make detecting these crashes harder on CI.

dpasque · 2023-04-07T16:13:50Z

Adding some more logs!

Got it to fail, here is what the console logging showed:

2023-04-07T16:06:28.554Z pw:browser <launching> /home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/playwright-core/.local-browsers/chromium-1055/chrome-linux/chrome --disable-field-trial-config --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-back-forward-cache --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --no-default-browser-check --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,DialMediaRouteProvider,AcceptCHFrame,AutoExpandDetailsElement,CertificateTransparencyComponentUpdater,AvoidUnnecessaryBeforeUnloadCheckSync,Translate --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --headless --hide-scrollbars --mute-audio --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --no-sandbox --user-data-dir=/home/teamcity-2/buildAgent/temp/buildTmp/playwright_chromiumdev_profile-ae3r7K --remote-debugging-pipe --no-startup-window
2023-04-07T16:06:28.573Z pw:browser <launched> pid=3191
2023-04-07T16:06:28.757Z pw:browser [pid=3191][err] [0407/160628.755709:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
2023-04-07T16:06:28.837Z pw:browser [pid=3191][err] [0407/160628.836220:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
2023-04-07T16:06:28.837Z pw:browser [pid=3191][err] [0407/160628.800476:WARNING:sandbox_linux.cc(393)] InitializeSandbox() called with multiple threads in process gpu-process.
2023-04-07T16:06:28.838Z pw:browser [pid=3191][err] [0407/160628.836346:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
2023-04-07T16:06:28.844Z pw:browser [pid=3191][err] [0407/160628.840861:WARNING:bluez_dbus_manager.cc(247)] Floss manager not present, cannot set Floss enable/disable.
2023-04-07T16:06:33.693Z pw:browser <launching> /home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/playwright-core/.local-browsers/ffmpeg-1008/ffmpeg-linux -loglevel error -f image2pipe -avioflags direct -fpsprobesize 0 -probesize 32 -analyzeduration 0 -c:v mjpeg -i - -y -an -r 25 -c:v vp8 -qmin 0 -qmax 50 -crf 8 -deadline realtime -speed 8 -b:v 1M -threads 1 -vf pad=800:450:0:0:gray,crop=800:450:0:0 /home/teamcity-2/buildAgent/temp/buildTmp/431fea4297650193e2792c6b1385ee6c.webm
2023-04-07T16:06:33.770Z pw:browser <launched> pid=4686
  console.log
    TestAccount:gutenbergSimpleSiteUser => Logging in via Login Page

      at TestAccount.log (../../packages/calypso-e2e/src/lib/test-account.ts:205:12)

2023-04-07T16:06:34.383Z pw:browser [pid=3191][err] [0407/160634.383470:ERROR:render_process_host_impl.cc(5252)] Terminating render process for bad Mojo message: Received bad user message: Prefetch/IsValidCrossOrigin: frame origin mismatch
2023-04-07T16:06:34.384Z pw:browser [pid=3191][err] [0407/160634.383549:ERROR:bad_message.cc(29)] Terminating renderer for bad IPC message, reason 123
ERROR: Theme: Preview And Activate > beforeAll

 page.goto: net::ERR_ABORTED; maybe frame was detached?
=========================== logs ===========================
navigating to "https://container-flamboyant-hoover.calypso.live/log-in", waiting until "load"
============================================================
    at LoginPage.visit (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/packages/calypso-e2e/src/lib/pages/login-page.ts:34:26)
    at TestAccount.logInViaLoginPage (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/packages/calypso-e2e/src/lib/test-account.ts:64:3)
    at TestAccount.authenticate (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/packages/calypso-e2e/src/lib/test-account.ts:47:4)
    at /home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/test/e2e/specs/appearance/theme__details-preview-activate.ts:38:3 {
  name: 'Error'
} 

Beginning page.screenshot()...
2023-04-07T16:06:34.582Z pw:browser [pid=3191] <gracefully close start>
2023-04-07T16:06:34.600Z pw:browser [pid=4686] <gracefully close start>
2023-04-07T16:06:34.617Z pw:browser Closing stdin...
2023-04-07T16:06:34.617Z pw:browser ffmpeg finished input.
2023-04-07T16:06:34.658Z pw:browser [pid=3191] <process did exit: exitCode=0, signal=null>
2023-04-07T16:06:34.658Z pw:browser [pid=3191] starting temporary directories cleanup
2023-04-07T16:06:34.668Z pw:browser [pid=3191] finished temporary directories cleanup
2023-04-07T16:06:34.668Z pw:browser [pid=3191] <gracefully close end>
2023-04-07T16:06:34.796Z pw:browser [pid=4686] <process did exit: exitCode=0, signal=null>
2023-04-07T16:06:34.796Z pw:browser ffmpeg onkill exitCode=0 signal=null
2023-04-07T16:06:34.796Z pw:browser [pid=4686] starting temporary directories cleanup
2023-04-07T16:06:34.796Z pw:browser [pid=4686] finished temporary directories cleanup
2023-04-07T16:06:34.796Z pw:browser [pid=4686] <gracefully close end>

And here is what the Playwright API logs showed:

2023-04-07T16:06:33.631Z 946 api info: => browser.newPage started
2023-04-07T16:06:33.659Z 946 api info: <= browser.newPage succeeded
2023-04-07T16:06:33.661Z 946 api info: => browser.newPage started
2023-04-07T16:06:33.805Z 946 api info: <= browser.newPage succeeded
2023-04-07T16:06:33.807Z 946 api info: => page.on started
2023-04-07T16:06:33.808Z 946 api info: => tracing.start started
2023-04-07T16:06:33.812Z 946 api info: <= page.on succeeded
2023-04-07T16:06:33.901Z 946 api info: <= tracing.start succeeded
2023-04-07T16:06:33.923Z 946 api info: => browserContext.clearCookies started
2023-04-07T16:06:33.929Z 946 api info: <= browserContext.clearCookies succeeded
2023-04-07T16:06:33.991Z 946 api info: => page.route started
2023-04-07T16:06:33.995Z 946 api info: <= page.route succeeded
2023-04-07T16:06:33.996Z 946 api info: => page.goto started
2023-04-07T16:06:34.406Z 946 api info: <= page.goto failed
2023-04-07T16:06:34.470Z 946 api info: => tracing.stop started
2023-04-07T16:06:34.573Z 946 api info: <= tracing.stop succeeded
2023-04-07T16:06:34.576Z 946 api info: => page.screenshot started
2023-04-07T16:06:34.580Z 946 api info: <= page.screenshot failed
2023-04-07T16:06:34.581Z 946 api info: => browser.close started
2023-04-07T16:06:34.668Z 946 api info: <= browser.close succeeded

dpasque · 2023-04-07T16:19:10Z

For comparison, here is a console log from a healthy executing test:

2023-04-07T16:06:28.476Z pw:browser <launching> /home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/playwright-core/.local-browsers/chromium-1055/chrome-linux/chrome --disable-field-trial-config --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-back-forward-cache --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --no-default-browser-check --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,DialMediaRouteProvider,AcceptCHFrame,AutoExpandDetailsElement,CertificateTransparencyComponentUpdater,AvoidUnnecessaryBeforeUnloadCheckSync,Translate --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --headless --hide-scrollbars --mute-audio --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --no-sandbox --user-data-dir=/home/teamcity-2/buildAgent/temp/buildTmp/playwright_chromiumdev_profile-6bb9AZ --remote-debugging-pipe --no-startup-window
2023-04-07T16:06:28.511Z pw:browser <launched> pid=3045
2023-04-07T16:06:28.741Z pw:browser [pid=3045][err] [0407/160628.740159:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
2023-04-07T16:06:28.741Z pw:browser [pid=3045][err] [0407/160628.740703:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
2023-04-07T16:06:28.742Z pw:browser [pid=3045][err] [0407/160628.740781:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
2023-04-07T16:06:28.854Z pw:browser [pid=3045][err] [0407/160628.854251:WARNING:bluez_dbus_manager.cc(247)] Floss manager not present, cannot set Floss enable/disable.
2023-04-07T16:06:33.657Z pw:browser [pid=3045][err] [0407/160628.886265:WARNING:sandbox_linux.cc(393)] InitializeSandbox() called with multiple threads in process gpu-process.
2023-04-07T16:06:33.686Z pw:browser <launching> /home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/node_modules/playwright-core/.local-browsers/ffmpeg-1008/ffmpeg-linux -loglevel error -f image2pipe -avioflags direct -fpsprobesize 0 -probesize 32 -analyzeduration 0 -c:v mjpeg -i - -y -an -r 25 -c:v vp8 -qmin 0 -qmax 50 -crf 8 -deadline realtime -speed 8 -b:v 1M -threads 1 -vf pad=800:450:0:0:gray,crop=800:450:0:0 /home/teamcity-2/buildAgent/temp/buildTmp/cae3e409edeb015dcf966d7ca4024c9c.webm
2023-04-07T16:06:33.717Z pw:browser <launched> pid=4575
  console.log
    TestAccount:siteEditorSimpleSiteUser => Logging in via Login Page

      at TestAccount.log (../../packages/calypso-e2e/src/lib/test-account.ts:205:12)

2023-04-07T16:07:02.556Z pw:browser [pid=3045] <gracefully close start>
2023-04-07T16:07:02.560Z pw:browser [pid=4575] <gracefully close start>
2023-04-07T16:07:02.560Z pw:browser Closing stdin...
2023-04-07T16:07:02.560Z pw:browser ffmpeg finished input.
2023-04-07T16:07:02.578Z pw:browser [pid=4575] <process did exit: exitCode=0, signal=null>
2023-04-07T16:07:02.578Z pw:browser ffmpeg onkill exitCode=0 signal=null
2023-04-07T16:07:02.578Z pw:browser [pid=4575] starting temporary directories cleanup
2023-04-07T16:07:02.579Z pw:browser [pid=4575] finished temporary directories cleanup
2023-04-07T16:07:02.579Z pw:browser [pid=4575] <gracefully close end>
2023-04-07T16:07:02.588Z pw:browser [pid=3045] <process did exit: exitCode=0, signal=null>
2023-04-07T16:07:02.588Z pw:browser [pid=3045] starting temporary directories cleanup
2023-04-07T16:07:02.627Z pw:browser [pid=3045] finished temporary directories cleanup
2023-04-07T16:07:02.627Z pw:browser [pid=3045] <gracefully close end>

dpasque · 2023-04-07T16:20:54Z

So comparing the two, this seems to be the real error:

2023-04-07T16:06:34.383Z pw:browser [pid=3191][err] [0407/160634.383470:ERROR:render_process_host_impl.cc(5252)] Terminating render process for bad Mojo message: Received bad user message: Prefetch/IsValidCrossOrigin: frame origin mismatch
2023-04-07T16:06:34.384Z pw:browser [pid=3191][err] [0407/160634.383549:ERROR:bad_message.cc(29)] Terminating renderer for bad IPC message, reason 123
ERROR: Theme: Preview And Activate > beforeAll

 page.goto: net::ERR_ABORTED; maybe frame was detached?
=========================== logs ===========================
navigating to "https://container-flamboyant-hoover.calypso.live/log-in", waiting until "load"
============================================================
    at LoginPage.visit (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/packages/calypso-e2e/src/lib/pages/login-page.ts:34:26)
    at TestAccount.logInViaLoginPage (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/packages/calypso-e2e/src/lib/test-account.ts:64:3)
    at TestAccount.authenticate (/home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/packages/calypso-e2e/src/lib/test-account.ts:47:4)
    at /home/teamcity-2/buildAgent/work/c4a9d5b38c1dacde/test/e2e/specs/appearance/theme__details-preview-activate.ts:38:3 {
  name: 'Error'
}

So, this confirms my suspicion that the screenshot taking process is not actually the root issue! It's just exposing a page that is in a bad state from an earlier error.

dpasque · 2023-04-07T16:30:29Z

So, I'm not entirely sure what these errors mean haha... But here's where the error is thrown in the Chromium source code:

https://source.chromium.org/chromium/chromium/src/+/main:content/browser/loader/prefetch_url_loader_service.cc;l=258

Gonna keep digging

dpasque · 2023-04-07T16:46:39Z

So, I'm not entirely sure how the IPC / Mojo communication (which appears to be how Chromium processes talk to each other) plays a roll...

But this might be suggesting that some iframe on our login page is violating our frame security (same-origin) policy? But this only happens very rarely? I'm not sure what that might be 🤔

dpasque · 2023-04-07T16:58:15Z

This is the only iframe I'm seeing on the login page:

https://public-api.wordpress.com/wp-admin/rest-proxy/?v=2.0#https://container-flamboyant-hoover.calypso.live

dpasque · 2023-04-07T17:14:37Z

Going back through a bunch of recent versions of this crash, I actually was able to find in the Playwright traces that they are all similarly crashes when trying to do a navigation, often the same ERR_ABORTED error, sometimes it manifests in other ways. It always seems to happen in the beforeAll hooks of scripts too.

So, pending discovering what it causing some initial navigations to fail in these specific conditions, I wonder if we can get around this with just some more defensive coding.

For one, we should be using try/catches in our teardown code to catch any failed page operations. This way, we'll make sure we are actually seeing the real errors that crash the page, not the errors thrown by trying to act on a failed page.

Secondly, we may want to wrap our initial authentication or landing in a couple of retry loops, so that if there's an unexpected crash right on the first navigation, we just try again.

dpasque · 2023-04-07T17:16:30Z

@noahtallen or @WunderBart, I'm wondering if you have any quick insight on the error as outlined in this comment. No worries if not, just curious if that rings any initial bells as you might be more familiar with the Calypso / DotCom security policies than I am 😁

noahtallen · 2023-04-07T19:02:29Z

No quick insights unfortunately, but a big fan of capturing as much log info as we can in normal runs!

You're right, the only iframe in that scenario I can think of is the rest proxy. 🤔 Though a side point is that calypso.live really doesn't work well when 3rd party cookies are disabled, just because *.calypso.live is a weird different origin from normal production. (When testing calypso.live, I have to turn off enhanced tracking protection in firefox, for example)

dpasque · 2023-04-07T19:24:22Z

Sounds good, thanks for weighing in @noahtallen! 🙇 You bring up a good point, and I'm also realizing we tend to only see these errors on our TC builds that use Calypso.live. That makes me think that maybe a good next step is adding some retry logic to our initial navigation/login, and see if that smooths things out!

worldomonation · 2023-04-12T02:16:09Z

Just catching up on this conversation and adding a few of my observations.

The observation that the crash only appears to happen on builds that use the calypso.live target is true. I don't think I've ever encountered this crash locally (as it hits staging when proxied/production if not proxied) and in the Pre-Release Tests (as it hits wpcalypso.wordpress.com).
Playwright runs with third-party cookies completely disabled and cannot be enabled. I'm not certain whether this is a legitimate bug in Chromium and I am more than happy to report this upstream, though the challenge in that is they typically ask us for a minimally reproducible snippet...which in theory is our login process. But there may be additional parts involved (eg. our particular CI/docker image setup) that they may not be able to replicate.
Defensive coding seems to be the right way to solve this problem, especially with the time risk of going upstream and digging really deep.

On retries, I am not sure where we can add retrie. The browser startup is handled in environment.ts while the initial navigation to the login page is handled in test-accounts.ts. Since the actual page object itself crashes upon navigation to the /log-in endpoint, I'm not sure how to "cross bridges" from POMs to the framework code itself.

WunderBart · 2023-05-11T09:05:55Z

👋 Sorry for such a late response, somehow I missed this one.

I'm wondering (and please ignore if you've already checked this scenario) if it's not the same case as microsoft/playwright#18137.

From the explaining comment:

The intent was to authenticate user and redirect to them to /bar.

However, what happens in practice, the polling mechanism of the app detects that the user was > authenticated, and attempts to redirect them to post-login page.

The end result was that sometimes when baz() executes we would be on /home and sometimes on > /baz.

...which seems to be describing our problem as well. The proposed solution looks pretty straightforward:

The reason for adding page.reload() after authenticate() is to ensure that before we are calling goto there are no more outstanding redirects.

So it should be sufficient to reload the page before calling the goto. There's also another solution proposed right in the next comment:

await authenticate(page);
await page.waitForURL('/home');
await page.goto('/bar');
await baz();

Not sure which of the above would be more viable in our case, but maybe we could try with simple reload before goto to clear the page's state and see what happens?

dpasque · 2023-05-11T13:55:42Z

@WunderBart ahhh, I didn't see that issue, that seems exactly like it!!! 😄 Nice find!

I think either approach would be viable, and hopefully that does the trick! CC @karenroldan who is 🎩 rotation this week, in case you have some time to squeeze in this fix! Otherwise, I can pick it up later 👍

worldomonation · 2023-05-12T05:03:01Z

Excellent find @WunderBart! We can give this a try later today or next week, and see if it helps reduce or eliminate this intermittent failure.

karenroldan · 2023-05-12T12:45:32Z

CC @karenroldan who is 🎩 rotation this week, in case you have some time to squeeze in this fix

I can make time for this next week!

worldomonation added the [Type] Flaky E2E Denotes an issue created for fixing flaky e2e tests label Dec 1, 2022

worldomonation self-assigned this Dec 1, 2022

worldomonation added this to KitKat Project Board Dec 1, 2022

worldomonation moved this from Backlog to Needs Review in KitKat Project Board Dec 1, 2022

worldomonation moved this to Backlog in KitKat Project Board Dec 1, 2022

worldomonation moved this from Needs Review to Backlog in KitKat Project Board Dec 1, 2022

worldomonation moved this from Backlog to In Progress in KitKat Project Board Dec 1, 2022

worldomonation mentioned this issue Dec 5, 2022

chore: update Playwright to 1.28. #70724

Merged

13 tasks

worldomonation mentioned this issue Dec 8, 2022

E2E: disable Chromium from using shared memory. #70970

Merged

9 tasks

worldomonation linked a pull request Dec 8, 2022 that will close this issue

E2E: disable Chromium from using shared memory. #70970

Merged

9 tasks

worldomonation closed this as completed in #70970 Dec 9, 2022

Repository owner moved this from In Progress to Done🎊 in KitKat Project Board Dec 9, 2022

worldomonation reopened this Dec 9, 2022

Repository owner moved this from Done🎊 to In Progress in KitKat Project Board Dec 9, 2022

worldomonation mentioned this issue Dec 13, 2022

E2E: allocate 1GB of shared memory when running Docker images using bashNodeScript. #71154

Merged

11 tasks

worldomonation linked a pull request Dec 19, 2022 that will close this issue

E2E: allocate 1GB of shared memory when running Docker images using bashNodeScript. #71154

Merged

11 tasks

worldomonation mentioned this issue Dec 19, 2022

CI: increase shared memory for the E2E execution step to 3GB. #71356

Merged

7 tasks

worldomonation added the [Type] Monitoring Issues or tasks that are being actively monitored. label Jan 24, 2023

worldomonation mentioned this issue Jan 25, 2023

CI: increase shared memory to 4GB when running Playwright E2E tests. #72617

Merged

7 tasks

worldomonation mentioned this issue Mar 17, 2023

CI: increase shared memory to 6GB for E2E builds. #74593

Closed

7 tasks

dpasque mentioned this issue Apr 7, 2023

DO NOT MERGE: Capture detailed e2e crash logs #75432

Closed

7 tasks

dpasque mentioned this issue Apr 7, 2023

Make Jest teardown safer for e2e tests #75436

Merged

7 tasks

lezama mentioned this issue May 11, 2023

Onboarding: Newsletter, change Subscribers step copy #76828

Merged

github-actions bot assigned lezama May 11, 2023

github-actions bot added the [Status] In Progress label May 11, 2023

lezama removed their assignment May 11, 2023

This was referenced May 18, 2023

Reload the page to clear the state #77089

Closed

Reload the page to clear the state #77181

Closed

worldomonation mentioned this issue May 25, 2023

E2E: reload the page after the log in process completes, but before the test steps take over. #77360

Closed

11 tasks

worldomonation removed the [Status] In Progress label Sep 6, 2023

worldomonation moved this from In Progress to Current Priorities in KitKat Project Board Sep 6, 2023

worldomonation moved this from Current Priorities to Backlog in KitKat Project Board Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky E2E: Jest failed to execute suite #70674

Flaky E2E: Jest failed to execute suite #70674

worldomonation commented Dec 1, 2022

worldomonation commented Dec 1, 2022

worldomonation commented Dec 6, 2022

worldomonation commented Dec 6, 2022

worldomonation commented Dec 6, 2022

worldomonation commented Dec 8, 2022

worldomonation commented Dec 9, 2022

worldomonation commented Dec 13, 2022

worldomonation commented Dec 13, 2022

worldomonation commented Dec 21, 2022

worldomonation commented Jan 24, 2023

worldomonation commented Feb 1, 2023

dpasque commented Apr 6, 2023

dpasque commented Apr 7, 2023 •

edited

Loading

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

noahtallen commented Apr 7, 2023

dpasque commented Apr 7, 2023

worldomonation commented Apr 12, 2023

WunderBart commented May 11, 2023 •

edited

Loading

dpasque commented May 11, 2023

worldomonation commented May 12, 2023

karenroldan commented May 12, 2023

Flaky E2E: Jest failed to execute suite #70674

Flaky E2E: Jest failed to execute suite #70674

Comments

worldomonation commented Dec 1, 2022

Spec file

TeamCity ID

Logs

worldomonation commented Dec 1, 2022

worldomonation commented Dec 6, 2022

worldomonation commented Dec 6, 2022

worldomonation commented Dec 6, 2022

worldomonation commented Dec 8, 2022

worldomonation commented Dec 9, 2022

worldomonation commented Dec 13, 2022

worldomonation commented Dec 13, 2022

worldomonation commented Dec 21, 2022

worldomonation commented Jan 24, 2023

worldomonation commented Feb 1, 2023

dpasque commented Apr 6, 2023

dpasque commented Apr 7, 2023 • edited Loading

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

dpasque commented Apr 7, 2023

noahtallen commented Apr 7, 2023

dpasque commented Apr 7, 2023

worldomonation commented Apr 12, 2023

WunderBart commented May 11, 2023 • edited Loading

dpasque commented May 11, 2023

worldomonation commented May 12, 2023

karenroldan commented May 12, 2023

dpasque commented Apr 7, 2023 •

edited

Loading

WunderBart commented May 11, 2023 •

edited

Loading