Chapter 1: The Long Day
=======================
Previously our E2E tests started failing with the following error.
chrome not reachable
12:41:37 (Session info: headless chrome=73.0.3683.75)
This issue was limited to the CI agent machines. In order to debug the
issue, we SSH'd to ci-agent-8, became the jenkins user and changed to
one of the workspace directories where the E2E tests are run. We than
ran the E2E tests manually in order to reproduce the issue, as follows.
make clone
make pull
make start
docker-compose run publishing-e2e-tests bash
Once inside the container where the tests are run, we were able to
reproduce the issue with 'bundle exec rspec'. In order to investigate
further, we then installed vim, in order to install the irb gem and
start an irb console using 'bundle exec irb -Ispec'. Then we did
require 'spec_helper'
driver = Capybara.drivers[Capybara.current_driver].call
driver.visit('https://google.com')
driver.visit('https://google.com')
Running the visit method twice yields the same error as when running the
tests. Using 'docker exec' to start another bash console and inspect the
running processes shows that Chrome itself is failing to start.
root 252 245 11 17:26 pts/1 00:00:00 [chrome] <defunct>
Running chrome manually with the options from the spec_helper then
yields the following error, even though we hadn't changed these options.
google-chrome-stable
--disable-dev-shm-usage
--disable-gpu
--disable-web-security
--disable-infobars
--disable-notifications
--headless
--no-sandbox
--window-size=1400,1400
https://google.com
[0318/173719.964171:FATAL:gpu_data_manager_impl_private.cc(892)] The display compositor is frequently crashing. Goodbye.
Searching online for this error indicates its related to a new version
of Chrome, as per the following issue on the puppeteer repo.
puppeteer/puppeteer#3774
Unfortunately it's not possible for us to downgrade Chrome, since Google
only provide the latest version in their package repo, and the E2E tests
are being run in transient containers, which have no older versions
available to downgrade to. This is the point where we lost all hope.
Chapter 2: The New Dawn
=======================
In Chapter 1 we experimented with running Chrome manually, based on
https://developers.google.com/web/updates/2017/04/headless-chrome.
google-chrome-stable --headless --no-sandbox https://google.com
The success of this command indicated one of the options specified in
the spec_helper was causing Chrome to crash, and experimentation showed
this was '--disable-dev-shm-usage'. Removing this parameter fixed the
error, but caused Chrome to crash for a different reason.
/dev/shm is a tmpfs partition, but by default it is only 64M in size.
Previously, we had specified the '--disable-dev-shm-usage' option to use
/tmp instead, but the new release of Chrome makes this option unusable
for some reason. The obvious remedy is to increase the size of /dev/shm.
publishing-e2e-tests:
shm_size: 2G <<<
build: .
The combination of removing the faulty option and specifying a larger
size for /dev/shm meant we could then run the E2E tests successfully.
And they all lived happily ever after.