Skip to content

Conversation

@alex-page
Copy link
Member

@alex-page alex-page commented Sep 21, 2020

WHY are these changes introduced?

Pa11y tests are taking ~45 minutes to complete on travis. This is due to the 600+ pages we currently test. This PR removes the need to test 600+ pages and run pa11y only on the kitchen sink page which puts all the components in one place. This reduces run time to ~5minutes.

The current set up leads to people skipping the tests or waiting for a long time to do a release. Neither are acceptable. This is a request for comments to start to solve this issue.

Some questions to start conversation:

  • Is this something we would want moving forward?
  • What are some of the downsides or upsides of this approach?
  • Can we make smaller changes to help with the performance issues?
  • Can we get off Travis and use Buildkite or GitHub actions?
  • I am not sure if we are missing any pages with this approach, should we add the details page?
  • Should we use axe-puppeteer instead of pa11y? It is showing more issues which makes me think it is more thorough.

WHAT is this pull request doing?

  • Updates pa11y, puppeteer and nodejs
  • Removes duplicate ID issue errors when testing kitchen sink
  • Simplifies pa11y test as we no longer need multiple browser instances

How to 🎩

$ yarn run storybook:build --quiet && node ./scripts/pa11y.js

🎩 checklist

  • Tested on mobile
  • Tested on multiple browsers
  • Tested for accessibility
  • Updated the component's README.md with documentation changes
  • Tophatted documentation changes in the style guide
  • For visual design changes, pinged one of @ HYPD, @ mirualves, @ sarahill, or @ ry5n to update the Polaris UI kit

@github-actions
Copy link
Contributor

github-actions bot commented Sep 21, 2020

🟢 No significant changes to src/**/*.tsx were detected.

@alex-page alex-page self-assigned this Sep 21, 2020
@alex-page
Copy link
Member Author

alex-page commented Sep 21, 2020

I experimented with puppeteer and performance and the previous implementation with parallel browsers is excellent. Moving to pages instead of browsers I could reduce the tests by a few minutes but it adds a risk that a single page can crash the browser with multiple pages. The biggest fixable bottleneck I can see is the number of pages tested.

If this solution causes problems, the other idea I have is using splash to generate the URL's for the accessibility tests. It still would cause long releases with changes to multiple components. I am also afraid that this will over complicate the tests.

I am not the biggest fan of the ids having the be unique in every example. I feel like users and developers won't care as long as the pages and examples work. The trade-off here feels good.

@alex-page alex-page changed the title Pa11y performance bump, request for comments [Development workflow request] Pa11y performance bump Sep 21, 2020
@alex-page alex-page marked this pull request as ready for review September 21, 2020 12:23
@alex-page alex-page changed the title [Development workflow request] Pa11y performance bump [RFC][Development workflow] Pa11y performance bump Sep 21, 2020
@kaelig
Copy link
Contributor

kaelig commented Sep 21, 2020

Thanks for this, I'm so surprised this went up to such a high build time!

I have a few thoughts and context.

What are some of the downsides or upsides of this approach?

My gut feeling tells me we'd be getting speed improvements and more value by running tests on "All examples" pages for each component, rather than having a single huge test.

Have you considered this approach?

Is this something we would want moving forward?

I'd say yes, in order to catch silly mistakes, and as long as it only takes a few minutes.

Can we get off Travis and use Buildkite or GitHub actions?

+1 to GitHub actions if it makes it any faster. Do you expect it to be the case?

(I believe Buildkite would be opaque to open-source contributors)

I am not sure if we are missing any pages with this approach, should we add the details page?

Some tests might need to run separately:

const testIndividualExamples = [
'Modal',
'Card',
'Top bar',
'App provider',
'Contextual save bar',
'Frame',
'Loading',
'Sheet',
'Theme provider',
].includes(readme.name);

Should we use axe-puppeteer instead of pa11y? It is showing more issues which makes me think it is more thorough.

Yes to using Axe!

Given that can run axe with Pa11y, using this configuration:

{
    runners: [
        'axe',
    ]
}

Source: https://github.com/pa11y/pa11y#runners-array

Is there anything else we'd get from switching to axe-puppeteer?

@BPScott
Copy link
Member

BPScott commented Sep 21, 2020

I'd be interested to know the root-cause for this speed regression? Was it Travis changing their images/resource allocation or the result of some PR we merged?

Can we get off Travis and use Buildkite or GitHub actions?

As Kaelig alluded to buildkite is private-only so not suited for open source stuff. If the problem here is "the performance of Travis runners dropped through the floor" then I'd be interested to see if the performance of GH actions is better. Do we know the root cause of when this became drastically slower? IIRC the a11y tests took about 10mins before

GH Actions is supported so it is certainly an option, and I think it would also help solve some of our other "should this test run when people make a fork" issues.

I am not sure if we are missing any pages with this approach, should we add the details page?

Bunging everything in one page (even the "per component all-examples pages") doesn't always work, because some components have overlay effects that would interfere with other component demos. For example the "modal" and "sheet" end up overlaying everything and so they don't appear on the All examples pages. Same goes for AppProvider and Frame and others in that testIndividualExamples array.

One test per story feels like overkill but it does enforce encapsulation so we know the examples don't interact and means that folks won't need to worry about "is this id used by any other test"

I experimented with puppeteer and performance and the previous implementation with parallel browsers is excellent. Moving to pages instead of browsers I could reduce the tests by a few minutes but it adds a risk that a single page can crash the browser with multiple pages. The biggest fixable bottleneck I can see is the number of pages tested.

That sounds like a great approach. If one of our stories does something jank enough to enough to crash a google chrome tab then we've got bigger problems than with incomplete CI tests :D I'd prefer using tabs rather than multiple browsers (possibly in combo with "move to GH Actions") over the "test one giant page" approach.

@kaelig
Copy link
Contributor

kaelig commented Sep 21, 2020

I experimented with puppeteer and performance and the previous implementation with parallel browsers is excellent. Moving to pages instead of browsers I could reduce the tests by a few minutes but it adds a risk that a single page can crash the browser with multiple pages. The biggest fixable bottleneck I can see is the number of pages tested.

That sounds like a great approach. If one of our stories does something jank enough to enough to crash a google chrome tab then we've got bigger problems than with incomplete CI tests :D I'd prefer using tabs rather than multiple browsers (possibly in combo with "move to GH Actions") over the "test one giant page" approach.

Note that stability with pages was a big issue when Andre and I worked on testing performance a couple of years ago. Things were really brittle as builds kept breaking with timeouts or loading issues.

I'd expect dependencies such as puppeteer to have gotten a lot better since then, so it's worth a shot.

@BPScott
Copy link
Member

BPScott commented Sep 22, 2020

Been mucking about with this in #3284 too.

Baselines: Current runtime ~45mins. Pre-regression runtime: ~13mins

  • Disabling SB Docs removes a big chunk of JS from the preview page bundle. Getting rid of that brings the CI runtime down to 23mins.
  • Converting from using multiple browsers and pa11y to use a single browser with multiple pages and using the version of axe already embedded into the brings the CI runtime down to 13.5mins - so sliiightly slower than before but within a margin of error.

My error reporting output could do with some love but the raw data is there.

As a bonus using axe means the errors reported are the same as what the a11y panel in storybook reports, e.g. https://polaris-react.herokuapp.com/?path=/story/all-components-top-bar--top-bar-with-all-of-its-elements contains one of the now failing contrast tests

To go further we could set the isSkippedStory logic to be const isSkippedStory = story.parameters.percy.skip; - which would run ally tests on the same pages we send to percy - mostly the "All examples" pages with a few single-story exceptions. That takes a bit under 2 mins on my local machine but comparing runtimes locally vs on CI is apples to oranges - my MBP has quite a bit more grunt compared to a CI container.

@alex-page
Copy link
Member Author

Thanks @BPScott this seems like a much safer approach. Lets keep moving forward with your pull request.

@alex-page alex-page closed this Sep 22, 2020
@tmlayton tmlayton deleted the pa11y-perf-bump-rfc branch October 5, 2020 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants