Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Performance task locally served pages & wg handlers #27283

Merged
merged 15 commits into from
Apr 21, 2020

Conversation

micajuine-ho
Copy link
Contributor

@micajuine-ho micajuine-ho commented Mar 18, 2020

Adding functionality to the performance gulp task to track how long it takes for requests (pageview and ini-load) to fire for pages with amp-analytics tags.

This PR is now for supporting locally served pages, along with building a structure for wgs easily add in handlers to measure requests (i.e. wg-ads/wg-analytics measuring delay til the visible ping is sent out). This is one of the steps in adding the performance task into our CI (since our test pages will be locally served pages).

Changes:

  • Config changes to support handlers for different wgs
    • Each handler will test their own defined metrics and include values relevant to those measurements (timeout value, threshold, etc...)
  • Changes to rewrite-script-tags:
    • Parse script tags of locally hosted websites and download the correct files from CDN or local
    • Download documents that are not explicitly stated in script tags (auto-amp-lightbox-0.1.js)
  • Creation of rewrite-analytics-tags:
    • Add special identifier to inline config to track visibility pings
    • Fetch and merge vendor configs, that would otherwise fail
  • Changes to measureDocuments:
    • Capture all outgoing networks requests
    • Apply request handlers, timeout promise, and metric measurer based upon the url being tested
    • Redirect requests for documents that are not explicitly stated in script tags (auto-amp-lightbox-0.1.js), to our downloaded version

@micajuine-ho micajuine-ho requested review from rsimha and removed request for ajwhatson March 18, 2020 07:24
build-system/tasks/performance/measure-documents.js Outdated Show resolved Hide resolved
build-system/tasks/performance/measure-documents.js Outdated Show resolved Hide resolved
@@ -2,6 +2,6 @@

Use `gulp performance` to run this test. It measures performance for the current branch compared to the current release. By default, the command first needs to compile minified runtime and components from the current branch by executing the `gulp dist` task. To skip this step, use the `--nobuild` flag, but the task will throw an error if the dist files are missing. Only the runtime and components used by documents from the configured URLs are compiled.

The `config.json` file contains settings. If you set `headless` to `false`, you will be able to watch the test run in Chrome. The default value for `runs` is 100, which takes ~10 minutes to run, and generates results with ~5% margin of error. Differences larger than this should be investigated. The `urls` array of strings must be set before running the test, and each URL will be tested separately.
The `config.json` file contains settings. If you set `headless` to `false`, you will be able to watch the test run in Chrome. The default value for `runs` is 100, which takes ~10 minutes to run, and generates results with ~5% margin of error. Differences larger than this should be investigated. The `urls` array of strings must be set before running the test, and each URL will be tested separately. If you wish to test `urls` hosted locally, use `gulp --cdn` to serve the binaries from cdn.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should ask all to use gulp serve --compiled when testing locally.
It should be up to user to decide to use gulp serve --compiled or gulp serve --cdn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you use the --compiled version the task won't work for locally served pages. It only works with --cdn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't it worked with --compiled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To download the amp scripts (local and master), we look for the CDN_URL (https://cdn.ampproject.org/), and that's only present when serving with --cdn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see, that's interesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I wonder why don't we use the local server proxy /proxy.

Comment on lines 85 to 86
const matchArray = interceptedUrl.match(CDN_ANALYTICS_REGEXP);
if (matchArray) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this is a good opportunity to setup an extensible pattern here, possibly some mapping of URLs to handlers since it seems likely there would be more cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! I know the ads team want to collect two requests delay. Not sure if they'll use this specific task. Maybe useful to expand the feature here.
@powerivq Any comment? Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of this: each team still implements its own handleRequest to respond the payload it desires, and we make another interceptor to record the timing of the request. Does it make sense? @zhouyx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinkimball good point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@powerivq I tested out multiple request handlers and it seems to work fine. Seems like a good route to go down in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at the definition of networkidle0. I don't know if this would satisfy ads team's request. ActiveView requires the ad to be visible for at least 50% for 1 second. It's very possible that the task ends before activeView. @kevinkimball @micajuine-ho any workaround?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe just adding a setTimeout would do the trick, however I'm not sure if we want to always include it. Maybe we could add a flag (--set-timeout=500) that would set how long to wait for, if any. Thoughts @zhouyx ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A race between timeout, and the analytics request sounds good to me.
My recommendation is to always include the timeout, but set it to a value that's long enough. People will go with the default setting most of the time, and a timeout will serve as a fallback and reduce flakiness.

BTW this is something I'm testing with, please ignore the console log part : )

let timeout_count = 0;
const promise = new Promise(resolve => {
    r = resolve;
    setTimeout(() => {
      timeout_count++;
      resolve();
   }, TIMEOUT);
});
  page.on('console', message => {
    if (String(message._text).includes('request delay')) {
      const delay = Number(message._text.replace('analytics request delay is  ', ''));
      all.push(delay);
      r();
    }
  });
.....
await promise; 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouyx Ok that sounds good to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add @estherkim for the best way to customize timeout for different test cases.

@rsimha rsimha requested review from estherkim and removed request for rsimha March 19, 2020 18:38
@rsimha
Copy link
Contributor

rsimha commented Mar 19, 2020

Adding @estherkim to review changes to the gulp performance task.

build-system/tasks/performance/measure-documents.js Outdated Show resolved Hide resolved
@@ -1,5 +1,15 @@
{
"headless": true,
"runs": 100,
"urls": []
"urls": [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the top level urls do? should this be?

"noHandlers": {
  "urls": [],
  ....
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouyx Yes, that's what it does. Would it be more clear to have noHandlerUrls: [] instead of making it an object, because right now I don't see any other config data that fits in to the noHandlers category (headless and runs applies to all urls)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the declaration a bit confusing because headless and runs applies to all urls, but the top level urls doesn't. The mix and match makes the configuration structure not intuitive imo.

I don't have strong opinion as long as everything agrees with the config setting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what do you guys think of the following:

urls: {
  'http://normal.test',
  'http://analytics-1.test': {handler: analyticsHandler},
  'http://analytics-2.test': {handler: analyticsHandler},
  'http://ads-1.test': {handler: adsHandler},
}

I got the inspo from eslint configs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@estherkim I like it, but would it have 'http://normal.test': {},?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah hmm. Could be ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer categorize by handler, because it is more often that one wants to test all handler related urls locally. But this would also work!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinkimball Any preference?

Edit: @estherkim actually after thinking about this for a I while I think the eslint config isn't optimal here. Since each handler could be applied many urls, we would end up having duplicate configs for a lot of urls. Also I don't think that urls with the same handler
would have any different values (aka timeout: 2000 vs timeout:5000). Also, having urls in within the handler definition ensures that each url gets treated the same way with less chance for flake or bias.

build-system/tasks/performance/rewrite-analytics-tags.js Outdated Show resolved Hide resolved
await cacheDocuments(urls);
await compileScripts(urls);
await rewriteScriptTags(urls);
await getMetrics(urls, {headless, runs});
await rewriteAnalyticsTags(urls);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rewriteScriptTags and rewriteAnalyticsTags can be merged to one to avoid read and write files multiple times.
Another optimization would be to not invoke rewriteAnalyticsTags for urls without the analytics handler defined.
Not blocking, but may be a good optimization to have once we start to run the task on travis. cc @estherkim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another optimization would be to not invoke rewriteAnalyticsTags for urls without the analytics handler defined.

Sounds good to me.

I think rewriteScriptTags and rewriteAnalyticsTags can be merged to one to avoid read and write files multiple times.

Maybe once the ads team checks in their handler, we can see how we can group the rewrites (assuming that they need to rewrite something).

build-system/tasks/performance/measure-documents.js Outdated Show resolved Hide resolved
build-system/tasks/performance/measure-documents.js Outdated Show resolved Hide resolved
@micajuine-ho
Copy link
Contributor Author

Ready for another round of review. Changes since last review:

  • Config change to include defaultHandler
  • Rewrite-script changes to find locally served scripts and correctly pull them from the CDN or dist
  • Refactor measureDocuments to only have 1 request listener and then call upon sub handlers (for analytics, ads, etc...)
  • Added requestsFailed metric to give us more insight into the task

const CDN_URL = 'https://cdn.ampproject.org/';
const AMP_JS_PATH = '/dist/amp.js';
const LOCAL_PATH_REGEXP = /dist\/(v0\/amp-[A-Za-z\-0-9\.]+).max(.js)/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain when would it fetch non compiled script? My understanding is that this should never happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, we thought that gulp --cdn to run local pages would work, since that changes the script tags to cdn.ampproject.... and we can parse out the filepath.

However, to use gulp --cdn, we have to run gulp dist --fortesting, which then causes amp-analytics/config.js to not work correctly; it requests file://...v0/analytics-vendors/gtag.json, which is blocked because AMP doesn't allow requests using the file protocol (it uses the file protocol because this.win.location.protocol is file on puppeteer.

So the fix is to use neither gulp --cdn to serve local pages, nor gulp --fortesting, but just parse out the filepaths manually. Also the change wasn't too bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify with @estherkim that running gulp dist without --fortesting works on travis.

But this doesn't explain why you need .max.js and also amp.js, they should never appear when you run gulp dist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this doesn't explain why you need .max.js and also amp.js, they should never appear when you run gulp dist.

Turns out I was making mistakes when running the gulp dist --forstesting + gulp serve --compiled.

Verified that we will be running gulp dist --forstesting + gulp serve --compiled on Travis.

build-system/tasks/performance/helpers.js Outdated Show resolved Hide resolved
build-system/tasks/performance/measure-documents.js Outdated Show resolved Hide resolved
@@ -28,11 +29,12 @@ const {printReport} = require('./print-report');
*/
async function performance() {
installPackages(__dirname);
const {headless, runs, urls} = new loadConfig();
const {headless, runs, urls, handlers} = new loadConfig();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so so confusing.
We first flat the config object to an array of urls and the raw handlers config.
Then later in measureDocument we iterate the handler config and find the corresponding handler type again?
Could you please optimize the design a bit. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry for the confusion!

  • Changed the config API: handlers (Object) => handlers (List)
  • Changed loadConfing() to do a little more preprocessing to avoid extra work in measureDocuments
  • Moved analytic specific parameters to the config and removed necessary helpers
  • Passed in full config to measureDocuments (we were basically doing this before)
  • Split up setUpRequestHandlers()

build-system/tasks/performance/load-config.js Show resolved Hide resolved
});
return mapping;
}, {});
if (Object.keys(config.urlToHandlers).length < 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So running gulp performance without changing the config will end up with this error? Is that expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes. Plan is that before we integrate with our CI, we add urls.

"extraUrlParam": {
"analytics": "amp-analytics-performance-param"
},
"analyticsParam": "analytics=amp-analytics-performance-param"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two general recommendation.

  1. The config should be simple and only contains essential information. analyticsParam should be able to be generated from extraUrlParam in code.
  2. extraUrlParam and analyticsParam don't apply to all handlers. Specifying here may be confusing, a nested handlerConfig may be helpful.

This is fine with only three handler types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouyx Agree with removing analyticsParam from the config.

extraUrlParam and analyticsParam don't apply to all handlers.

Yep, they already live in the analyticsHandler. Is that good enough?

const CDN_URL = 'https://cdn.ampproject.org/';
const AMP_JS_PATH = '/dist/amp.js';
const LOCAL_PATH_REGEXP = /dist\/(v0\/amp-[A-Za-z\-0-9\.]+).max(.js)/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify with @estherkim that running gulp dist without --fortesting works on travis.

But this doesn't explain why you need .max.js and also amp.js, they should never appear when you run gulp dist.

*/
function setUpAddionalHandlers(handlersList, handlerOptions) {
// Setting up timing and adding specified handler
if (handlerOptions.handlerName === 'analyticsHandler') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if this wrong.
All urls within the same handlers config share the same handlerOptions config. (we're not creating new instances of the handlerOptions object within load-config.js file). I'm afraid rewriting the handlerOptions object would affect other instances that fall within the same handler. Have you tried the test with multiple urls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I have tried it with multiple urls; within measureDocument(), we create handlerOptionsForUrl by creating a shallow copy of the original handlerOptions config. This works fine for storing the startTime and endTime, but would cause some bugs if someone were to mutate objects within the handlerOptions. Instead of shallow copy, should I create a deep copy?

build-system/tasks/performance/measure-documents.js Outdated Show resolved Hide resolved
build-system/tasks/performance/measure-documents.js Outdated Show resolved Hide resolved
@micajuine-ho micajuine-ho changed the title ✨Analytics request performance measurer ✨Performance task locally served pages & wg handlers Apr 10, 2020
Copy link
Contributor

@zhouyx zhouyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good if this runs correctly, please also get infra team's approval.

build-system/tasks/performance/README.md Outdated Show resolved Hide resolved
build-system/tasks/performance/README.md Show resolved Hide resolved
@@ -19,13 +19,16 @@ const fs = require('fs');
const path = require('path');

const CDN_URL = 'https://cdn.ampproject.org/';
const V0_PATH = '/dist/v0.js';
const LOCAL_PATH_REGEXP = /dist\/(v0\/amp-[A-Za-z\-0-9\.]+.js)/;
const ANALYTICS_VENDORS_PATH = '../../../extensions/amp-analytics/0.1/vendors/';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the compiled vendor which lives under v0/analytics-vendors/xxx.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if we're using that path then we should also add the gulp vendor-configs command as part of this task...

That command will be run in the Travis build, but I think it's good to have it for local testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we always run that with gulp build or gulp dist, so it should be fine?

build-system/tasks/performance/rewrite-analytics-tags.js Outdated Show resolved Hide resolved
@micajuine-ho
Copy link
Contributor Author

@estherkim PTAL

Copy link
Collaborator

@estherkim estherkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this!

@@ -28,6 +29,7 @@ async function compileScripts(urls) {
if (!argv.nobuild) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--nobuild is only used when you have already built the binary and don't want to build it again.
There's no need to await for vendorConfigs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants