Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing, improving, and measuring speed/performance #1422

Closed
icarusmiles opened this issue Nov 19, 2017 · 2 comments
Closed

Optimizing, improving, and measuring speed/performance #1422

icarusmiles opened this issue Nov 19, 2017 · 2 comments

Comments

@icarusmiles
Copy link

icarusmiles commented Nov 19, 2017

Running Puppeteer v0.13.0 (In headless) on Ubuntu Server LTS (and tried Windows 10).

I am trying to get my puppeteer system as optimized as possible. I do not need the pages to look pretty for screenshots, mainly obtaining data after filling some forms and doing simple actions.

My end-goal is to have a user-requested system that goes to 4 different sites (does a few actions specific to the user request), then returns fresh data.

What I've Done

  • Forced Images to Not Load
  • Using an already spawned chrome/puppeteer instance (connect())
  • High Resource Allocation

  1. Does anyone know other options to speed performance in this scenario?
  2. Anyway to efficiently cache JS/CSS on the clientside for future requests?
  3. Is there a way to measure the timing/performance of each individual action? (Connect, goto, click, etc.)

Sorry if this began to feel like a stackoverflow question, but I've invested a good sum of time in Puppeteer and it seems great. However if I can't find a way to speed up this process I may have to find another option. In addition I am having to plan out to pool Chrome instances due to the lack of browser contexts #85. This is due to IF multiple users request at the same time.

@HanXHX
Copy link

HanXHX commented Dec 12, 2017

Hey @miles-collier

  1. You can use something like this. But it doesn't work on all websites (don't XHR for images/css...). Check in headfull mode, first.
const block_ressources = ['image', 'stylesheet', 'media', 'font', 'texttrack', 'object', 'beacon', 'csp_report', 'imageset'];
page.on('request', request => {
	if (
		block_ressources.indexOf(request.resourceType) > 0
		// Be careful with above
		|| request.url.includes('.jpg')
		|| request.url.includes('.jpeg')
		|| request.url.includes('.png')
		|| request.url.includes('.gif')
		|| request.url.includes('.css')
	)
		request.abort();
	else
		request.continue();
});
  1. No idea
  2. The easiest way: https://nodejs.org/api/console.html#console_console_time_label
console.time("pagegoto");
response = await page.goto('https://github.com');
console.timeEnd("pagegoto");

@aslushnikov
Copy link
Contributor

For (2), I'd try re-using userDataDir and rely on browser own HTTP cache.
Other then that, it looks like @HanXHX gave a reasonable answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants