Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighthouse/Puppeteer integration #3837

Closed
Khady opened this issue Nov 17, 2017 · 47 comments
Closed

Lighthouse/Puppeteer integration #3837

Khady opened this issue Nov 17, 2017 · 47 comments

Comments

@Khady
Copy link

Khady commented Nov 17, 2017

I am using lighthouse from javascript to check a few pages from a website. I would like to be able to tell lighthouse to use a specific tab that is already opened in my chrome to do that. Maybe by giving it at link to a ws endpoint. That's because I create the browser using puppeteer and I want to do some operations before to run lighthouse (like to set the useragent or some request interception configuration) and once the lighthouse check is done (like to get the html of the page or to interact with the page).

Is it possible to tell lighthouse to use a tab already existing?

@patrickhulce
Copy link
Collaborator

Thanks for filing @Khady!

In short, it's not currently possible today. Lighthouse always controls the navigation to the page. There are some settings (user agent/setting a cookie/logging in) you'd be able to set using puppeteer before providing the same port to LH, but you'll need to make sure LH isn't overriding those by disabling mobile emulation/storage reset where applicable.

We've got it on our roadmap to enable auditing without a navigation though in which case this sort of thing becomes possible :)

related #1769, #3833

@paulirish
Copy link
Member

We're not ready to open up the full multiclient story where Lighthouse and Puppeteer/CRI talk to the same page. There are some dragons within here that we're not ready to fight yet.

There's another approach we're discussing and we currently favor:

  1. Don't use puppeteer to launch chrome and set up the lifecycle. Use lighthouse for this instead.
  2. Use a custom config and custom gatherer for lighthouse. In the gatherer's beforePass, set up the environment with puppeteer and then resolve.

You can do all this today without any code changes (although #3864 should help quite a bit..) Your custom gatherer won't actually return a useful artifact, but that's OK. We're just abusing its lifecycle hooks.

@patrickhulce does this match what you were thinking?

@patrickhulce
Copy link
Collaborator

Yeah this seems like the quickest way to achieve as much of the goals as possible today. Long-term vision, reusing existing tab and making LH more flexible in analyzing existing pages is definitely the way we should be moving to play better with DevTools and puppeteer 👍

@wardpeet
Copy link
Collaborator

how do we feel about creating an example:
https://github.com/GoogleChrome/lighthouse/tree/master/docs/recipes

@Khady
Copy link
Author

Khady commented Dec 1, 2017

I'm working on a new version of my tool following your advices.

  1. I keep using puppeteer to launch chrome and set up the lifecycle because it is easier this way and it is easier the keep the same chrome alive for multiple tasks. This part is not a problem because lighthouse has a correct support of an already running chrome process.

  2. I'm using a custom gatherer to setup what I need. But this is not convenient at all. I have many missing information that I have to transfer to the gatherer to do the whole setup properly. There are two options to do that if I understand correctly the code I read:

    • using a global state on my side ­— which is not very clean and not very convenient. I have no unique identifier available in my code and in the gatherer to which I can attach the information. The best I found is wsEndpoint × url, but it is not unique if the same url is opened multiple times in the same browser. It would be nice to have wsEndpoint × pageId, but this information is not publicly available in lighthouse and puppeteer (options.driver._connection._pageId in lh, page._client._targetId in pptr). And anyway I can't know before to launch lighthouse what will be the id of the page. ¯\_(ツ)_/¯
    • storing the values I need in the flags object which is transfered all the way until the gatherer — and I am a bit scared to use this solution because there is no guaranty that the flags will always be given to the gatherer and it smells like a bit hack. Is there an object in which I can put some data to use in my gatherer and be sure it won't disappear in the future? With a better semantic than the flags object :)?

    Also if there is an error during the setup of the page (it shouldn't happen, but we are dealing with computers => it will happen), I didn't find a way from the gatherer to interrupt the whole lighthouse operation. I can see in the artifacts that my gatherer has returned an empty object and invalidate the results like this. But it is cumbersome. Plus it still cost me the duration of the lighthouse run which can be pretty long and it can create an unnecessary crawl of the page I try to evaluate.

aslushnikov says in puppeteer/puppeteer#1398 (comment) that pptr could move from ws connection to pipe connection. If the pageId/targetId system is not portable over the pipe connection then I guess there is not much choice but to keep the connection system as it is currently. No point adding #3857 if it is going to be deprecated soon.

ps: I think I found a possible bug in cri.js while reading the code, but I don't have the time to properly investigate it (and it's probably not faced very often). When a page is closed here, it can be the latest page of the browser, because of the condition at this point. It is possible to reuse an existing tab and if this tab is the latest tab and closed by lighthouse at the end of the run, the browser will be closed too.

@patrickhulce
Copy link
Collaborator

Great feedback @Khady you're somewhat of a pioneer in this area, so it's great to be aware of the pain points :) A few responses to your comments below

using a global state on my side ­— which is not very clean and not very convenient. I have no unique identifier available in my code and in the gatherer to which I can attach the information. The best I found is wsEndpoint × url, but it is not unique if the same url is opened multiple times in the same browser. It would be nice to have wsEndpoint × pageId, but this information is not publicly available in lighthouse and puppeteer

Ah, you're having trouble finding the target to use in puppeteer once the page has been loaded correct? Yeah, we should expand #3864 to communicate the target/page ID as well.

storing the values I need in the flags object which is transfered all the way until the gatherer — and I am a bit scared to use this solution because there is no guaranty that the flags will always be given to the gatherer and it smells like a bit hack. Is there an object in which I can put some data to use in my gatherer and be sure it won't disappear in the future? With a better semantic than the flags object :)?

Yes, we had a plan for this and haven't gotten around since there wasn't an immediate need, but we want to implement audit and gatherer options to pass in dynamic runtime information that can control audit/gatherer behavior separately from the gatherer/audit code itself.

Also if there is an error during the setup of the page (it shouldn't happen, but we are dealing with computers => it will happen), I didn't find a way from the gatherer to interrupt the whole lighthouse operation. I can see in the artifacts that my gatherer has returned an empty object and invalidate the results like this. But it is cumbersome. Plus it still cost me the duration of the lighthouse run which can be pretty long and it can create an unnecessary crawl of the page I try to evaluate.

You should be able to mark an error with a .fatal property to have LH exit immediately rather than just fail the gatherer.

/**
* Test any error output from the promise, absorbing non-fatal errors and
* throwing on fatal ones so that run is stopped.
* @param {!Promise<*>} promise
* @return {!Promise<*>}
*/
static recoverOrThrow(promise) {
return promise.catch(err => {
if (err.fatal) {
throw err;
}
});
}

pass(/** stuff */) {
  const error = new Error("Uh-oh something went wrong!");
  error.fatal = true;
  throw error;
}

It is possible to reuse an existing tab and if this tab is the latest tab and closed by lighthouse at the end of the run, the browser will be closed too

Ah, good find! You're right we've never really run into this, especially since we discourage using headless for its lack of throttling, but we should update that to throw loudly at this point if we can't create a tab :)

@Khady
Copy link
Author

Khady commented Dec 8, 2017

Thank you for your help!

Ah, you're having trouble finding the target to use in puppeteer once the page has been loaded correct? Yeah, we should expand #3864 to communicate the target/page ID as well.

Correct.

Yes, we had a plan for this and haven't gotten around since there wasn't an immediate need, but we want to implement audit and gatherer options to pass in dynamic runtime information that can control audit/gatherer behavior separately from the gatherer/audit code itself.

Good news. I can manage to do what I want in the current situation. But it's great to have visibility on the future plans.

You should be able to mark an error with a .fatal property to have LH exit immediately rather than just fail the gatherer.

Awesome. I should have read the whole code related to the gatherers and not only some parts.

Nothing is blocking me for now, thanks to your advices. I just exploit a few undocumented information (driver._connection._pageId, the flags object, ...). I understand puppeteer is pretty young and it's not common (yet?) to connect it with lighthouse. My hope is that feedback can help to understand what are the necessary bits for possible improvements.

@paulirish paulirish changed the title Use lighthouse from javascript on a tab already opened Lighthouse/Puppeteer integration Jan 16, 2018
@paulirish
Copy link
Member

We will sort this out in the next 2 quarters. Thanks!

@unindented
Copy link

I'm also interested in the request interception side of things.

We're running Lighthouse as part of our CI/CD pipeline. However, our API endpoints have really erratic behavior, and requests can take anything from 500ms to 2s. That's forcing us to make our TTI checks much laxer than what we'd want.

If we could intercept requests to those endpoints and immediately respond with a fixture, we'd have much more deterministic numbers, and we could tighten our TTI checks.

@brendankenny
Copy link
Member

See #5472 for another use case

@niieani
Copy link

niieani commented Dec 14, 2018

I'm running Chrome with chrome-launcher, then connecting to it with puppeteer. The only thing I'm setting up is this:

    // add HTTP BasicAuth credentials on new tab creation:
    browser.on('targetcreated', async (target) => {
      const page = await target.page()
      if (page) await page.authenticate(basicAuth)
    })

It works, but once Puppeteer is connected, Lighthouse (and Chrome's devtools, for that matter) stops gathering the size of requests.

Anybody know why, or how to mitigate this (size 0 everywhere)?

screenshot 2018-12-14 at 11 43 53

@iamEAP
Copy link

iamEAP commented Jan 13, 2020

Going to add my perspective because I did not see mention of this after reading through the thread:

My use-case: I'm looking to run multiple concurrent Lighthouse audits using a single instance of Chrome using a new Incognito Browser Context for each audit so that no data storage/state is shared between concurrent audits.

Ideally each audit could be preceded by a series of actions (e.g. a log in), and state would be maintained per incognito context (tab).

However, following the Puppeteer recipes in this repo, it seems like Lighthouse always opens the URL in the default (shared) browser context.

@connorjclark
Copy link
Collaborator

connorjclark commented Jan 13, 2020

Have you seen this? https://github.com/GoogleChrome/lighthouse/blob/master/docs/recipes/auth/example-lh-auth.js

Puppeteer, by default, uses a fresh Chrome profile, so if you launch it like the above script does you shouldn't see any state persist.

multiple concurrent Lighthouse audits

FYI we recommend against this. If you rely on the performance category, the results will be skewed. Even if you don't, you risk protocol timeouts by asking Chrome to do too much at once.

@iamEAP
Copy link

iamEAP commented Jan 13, 2020

Thanks @connorjclark.

Yep, I've seen that recipe. My specific problem with the fresh Chrome profile on launch approach is that I'm exposing Puppeteer as a micro-service (so it only launches when the service re/starts). Multiple clients can hit this service, but their requests are sandboxed from each other via Incognito contexts; I was hoping to borrow the same sandboxing approach for Lighthouse performance audits as well.

FYI we recommend against this. If you rely on the performance category, the results will be skewed.

This is good to know (I'm looking at just the performance category for right now). Is this something that can be mitigated by throwing additional resources at Chrome (e.g. CPU cores / Memory)? Any documentation you have on this would be very much appreciated.

I'd also be curious to hear how Google approaches scaling the PageSpeed Insights API, given the recommendation against concurrent audits in a single Chrome instance.

@connorjclark
Copy link
Collaborator

connorjclark commented Jan 13, 2020

so it only launches when the service re/starts

I'd suggest this is a micro-optimization. Also, LH directs the browser to clear the cache on each run (by default), so you're also at risk of runs stomping on the cache of other runs.

I'd also be curious to hear how Google approaches scaling the PageSpeed Insights API, given the recommendation against concurrent audits in a single Chrome instance.

We have many machines, a load balancer, and queue things up in the worst case.

You could probably get away with a few concurrent runs, but I'd measure to be safe. 3 is probably fine on any non-network constrained machine. In any case, you certainly should queue up LH runs if you get more than 3 req/minute.

@patrickhulce
Copy link
Collaborator

In addition to connor's advice, if you're going to run LH concurrently (again we recommend you don't or your performance variability will be quite high), run each Lighthouse in its own child process and dedicate at least 2 cores to its execution.

Scaling horizontally has shown to yield more consistent results than scaling vertically, i.e. using 8 smaller 2-core machines as opposed to running 8 runs on a 16-core machine. Just avoid any burst-able instance types.

@niieani
Copy link

niieani commented Jan 14, 2020

@iamEAP In our case, we run Lighthouse in a serverless compute service (e.g. AWS Lambda). We do this to run 60 tests simultaneously and then extract median performance data to see whether a given code change causes a performance regression (or is an improvement). This makes it easy to run LH concurrently (and scalably) and you get meaningful results as soon as the longest run completes.
You also get a fresh run of Chrome with every hit of the API, so you won't hit any of the issues you mentioned.

@Siilwyn
Copy link

Siilwyn commented Oct 21, 2020

Sorry if this was already obvious, but as far as I understand most usecases above could be solved by adding a way to connect to a chrome instance by providing a browser websocket url right? Just like puppeteer.connect accepts a browserWSEndpoint.

@patrickhulce
Copy link
Collaborator

Yes that is in fact the plan @Siilwyn but not the hardest part :) The full story is in #11313 and the associated links therein if you're interested in following along 👍

@praveenralla
Copy link

praveenralla commented Sep 16, 2021

Can I please get an example of the client calling the lighthouse and passing browser in the parameter userConnect? I am trying to call lighthouse on a url after navigating in puppeteer and getting a new tab launched everytime lighthouse is called. I dont want the new tab to be launched and want the existing tab to be reused.

Thanks in advance!

puppeteer version 7.6.0
lighthouse version 7.6.0

@connorjclark
Copy link
Collaborator

We have puppeteer examples here: https://github.com/GoogleChrome/lighthouse/blob/master/docs/puppeteer.md

@Khady
Copy link
Author

Khady commented Sep 17, 2021

From the look of the doc it only partially solves the original issue. For example it doesn't offer a way to force lighthouse to use a specific tab.

@praveenralla
Copy link

That is true . I went through all these examples but couldnt find a way to open lighthouse analysis on existing tab opened in puppeteer. I can achieve the result partially using the code lighthouse.snapshot in this code but the result report is not in desired html format but in Json format.

it('should compute accessibility results on the page as-is', async () => {
await setupTestPage();
const result = await lighthouse.snapshot({page});

It will be good to know if this resolution of opening in existing tab instead of opening new tab comes with official lighthouse release.

Thanks.

Thanks

@patrickhulce
Copy link
Collaborator

@Khady forcing Lighthouse on a particular tab will be solved by #11313. The issues @praveenralla ran into are unrelated to whether it can be used on a particular tab or not (just about consuming the output).

@Khady
Copy link
Author

Khady commented Sep 17, 2021

@patrickhulce I was actually reacting because this issue (which I opened and might be different from the ones of @praveenralla) is being closed without being solved. But thanks for the link to 11313! I'll follow the progress there

@niieani
Copy link

niieani commented Sep 29, 2021

@Khady the issue is indeed solved by Fraggle Rock, as @patrickhulce pointed out.

The new API is solid enough that we've starting using Fraggle Rock in production.
Though beware of small braking changes in FR configuration that are still happening between versions.

Example usage:

import {navigation} from 'lighthouse/lighthouse-core/fraggle-rock/api'
import puppeteer from 'puppeteer'

const browser = await puppeteer.launch()
const [thisIsYourCustomTab] = await browser.pages()
const result = await navigation({
  page: thisIsYourCustomTab,
  url: 'https://google.com',

  config: {
    // add your navigation config / settings here
  }
})

@vandana-k13
Copy link

Will lighthouse supports for SPA(Single page apps)

@adamraine
Copy link
Member

Will lighthouse supports for SPA(Single page apps)

It is a planned feature as part of #11313

@samarth-gupta-traceable
Copy link

samarth-gupta-traceable commented Feb 10, 2022

I am using lighthouse node module along with puppeteer to record perf metrics of pages behind auth.

I would also like to achieve below

  1. Figure out if any error occurred in browser console while page was being loaded
  2. Get handle to page once page is loaded, so that I can look for or assert of certain elements being present.

Not sure if above can be purely achieved using lighthouse node module or with assistance of puppeteer.
I tried using

I am new to both lighthouse & puppeteer so any pointers will be helpful

cc @adamraine @patrickhulce @Khady

@adamraine
Copy link
Member

adamraine commented Feb 10, 2022

Figure out if any error occurred in browser console while page was being loaded

Lighthouse has an audit under "Best practices" that checks for errors in the console

Get handle to page once page is loaded, so that I can look for or assert of certain elements being present.

Puppeteer will close the page after you call page.close() or browser.close(), but you don't need to call those methods if you want to inspect the page after Lighthouse runs. Additionally, you can try using page.$ and page.$$ to query for elements from puppeteer.

I forgot Lighthouse will create its own page and close it automatically when running from the node module. The easiest way to check for elements is to open a separate page and test for the elements using Puppeteer without Lighthouse:

const page = await browser.newPage();
await page.goto('https://example.com');
const check1 = await page.$('button.class');

@samarth-gupta-traceable

Figure out if any error occurred in browser console while page was being loaded

Lighthouse has an audit under "Best practices" that checks for errors in the console

Get handle to page once page is loaded, so that I can look for or assert of certain elements being present.

Puppeteer will close the page after you call page.close() or browser.close(), but you don't need to call those methods if you want to inspect the page after Lighthouse runs. Additionally, you can try using page.$ and page.$$ to query for elements from puppeteer.

I forgot Lighthouse will create its own page and close it automatically when running from the node module. The easiest way to check for elements is to open a separate page and test for the elements using Puppeteer without Lighthouse:

const page = await browser.newPage();
await page.goto('https://example.com');
const check1 = await page.$('button.class');

thanks @adamraine !! will try above out .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.