Lighthouse/Puppeteer integration #3837

Khady · 2017-11-17T07:31:42Z

I am using lighthouse from javascript to check a few pages from a website. I would like to be able to tell lighthouse to use a specific tab that is already opened in my chrome to do that. Maybe by giving it at link to a ws endpoint. That's because I create the browser using puppeteer and I want to do some operations before to run lighthouse (like to set the useragent or some request interception configuration) and once the lighthouse check is done (like to get the html of the page or to interact with the page).

Is it possible to tell lighthouse to use a tab already existing?

patrickhulce · 2017-11-18T06:21:11Z

Thanks for filing @Khady!

In short, it's not currently possible today. Lighthouse always controls the navigation to the page. There are some settings (user agent/setting a cookie/logging in) you'd be able to set using puppeteer before providing the same port to LH, but you'll need to make sure LH isn't overriding those by disabling mobile emulation/storage reset where applicable.

We've got it on our roadmap to enable auditing without a navigation though in which case this sort of thing becomes possible :)

related #1769, #3833

paulirish · 2017-11-21T00:29:00Z

We're not ready to open up the full multiclient story where Lighthouse and Puppeteer/CRI talk to the same page. There are some dragons within here that we're not ready to fight yet.

There's another approach we're discussing and we currently favor:

Don't use puppeteer to launch chrome and set up the lifecycle. Use lighthouse for this instead.
Use a custom config and custom gatherer for lighthouse. In the gatherer's beforePass, set up the environment with puppeteer and then resolve.

You can do all this today without any code changes (although #3864 should help quite a bit..) Your custom gatherer won't actually return a useful artifact, but that's OK. We're just abusing its lifecycle hooks.

@patrickhulce does this match what you were thinking?

patrickhulce · 2017-11-21T01:04:46Z

Yeah this seems like the quickest way to achieve as much of the goals as possible today. Long-term vision, reusing existing tab and making LH more flexible in analyzing existing pages is definitely the way we should be moving to play better with DevTools and puppeteer 👍

wardpeet · 2017-11-21T20:22:57Z

how do we feel about creating an example:
https://github.com/GoogleChrome/lighthouse/tree/master/docs/recipes

Khady · 2017-12-01T10:51:06Z

I'm working on a new version of my tool following your advices.

I keep using puppeteer to launch chrome and set up the lifecycle because it is easier this way and it is easier the keep the same chrome alive for multiple tasks. This part is not a problem because lighthouse has a correct support of an already running chrome process.
I'm using a custom gatherer to setup what I need. But this is not convenient at all. I have many missing information that I have to transfer to the gatherer to do the whole setup properly. There are two options to do that if I understand correctly the code I read:
- using a global state on my side — which is not very clean and not very convenient. I have no unique identifier available in my code and in the gatherer to which I can attach the information. The best I found is wsEndpoint × url, but it is not unique if the same url is opened multiple times in the same browser. It would be nice to have wsEndpoint × pageId, but this information is not publicly available in lighthouse and puppeteer (options.driver._connection._pageId in lh, page._client._targetId in pptr). And anyway I can't know before to launch lighthouse what will be the id of the page. ¯\_(ツ)_/¯
- storing the values I need in the flags object which is transfered all the way until the gatherer — and I am a bit scared to use this solution because there is no guaranty that the flags will always be given to the gatherer and it smells like a bit hack. Is there an object in which I can put some data to use in my gatherer and be sure it won't disappear in the future? With a better semantic than the flags object :)?
Also if there is an error during the setup of the page (it shouldn't happen, but we are dealing with computers => it will happen), I didn't find a way from the gatherer to interrupt the whole lighthouse operation. I can see in the artifacts that my gatherer has returned an empty object and invalidate the results like this. But it is cumbersome. Plus it still cost me the duration of the lighthouse run which can be pretty long and it can create an unnecessary crawl of the page I try to evaluate.

aslushnikov says in puppeteer/puppeteer#1398 (comment) that pptr could move from ws connection to pipe connection. If the pageId/targetId system is not portable over the pipe connection then I guess there is not much choice but to keep the connection system as it is currently. No point adding #3857 if it is going to be deprecated soon.

ps: I think I found a possible bug in cri.js while reading the code, but I don't have the time to properly investigate it (and it's probably not faced very often). When a page is closed here, it can be the latest page of the browser, because of the condition at this point. It is possible to reuse an existing tab and if this tab is the latest tab and closed by lighthouse at the end of the run, the browser will be closed too.

patrickhulce · 2017-12-07T19:01:27Z

Great feedback @Khady you're somewhat of a pioneer in this area, so it's great to be aware of the pain points :) A few responses to your comments below

using a global state on my side — which is not very clean and not very convenient. I have no unique identifier available in my code and in the gatherer to which I can attach the information. The best I found is wsEndpoint × url, but it is not unique if the same url is opened multiple times in the same browser. It would be nice to have wsEndpoint × pageId, but this information is not publicly available in lighthouse and puppeteer

Ah, you're having trouble finding the target to use in puppeteer once the page has been loaded correct? Yeah, we should expand #3864 to communicate the target/page ID as well.

storing the values I need in the flags object which is transfered all the way until the gatherer — and I am a bit scared to use this solution because there is no guaranty that the flags will always be given to the gatherer and it smells like a bit hack. Is there an object in which I can put some data to use in my gatherer and be sure it won't disappear in the future? With a better semantic than the flags object :)?

Yes, we had a plan for this and haven't gotten around since there wasn't an immediate need, but we want to implement audit and gatherer options to pass in dynamic runtime information that can control audit/gatherer behavior separately from the gatherer/audit code itself.

Also if there is an error during the setup of the page (it shouldn't happen, but we are dealing with computers => it will happen), I didn't find a way from the gatherer to interrupt the whole lighthouse operation. I can see in the artifacts that my gatherer has returned an empty object and invalidate the results like this. But it is cumbersome. Plus it still cost me the duration of the lighthouse run which can be pretty long and it can create an unnecessary crawl of the page I try to evaluate.

You should be able to mark an error with a .fatal property to have LH exit immediately rather than just fail the gatherer.

lighthouse/lighthouse-core/gather/gather-runner.js

Lines 126 to 138 in 407b1af

    
             /** 
        
              * Test any error output from the promise, absorbing non-fatal errors and 
        
              * throwing on fatal ones so that run is stopped. 
        
              * @param {!Promise<*>} promise 
        
              * @return {!Promise<*>} 
        
              */ 
        
             static recoverOrThrow(promise) { 
        
               return promise.catch(err => { 
        
                 if (err.fatal) { 
        
                   throw err; 
        
                 } 
        
               }); 
        
             }

pass(/** stuff */) {
  const error = new Error("Uh-oh something went wrong!");
  error.fatal = true;
  throw error;
}

It is possible to reuse an existing tab and if this tab is the latest tab and closed by lighthouse at the end of the run, the browser will be closed too

Ah, good find! You're right we've never really run into this, especially since we discourage using headless for its lack of throttling, but we should update that to throw loudly at this point if we can't create a tab :)

Khady · 2017-12-08T02:35:26Z

Thank you for your help!

Ah, you're having trouble finding the target to use in puppeteer once the page has been loaded correct? Yeah, we should expand #3864 to communicate the target/page ID as well.

Correct.

Yes, we had a plan for this and haven't gotten around since there wasn't an immediate need, but we want to implement audit and gatherer options to pass in dynamic runtime information that can control audit/gatherer behavior separately from the gatherer/audit code itself.

Good news. I can manage to do what I want in the current situation. But it's great to have visibility on the future plans.

You should be able to mark an error with a .fatal property to have LH exit immediately rather than just fail the gatherer.

Awesome. I should have read the whole code related to the gatherers and not only some parts.

Nothing is blocking me for now, thanks to your advices. I just exploit a few undocumented information (driver._connection._pageId, the flags object, ...). I understand puppeteer is pretty young and it's not common (yet?) to connect it with lighthouse. My hope is that feedback can help to understand what are the necessary bits for possible improvements.

paulirish · 2018-01-16T22:47:49Z

We will sort this out in the next 2 quarters. Thanks!

unindented · 2018-03-20T19:50:59Z

I'm also interested in the request interception side of things.

We're running Lighthouse as part of our CI/CD pipeline. However, our API endpoints have really erratic behavior, and requests can take anything from 500ms to 2s. That's forcing us to make our TTI checks much laxer than what we'd want.

If we could intercept requests to those endpoints and immediately respond with a fixture, we'd have much more deterministic numbers, and we could tighten our TTI checks.

brendankenny · 2018-08-10T18:43:39Z

See #5472 for another use case

niieani · 2018-12-14T10:39:30Z

I'm running Chrome with chrome-launcher, then connecting to it with puppeteer. The only thing I'm setting up is this:

    // add HTTP BasicAuth credentials on new tab creation:
    browser.on('targetcreated', async (target) => {
      const page = await target.page()
      if (page) await page.authenticate(basicAuth)
    })

It works, but once Puppeteer is connected, Lighthouse (and Chrome's devtools, for that matter) stops gathering the size of requests.

Anybody know why, or how to mitigate this (size 0 everywhere)?

iamEAP · 2020-01-13T20:07:07Z

Going to add my perspective because I did not see mention of this after reading through the thread:

My use-case: I'm looking to run multiple concurrent Lighthouse audits using a single instance of Chrome using a new Incognito Browser Context for each audit so that no data storage/state is shared between concurrent audits.

Ideally each audit could be preceded by a series of actions (e.g. a log in), and state would be maintained per incognito context (tab).

However, following the Puppeteer recipes in this repo, it seems like Lighthouse always opens the URL in the default (shared) browser context.

connorjclark · 2020-01-13T20:14:36Z

Have you seen this? https://github.com/GoogleChrome/lighthouse/blob/master/docs/recipes/auth/example-lh-auth.js

Puppeteer, by default, uses a fresh Chrome profile, so if you launch it like the above script does you shouldn't see any state persist.

multiple concurrent Lighthouse audits

FYI we recommend against this. If you rely on the performance category, the results will be skewed. Even if you don't, you risk protocol timeouts by asking Chrome to do too much at once.

iamEAP · 2020-01-13T23:48:11Z

Thanks @connorjclark.

Yep, I've seen that recipe. My specific problem with the fresh Chrome profile on launch approach is that I'm exposing Puppeteer as a micro-service (so it only launches when the service re/starts). Multiple clients can hit this service, but their requests are sandboxed from each other via Incognito contexts; I was hoping to borrow the same sandboxing approach for Lighthouse performance audits as well.

FYI we recommend against this. If you rely on the performance category, the results will be skewed.

This is good to know (I'm looking at just the performance category for right now). Is this something that can be mitigated by throwing additional resources at Chrome (e.g. CPU cores / Memory)? Any documentation you have on this would be very much appreciated.

I'd also be curious to hear how Google approaches scaling the PageSpeed Insights API, given the recommendation against concurrent audits in a single Chrome instance.

connorjclark · 2020-01-13T23:55:26Z

so it only launches when the service re/starts

I'd suggest this is a micro-optimization. Also, LH directs the browser to clear the cache on each run (by default), so you're also at risk of runs stomping on the cache of other runs.

I'd also be curious to hear how Google approaches scaling the PageSpeed Insights API, given the recommendation against concurrent audits in a single Chrome instance.

We have many machines, a load balancer, and queue things up in the worst case.

You could probably get away with a few concurrent runs, but I'd measure to be safe. 3 is probably fine on any non-network constrained machine. In any case, you certainly should queue up LH runs if you get more than 3 req/minute.

patrickhulce · 2020-01-14T00:14:47Z

In addition to connor's advice, if you're going to run LH concurrently (again we recommend you don't or your performance variability will be quite high), run each Lighthouse in its own child process and dedicate at least 2 cores to its execution.

Scaling horizontally has shown to yield more consistent results than scaling vertically, i.e. using 8 smaller 2-core machines as opposed to running 8 runs on a 16-core machine. Just avoid any burst-able instance types.

niieani · 2020-01-14T14:02:44Z

@iamEAP In our case, we run Lighthouse in a serverless compute service (e.g. AWS Lambda). We do this to run 60 tests simultaneously and then extract median performance data to see whether a given code change causes a performance regression (or is an improvement). This makes it easy to run LH concurrently (and scalably) and you get meaningful results as soon as the longest run completes.
You also get a fresh run of Chrome with every hit of the API, so you won't hit any of the issues you mentioned.

Siilwyn · 2020-10-21T13:22:54Z

Sorry if this was already obvious, but as far as I understand most usecases above could be solved by adding a way to connect to a chrome instance by providing a browser websocket url right? Just like puppeteer.connect accepts a browserWSEndpoint.

patrickhulce · 2020-10-21T15:30:13Z

Yes that is in fact the plan @Siilwyn but not the hardest part :) The full story is in #11313 and the associated links therein if you're interested in following along 👍

praveenralla · 2021-09-16T23:23:58Z

Can I please get an example of the client calling the lighthouse and passing browser in the parameter userConnect? I am trying to call lighthouse on a url after navigating in puppeteer and getting a new tab launched everytime lighthouse is called. I dont want the new tab to be launched and want the existing tab to be reused.

Thanks in advance!

puppeteer version 7.6.0
lighthouse version 7.6.0

connorjclark · 2021-09-16T23:31:52Z

We have puppeteer examples here: https://github.com/GoogleChrome/lighthouse/blob/master/docs/puppeteer.md

Khady · 2021-09-17T12:31:09Z

From the look of the doc it only partially solves the original issue. For example it doesn't offer a way to force lighthouse to use a specific tab.

praveenralla · 2021-09-17T13:02:00Z

That is true . I went through all these examples but couldnt find a way to open lighthouse analysis on existing tab opened in puppeteer. I can achieve the result partially using the code lighthouse.snapshot in this code but the result report is not in desired html format but in Json format.

lighthouse/lighthouse-core/test/fraggle-rock/api-test-pptr.js

Lines 115 to 118 in 6b95928

    
           it('should compute accessibility results on the page as-is', async () => { 
        
             await setupTestPage(); 
        
             const result = await lighthouse.snapshot({page});

It will be good to know if this resolution of opening in existing tab instead of opening new tab comes with official lighthouse release.

Thanks.

Thanks

patrickhulce · 2021-09-17T13:55:37Z

@Khady forcing Lighthouse on a particular tab will be solved by #11313. The issues @praveenralla ran into are unrelated to whether it can be used on a particular tab or not (just about consuming the output).

Khady · 2021-09-17T14:13:58Z

@patrickhulce I was actually reacting because this issue (which I opened and might be different from the ones of @praveenralla) is being closed without being solved. But thanks for the link to 11313! I'll follow the progress there

niieani · 2021-09-29T06:27:44Z

@Khady the issue is indeed solved by Fraggle Rock, as @patrickhulce pointed out.

The new API is solid enough that we've starting using Fraggle Rock in production.
Though beware of small braking changes in FR configuration that are still happening between versions.

Example usage:

import {navigation} from 'lighthouse/lighthouse-core/fraggle-rock/api'
import puppeteer from 'puppeteer'

const browser = await puppeteer.launch()
const [thisIsYourCustomTab] = await browser.pages()
const result = await navigation({
  page: thisIsYourCustomTab,
  url: 'https://google.com',

  config: {
    // add your navigation config / settings here
  }
})

vandana-k13 · 2022-01-12T10:43:55Z

Will lighthouse supports for SPA(Single page apps)

adamraine · 2022-01-12T16:09:36Z

Will lighthouse supports for SPA(Single page apps)

It is a planned feature as part of #11313

samarth-gupta-traceable · 2022-02-10T13:16:21Z

I am using lighthouse node module along with puppeteer to record perf metrics of pages behind auth.

I would also like to achieve below

Figure out if any error occurred in browser console while page was being loaded
Get handle to page once page is loaded, so that I can look for or assert of certain elements being present.

Not sure if above can be purely achieved using lighthouse node module or with assistance of puppeteer.
I tried using

I am new to both lighthouse & puppeteer so any pointers will be helpful

cc @adamraine @patrickhulce @Khady

adamraine · 2022-02-10T16:17:23Z

Figure out if any error occurred in browser console while page was being loaded

Lighthouse has an audit under "Best practices" that checks for errors in the console

Get handle to page once page is loaded, so that I can look for or assert of certain elements being present.

Puppeteer will close the page after you call page.close() or browser.close(), but you don't need to call those methods if you want to inspect the page after Lighthouse runs. Additionally, you can try using page.$ and page.$$ to query for elements from puppeteer.

I forgot Lighthouse will create its own page and close it automatically when running from the node module. The easiest way to check for elements is to open a separate page and test for the elements using Puppeteer without Lighthouse:

const page = await browser.newPage();
await page.goto('https://example.com');
const check1 = await page.$('button.class');

samarth-gupta-traceable · 2022-02-11T09:50:31Z

Figure out if any error occurred in browser console while page was being loaded

Lighthouse has an audit under "Best practices" that checks for errors in the console

Get handle to page once page is loaded, so that I can look for or assert of certain elements being present.

Puppeteer will close the page after you call page.close() or browser.close(), but you don't need to call those methods if you want to inspect the page after Lighthouse runs. Additionally, you can try using page.$ and page.$$ to query for elements from puppeteer.

I forgot Lighthouse will create its own page and close it automatically when running from the node module. The easiest way to check for elements is to open a separate page and test for the elements using Puppeteer without Lighthouse:
const page = await browser.newPage();
await page.goto('https://example.com');
const check1 = await page.$('button.class');

thanks @adamraine !! will try above out .

patrickhulce added needs-priority feature labels Nov 18, 2017

Khady mentioned this issue Nov 20, 2017

core(connection): add support to use existing tab #3857

Closed

paulirish mentioned this issue Nov 21, 2017

core(driver): add driver.wsEndpoint() #3864

Merged

Khady mentioned this issue Nov 21, 2017

Method to uniquely identify Pages/Targets? puppeteer/puppeteer#1428

Closed

wardpeet mentioned this issue Nov 21, 2017

Create more useful examples/guides #3877

Closed

paulirish added P1.5 and removed needs-priority labels Jan 16, 2018

paulirish changed the title ~~Use lighthouse from javascript on a tab already opened~~ Lighthouse/Puppeteer integration Jan 16, 2018

paulirish mentioned this issue Jan 16, 2018

add ability to accept geolocation permissions #3836

Open

patrickhulce mentioned this issue Jan 18, 2018

Navigation of the web application #4283

Closed

patrickhulce mentioned this issue Jan 30, 2018

Doing Network throttling programmatically? #4376

Closed

patrickhulce mentioned this issue Feb 15, 2018

custom configuration difficulties #4526

Closed

patrickhulce mentioned this issue Feb 26, 2018

Unable to find chrome session started by Protractor #4606

Closed

ebidel mentioned this issue Feb 27, 2018

Puppeteer and Lighthouse for page flow puppeteer/puppeteer#2105

Closed

patrickhulce mentioned this issue Jun 12, 2018

How to generate a Lighthouse Report on dynamic URL? #5472

Closed

justinribeiro mentioned this issue Sep 17, 2018

How to test page flow in Lighthouse #6028

Closed

brendankenny mentioned this issue Sep 17, 2018

RFC: Allow passing in of a existing page debugger URL to lighthouse #6038

Closed

patrickhulce mentioned this issue Nov 14, 2018

Lighthouse Audits on Multiple SPA pages with only 1 unique URL #6555

Closed

patrickhulce mentioned this issue Oct 3, 2019

Make page refresh optional, depending on config #1769

Closed

iamEAP mentioned this issue Jan 14, 2020

[WIP] Initial Lighthouse Performance Checks against current page run-crank/cog-web#56

Merged

peterwilliams-atl mentioned this issue Apr 29, 2020

Run lighthouse report at each step in an Angular SPA #10435

Closed

bebraw mentioned this issue May 5, 2020

--puppeteer-script - lhci fails after logging in GoogleChrome/lighthouse-ci#300

Closed

patrickhulce mentioned this issue Jul 3, 2020

Lighthouse audit from Incognito mode #11055

Closed

patrickhulce mentioned this issue Jul 17, 2020

Run on page with interactions / flows GoogleChrome/lighthouse-ci#383

Open

patrickhulce mentioned this issue Aug 25, 2020

Flow Support (Fraggle Rock) #11313

Closed

mikestead mentioned this issue Oct 17, 2020

Feature Request: Run concurrently mikestead/lighthouse-batch#49

Closed

connorjclark closed this as completed Sep 16, 2021

theadityasam mentioned this issue Feb 22, 2023

Analogous to sessionattached puppeteer event in playwright microsoft/playwright#21107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lighthouse/Puppeteer integration #3837

Lighthouse/Puppeteer integration #3837

Khady commented Nov 17, 2017

patrickhulce commented Nov 18, 2017

paulirish commented Nov 21, 2017

patrickhulce commented Nov 21, 2017

wardpeet commented Nov 21, 2017

Khady commented Dec 1, 2017 •

edited

Loading

patrickhulce commented Dec 7, 2017

Khady commented Dec 8, 2017

paulirish commented Jan 16, 2018

unindented commented Mar 20, 2018

brendankenny commented Aug 10, 2018

niieani commented Dec 14, 2018 •

edited

Loading

iamEAP commented Jan 13, 2020

connorjclark commented Jan 13, 2020 •

edited

Loading

iamEAP commented Jan 13, 2020

connorjclark commented Jan 13, 2020 •

edited

Loading

patrickhulce commented Jan 14, 2020

niieani commented Jan 14, 2020 •

edited

Loading

Siilwyn commented Oct 21, 2020

patrickhulce commented Oct 21, 2020

praveenralla commented Sep 16, 2021 •

edited

Loading

connorjclark commented Sep 16, 2021

Khady commented Sep 17, 2021 •

edited

Loading

praveenralla commented Sep 17, 2021

patrickhulce commented Sep 17, 2021

Khady commented Sep 17, 2021

niieani commented Sep 29, 2021 •

edited

Loading

vandana-k13 commented Jan 12, 2022

adamraine commented Jan 12, 2022

samarth-gupta-traceable commented Feb 10, 2022 •

edited

Loading

adamraine commented Feb 10, 2022 •

edited

Loading

samarth-gupta-traceable commented Feb 11, 2022

Lighthouse/Puppeteer integration #3837

Lighthouse/Puppeteer integration #3837

Comments

Khady commented Nov 17, 2017

patrickhulce commented Nov 18, 2017

paulirish commented Nov 21, 2017

patrickhulce commented Nov 21, 2017

wardpeet commented Nov 21, 2017

Khady commented Dec 1, 2017 • edited Loading

patrickhulce commented Dec 7, 2017

Khady commented Dec 8, 2017

paulirish commented Jan 16, 2018

unindented commented Mar 20, 2018

brendankenny commented Aug 10, 2018

niieani commented Dec 14, 2018 • edited Loading

iamEAP commented Jan 13, 2020

connorjclark commented Jan 13, 2020 • edited Loading

iamEAP commented Jan 13, 2020

connorjclark commented Jan 13, 2020 • edited Loading

patrickhulce commented Jan 14, 2020

niieani commented Jan 14, 2020 • edited Loading

Siilwyn commented Oct 21, 2020

patrickhulce commented Oct 21, 2020

praveenralla commented Sep 16, 2021 • edited Loading

connorjclark commented Sep 16, 2021

Khady commented Sep 17, 2021 • edited Loading

praveenralla commented Sep 17, 2021

patrickhulce commented Sep 17, 2021

Khady commented Sep 17, 2021

niieani commented Sep 29, 2021 • edited Loading

vandana-k13 commented Jan 12, 2022

adamraine commented Jan 12, 2022

samarth-gupta-traceable commented Feb 10, 2022 • edited Loading

adamraine commented Feb 10, 2022 • edited Loading

samarth-gupta-traceable commented Feb 11, 2022

Khady commented Dec 1, 2017 •

edited

Loading

niieani commented Dec 14, 2018 •

edited

Loading

connorjclark commented Jan 13, 2020 •

edited

Loading

connorjclark commented Jan 13, 2020 •

edited

Loading

niieani commented Jan 14, 2020 •

edited

Loading

praveenralla commented Sep 16, 2021 •

edited

Loading

Khady commented Sep 17, 2021 •

edited

Loading

niieani commented Sep 29, 2021 •

edited

Loading

samarth-gupta-traceable commented Feb 10, 2022 •

edited

Loading

adamraine commented Feb 10, 2022 •

edited

Loading