Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serverless (function-as-a-service) & lighthouse (NO_SCREENSHOTS, Load timeout) #14955

Open
doteric opened this issue Apr 7, 2023 · 15 comments
Open
Assignees
Labels

Comments

@doteric
Copy link
Contributor

doteric commented Apr 7, 2023

Hello 👋
First of all thank you for maintaining this awesome open source tool!

I'm writing this issue to gather the problems that I've stumbled upon so far with using lighthouse on an AWS Lambda.

The setup I'm using is:

Problems I've faced with running lighthouse on AWS Lambda:

1. Load timeout
This happens each time, for every test. It looks like one (or more) of the following conditions seem to never pass on a serverless environment and the timeout (45s by default) is always reached. https://github.com/GoogleChrome/lighthouse/blob/main/core/gather/driver/wait-for-condition.js#L405
I haven't investigated yet which one exactly is causing the issue, but maybe you have done it already? If not then I could maybe find some time next week to investigate this.

Read: #14955 (comment)

  1. NO_SCREENSHOTS
    This problem happens very rarely and it causes the performance score (with the speed index) to not calculate. It's very hard to pinpoint the exact reason of this as it seems very random. It might be related to the opened chrome instance as I've noticed that on one instance all tests would not contain this error, but on a different one all tests would contain this error. However I cannot confirm this theory. If you could point me to the code that actually performs the screenshots and what could potentially fail in that process then maybe I could investigate that also.

I am fully aware of Avoid function-as-a-service infrastructure (Lambda, GCF, etc) inside https://github.com/GoogleChrome/lighthouse/blob/main/docs/variability.md#run-on-adequate-hardware , but I would like to know the reason behind this and whether it would be possible to actually support serverless as it's used very often. I'm guessing that someone already did some investigation around this so I would want to avoid duplicating the work and listen to the reasoning behind not supporting serverless infrastructure and if there are any possible fixes for the above issues I have listed. If you lack time to investigate particular parts, but think something should be possible then let us (the community) know so maybe somebody can help out.

I've also noticed the following issues, but none of them have provided a perfect solution to the above problems:

Appreciate any replies and help 💪

@doteric
Copy link
Contributor Author

doteric commented Apr 11, 2023

@adamraine , would you have some time by any chance to check this issue? 🙏 Would appreciate it 🙇

@doteric
Copy link
Contributor Author

doteric commented Apr 11, 2023

As for point 1 I think I've found the main reason:

const resolveOnCriticalNetworkIdle = waitForNetworkIdle(session, networkMonitor, {
    networkQuietThresholdMs,
    busyEvent: 'network-critical-busy',
    idleEvent: 'network-critical-idle',
    isIdle: recorder => recorder.isCriticalIdle(),
  });

inside https://github.com/GoogleChrome/lighthouse/blob/main/core/gather/driver/wait-for-condition.js#L440
Without it the timeout seems to not happen. But another thing that I noticed is that this does not happen on most websites, but just on particular ones (those that I want to test for example) and then I've noticed that this is not a strictly serverless related problem. Therefore removing point 1 from this issue as it should be treated separately...
As of the problem itself it seems that an auth check that happens periodically is blocking the test to finish successfully. Therefore I will just try to block that request on LH level and see if that works.


So now only point 2 remains (as for the serverless problem) which I think overall is more important.
Didn't find a good point to start the investigation on that yet.
I have a question though - where and how are the "ScreenShots" kept during the test? Maybe it's inside some unsupported by lambda path and even though putting the files works they can be cleaned up almost instantly 🤔 (Just a guess/assumption, haven't investigated this).

@doteric doteric changed the title Serverless (function-as-a-service) & ligthhouse (NO_SCREENSHOTS, Load timeout) Serverless (function-as-a-service) & lighthouse (NO_SCREENSHOTS, Load timeout) Apr 11, 2023
@paulirish
Copy link
Member

Are you using headless=new ?

@doteric
Copy link
Contributor Author

doteric commented Apr 11, 2023

Hey @paulirish, thank you for the reply 💪
I've tried both --headless='new' and --headless and the behavior seems pretty similar, but the odds might slightly be different (might be due to not enough tests done), it's seems pretty random whether the test will be good or bad.

@doteric
Copy link
Contributor Author

doteric commented Apr 13, 2023

@paulirish Would you maybe be able to point me to the place where the screenshots are gathered? I guess it's on the gatherers part, but I couldn't find how is it done exactly :/ Maybe I could try to debug it.

@doteric
Copy link
Contributor Author

doteric commented Apr 18, 2023

bump on this topic.
@paulirish @connorjclark @adamraine @brendankenny
Really sorry for bothering you guys, but by any chance any of you could provide me some more details on what could be happening and has this been investigated before? If not then with some extra details I could potentially try investigating this out. I'm guessing this could be an issue in chrome itself and the devtools protocol not returning all needed artifacts?
Appreciate it 🙇

@connorjclark
Copy link
Collaborator

connorjclark commented Apr 18, 2023

  1. re: timeout. There is the --max-wait-for-load option, which default to 45000 (45s). You could set it higher for machines with variable load
  2. re: NO_SCREENSHOTS. Maybe you need xvfb. See what we do in GHCI:
    - run: sudo apt-get install xvfb
    - name: Run smoke tests
    run: |
    xvfb-run --auto-servernum yarn c8 yarn smoke --debug -j=2 --retries=2 --shard=${{ matrix.smoke-test-shard }}/$SHARD_TOTAL

@doteric
Copy link
Contributor Author

doteric commented Apr 19, 2023

Thanks @connorjclark for the reply 🙇

  1. I've already managed with this hence the strikethrough :D
  2. That would be very interesting as it doesn't always fail, but only sometimes (50/50 kinda). I will try to look into this if by any chance some lambda container can have something more installed that some other doesn't, but that would be very weird... What do you think? I'll also try adding xvfb additionally and check if that helps in anything.
    Please also keep in mind that the final screenshot (full page screenshot) always creates fine, it's just the screenshots during the loading process seem to be missing 🤷‍♂️

@doteric
Copy link
Contributor Author

doteric commented May 17, 2023

I've recently started working with LightHouse user flows and I noticed that that some LH navigation tests work fine and some result in the exact same NO_SCREENSHOTS error in the exact same user flow which means same browser and all same settings, but still something is wrong. I initially thought it only seems to happen on the first run and it always is fine on the next runs, but then I managed to get a result where the 1st attempt is fine meanwhile then 2nd and 3rd error with NO_SCREENSHOTS and then 4th, 5th are fine.

Examples:
image
image

FYI. @connorjclark / @paulirish
🙇‍♂️

@connorjclark
Copy link
Collaborator

connorjclark commented Jan 23, 2024

Could you extract the traces / LH artifacts from these bad runs and upload them here? We don't support running LH on lambda so I can't promise any resolution here, but I can take a look at the trace/artifacts and see if there is anything obviously wrong.

@doteric
Copy link
Contributor Author

doteric commented Jan 25, 2024

Thank you @connorjclark for replying 🙇
Sure I can get some failed result artifacts for you for sure. Just please let me know, do you mean the RunnerResult.artifacts object as a JSON to be precise or something else? Also can it contain any sensitive information apart from what's on the actual website btw? So I can post it here publicly? Do not have the time to look through it.
Thank you for the help btw 💪

@doteric
Copy link
Contributor Author

doteric commented Feb 26, 2024

@connorjclark ping 🙏
If you have some time

@connorjclark
Copy link
Collaborator

connorjclark commented Feb 26, 2024

do you mean the RunnerResult.artifacts object as a JSON to be precise or something else?

Yes, but it would be better as a zip of latest-run, which is a folder that is generated of the artifacts when you use the -G flag.

Also can it contain any sensitive information apart from what's on the actual website btw?

Treat it like it's giving someone full access to anything the browser devtools can show you. In general, this is not an issue.

@doteric
Copy link
Contributor Author

doteric commented Feb 27, 2024

@connorjclark Big thanks for the reply

Below is a json of the artifacts.
example-fail-artifacts.json

Whenever I have time I will also do a run in gatherMode as you stated, that would produce the .zip file. It will require some fiddling with the AWS Lambda so it's not as straightforward, but hopefully I'll have time to do it this week.
Appreciate it btw 🙇

@doteric
Copy link
Contributor Author

doteric commented May 22, 2024

Hello @connorjclark ,
Sorry for taking so long, but I kinda forgot about this and never had a longer moment to go back to this. However, today I've decided to go back to this topic and grab the artifacts that you've asked for.

Artifacts of the failed run in zip:
latest-run.zip

Hopefully that will help you identify the problem.

Cheers and big thanks again 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants