Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runTimeError exists when there are NO_ERRORs #6336

Closed
ebidel opened this issue Oct 18, 2018 · 17 comments
Closed

runTimeError exists when there are NO_ERRORs #6336

ebidel opened this issue Oct 18, 2018 · 17 comments
Assignees
Labels

Comments

@ebidel
Copy link
Contributor

ebidel commented Oct 18, 2018

Ran into this working on a web server that returns LH results. It's unexpected that runTimeErrors exists in the LHR if there were no errors! It also requires users to check for a "NO_ERROR" string code which overcomplicates matters.

    if (lhr.runtimeError && lhr.runtimeError.code !== 'NO_ERROR') {
      throw new Error(`${lhr.runtimeError.code} ${lhr.runtimeError.message}`);
    }

screen shot 2018-10-17 at 10 55 55 pm

Environment Information

  • Affected Channels: all
  • Lighthouse version: 3.2.1
  • Node.js version: 10+
@ebidel ebidel changed the title runTimeErrors exists when there are NO_ERRORs runTimeError exists when there are NO_ERRORs Oct 18, 2018
@patrickhulce
Copy link
Collaborator

I'm missing rationale for this too.

In #6014, @brendankenny said

I'd prefer to continue throwing exceptions for these cases and then checking outside core in runLighthouseForLR that the thrown error has lhrRuntimeError set on it. Most of our clients just want a thrown error in those cases, not a pseudo-LHR, and dealing with a pseudo-LHR just makes tracking the LHR type and what's on it messier when we really just want an exception floated up.

Which makes a lot of sense to me too. Why did we not go that route? :)

@ebidel
Copy link
Contributor Author

ebidel commented Oct 18, 2018 via email

@exterkamp
Copy link
Member

This has implications for proto enums as well. @paulirish

@brendankenny
Copy link
Member

#6330 is also part of this

@paulirish
Copy link
Member

runtimeError for failed document requests

@ebidel currently in the case of a 500 or a DNS resolving error for the page being tested, we do return an LHR with a populated runtimeError object. So you're going to have to check this prop before considering the rest of the LHR valid enough to read. (But.. more on NO_ERROR below)

On throwing and pseudo-LHRs

I'm not sold that throwing to our node module consumers is useful, but we can leave that be. I'll live with our current approach.

However, for CLI users (and I'm mostly considering CLI use at scale here), I don't think communicating over exit code is tremendously effective. In order to convey any detail about the error we'd need to log something, but that now means the user has to parse stderr, which we've been logging status to this whole time. :/

So I'd argue that CLI users would be better served with a psuedo-LHR with the runtimeError object populated.

NO_ERROR

Sounds like nobody likes lhr.runtimeError being set if there was no error. I can understand that. While it's more idomatic in JS land to leave it off, in the proto, we do need the field defined and initialized with NO_ERROR. However, we think we can populate it after going JSON->Proto. So that seems like the rough plan here.

API flexibility

Last, we need to confirm what we can do on the PSI API side of things. We don't have unlimited flexibility there, but after discussing it, @exterkamp thinks we can probably handle this well. But whatever the plan is, it needs to be compatible there, because those LHRs will be consumed more than LHRs from the CLI. ;)

@patrickhulce
Copy link
Collaborator

I'm not sold that throwing to our node module consumers is useful

😮 I am so curious why, but I'll table that for now too :)

Thanks for the context!

I don't think communicating over exit code is tremendously effective

I can understand getting more information about the error is more difficult in the CLI case, but I'm curious who those CLI users are where getting the error string easily is more important than detecting that an error happened easily.

I would expect that the only actionable information it gives anyone not on this team is that they need to re-run LH. From bash script angle, being able to tell LH failed just by exit code is a huge win over parsing JSON and checking if a specific property is equal to a particular value to figure out there was a failure. This also assumes they're not using the CLI to generate HTML output in which case determining there was an error seems impossibly difficult. Are we at least considering doing both? Populating a JSON runtimeError for those who need more detail and still exiting with 1 + basic logs to stderr?

in the proto, we do need the field defined and initialized with NO_ERROR

I should've known proto was to blame :) does this mean it's the default and won't be set in the API response? I forget where we landed on if our proto-version does that.

Last, we need to confirm what we can do on the PSI API side of things.

I probably missed this, but do our error responses and the LHR have to be same proto? In my experience, it's very common for error responses with a non-200 status to have very different schemas.

@ebidel
Copy link
Contributor Author

ebidel commented Oct 19, 2018

I probably missed this, but do our error responses and the LHR have to be same proto? In my experience, it's very common for error responses with a non-200 status to have very different schemas.

In fact, I think that's what a lot of Google service APIs do. The return object is something like {errors: ...} instead of the normal object you get back with an embedded errors property you have to check. IOW, you get the response status code and response body tell similar stories (there was a failure).

@paulirish
Copy link
Member

A few updates:

  1. The proto story looks okay
  2. Yes you're right about Google service APIs. If we return a real runtimeError, the API will return a 500 with a lil object. No LHR.

@paulirish
Copy link
Member

paulirish commented Oct 22, 2018

Are we at least considering doing both? Populating a JSON runtimeError for those who need more detail and still exiting with 1 + basic logs to stderr?

Yes we'd def still set an exit code correctly.


Current status is...

  • Node module has a few cases where it throws. Any clients who don't like that have to catch on their own.
  • lhr.runtimeError object should only be created if there's an error. If there's not, then that prop will be undefined.
  • API will not return an empty or NO_ERROR runtimeError object
  • API will respond with 500 in case of a runtimeError, though the response will be some other JSON, but not the LHR.
  • CLI should return non-0 exit codet if runtimeError is populated
  • CLI will set an exit code for if it's catching an exception.
  • In the case mentioned immediately above, there's no LHR, so no lhr.runtimeError. Perhaps there should be?

I believe this reflects the consensus. AFAICT, node module + API behavior are settled, but we have some unresolved questions around CLI behavior.

@exterkamp
Copy link
Member

exterkamp commented Dec 3, 2018

I will try to organize this a little bit so that we don't get out of hand with #6671.

Error Definitions

It seems like there are two main kinds of Lighthouse errors:

  1. "Audit Errors" which are small localized errors that are reported at the audit level.
    • "Gatherer Errors" can cause Audit Errors. Think PageLoadError which is fatal, but presents itself by erroring out all gatherers which leads to most audits having an error.
  2. "Runtime Errors" which are large errors that are catastrophic to the Lighthouse Run.
    • "Fatal Errors" Errors that immediately cause the run to terminate e.g. PROTOCOL_ERROR when getting browser version, i.e. errors we can never recover from and LH will never make sense with one of these errors occurring. For clarity I will only refer to "runtime errors" since they are equivalent to "fatal errors", I think I introduced confusion around this 😨 sorry

Error Actions

As noted in this convo, surfacing these runtime errors can be a problem. This seems like the consensus.

CLI Lightrider node DevTools/Extension
Non 0 Exit Code + save an LHR with runtimeError populated Graceful exit with LHR with runtimeError populated. Throw error + save output with runtimeError populated (either LHR if json or html report if html) Don't worry about it?

PSI

PSI should return an HTTP status code that is consistent with the LHR inside it. Currently it will return a 200 if there is a response from LR, even if that response is that a runtime error has occured. This is not desired and should return 200 if successful (no runtimeErrors or runtimeWarnings) and a 5XX if the run was not successful.

Also PSI shouldn't mirror what LR encountered e.g. if Lightrider gets a 404 PSI should return a 5XX not a 404. This does make me want a better way to show that there was an underlying status code/details to the error. After #6457 the error codes will have some more explicit status codes appended to them, but it might be prudent to add a field to Lighthouse_result.runtime_error something like underlying_status_code so that it is simple to parse what the cause was exactly if it was a page load problem, or possibly error_details. Basically something that could be parsed easier for better error reporting/bucketing.

Eng to make this happen

  • Runtime errors should generate the most descriptive LHR possible and return it. Not just throw and error.
    • The CLI wil have to return exit codes based on the runtimeError, not catching thrown errors, but it should also always save an LHR.
    • LR will return any LHR it gets, not construct a little error-msg-only pseudo LHR.
    • PSI should return 5XX on errors w/explicit status codes or protocol methods included in reseponse
  • Refactor PageLoadError into the Runtime Error that it is and stop erroring the gatherers, and just exit with an error like a protocol error during artifact generation.

@patrickhulce
Copy link
Collaborator

Thanks for the clarification @exterkamp this helps quite a bit!! Few Qs

This seems like the consensus.

We're missing node module in the table and it sounds like the result of this is that the node module will not throw on "Runtime errors" anymore? This seems somewhat unfortunate as the first thing all consumers will have to do is the same reason the issue was filed in the first place if (lhr.runtimeError) throw new Error(...).

The CLI, LR, and PSI end-behavior all sound great, but given that an error is just an object, can we just attach whatever additional information to it we need?

Refactor PageLoadError into the Runtime Error that it is

This isn't really the case for every pass, some passes it's OK for PageLoadError to just be a gatherer error, and move on and in fact is why we converted it from a fatal error to a non-fatal error. Is the core of what we're getting at here when the root document of the default pass errors we want the run to stop and produce a runtimeError? Because that seems like a more specific problem than all PageLoadErrors Maybe just better discussed on the actual PR though.

Graceful exit with LHR, save an LHR with runtimeError populated

I think a hurdle for me is the idea that you get an "LHR" when there's a fatal runtimeError. WDYT if we treated it exactly like we want the PSI API to work, you don't get back an LHR but an error response with details. I think the fact that you get an LHR is one of the hurdles of the original issue filing and if the node module instead returned an object like {runtimeError: error, lhr: undefined} instead of {lhr: {runtimeError: error}} I could begin to see the argument for not throwing (though I'd still fight it 😃).

@exterkamp
Copy link
Member

So @patrickhulce response made me look more into the LR errors, and I finally found the root cause (#6739)! So I am going to table more advanced error handling (#6671) for later. But I definitely think that PSI error handling needs to be re-evaluated next, but I'm going to leave LH CLI, node and others for later.

@exterkamp
Copy link
Member

Okay picking this back up.

Current state of affairs:

node output=json PSI-API Lightrider Proto
"runtimeError": {"code": "NO_ERROR", "message": ""}, "runWarnings":[] and no runtimeError 😄 No runtimeError and an empty runWarnings ListValue! 😄

This is the result of the proto definition for Lighthouse Errors using NO_ERROR as the default enum of 0, therefore in proto3 they are left blank since it is the default.

Are we satisfied with this outcome? Do we also want to remove NO_ERRORs from the LHR by default (I think the main reason they were there in the beginning was PSI, which doesn't need them anymore).

Generating a set of, what I'm still going to call emergency-artifacts and a bailout-LHR has come up again in #6802. So that might be related to LR error handling as well.

@patrickhulce
Copy link
Collaborator

I'm not really satisfied with the node outcome right now. Eliminating the "runtimeError": {"code": "NO_ERROR"} was the original point of the issue I believe, so seeing that through in all environments seems reasonable.

I think I still stand by my opinions/suggestions in #6336 (comment), so good to see a few months hasn't changed me too much :)

@exterkamp
Copy link
Member

I thought this would be the answer 😉 I'll work on refactoring the node to have no runtimeError.

@connorjclark
Copy link
Collaborator

@exterkamp follow up pr for exit code (#7358 (comment)) then we can close this issue?

@exterkamp
Copy link
Member

Yup afaict I think this is 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants