Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add audit to check if start_url is cached by SW #2040

Merged

Conversation

wardpeet
Copy link
Collaborator

I couldn't reuse the offline gatherer as it checks the current page from being cached by SW or not for the works-offline audit.

I added a start-url gatherer which lives in a pass other than offline as I need the manifest to work which fails if the current requested page is offline.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very clean approach I like it!

return navigationRecord ? navigationRecord.statusCode : -1;
});
return options.driver.goOnline(options)
.then(_ => navigationRecord ? navigationRecord.statusCode : -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏 👏

@@ -0,0 +1,45 @@
/**
* @license
* Copyright 2016 Google Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2017 :)

return super.afterPass(options)
.then(manifest => {
if (!manifest) {
return Promise.reject(`Unable to retrieve manifest at ${options.url}`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it doesn't reject if it couldn't find it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it doesn't reject if it couldn't find it?

I believe this handles the case when the page has no manifest and so Manifest.afterPass returns null. Do we want to say that more explicitly? No web app manifest found on page ${options.url} or something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to check manifest.value to make sure there wasn't a parse error (and reject on manifest.debugString in that case, maybe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I change the manifest gatherer to reject instead?

Code below how it is now

if (!response.data) {
          if (response.url) {
            throw new Error(`Unable to retrieve manifest at ${response.url}`);
          }

          // If both the data and the url are empty strings, the page had no manifest.
          return null;
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I think it should stay as it is there, because not having a manifest for other audits isn't an error, there just isn't one. Not having a manifest here is also arguably not an error, but I'm fine with treating it as one for now

})
.then(_ => options.driver.goOffline())
.then(_ => options.driver.evaluateAsync(
`fetch('${startUrl}').then(response => response.status)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we decide we're ok with this failing for CORS and https redirect issues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we decide we're ok with this failing for CORS and https redirect issues?

for now, at least, start_url is required to be same-origin, so CORS shouldn't be an issue but https redirect could definitely be a problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, at least, start_url is required to be same-origin, so CORS shouldn't be an issue but https redirect could definitely be a problem

actually, I guess whether it was cross domain or http while the start_url was https that would still be cross origin, so manifest.value.start_url would fall back to options.url and have a debugString of 'ERROR: start_url must be same-origin as document'

Copy link
Collaborator Author

@wardpeet wardpeet Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can always check if both urls (start_url & lighthouse url) are the same origin. If they are not I can throw an error as start_url should be same origin as @brendankenny said but don't let the fetch fail I can just throw earlier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point I was thinking of the LH test URL redirecting to a different domain but we're getting this from the manifest.

aside: should we stringify here in case there are non-unicode escaped 's? or should we throw if they have a wonky start_url in the manifest anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should fail if it's a wonky start_url. The manifest audit should fail as well as I guess it's not a valid start_url?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can always check if both urls (start_url & lighthouse url) are the same origin. If they are not I can throw an error as start_url should be same origin as @brendankenny said but don't let the fetch fail I can just throw earlier.

just to be clear, I don't think this gatherer/audit should check this, since ManifestParser already took care of all that. I would say test the start_url and we should pipe through the debugString if there is one notifying if manifestParser noticed anything weird about their manifest

@@ -0,0 +1,105 @@
/**
* Copyright 2016 Google Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2017

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. Thanks for taking this on! Many many people are looking forward to this landing :)

]
},
{
"passName": "redirectPass",
"useThrottling": false,
"gatherers": [
"http-redirect",
"html-without-javascript"
"html-without-javascript",
"start-url",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putting it in this pass seems like it could be problematic if this site doesn't do http->https redirect since the base url would then be the http version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so should I move it to the dbw pass or create a new one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we think it actually could live in the offlinePass. order of all the afterPasses should work!

plus nicer for people that only do PWA stuff. just 2 passes instead of 3.

Copy link
Collaborator Author

@wardpeet wardpeet Apr 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly it can not :( As I need the start url from the manifest and I can't retrieve the manifest unless the page loaded so if I go offline and the page can not be fetched I cannot get the manifest anymore.

Another option is to fail with a debugstring as I could not get the manifest?
For example if you test https://www.chromestatus.com/ offline it won't work as this page is not cached by SW. If you test https://www.chromestatus.com/features directly it will work as I can get the manifest because it's cached by SW

return super.afterPass(options)
.then(manifest => {
if (!manifest) {
return Promise.reject(`Unable to retrieve manifest at ${options.url}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it doesn't reject if it couldn't find it?

I believe this handles the case when the page has no manifest and so Manifest.afterPass returns null. Do we want to say that more explicitly? No web app manifest found on page ${options.url} or something?

afterPass(options) {
let startUrl = options.url;

return super.afterPass(options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kind of horrifying :) What do you think about moving most of the manifest gatherer's afterPass method to driver.getParsedManifest or something? The only awkward part is maybe passing in options.url, but that's not the worst thing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind moving it, I only need the url if I want to parse the manifest. So an option is to get the raw manifest inside driver and let the gatherer parse it with the lib. Which isn't a big problem as the logic is inside another file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this makes sense to me

return Promise.reject(`Unable to retrieve manifest at ${options.url}`);
}

startUrl = manifest.value.start_url.raw;
Copy link
Member

@brendankenny brendankenny Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want manifest.value.start_url.value here and then need to reject if there isn't a value.

We may also want to surface the manifest.value.start_url.debugString (e.g. if it's an invalid URL or cross-origin) and warn that Lighthouse is testing the default URL (options.url in this case) instead.

I wonder if the report should show which start_url we tested regardless, actually, since that could be really annoying if you have an error in your start_url and so we use the default instead, or we're calculating the url relative to the document URL differently than the developer expects

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't test the the original url as it's an other audit that takes care of this (offline.js). I should just fail with extra information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't test the the original url as it's an other audit that takes care of this (offline.js). I should just fail with extra information.

I think we actually need to use it.

For all error cases (missing start_url or invalid URL, etc), the web app manifest spec has the value of start_url fall back to the URL of the document instead, so I think that's what we need to do. As long as the manifest exists and is JSON, there will always be a start_url on the parsed result.

ManifestParser takes care of all that (here), so this code doesn't need to handle any of this, it just might want to surface manifest.value.start_url.debugString as a warning if there was one

Copy link
Member

@brendankenny brendankenny Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then need to reject if there isn't a value

I was actually wrong here. As above, as long as there is some manifest made up of some kind of native JSON, there will be a manifest.value.start_url.value with a valid URL on it.

(It'll just be the document URL if there's not a valid start_url in that JSON)

return super.afterPass(options)
.then(manifest => {
if (!manifest) {
return Promise.reject(`Unable to retrieve manifest at ${options.url}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to check manifest.value to make sure there wasn't a parse error (and reject on manifest.debugString in that case, maybe

})
.then(_ => options.driver.goOffline())
.then(_ => options.driver.evaluateAsync(
`fetch('${startUrl}').then(response => response.status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we decide we're ok with this failing for CORS and https redirect issues?

for now, at least, start_url is required to be same-origin, so CORS shouldn't be an issue but https redirect could definitely be a problem

.then(_ => options.driver.goOffline())
.then(_ => options.driver.evaluateAsync(
`fetch('${startUrl}').then(response => response.status)
.catch(err => -1)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hated keeping the 200/-1 thing from the offline test. Is there more information we could surface here, especially in the case of failure? (definitely ok to punt this to a followup, though)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we could keep the network gatherer alive I could get a network record. If that's the case I can also move this audit back to offline pass. (i'll take this discussing offline with you)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we could keep the network gatherer alive I could get a network record.

as discussed, let's file an issue on this and do it in the future. I think it'll be really useful beyond this gatherer, too

@@ -42,7 +42,8 @@ module.exports = {
{
gatherers: [
'dobetterweb/domstats',
'http-redirect'
'http-redirect',
'start-url',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely need some expectations for the smoke tests as this is going to be a tricky one :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do :)

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks for this! a lot of anxious customers :)

},

{
initialUrl: 'https://www.chromestatus.com/features',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused, why is this being added twice? Maybe the GitHub diff is just misleading

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it twice, the only difference is webapp-install-banner XD
Or maybe a a smoketest is enough?

@@ -675,20 +697,26 @@ class Driver {
});
}

endNetworkCollect() {
endNetworkCollect(opts = {}) {
const pauseAfterLoadMs = (opts.flags && opts.flags.pauseAfterLoad) || PAUSE_AFTER_LOAD;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't love that we have to have this logic twice :/ also seems a little strange that we have to wait for idle again since we could have explicitly timed out trying to wait for network idle the first time, I'm guessing you want to be sure you're not giving up while it's being requested?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also as an aside, the implementation of consistently interactive (#2023) increases the pauseafterload to 5s which would be kinda painful to wait an extra 5s here too

@wardpeet wardpeet force-pushed the feature/1903-start-url-offline branch from 8a56f19 to 085fa2d Compare April 28, 2017 15:09
@paulirish
Copy link
Member

We need to run this new PWA checks including this across a bunch of PWAs to be confident we're happy with this.

@wardpeet wanna run that if we give you like 50 URLs? :)

@wardpeet
Copy link
Collaborator Author

wardpeet commented May 7, 2017

@paulirish sure send me a list! :)

@paulirish
Copy link
Member

@wardpeet just sent you a list

merging this so that we can get more eyes on it (and its results).

Copy link
Member

@paulirish paulirish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything looks in place. this requires more testing so merging a bit earlier than typical so it can get some usage.

@paulirish paulirish merged commit 691157f into GoogleChrome:master May 9, 2017
@wardpeet wardpeet deleted the feature/1903-start-url-offline branch May 11, 2017 18:39
@fchristant
Copy link

I hope this is the right place to offer some feedback on this new check. I'm specifically triggered because yesterday I scored 100% on the PWA check, yet this morning it fails due to this new check. I don't care too much about the score, but the bigger issue is that my PWA no longer classifies as installable.

My SW strategy is very modest and basic. I may extend upon it in the future, but right now my strategy is to simply shows users a dedicated "offline" page when they are disconnected. So naturally, I cache that offline page's content during the install step. My SW simply passes through every network request to the network stack, and only if it fails and it was a HTML request, do I redirect to the cached 'offline' page.

To further satisfy PWA requirements, I'm setting a start_url, to the only value that makes sense in my case: '/'. It does not make sense in my strategy to cache '/', since my SW will never show a cached version of '/'. But I'm caching it anyway, just to see if that passes this check. It doesn't.

Here's my SW code:

https://gist.github.com/fchristant/440ea5077db60fff2a14108ce152ca0b

It seems to me that the error text is misleading in this setup. I do cache the start URL, but the check does not seem to detect it. Does this mean my PWA really does not qualify as installable? Or does it pass and is this a false positive?

@wardpeet
Copy link
Collaborator Author

@fchristant thanks for reaching out! Do you have a live url we can test it on? If not I can get your sw running but a live url is easier to start debugging

@fchristant
Copy link

fchristant commented May 20, 2017

Thanks for the quick reply :) I do have a live URL but it running on a Debian VM from my Windows desktop. The URL will only be accessible as long as my machine is on. Here it is:

https://www.eon.earth/template/empty

As you can see, I'm coding a minimum template here for a new web app. This page pretty much is about testing all the stuff and setup in the HTML header.

I'll be away for the afternoon and return in the evening, yet keeping the machine on. I hope this is enough opportunity for you to have a look. Thanks a lot in advance for any help!

@fchristant
Copy link

I just saw something weird. Not sure if it makes a difference for this issue, but reporting it anyway. This is the valid test URL:

https://www.eon.earth/template/empty

If I run Lighthouse on that URL (from the extension), by the time it is done the URL has become:

https://www.eon.earth/index.php/template/empty

I wonder how that happened? Or whether it matters?

The back-end of this app is PHP CodeIgniter. All URLs go through a front controller which is index.php. However, I have specific URL rewriters in place at Apache to hide this front controller so that the URLs become prettier. I'm just surprised/curious what triggers this strange redirect.

@wardpeet
Copy link
Collaborator Author

It does matter as the url isn't the same anymore and lighthouse loses track of what url to look at. I'll see why the redirect happens

@fchristant
Copy link

fchristant commented May 21, 2017

Just checking in to see if there's any news?

PS: I removed the unwanted redirect on my end. The issue was that the root URL (/) was an invalid HTML page (valid status code and mime type, but not a valid document). As expected for me as my app is under development, yet somehow this throws off Lighthouse.

Without the redirect, the initial issue still stands though.

@fchristant
Copy link

Reporting on a few experiments I did:

Setting my start URL in the manifest to the full absolute path of the root domain (instead of using '/') and matching the exact same path in my cache URL makes no difference. This test is to exclude relative/absolute paths being a factor.

Temporarily setting my 'offline' page to be the root page and responding with that when being offline, makes no difference either.

Being more explicit about named caches makes no difference.

It seems that Lighthouse in this scenario doesn't detect what is in cache at all, no matter what I put in it. Which leaves me wondering...is this strategy fundamentally incompatible with this check, or is the check itself faulty (a false positive)?

It's a highly common recipe that is repeated all over the web in cookbooks, MDN docs, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants