HTML from build sometimes doesn't include content #40

icidasset · 2020-01-06T21:07:26Z

Hey 👋 Thanks for this great project!

I noticed that sometimes when I make a build, the resulting static HTML says Missing content (ie. the content isn't there). Also, sometimes the static–html content is wrong, instead of missing. For example, sometimes I'm getting content for the index page instead of the blog article.

It appears to happen randomly, also tested it with removing the .cache directory.

The text was updated successfully, but these errors were encountered:

dillonkearns · 2020-01-07T16:33:20Z

Hello! Glad you're enjoying the project 😄

I've never seen this behavior, could you try to narrow down how to reproduce it, and give some more context like screenshots, or the files with the missing content, etc. That would be a big help. Thank you!

icidasset · 2020-01-08T17:50:42Z

Sure thing. So for example, the elm-pages blog has the "wrong content" issue as well.

Here's the static html for your https://elm-pages.com/blog/introducing-elm-pages blog post. If you'll look at the title, you'll see it's the wrong one. It's the one of the home page, not the blog post.

If you disable javascript, you can see it renders the home page:

Will look into this a bit more.

PS. When you refresh the elm-pages blog, it says "Missing content" for a second.

icidasset · 2020-01-09T17:25:10Z

Some notes:

Haven't found the determining factor yet and still happens randomly
elm-pages generates a Pages.elm file in both ./gen and ./elm-stuff/elm-pages, is there a reason for that?
All content is present in the generated Pages.elm, but not the resulting HTML.
Could it have something to do with this?

elm-pages/src/Pages/ContentCache.elm

Line 170 in fc6f0e3

-- TODO do I need to handle this case?

dillonkearns · 2020-01-09T17:42:58Z

Hey! Thanks for taking the time to give more context. I have definitely seen the flash of the "Missing content" message, but I don't think I've been able to reproduce the wrong content issue (maybe I'm missing something in the steps for how to reproduce it).

elm-pages generates a Pages.elm file in both ./gen and ./elm-stuff/elm-pages, is there a reason for that?

Yeah, so what that's doing is it runs both as a headless CLI app (Platform.worker is the thing that Elm provides to let you run an app with no view). It copies your elm.json file and tweaks it, and generates a different file for the CLI version. It uses this to generate the manifest.json file based on your configuration, etc. Then it sends that data over to the Webpack process in the JavaScript nodejs stuff... and then after all that, it can generate your assets to run in the actual web version (Browser.application). Hope that helps clarify it a little. But it's totally normal that it generates that file in the elm-stuff folder, and that piece of it only effects the CLI process.

dillonkearns · 2020-01-09T17:44:39Z

All content is present in the generated Pages.elm, but not the resulting HTML.

In the headless CLI version, the content is placed in the generated Pages.elm file.

In the browser version (not the CLI version) of the generated Pages.elm, it doesn't include the content to reduce the bundle size. So all of that content for the body of all the posts is fetched through HTTP requests. But to simplify the CLI process, it's just put straight into the generated file so that we don't need to make HTTP requests to fetch the content. Hope that makes sense!

icidasset · 2020-01-09T18:40:47Z

No worries! The small start-up I work at would love to use elm-pages. This issue is sadly a deal breaker, so I'm doing what I can to help it get fixed ☺️

I don't think I've been able to reproduce the wrong content issue (maybe I'm missing something in the steps for how to reproduce it).

Sorry for not communicating this clearly. But the screenshots I posted above were from your elm-pages website (ie. https://elm-pages.com/blog/introducing-elm-pages).

The issue being here that your blog post ☝️ has the static HTML of the index page. Or, in other words, the rendered HTML file you've put on the server does not contain the actual blog post. Another way you can see this, is when you disable javascript and refresh the page (see screenshot).

dillonkearns · 2020-01-09T19:23:11Z

Hey! Thanks, I appreciate the help in trying to reproduce the issue!

I tried reproducing it with JavaScript and I see what you're talking about. It's pretty strange, I didn't see that when I disabled JavaScript in Brave (it showed the correct page).

Also, if you do a curl command:

curl https://elm-pages.com/blog/introducing-elm-pages/

You can see it has the correct title:

<title>Introducing elm-pages 🚀 - a type-centric static site generator</title>

So I'm not sure what's going on here, it seems like it might be something strange with Firefox? I'm not sure what would cause fetching a simple HTML file with no JavaScript running to have different behavior in different browsers. Any ideas?

dillonkearns · 2020-01-09T19:46:02Z

This is interesting, it looks like Firefox isn't fetching that document from the server with JavaScript turned off. It's getting it from the service worker.

Strange that the service worker is used at all with JS turned off 🤷‍♂

But that might explain why the homepage comes back in that case (and why the behavior is different in other browsers with JS turned off).

What happens is that when the app is offline is that the service worker will always give you the shell application. Because the shell is the same for all pages, and when it's offline then you don't want to store the HTML for each individual page because it has redundant information.

So it seems like turning JS off here is creating a sort of artificial condition that wouldn't happen otherwise. The fact that it serves from the service worker when JS is turned off seems like a strange choice, and like it wouldn't match up with the real behavior for a user that has JS turned off.

But if you have something else in mind, could you describe the use case that you're trying to support here?

Thanks for the discussion, it's a good topic! 😄

dillonkearns · 2020-01-09T19:49:35Z

~~Oh, and by the way if you check the box for "Disable HTTP Cache when toolbox is open)", then it loads the correct page:~~

Scratch that, it doesn't make a difference. But the correct content shows up in Firefox if you do a hard refresh (i.e. Firefox bypasses the service worker when you do a hard refresh).

lukewestby · 2020-01-09T19:50:08Z

If it helps I'm also seeing this at https://sunrisemvmtsb.org in Chrome. I spotted it loading the correct HTML from the server, and then once the client takes over it renders the missing content warning followed by the page content again.

dillonkearns · 2020-01-09T19:55:31Z

If you go to about:serviceworkers in Firefox and click "Unregister" for the elm-pages.com service worker, then it works as expected.

So to summarize, the problem you're encountering is only specifically when both:

JS is disabled, AND
Service workers are enabled

I think this is just an eccentricity of the Firefox dev tools. I can't think of a reason a user would have JS disabled but keep service workers enabled. What do you think?

icidasset · 2020-01-09T20:15:53Z

Oh interesting, I didn't think the service worker would kick in at this point. Damn, caching can get complicated 😅

So, some thoughts here:

I also have issues with JS enabled (most likely, because like you said, service worker runs no matter what)
I got the wrong HTML as well by looking at the page source, I guess the service worker intervenes here as well? The Date header was from a few days ago, so I guess that indicates it's coming from some cache.
Shouldn't the service worker only return the cached content when we're offline? Or is the idea to only hit the server when the content has changed?
I'm guessing that something is wrong with when a cache is invalidated. I should have gotten the proper static html (ie. page source) without having to reset the service worker. Right?
Is the Missing content flash everyone's seeing related my original issue with having Missing content in the html output in dist?

dillonkearns · 2020-01-09T22:50:16Z

Oh man, yeah caching and service worker logic gets really tricky! I've spent so much time digging into it and thinking about how to handle these different cases!

Using fallback of the shell HTML is one of things that I've arrived at as a best practices. Here's a brief description of how to do that with workbox (which is what elm-pages uses under the hood): https://developers.google.com/web/tools/workbox/modules/workbox-routing#how_to_register_a_navigation_route

It's worth noting, I am just rendering the home page here, which is just a simple way of doing that. It should really be a blank page, it's just a lot of work to get it to do that and for not too much benefit. It's on the roadmap, but not at the top of the list, to have the fallback page have an empty body.

Regarding Firefox using the service worker when you have JS turned off... it's pretty odd because, 1) the service worker itself is just a JS process, and 2) the service worker is registered by running this JS code:

<script>
if ("serviceWorker" in navigator) {
        window.addEventListener("load", () => {
          navigator.serviceWorker.register("/service-worker.js");
        });
      } else {
        console.log("No service worker registered.");
      }
</script>

So the only time you would run into that situation is if you first load the page with JS, and then turn off JS through Firefox's dev mode options (I'm guessing turning off JS through other means in Firefox disables service workers? but maybe not?).

Also, yes it is really strange that gets the fallback URL in those cases... seems like the service worker code itself is behaving strangely and is hitting the code for when the site is not reachable. Seems like a Firefox bug to me, because you're not offline so it shouldn't get the fallback, it should go directly to the network.

For the other points:

I also have issues with JS enabled (most likely, because like you said, service worker runs no matter what)

Could you describe those other points more? The service worker stuff is definitely tricky, and if there's something I'm missing I'd love to hear more details. There is one known issue with the service workers, which is that you can't access files from the static and images folders directly because the service worker fallback catches it and serves up the fallback page. I'm working on a fix for this, let's track that particular issue in another thread. Would love to hear about any other behavior you've been seeing!

I got the wrong HTML as well by looking at the page source, I guess the service worker intervenes here as well? The Date header was from a few days ago, so I guess that indicates it's coming from some cache.

Yeah, if you look at the network tab, it tells you where the data is being served from, and you can see that it says it's coming from the service worker cache in these cases.

Shouldn't the service worker only return the cached content when we're offline? Or is the idea to only hit the server when the content has changed?

Yeah, as I went into in the beginning of the message, I believe this is a Firefox bug. Correct me if I'm missing anything, but this workbox code is what I'm using to tell it to use the fallback routing in the case that it's offline. So I guess the bug is either with that workbox code, or with the Firefox No JS setting.

Is the Missing content flash everyone's seeing related my original issue with having Missing content in the html output in dist?
This is definitely a bug in the elm-pages code, and one that is a high priority. @lukewestby mentioned that he might take a look at this one, actually! I created this as a placeholder for that conversation, let's track if further here: Missing Content message flashes on initial load #42.

That's a lot of stuff! Feel free to also DM me on Slack or message in the #elm-pages channel there. If there's more to discuss, I'd be happy to do a video call some time, too! Thanks for all the feedback!

icidasset · 2020-01-10T14:29:54Z

Thanks for the detailed description, appreciate it!
Might be worth writing some of this down in the docs, so people know what to expect? 🤷‍♂

Using fallback of the shell HTML is one of things that I've arrived at as a best practices.

Makes sense yeah 👍

Seems like a Firefox bug to me, because you're not offline so it shouldn't get the fallback, it should go directly to the network.

I did have the problem in Chrome as well. I guess we can confirm this behaviour by looking at the response in the dev tools?

Testing:

I'm looking at https://elm-pages.com/blog/introducing-elm-pages/ in Chrome
Devtools are open and "Disable cache" is activated
When I'm looking at the document in the network tab, the date response header says Wed, 08 Jan 2020 18:09:15 GMT
Status code 200 from service worker

So, I'm online, I shouldn't get the version from the service worker, right? And the date response header should be the current time.

Thanks for all the feedback!

My pleasure ☺️

icidasset · 2020-01-10T14:45:04Z

Looking at the service worker code, I'm guessing this code:

elm-pages/generator/src/service-worker-template.js

Line 3 in 40089da

workbox.precaching.precacheAndRoute(self.__precacheManifest);

should be changed? This will return the cached assets even when the network is available (ie. when not offline).

See https://github.com/GoogleChrome/workbox/blob/194cdeb63d5abb21490f88f01f02f4bcf7e6d54b/packages/workbox-precaching/utils/addFetchListener.mjs#L51

Which is called from https://github.com/GoogleChrome/workbox/blob/194cdeb63d5abb21490f88f01f02f4bcf7e6d54b/packages/workbox-precaching/precacheAndRoute.mjs#L30

Or am I misinterpreting something here? 🤔

dillonkearns · 2020-01-10T16:34:09Z

Interesting, precacheAndRoute was what all of the documentation used. For example, see this note:

**Important**: `workbox.precaching.precacheAndRoute(self.__precacheManifest)` reads a list of URLs to precache from an externally defined variable, `self.__precacheManifest`. At build-time, Workbox injects code needed set `self.__precacheManifest` to the correct list of URLs.

(From this section of the docs: https://developers.google.com/web/tools/workbox/guides/codelabs/webpack#inject).

See also this page which talks about how to precache assets with Workbox: https://developers.google.com/web/tools/workbox/guides/precache-files.

It uses some magic to turn self.__precacheManifest into the list that's injected by Webpack (right now, it's only index.html, but in the future I'm going to expose an API to let you choose a set of pages to precache the data for so it's available offline without visiting it first).

Since it's only adding /index.html to that precache list right now (see

elm-pages/generator/src/develop.js

Line 210 in 815dec7

include: [/^index\.html$/],

), it doesn't seem like that should cause it to do any offline routing for anything other than the root route, right?

icidasset · 2020-01-13T16:25:54Z

Oh interesting. Yeah, you're right, this should indeed work using precacheAndRoute. It'll always come from the service worker, and will only be updated when the revision identifier has changed. My bad sorry 😅

dillonkearns · 2020-01-15T03:15:42Z

Hey @icidasset, no worries, it's a lot to keep track of! I'm definitely not fully understanding things myself, so I appreciate talking through things with you.

icidasset · 2020-01-17T19:32:28Z

I've been going through the original issue and some elm-pages code.

I would guess the issue is that the "prerenderer" sometimes renders the page when the content.json file is still being loaded (as described in #42).

It might be that

elm-pages/index.js

Line 34 in 35fbee6

document.dispatchEvent(new Event("prerender-trigger"));

is called before the content.json has done loading.

What do you think? Could that possibly be the issue?
I'm trying to investigate further when toJsPort is exactly called,
but it's a little much for me to take in 😄

dillonkearns · 2020-01-19T01:12:29Z

Yeah, there's just a ton of functionality in a static site generator framework, it's tricky to keep track of everything that's going on! Especially if you didn't write the code!

The content is already rendered in these cases, you can see that it only calls that prerender-trigger event when it has successfully parsed the content:

https://github.com/dillonkearns/elm-pages/blob/master/src/Pages/Internal/Platform.elm#L406-L411

Also, if you do a simple curl to request the page, I believe you will see the content rendered properly, right? I would check on that first, because if curl gives back the right HTML, then it's something to do with which HTML the browser chooses to serve (due to some caching logic somehow), and not anything to do with the actual pre-rendered HTML content being incorrect.

My impression is still that it's a bug (or strange design decision) with Firefox's behavior when you have 1) a service worker that was previously registered (with JS turned on), and then 2) you turn JS off (using Firefox dev tools). In this case, it appears that Firefox is treating it as if you are offline and not fetching those pre-rendered HTML files from the server, which seems... broken to me.

I'd love clarity on what's going on (even if it means my theory being proven wrong 😄)!

icidasset · 2020-01-19T13:29:55Z

Also, if you do a simple curl to request the page, I believe you will see the content rendered properly, right?

I'm checking the actual file on disk, so no server or Firefox involved here. Screenshot:

dillonkearns · 2020-01-19T16:28:07Z

Oooh... so I think I know what's going on here. Thanks for giving the screenshot and context, that's helpful!

The way the Elm lifecycle works, it calls update before it calls view...

So I think what's happening is:

Cycle 1
update - need to fetch content.json... making HTTP request
view - render "Missing content"

Cycle 2
update - got HTTP response with content.json... storing in Model. Sending port to tell the pre-renderer that it can take a snapshot now
Port is received by JS
Pre-rendering snapshot happens (but the view still shows "Missing content")
view - Renders content, but it's too late

The way to fix this would be by using a MutationObserver, which is something the web platform provides to listen for changes to the DOM. That gets complicated and a little hacky, but it would work. But, I think that the changes I'm working on for #42 will actual also solve this problem So let me try getting a fix out for #42 first, and then let's see if the issue goes away.

I believe it will solve it reliably because I'm going to make sure that the Elm app has the content.json passed in as a flag, so the Elm on the client won't even be initialized until it has content and can correctly render the view 👍

icidasset · 2020-01-21T13:58:15Z

Thanks so much for fixing #42 @dillonkearns! 🙏
Sadly it did not fix this issue though. It now looks like this:

So no more "Missing content" as before, but just an empty body element.
Still happening randomly, as before.

dillonkearns · 2020-01-21T15:02:34Z

Hey @icidasset, thanks for the fast feedback!

Sounds like we'll have to explicitly listen for the DOM to change from Elm's view function being called. I'll try to get a branch that you can test out soon (seems like you're able to reproduce it reliably, I haven't been able to reproduce it on my machine). Thanks for helping to narrow this down!

dillonkearns · 2020-01-28T04:03:44Z

@icidasset since I'm not able to reproduce this issue on my machine, could you try an experiment for me and let me know how it goes?

I just want to prove the theory that the snapshot is being taken before the view function has taken over.

Could you add some code right below this line:

elm-pages/generator/src/develop.js

Line 308 in cae3344

routes: routes,

// you'll need an import like this at the top, too
const Renderer = PrerenderSPAPlugin.PuppeteerRenderer

// and then add this in the options for the PrerenderSpaPlugin
        new PrerenderSPAPlugin({
          staticDir: path.join(process.cwd(), "dist"),
          routes: routes,
          renderer: new Renderer({
        renderAfterTime: 5000 // Wait 5 seconds.
      })
})

You can just take the cloned repo, make that change, and then run npm install -g . from the cloned repo (the . is the current path) to test it out (you may have already used that trick).

After we confirm that, I'll try working on a fix, just want to confirm the theory first. Thank you!

icidasset · 2020-01-28T14:16:56Z

@dillonkearns That seems to fix it 👍

dillonkearns · 2020-01-28T19:56:19Z

@icidasset awesome! Thank you for checking!

Could you try out this branch and see if the fix works? #61

dillonkearns · 2020-01-30T17:41:56Z

Fixed by #62.

dillonkearns added this to To prioritize in Roadmap Jan 26, 2020

dillonkearns moved this from To prioritize to Prioritized in Roadmap Jan 28, 2020

dillonkearns mentioned this issue Jan 28, 2020

Try prerender-trigger event only after body has been updated to make sure Elm has rendered. #61

Merged

dillonkearns moved this from Prioritized to In progress in Roadmap Jan 29, 2020

icidasset mentioned this issue Jan 30, 2020

Properly setup the renderAfterDocumentEvent option #62

Merged

dillonkearns moved this from In progress to Ready for release in Roadmap Jan 30, 2020

dillonkearns closed this as completed Jan 30, 2020

dillonkearns moved this from Ready for release to Released in Roadmap Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML from build sometimes doesn't include content #40

HTML from build sometimes doesn't include content #40

icidasset commented Jan 6, 2020 •

edited

dillonkearns commented Jan 7, 2020

icidasset commented Jan 8, 2020 •

edited

icidasset commented Jan 9, 2020

dillonkearns commented Jan 9, 2020

dillonkearns commented Jan 9, 2020

icidasset commented Jan 9, 2020

dillonkearns commented Jan 9, 2020

dillonkearns commented Jan 9, 2020

dillonkearns commented Jan 9, 2020 •

edited

lukewestby commented Jan 9, 2020 •

edited

dillonkearns commented Jan 9, 2020

icidasset commented Jan 9, 2020 •

edited

dillonkearns commented Jan 9, 2020

icidasset commented Jan 10, 2020 •

edited

icidasset commented Jan 10, 2020

dillonkearns commented Jan 10, 2020

icidasset commented Jan 13, 2020

dillonkearns commented Jan 15, 2020

icidasset commented Jan 17, 2020 •

edited

dillonkearns commented Jan 19, 2020

icidasset commented Jan 19, 2020

dillonkearns commented Jan 19, 2020 •

edited

icidasset commented Jan 21, 2020 •

edited

dillonkearns commented Jan 21, 2020

dillonkearns commented Jan 28, 2020

icidasset commented Jan 28, 2020

dillonkearns commented Jan 28, 2020

dillonkearns commented Jan 30, 2020

HTML from build sometimes doesn't include content #40

HTML from build sometimes doesn't include content #40

Comments

icidasset commented Jan 6, 2020 • edited

dillonkearns commented Jan 7, 2020

icidasset commented Jan 8, 2020 • edited

icidasset commented Jan 9, 2020

dillonkearns commented Jan 9, 2020

dillonkearns commented Jan 9, 2020

icidasset commented Jan 9, 2020

dillonkearns commented Jan 9, 2020

dillonkearns commented Jan 9, 2020

dillonkearns commented Jan 9, 2020 • edited

lukewestby commented Jan 9, 2020 • edited

dillonkearns commented Jan 9, 2020

icidasset commented Jan 9, 2020 • edited

dillonkearns commented Jan 9, 2020

icidasset commented Jan 10, 2020 • edited

icidasset commented Jan 10, 2020

dillonkearns commented Jan 10, 2020

icidasset commented Jan 13, 2020

dillonkearns commented Jan 15, 2020

icidasset commented Jan 17, 2020 • edited

dillonkearns commented Jan 19, 2020

icidasset commented Jan 19, 2020

dillonkearns commented Jan 19, 2020 • edited

icidasset commented Jan 21, 2020 • edited

dillonkearns commented Jan 21, 2020

dillonkearns commented Jan 28, 2020

icidasset commented Jan 28, 2020

dillonkearns commented Jan 28, 2020

dillonkearns commented Jan 30, 2020

icidasset commented Jan 6, 2020 •

edited

icidasset commented Jan 8, 2020 •

edited

dillonkearns commented Jan 9, 2020 •

edited

lukewestby commented Jan 9, 2020 •

edited

icidasset commented Jan 9, 2020 •

edited

icidasset commented Jan 10, 2020 •

edited

icidasset commented Jan 17, 2020 •

edited

dillonkearns commented Jan 19, 2020 •

edited

icidasset commented Jan 21, 2020 •

edited