Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop .html from URLs? #121

Closed
trieloff opened this issue May 8, 2020 · 43 comments
Closed

Drop .html from URLs? #121

trieloff opened this issue May 8, 2020 · 43 comments
Labels
question Further information is requested

Comments

@trieloff
Copy link
Contributor

trieloff commented May 8, 2020

Some time ago, @davidnuescheler made an offhand remark that we made a mistake when we included the extension in the URL, because people get confused by it, as it is exceedingly uncommon. Also it seems old-fashioned, but not in a cute, ironic cgi-bin-way.

So, should be drop extensions from the URL, and if yes, how?

@trieloff trieloff added the question Further information is requested label May 8, 2020
@rofe
Copy link
Contributor

rofe commented May 10, 2020

Hmm. Call me old-fashioned and biased 😜, but I don't think dropping extensions would be a very good idea. I personally like the explicitness of it a lot, i.e. the comfort of knowing that the content delivered from a URL ending in .html will be HTML, while .json will be JSON etc. I'd also like to point out that we already support extensionless requests in the form of directory listing (i.e. a request to /foo where foo is a directory will redirect to /foo/index.html internally).

If the goal here is to make the non-technical end user's life easier, we might add an internal redirect from any URL without an extension to the same URL with .html, as long as there isn't a folder with the same name. So /foo would either look for a directory 'foo' and deliver /foo/index.html, or /foo.html.

@tripodsan
Copy link
Contributor

well... technically, only /foo/ should serve /foo/index.html.

@trieloff
Copy link
Contributor Author

My natural inclination would be to agree with you, @rofe and say that the rest of the world that prefers extension-less URLs is wrong. There are two things that give me pause:

  1. Our manifesto which says something along the lines of "intuitiveness eats consistency for breakfast"
  2. My experience telling me that disagreeing with David not on the basis of facts, but on principles, typically works out in favor of David, not principles.

I think I'm going to make a PR in helix-dispatch, so that

  1. we can decide how much we hate it once we have it on our hands
  2. it gives me another emotional reason for liking the proposal

@kptdobe
Copy link
Contributor

kptdobe commented May 11, 2020

Why does it have to be exclusive, why we cannot support both ?
Also, we should be careful with the selectors which would look like an... extension if no .html :)

@trieloff
Copy link
Contributor Author

trieloff commented May 11, 2020

adobe/helix-dispatch#247 does not support selectors in extension-less paths. And it is entirely additive, i.e. existing URLs work as they did before. You can now leave out .html, but if you prefer it, it will still work.

@tripodsan
Copy link
Contributor

I don't like this and I'd rather have at least a redirect to a resource that is less ambigous.
I wonder what @royfielding thinks about this :-)

Also, @davidnuescheler do you still think this is a good idea?

@tripodsan
Copy link
Contributor

we made a mistake when we included the extension in the URL, because people get confused by it, as it is exceedingly uncommon.

do you know people that still enter urls manually?

@auniverseaway
Copy link
Member

Is it old-fashioned? Yes. Does that mean it's bad? No.

For consistency, I would argue .html is not consistent. No {reasonable} site does www.mykillersite.com/index.html. It's always www.mykillersite.com with no extension and no resource name.

I do know many devs in the wild that consider it to be a sign of old technology.

My heart says leave .html my brain says we should get with the times.

@tripodsan
Copy link
Contributor

I think that serving content from /whatever is of course important. especially for the root directory :-) also i see that people don't want to create empty folders for allowing url w/o extensions. but the mix would probably be a bit confusing:

/index.md 
/about.md
/contact.md
/products
  /index.md
  /shipping.md
  /mens
   /index.md
.

I just think that for helix, it's quite expensive to test all possible resources. and to make it more stable, I would prefer to send a redirect to the resolved resources, rather than serving it from the non-extension url.

@royfielding
Copy link
Contributor

I think we usually have to end up supporting both, since some sites will want extensions and others won't.

A problem with not having extensions is that a site needs to accurately provide the right media type on responses. The problem with having extensions is that a page becomes associated with the type normally associated with that extension, which means some tools/browsers will obey the type and others will obey the extension.

Personally, I prefer sites that separately identify computed resources from their source/template files. Typically that means non-extension URIs and extensions on templates/source files.

Given the mount of generation we are doing, it might make better performance sense to generate all routes when the source files change, rather than fail backwards. In other words, generate a b-tree or hash table of the site's valid URI-space rather than path traverse on each request.

@rofe
Copy link
Contributor

rofe commented May 12, 2020

Myabe we could do the mapping in Fastly instead of Runtime?

@trieloff
Copy link
Contributor Author

Myabe we could do the mapping in Fastly instead of Runtime?

The implementation in adobe/helix-dispatch#247 is simply trying a hard-coded /foo.html when /foo/index.html isn't there. If we would want to support a number of fallbacks, i.e. /foo.html, then /foo.json then we would need to do some normalization of the Accept header in Fastly, so that we get the order right.

@rofe
Copy link
Contributor

rofe commented May 12, 2020

I don't think we would need any fallback other than .html. We can still rely on app devs to append .json, right? 😉

@davidnuescheler
Copy link
Contributor

Myabe we could do the mapping in Fastly instead of Runtime?

haha, i just got tired of sending people links that end in .html which really i haven't seen on real websites in a very long time... so i quickly added this to my outer cdn code...

      let pathname=url.pathname;
      if (pathname.indexOf('.')==-1) {
        if (pathname.endsWith('/')) {
          pathname+='index';
        } 
        pathname+='.html';
      }

and it works beautifully, see here:
https://thinktanked.org/pretzels

which makes me agree with @rofe that this is easy and fast to add on the edge without adding any additional requests.

.ps: this also supports the distinction between /foo and /foo/ which both may have merit.

@tripodsan
Copy link
Contributor

I would love such a pragmatic approach. with the current (intended) implementation we would try to load:

  • /foo.html
  • /foo/index.html
  • /foo/default.html

from 3 locations (pipeline, content, static).... which results in 9 requests and a gazillions of activations....

if we could remove this logic from the dispatcher, and only use index.html as fallback for /foo/, that would make the implementation leaner and less expensive.

@tripodsan
Copy link
Contributor

(who wanted the default.html fallback anyways?)

@royfielding
Copy link
Contributor

royfielding commented May 15, 2020

Just be sure that any change in the path hierarchy is done as a redirect to the client, since otherwise the relative links will break. That's why httpd redirects directory name to name/ instead of just sending the content.

@tripodsan
Copy link
Contributor

tripodsan commented May 21, 2020

so I suggest:

  • handle all mappings in the CDN
  • / -> /index.html
  • /foo -> /foo.html
  • /foo/ -> /foo/index.html
  • drop the fallback for default.html
  • drop the fallback for README.html

@davidnuescheler are you ok with this?

the only question remains:

  1. do we want real redirects (the url changes in the browser) or
  2. internal redirects

(I'm for real redirects)

@rofe
Copy link
Contributor

rofe commented May 21, 2020

  • /foo -> /foo.html

The only thing I would add - for backward compatibility and convenience - is a fallback from /foo -> /foo/index.html (or /foo) if /foo.html returns 404. Example: https://theblog-adobe.hlx.page/fr

I'm on the fence regarding the redirects, slightly leaning towards internal...

@tripodsan
Copy link
Contributor

The only thing I would add - for backward compatibility and convenience - is a fallback from /foo -> /foo/index.html (or /foo) if /foo.html returns 404. Example: https://theblog-adobe.hlx.page/fr

well, my goal is to avoid internal fallbacks and have the entire logic on the CDN with simple redirects.

the project can still add a fr.html with a javascript redirect to /fr/ if they really want.

@davidnuescheler
Copy link
Contributor

i think the README fallback can easily be dropped, the fallback to the default is something that we probably would want to have some solution for. the workaround for that would simply be a query string based solution.

The only thing I would add - for backward compatibility and convenience - is a fallback from /foo -> /foo/index.html (or /foo) if /foo.html returns 404. Example: https://theblog-adobe.hlx.page/fr

i don't think there is material backwards compatibility there, and the convenience in code to go to /fr instead /fr/ is relatively immaterial... if this is a vanity URL it can also easily be handled as a redirect.

@trieloff
Copy link
Contributor Author

The / -> README.html fallback is super important for the OOTB experience. If you start with a fresh GitHub repo, you will have a README.md, but not an index.md.

I wouldn't want to drop that.

@trieloff
Copy link
Contributor Author

What would be the query string based solution for default.html? I think it would just lead to overloading the 404.html page (at least in Helix Pages) and would make it very hard to contain the logic by page type or path.

@trieloff
Copy link
Contributor Author

And I think that 30x redirects to index.html or foo.html are not just jarring and surprising to the user, but also defeat the purpose of having short, extension-less URLs.

URLs are far more often copied from the browser than written up from scratch, so if you redirect from /foo to /foo.html, then /foo.html will be what's in the browser, will be what's copied to the email, will be what will show up in the PowerPoint and will be what confuses the hell out of some HiPPO, which will lead the project team trying to configure it away.

For /foo/ the story is even worse: it becomes the longer, more confusing (to non technical users, i.e. visitors) /foo/index.html.

@trieloff
Copy link
Contributor Author

Sorry to disagree with all of you on every point, but I see the number of activations as a minor detail (as long as they are not in sequence, thus visible as latency), but the URL space as a super important aspect of the UI, and one where we made every addition so far with good reason.

@tripodsan
Copy link
Contributor

And I think that 30x redirects to index.html or foo.html are not just jarring and surprising to the user, but also defeat the purpose of having short, extension-less URLs.

fine for me.

For /foo/ the story is even worse: it becomes the longer, more confusing (to non technical users, i.e. visitors) /foo/index.html.

als long as we don't internally redirect /foo to /foo/index.html we're good. as roy mentioned above, this would lead to broken relative inks.

@rofe
Copy link
Contributor

rofe commented May 22, 2020

The / -> README.html fallback is super important for the OOTB experience

I agree, but I think it should stay an external redirect for transparency. If customers don't want to go live with a /README.html homepage, they can (and should) add an index.md or index.html.

@davidnuescheler
Copy link
Contributor

davidnuescheler commented May 22, 2020

i agree that the mapping should not be external redirects.

i also agree that we need to be careful with the URL as an important part of the user experience. adding to that, i think we should have been more careful with the URL space in the past and the multitude of fallbacks actually make it more complicated than needed.

while i generally agree that the number of activations should not be our sole guiding principles, i think it is important that we are careful and prudent with all resources, even if they seem "free".

having said that, here are some things to consider

  • i have not seen a project that uses the / -> README.md, nor do i think that a current out of the box (helix pages) experience would be if that's the case, especially as in every single case that i showed someone an ootb experience started with an fstab.yaml mounting a google drive into /

  • for the default.md the query string based solution would not require any particular support from helix. considering that we primarily conceived this on a concept that we were looking at for something like the topics on the blog, it could easily be served with an index.md in the topics directory that would take a query parameter of ?topic=<mytopic> or even ?<mytopic> which would change the URL from /topics/<mytopic> to /topics/?<mytopic> or /topics/?topic=<mytopic>. agreed that it is mildly less desirable, but is this small difference that worth the extra cost and complexity on every single request? also, we never implemented the default.md or similar in AEM.

@trieloff
Copy link
Contributor Author

i have not seen a project that uses the / -> README.md, nor do i think that a current out of the box (helix pages) experience would be if that's the case, especially as in every single case that i showed someone an ootb experience started with an fstab.yaml mounting a google drive into /

What I've been thinking of in terms of OOTB experience is what we have on https://www.hlx.page:

Welcome to Helix Pages!

To use it, change the current URL to https://<repo>-<owner>.project-helix.page. <owner> and <repo> must refer to a valid Git repository.

Example: https://helix-home-adobe.project-helix.page/README.html

Ironically, this feature seems to be broken or disabled for Helix Pages, resulting in the ugly URL.

If we say the OOTB experience does start with fstab, then we need to polish the path to get there, so that it is as easy as entering a GitHub URL in a text field.

Anyway, the directory index lookup order is configurable, and if you all want to change the default from index.html,README.html to index.html (https://github.com/adobe/helix-dispatch/blob/20be5386cd3dfcaa9f8c5da244bff919f701e645/src/fetchers.js#L365) then I'll shut up. If you want me to agree to dropping it entirely, or building an external redirect for README.html as @tripodsan suggested, then we have to keep arguing.

@trieloff
Copy link
Contributor Author

for the default.md the query string based solution would not require any particular support from helix. considering that we primarily conceived this on a concept that we were looking at for something like the topics on the blog, it could easily be served with an index.md in the topics directory that would take a query parameter of ?topic=<mytopic> or even ?<mytopic> which would change the URL from /topics/<mytopic> to /topics/?<mytopic> or /topics/?topic=<mytopic>. agreed that it is mildly less desirable…

Mildly less desirable in the same way as a pizza that fell on the floor, face-down. It might taste the same, but your guests will still think there's something weird about it. URLs with question marks are by definition more questionable than those without.

There is another aspect of this: the client-side fallback logic would need to be duplicated for every kind of resource that can have a fallback, e.g. topics, products, authors. So I'd say the default fallback takes a lot of complexity away from the project, at minimal complexity cost for us, and only moderate (and minimizable) runtime overhead.

, but is this small difference that worth the extra cost and complexity on every single request?

The extra complexity: definitely. Let's see what we can do about the extra cost and "every single request" aspect.

The default.html requests are easily cacheable, so they won't have to happen for every request, only for every request to a so far unseen directory.

But maybe we can find a way to create URLs that enable dynamic default feature selectively, so that we can further reduce it:

/topics*/creativity.html
/topics/*creativity.html
/topics/creativity.html*

We could pick a different delimiter, but * seems to be well understood as "slightly special" URLs.

also, we never implemented the default.md or similar in AEM.

I think we did. If /topics/creativity does not exist, but /topics does, then a request to /topics/creativity would resolve with /topics as the content and /creativity as a path extra. But even if we didn't, AEM developers have plenty of ways of intercepting the URL resolution and implementing placeholder content. Helix has different constraints and does benefit from an explicit solution.

@trieloff
Copy link
Contributor Author

trieloff commented May 25, 2020

Just spoke with @davidnuescheler and came up with an alternative implementation that would allow us to handle the default.md use case at much lower cost:

In fstab.yaml, we would have something like this:

mountpoints:
  /authors: https://adobe.sharepoint.com/sites/TheBlog/Shared%20Documents/theblog/authors?fallback=default-author.docx
  /topics: https://adobe.sharepoint.com/sites/TheBlog/Shared%20Documents/theblog/topics?fallback=default-topic.docx
  /: https://adobe.sharepoint.com/sites/TheBlog/Shared%20Documents/theblog

We then push the fallback handling either into helix-content-proxy or helix-word2md and have a very small surface where we still need to make additional requests.

@tripodsan
Copy link
Contributor

We then push the fallback handling either into helix-content-proxy or helix-word2md and have a very small surface where we still need to make additional requests.

I suggest to handle it in the content-proxy. also, maybe it's time to convert/allow the fstab to be a list of objects again:

mountpoints:
  - root: /authors
    source: https://adobe.sharepoint.com/sites/TheBlog/Shared%20Documents/theblog/authors
    fallback: default-author.docx
  - root:  /topics
    source: https://adobe.sharepoint.com/sites/TheBlog/Shared%20Documents/theblog/topics
    fallback: default-topic.docx
  - root:  /
    source: https://adobe.sharepoint.com/sites/TheBlog/Shared%20Documents/theblog

this wold also allow better ordering of the resolution order and allow multiple roots:

mountpoints:
  - root:  /
    source: github
  - root: /
    source: https://adobe.sharepoint.com/sites/TheBlog/Shared%20Documents/theblog/authors

@trieloff
Copy link
Contributor Author

I'm not sure what multiple roots would give us, other than complexity and lengthy explanations, but I can agree that a mount-point entry could be an object (in addition to a string), this would not break compatibility.

@tripodsan
Copy link
Contributor

I'm not sure what multiple roots would give us

maybe it's out of the context here, but we discussed somewhere if the the pipeline (or content-proxy) should first load the md from github or the external, and/or if it should fallback.
having multiple roots would be one way to let the user configure the lookup order.

@trieloff
Copy link
Contributor Author

trieloff commented May 26, 2020

I suggest a quick vote to see if we have consensus. Just use 👍 and 👎

We will then create separate PRs for each item

@tripodsan
Copy link
Contributor

tripodsan commented May 26, 2020

@trieloff ^^^ this also implies:

trieloff added a commit to adobe/helix-dispatch that referenced this issue Jun 26, 2020
BREAKING CHANGE: As discussed in adobe/helix-home#121 (comment) `README.html` is no longer a default directory index. Either rename your `README.md` to `index.md` or set the [`directoryIndex`](https://github.com/adobe/helix-shared/blob/master/docs/strains-definitions-anystrain-oneof-runtime-strain.md#directoryIndex) property in your strain config to `index.html,README.html` to restore the old behavior

fixes #268
tripodsan pushed a commit to adobe/helix-dispatch that referenced this issue Jul 1, 2020
BREAKING CHANGE: As discussed in adobe/helix-home#121 (comment) `README.html` is no longer a default directory index. Either rename your `README.md` to `index.md` or set the [`directoryIndex`](https://github.com/adobe/helix-shared/blob/master/docs/strains-definitions-anystrain-oneof-runtime-strain.md#directoryIndex) property in your strain config to `index.html,README.html` to restore the old behavior

- fixes #267 
- fixes #268
- fixes #269
adobe-bot pushed a commit to adobe/helix-dispatch that referenced this issue Jul 1, 2020
# [4.0.0](v3.2.25...v4.0.0) (2020-07-01)

### Bug Fixes

* **path:** changes to extension less request fallbacks ([8ea8b65](8ea8b65)), closes [#267](#267) [#268](#268) [#269](#269)

### BREAKING CHANGES

* **path:** As discussed in adobe/helix-home#121 (comment) `README.html` is no longer a default directory index. Either rename your `README.md` to `index.md` or set the [`directoryIndex`](https://github.com/adobe/helix-shared/blob/master/docs/strains-definitions-anystrain-oneof-runtime-strain.md#directoryIndex) property in your strain config to `index.html,README.html` to restore the old behavior
@rofe
Copy link
Contributor

rofe commented Jul 2, 2020

Looks like adobe/helix-dispatch#267 isn't enough yet for a URL like https://theblog-adobe.hlx.page/en/publish/2020/06/10/listening-learning-and-taking-action to work... do we need adobe/helix-content-proxy#32 - the final piece of the puzzle - for that?

@tripodsan
Copy link
Contributor

I think that helix-pages wasn't deployed since we updated helix-publish to use dispatch V4. looking at
https://acapt.adobeio-static.net/helix-tracedebug-latest/index.html#https://theblog-adobe.hlx.page/en/publish/2020/06/10/listening-learning-and-taking-action

shows me, that still the 3.x is used:

image

@trieloff
Copy link
Contributor Author

trieloff commented Jul 3, 2020

Yes,we need to bump the version in helix-cli, too.

@trieloff
Copy link
Contributor Author

trieloff commented Jul 3, 2020

adobe/helix-cli#1460

@tripodsan
Copy link
Contributor

implemented.

@rofe
Copy link
Contributor

rofe commented Jul 9, 2020

adobe/helix-content-proxy#32 is still open ☝️

@rofe rofe reopened this Jul 9, 2020
@tripodsan
Copy link
Contributor

the fallbacks to github have nothing to do with the extension to .html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants