New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve URL scheme #1205

Closed
andrewdavidwong opened this Issue Sep 22, 2015 · 14 comments

Comments

Projects
None yet
4 participants
@andrewdavidwong
Member

andrewdavidwong commented Sep 22, 2015

Currently, many of our page URLs use CamelCase, which is an artifact of the old TracWiki system:

https://www.qubes-os.org/doc/GettingStarted/
https://www.qubes-os.org/doc/SplitGpg/
https://www.qubes-os.org/doc/Templates/

I think it would look cleaner and more professional if all of our URLs were lowercase and used only the characters a-z, 0-9, -, and possibly _:

https://www.qubes-os.org/doc/getting-started/
https://www.qubes-os.org/doc/split-gpg/
https://www.qubes-os.org/doc/templates/

The website is already set up to handle redirects, so that's not a problem. However, I'm not sure if there's a relatively easy, programmatic way to change all the files. We would want to change the yaml frontmatter from this:


---
layout: doc
title: SplitGpg
permalink: /doc/SplitGpg/
redirect_from:
- "/doc/UserDoc/SplitGpg/"
- "/wiki/UserDoc/SplitGpg/"

---

to this:


---
layout: doc
title: Split GPG
permalink: /doc/split-gpg/
redirect_from:
- "/doc/SplitGpg/"
- "/doc/UserDoc/SplitGpg/"
- "/wiki/UserDoc/SplitGpg/"

---

for each file/page.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 23, 2015

Member

@woju could you help here?

Member

marmarek commented Sep 23, 2015

@woju could you help here?

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Sep 23, 2015

Member
Member

woju commented Sep 23, 2015

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Sep 24, 2015

Member

If we are manipulating URIs, can we also put /en/ somewhere in the path, preferably at the beginning? Currently all pages are in English, but in the future that may change and someone may like to translate Qubes manual.

Yes, good idea.

For example:
https://www.qubes-os.org/doc/en/split-gpg/
Right?

Second question: if we are changing URIs, should we rename source files in the repo to reflect respective URI?

Yes, I was planning on changing, e.g., SplitGpg.md to split-gpg.md.

If answer to the first question is yes, will we include lower-case characters from respective script? (Polish example: ąćęłńóśźż, but not ĄĆĘŁŃÓŚŹŻ)

This I'm not so sure about. Naming source files using non-ASCII characters could cause compatibility issues with certain file systems, couldn't it?

Yes, there can be any time, but results have to be checked manually. Saves keystrokes, but probably not eyegazing. I can provide you with tool, but I don't have time to go through all the pages, so you'd have to promise to point out all the errors which will be left.

No problem, I can do the manual checking.

Member

andrewdavidwong commented Sep 24, 2015

If we are manipulating URIs, can we also put /en/ somewhere in the path, preferably at the beginning? Currently all pages are in English, but in the future that may change and someone may like to translate Qubes manual.

Yes, good idea.

For example:
https://www.qubes-os.org/doc/en/split-gpg/
Right?

Second question: if we are changing URIs, should we rename source files in the repo to reflect respective URI?

Yes, I was planning on changing, e.g., SplitGpg.md to split-gpg.md.

If answer to the first question is yes, will we include lower-case characters from respective script? (Polish example: ąćęłńóśźż, but not ĄĆĘŁŃÓŚŹŻ)

This I'm not so sure about. Naming source files using non-ASCII characters could cause compatibility issues with certain file systems, couldn't it?

Yes, there can be any time, but results have to be checked manually. Saves keystrokes, but probably not eyegazing. I can provide you with tool, but I don't have time to go through all the pages, so you'd have to promise to point out all the errors which will be left.

No problem, I can do the manual checking.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Sep 24, 2015

Member

No problem, I can do the manual checking.

OK. I pushed processed repo to woju/qubes-doc and the tools are in
woju/qubesos.github.io. Check them out and merge as you like.

For example:
https://www.qubes-os.org/doc/en/split-gpg/
Right?

I don't know, probably /en/doc/split-gpg/, because we may like to
translate press releases or whatever. @bnvk, what's your opinion?

Second question: if we are changing URIs, should we rename source files in the repo to reflect respective URI?

Yes, I was planning on changing, e.g., SplitGpg.md to split-gpg.md.

OK. There is a tool in qubesos.github.io/_utils/camel2hyphen.pl which
processes the file path. Didn't do that yet.

If answer to the first question is yes, will we include lower-case
characters from respective script? (Polish example: ąćęłńóśźż, but
not ĄĆĘŁŃÓŚŹŻ)

This I'm not so sure about. Naming source files using non-ASCII
characters could cause compatibility issues with certain file systems,
couldn't it?

I don't know. @marmarek, will the offline docs reside in dom0 with
support for UTF-8, or usb stick with FAT16/32?

Member

woju commented Sep 24, 2015

No problem, I can do the manual checking.

OK. I pushed processed repo to woju/qubes-doc and the tools are in
woju/qubesos.github.io. Check them out and merge as you like.

For example:
https://www.qubes-os.org/doc/en/split-gpg/
Right?

I don't know, probably /en/doc/split-gpg/, because we may like to
translate press releases or whatever. @bnvk, what's your opinion?

Second question: if we are changing URIs, should we rename source files in the repo to reflect respective URI?

Yes, I was planning on changing, e.g., SplitGpg.md to split-gpg.md.

OK. There is a tool in qubesos.github.io/_utils/camel2hyphen.pl which
processes the file path. Didn't do that yet.

If answer to the first question is yes, will we include lower-case
characters from respective script? (Polish example: ąćęłńóśźż, but
not ĄĆĘŁŃÓŚŹŻ)

This I'm not so sure about. Naming source files using non-ASCII
characters could cause compatibility issues with certain file systems,
couldn't it?

I don't know. @marmarek, will the offline docs reside in dom0 with
support for UTF-8, or usb stick with FAT16/32?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Sep 24, 2015

Member

On Thu, Sep 24, 2015 at 12:12:25PM -0700, Wojtek Porczyk wrote:

I don't know. @marmarek, will the offline docs reside in dom0 with
support for UTF-8, or usb stick with FAT16/32?

Most likely some VM. But I'd still avoid non-ASCII characters in file
names and URLs.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Sep 24, 2015

On Thu, Sep 24, 2015 at 12:12:25PM -0700, Wojtek Porczyk wrote:

I don't know. @marmarek, will the offline docs reside in dom0 with
support for UTF-8, or usb stick with FAT16/32?

Most likely some VM. But I'd still avoid non-ASCII characters in file
names and URLs.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@bnvk

This comment has been minimized.

Show comment
Hide comment
@bnvk

bnvk Sep 25, 2015

A big YES to making the URLs nicer and non-camelCased, great call :-)

I'm not sure how to best (and if possible) to do multi language support with Jekyll, unless it's just simple copies of all the markdown files and scoping them inside of sub folders en, de, pl

In which case, if we are going to do the whole site (not just the docs) then /en/doc/split-gpg makes most sense, but if just the docs then /doc/en/split-gpg is preferred, I guess!

bnvk commented Sep 25, 2015

A big YES to making the URLs nicer and non-camelCased, great call :-)

I'm not sure how to best (and if possible) to do multi language support with Jekyll, unless it's just simple copies of all the markdown files and scoping them inside of sub folders en, de, pl

In which case, if we are going to do the whole site (not just the docs) then /en/doc/split-gpg makes most sense, but if just the docs then /doc/en/split-gpg is preferred, I guess!

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Sep 26, 2015

Member

@woju:

OK. I pushed processed repo to woju/qubes-doc and the tools are in woju/qubesos.github.io. Check them out and merge as you like.

Thank you! Could you give me some example commands using these scripts? I'm trying to figure out how to use them on my own, but I haven't been very successful so far.

(I would just use your already-processed repo, but I had to sort a bunch of unsorted doc pages and clean up the developer documentation after you created it.)

Member

andrewdavidwong commented Sep 26, 2015

@woju:

OK. I pushed processed repo to woju/qubes-doc and the tools are in woju/qubesos.github.io. Check them out and merge as you like.

Thank you! Could you give me some example commands using these scripts? I'm trying to figure out how to use them on my own, but I haven't been very successful so far.

(I would just use your already-processed repo, but I had to sort a bunch of unsorted doc pages and clean up the developer documentation after you created it.)

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Sep 28, 2015

Member

@woju:

OK. I pushed processed repo to woju/qubes-doc and the tools are in
woju/qubesos.github.io. Check them out and merge as you like.

Thank you! Could you give me some example commands using these
scripts? I'm trying to figure out how to use them on my own, but
I haven't been very successful so far.

Sure. First of all, cd qubesos.github.io. Then:

find _doc/ -name \*.md | while read file; do echo $file; _utils/rewrite-camel-permalinks.pl < $file > /tmp/relink; cat < /tmp/relink > $file; done

Now all files have permalink: in lower-hyphen convention, however with
some caveats, for example VPN is rewritten v-p-n. Algorithm does not
catch capitalised words. Old permalink is added as first redirect. Now
it has to be checked manually and all permalink can be rewritten (no
need to add another redirect). Then run:

find _doc/ -name \*.md | while read file; do echo $file; _utils/get-redirects.pl < $file; done > /tmp/redirects

Now in /tmp/redirects there is list of all redirects, $redirect_from $permalink, one redirect per line. The filename /tmp/redirects is
important, because it is hardcoded in the next tool. Finally, the command:

find _doc/ -name \*.md | while read file; do echo $file; _utils/redirect-links.pl < $file > /tmp/relink; cat < /tmp/relink > $file; done

It rewrites all links in [link](uri) format.

Order of the commands is important, since rewriting links in page
content depends on redirect_from:, not another regexp. This is to
allow for manual correction (which would have to be done twice and be an
opportunity for error) and to get rid of https -> http redirect at the
same time.

If you need to rewrite just one file, just pipe it to standard input of
respective tool.

Member

woju commented Sep 28, 2015

@woju:

OK. I pushed processed repo to woju/qubes-doc and the tools are in
woju/qubesos.github.io. Check them out and merge as you like.

Thank you! Could you give me some example commands using these
scripts? I'm trying to figure out how to use them on my own, but
I haven't been very successful so far.

Sure. First of all, cd qubesos.github.io. Then:

find _doc/ -name \*.md | while read file; do echo $file; _utils/rewrite-camel-permalinks.pl < $file > /tmp/relink; cat < /tmp/relink > $file; done

Now all files have permalink: in lower-hyphen convention, however with
some caveats, for example VPN is rewritten v-p-n. Algorithm does not
catch capitalised words. Old permalink is added as first redirect. Now
it has to be checked manually and all permalink can be rewritten (no
need to add another redirect). Then run:

find _doc/ -name \*.md | while read file; do echo $file; _utils/get-redirects.pl < $file; done > /tmp/redirects

Now in /tmp/redirects there is list of all redirects, $redirect_from $permalink, one redirect per line. The filename /tmp/redirects is
important, because it is hardcoded in the next tool. Finally, the command:

find _doc/ -name \*.md | while read file; do echo $file; _utils/redirect-links.pl < $file > /tmp/relink; cat < /tmp/relink > $file; done

It rewrites all links in [link](uri) format.

Order of the commands is important, since rewriting links in page
content depends on redirect_from:, not another regexp. This is to
allow for manual correction (which would have to be done twice and be an
opportunity for error) and to get rid of https -> http redirect at the
same time.

If you need to rewrite just one file, just pipe it to standard input of
respective tool.

@marmarek marmarek added the C: doc label Oct 5, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 5, 2015

Member

Is this ticket completed?

Member

marmarek commented Oct 5, 2015

Is this ticket completed?

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Oct 6, 2015

Member

No, I haven't had time to do this yet.

Member

andrewdavidwong commented Oct 6, 2015

No, I haven't had time to do this yet.

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Oct 11, 2015

Member

Thank you, @woju! Your tools were extremely helpful. There are just two more places where we should make changes:

  1. Markdown file names (same as URI change, i.e., from CamelCase to lowercase-hyphen-separated).
  2. Page titles (add spaces in-between capitalized words, e.g., AntiEvilMaid to Anti Evil Maid).

Can your tools be tweaked to make these changes these, as well?

Member

andrewdavidwong commented Oct 11, 2015

Thank you, @woju! Your tools were extremely helpful. There are just two more places where we should make changes:

  1. Markdown file names (same as URI change, i.e., from CamelCase to lowercase-hyphen-separated).
  2. Page titles (add spaces in-between capitalized words, e.g., AntiEvilMaid to Anti Evil Maid).

Can your tools be tweaked to make these changes these, as well?

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Oct 11, 2015

Member

I've made all the changes (except, of course, the ones mentioned in my last message).

However, after looking at the results, I wonder if we really want to prepend /en/ to every subpage. Realistically, how likely is it that any content will get translated? Will enough of it get translated for it to make sense to have both https://www.qubes-os.org/en/ and https://www.qubes-os.org/de/, for example? And even if so, how quickly before the non-English versions go out of date and out of sync with the more current English version? (Also, consider the security implications. If we have only one <language> speaker in the community who volunteers to translate the pages into <language>, we may not be able to tell whether misinformation is being inserted (deliberately or mistakenly) into the translated pages.)

I can certainly see the benefit of going with /en/doc/ rather than /doc/en/, as @woju suggested, because it allows us to translate things like press releases. But one of the main disadvantages is that now every page with any kind of English on it (in other words, every page) gets redirected from the bare URL to the /en/ version. So, for example, any external site which links to https://www.qubes-os.org/downloads/ is getting redirected to https://www.qubes-os.org/en/downloads/.

So, even if we stick with using /en/ for some pages, it probably makes sense to exempt certain pages, such as:

/
/downloads/
/hcl/
/screenshots/
/people/
Member

andrewdavidwong commented Oct 11, 2015

I've made all the changes (except, of course, the ones mentioned in my last message).

However, after looking at the results, I wonder if we really want to prepend /en/ to every subpage. Realistically, how likely is it that any content will get translated? Will enough of it get translated for it to make sense to have both https://www.qubes-os.org/en/ and https://www.qubes-os.org/de/, for example? And even if so, how quickly before the non-English versions go out of date and out of sync with the more current English version? (Also, consider the security implications. If we have only one <language> speaker in the community who volunteers to translate the pages into <language>, we may not be able to tell whether misinformation is being inserted (deliberately or mistakenly) into the translated pages.)

I can certainly see the benefit of going with /en/doc/ rather than /doc/en/, as @woju suggested, because it allows us to translate things like press releases. But one of the main disadvantages is that now every page with any kind of English on it (in other words, every page) gets redirected from the bare URL to the /en/ version. So, for example, any external site which links to https://www.qubes-os.org/downloads/ is getting redirected to https://www.qubes-os.org/en/downloads/.

So, even if we stick with using /en/ for some pages, it probably makes sense to exempt certain pages, such as:

/
/downloads/
/hcl/
/screenshots/
/people/

@marmarek marmarek added C: website and removed C: doc labels Oct 12, 2015

woju added a commit to woju/qubesos.github.io that referenced this issue Oct 13, 2015

woju added a commit to woju/qubesos.github.io that referenced this issue Oct 13, 2015

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Oct 13, 2015

Member

On Sun, Oct 11, 2015 at 12:51:12AM -0700, Axon wrote:

I've made all the changes (except, of course, the ones mentioned in my last message).

Here are the tools for you: woju/qubesos.github.io@master.

Usage: go to qubesos.github.io/_doc (it is important to be in this
directory) and launch ../_utils/rename_to_permalink.py (no shell loop
this time...). This script will "git mv" on every *.md file it will find
beneath the directory, based on last segment of permalink.

The tool will not rename directories, it will just point them out as
warnings. Currently there are only four, so that can be done manually.

I didn't wan't to automagically change titles, because some of them are
wrong anyway, so the second tool, ../_utils/find_camel_title.py, will
only list all the files which has wrong title. Please change them
manually, maybe to something other, like equivalent of

inside.

However, after looking at the results, I wonder if we really want to
prepend /en/ to every subpage. Realistically, how likely is it that
any content will get translated? Will enough of it get translated for
it to make sense to have both https://www.qubes-os.org/en/ and
https://www.qubes-os.org/de/, for example? And even if so, how
quickly before the non-English versions go out of date and out of sync
with the more current English version?

Because of this problem I don't think putting /en as main directory in
_doc was good idea. I think keeping everything in one tree and just
naming things *.en.md, *.pl.md etc would be abetter idea, because then
would be easier to see which files went out of sync just in github tree
listing.

(Also, consider the security implications. If we have only one
<language> speaker in the community who volunteers to translate the
pages into <language>, we may not be able to tell whether
misinformation is being inserted (deliberately or mistakenly) into the
translated pages.)

I don't know, but it seems a valid concern. @rootkovska, what do you
think about this? Maybe you could appoint maintainer of each language
version, who will be personally responsible?

I can certainly see the benefit of going with /en/doc/ rather than
/doc/en/, as @woju suggested, because it allows us to translate
things like press releases. But one of the main disadvantages is that
now every page with any kind of English on it (in other words, every
page) gets redirected from the bare URL to the /en/ version. So, for
example, any external site which links to
https://www.qubes-os.org/downloads/ is getting redirected to
https://www.qubes-os.org/en/downloads/.

So, even if we stick with using /en/ for some pages, it probably makes sense to exempt certain pages, such as:

/downloads/
/hcl/
/screenshots/
/people/

Downloads should be localised more than anything else: there should be
big green „Download” button in as many languages as possible. As to HCL
and people/team, I don't know. As for screenshots, @bnvk, could you
weight in?

Member

woju commented Oct 13, 2015

On Sun, Oct 11, 2015 at 12:51:12AM -0700, Axon wrote:

I've made all the changes (except, of course, the ones mentioned in my last message).

Here are the tools for you: woju/qubesos.github.io@master.

Usage: go to qubesos.github.io/_doc (it is important to be in this
directory) and launch ../_utils/rename_to_permalink.py (no shell loop
this time...). This script will "git mv" on every *.md file it will find
beneath the directory, based on last segment of permalink.

The tool will not rename directories, it will just point them out as
warnings. Currently there are only four, so that can be done manually.

I didn't wan't to automagically change titles, because some of them are
wrong anyway, so the second tool, ../_utils/find_camel_title.py, will
only list all the files which has wrong title. Please change them
manually, maybe to something other, like equivalent of

inside.

However, after looking at the results, I wonder if we really want to
prepend /en/ to every subpage. Realistically, how likely is it that
any content will get translated? Will enough of it get translated for
it to make sense to have both https://www.qubes-os.org/en/ and
https://www.qubes-os.org/de/, for example? And even if so, how
quickly before the non-English versions go out of date and out of sync
with the more current English version?

Because of this problem I don't think putting /en as main directory in
_doc was good idea. I think keeping everything in one tree and just
naming things *.en.md, *.pl.md etc would be abetter idea, because then
would be easier to see which files went out of sync just in github tree
listing.

(Also, consider the security implications. If we have only one
<language> speaker in the community who volunteers to translate the
pages into <language>, we may not be able to tell whether
misinformation is being inserted (deliberately or mistakenly) into the
translated pages.)

I don't know, but it seems a valid concern. @rootkovska, what do you
think about this? Maybe you could appoint maintainer of each language
version, who will be personally responsible?

I can certainly see the benefit of going with /en/doc/ rather than
/doc/en/, as @woju suggested, because it allows us to translate
things like press releases. But one of the main disadvantages is that
now every page with any kind of English on it (in other words, every
page) gets redirected from the bare URL to the /en/ version. So, for
example, any external site which links to
https://www.qubes-os.org/downloads/ is getting redirected to
https://www.qubes-os.org/en/downloads/.

So, even if we stick with using /en/ for some pages, it probably makes sense to exempt certain pages, such as:

/downloads/
/hcl/
/screenshots/
/people/

Downloads should be localised more than anything else: there should be
big green „Download” button in as many languages as possible. As to HCL
and people/team, I don't know. As for screenshots, @bnvk, could you
weight in?

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Oct 14, 2015

Member

Here are the tools for you: woju/qubesos.github.io@master.
[...]

Thank you, @woju!

Because of this problem I don't think putting /en as main directory in _doc was good idea. I think keeping everything in one tree and just naming things *.en.md, *.pl.md etc would be abetter idea, because then would be easier to see which files went out of sync just in github tree listing.

I agree with not having everying in en/, but I wonder if we should just leave the current files as just .md, then add language codes to any translated files.

Then there's the question of how the language should be represented in the URL. I'd like to get people's opinions about the pros and cons of these two options:

de.qubes-os.org/page/

vs.

qubes-os.org/de/page/

(The first way is how Wikipedia handles it.)

Also, one possibility is to have the English version without any language code, then insert a language code for any translated pages (similar to what I said above about the .md files).

Member

andrewdavidwong commented Oct 14, 2015

Here are the tools for you: woju/qubesos.github.io@master.
[...]

Thank you, @woju!

Because of this problem I don't think putting /en as main directory in _doc was good idea. I think keeping everything in one tree and just naming things *.en.md, *.pl.md etc would be abetter idea, because then would be easier to see which files went out of sync just in github tree listing.

I agree with not having everying in en/, but I wonder if we should just leave the current files as just .md, then add language codes to any translated files.

Then there's the question of how the language should be represented in the URL. I'd like to get people's opinions about the pros and cons of these two options:

de.qubes-os.org/page/

vs.

qubes-os.org/de/page/

(The first way is how Wikipedia handles it.)

Also, one possibility is to have the English version without any language code, then insert a language code for any translated pages (similar to what I said above about the .md files).

marmarek added a commit to QubesOS/qubesos.github.io that referenced this issue Oct 14, 2015

autoupdate: _doc
_doc:
    tag axon_6deb63eb
    tagger Axon <axon@openmailbox.org> 1444793563 +0000

    Tag for commit 6deb63ebcb56d3f4cda69543b8da8128edf34b53
    gpg: Signature made Wed 14 Oct 2015 05:32:43 AM CEST using RSA key ID 2A019A17
    gpg: Good signature from "Axon (Qubes Documentation Signing Key)"

    6deb63e Update page titles from CamelCase to lowercase-hyphen-separated (closes QubesOS/qubes-issues#1205)

woju added a commit to woju/qubesos.github.io that referenced this issue Nov 26, 2015

woju added a commit to woju/qubesos.github.io that referenced this issue Nov 26, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment