Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Permalink/Permaview URLs #7729

Merged
merged 13 commits into from Nov 21, 2023

Conversation

CrossEye
Copy link
Contributor

Closes #7728

We create new functions for encoding and decoding the fragment (hash) portion of a URL, so that what would previously have been encoded as https://tiddlywiki.com/#Foo%20Bar%20Baz can now be encoded as https://tiddlywiki.com/#Foo+Bar+Baz and https://tiddlywiki.com/#Foo%20Bar%20Baz:%5B%5BFoo%20Bar%20Baz%5D%5D%20Qux%20%5B%5BCorge%20Grault%5D%5D can become https://tiddlywiki.com/#Foo+Bar+Baz:Foo+Bar+Baz&Qux&Corge+Grault.

This does not try to capture any additional punctuation. If it appears in a tiddler title, it will still be percent-encoded. There is a reasonably strong argument for keeping intact any punctuation the specification allows, excepting the +, :, and , this format claims. That would be a fairly simple extension of this PR.

Note that this is intended to not break any existing permalinks/permaviews. Those should continue to work as they have. There is one minor unavoidable caveat there. Although TiddlyWiki would not have generated a perma-link/view that included a + sign, it's possible that someone hand-crafted such a url for a tiddler with a + in its title. That link would no longer work.

Open Questions

  • Do we want to extend to allowing other characters? All the following characters could be added without breaking the implementation or the spec: -, ., _, ~, !, $, ', (, ), *, ;, =, and@. This would mean many fewer ugly percent-encoded characters?
  • Should we switch the separator for the list portion from the likely-to-appear in titles, & to the much-less-likely one of ~?
  • What other formats of fragment identifier does TW support? This only tested for the target and a title list. I have vague memories of seeing other filters used as fragment identifiers in the URL. Am I remembering correctly? If so, how are those formatted.
  • How well does this match TW's coding standards? I don't think I've added any linting errors, although that's hard to tell at the moment. But my personal style uses many more modern JS features than the code-base seems to. Some of those may have slipped in. One specific question: does this project still try to support ES3? Do I need to replace Array.prototype.map and Array.prototype.filter with other implementations?
  • Do you have other suggestions for a contributor new to the code-base?

@vercel
Copy link

vercel bot commented Sep 10, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
tiddlywiki5 ✅ Ready (Inspect) Visit Preview Nov 20, 2023 11:29pm

boot/boot.js Outdated Show resolved Hide resolved
boot/boot.js Outdated Show resolved Hide resolved
@pmario
Copy link
Contributor

pmario commented Sep 10, 2023

  • Do we want to extend to allowing other characters? All the following characters could be added without breaking the implementation or the spec: -, ., _, ~, !, $, ', (, ), *, ;, =, and@. This would mean many fewer ugly percent-encoded characters?

I would be in favour of more readable characters in the links if it creates more human readable URLs

@CrossEye
Copy link
Contributor Author

CrossEye commented Sep 11, 2023

Do we want to extend to allowing other characters? All the following characters could be added without breaking the implementation or the spec: -, ., _, ~, !, $, ', (, ), *, ;, =, and@. This would mean many fewer ugly percent-encoded characters?

I would be in favour of more readable characters in the links if it creates more human readable URLs

The latest commit adds this feature. Now instead of

https://tiddlywiki.com/#A%20(surprising%3F)%20tiddler%20with%20various%20*special%2Funusual*%20characters%2C%20%26%20more!

We can have

https://tiddlywiki.com/#A_(surprising?)_tiddler_with_various_*special/unusual*_characters,_&_more!

I have tried this with _ instead of + as @AnthonyMuscio recommended, and I agree that I like this much better. I also switched the , separator to a ;, as I think commas are more likely to actually appear in titles.

So for instance, we might have this permaview:

http://tiddlywiki.com/#Foo_Bar_Baz:Foo_Bar_Baz;Qux;Corge_Grault

which is decent; it's hard to imagine better. But for permalinks, this is hard to beat:

http://tiddlywiki.com/#Working_with_TiddlyWiki

Code

The code is structured a little oddly for TW. Part of this is to enable us to easily try various combination of space characters and separators.:

/*
The character that will substitute for a space in the URL
*/
var SPACE = "_";
/*
The character that will separate out the list elements in the URL
*/
var CONJUNCTION = ";";
/*

These -- along with a list of punctuation characters allowed in fragments -- are then used to build a lookup object, and some regexes that are used to remove unwanted percent encoding, and to add our own space encoding.

But this involves having a number of non-function variables inside the _boot closure, to be used by these functions. I would expect that the indirection (which I'm using now to make these values easy to change) will go away, and there will be fewer of these _boot-global variables, but they won't all go away. The regexes and lookup object are extracted, I guess, mostly for performance, but this does not feel at all like an early optimization. However, if we do decide to do this, and there is a real objection to having those, I can inline them once we decide on the space and conjunction values.

Note

I really didn't stress that this is not a panacea. There are plenty of URLs that will still have percent-encoding in them. Probably the worst offender will be the colon (:). To keep backward compatibility, that has to remain the separator between the target tiddler and the list of open tiddlers. But it's a key part of the title of all system tiddlers, and I think it's a common character in titles. But we will still have to deal with percent-encoding:

https://tiddlywiki.com/#JavaScript%3A_The_Good_Parts
https://tiddlywiki.com/#$%3A/DefaultTiddlers

I really wish we could get around that, but I think the break in backward compatibility is too severe.

boot/boot.js Outdated Show resolved Hide resolved
boot/boot.js Outdated Show resolved Hide resolved
boot/boot.js Outdated Show resolved Hide resolved
boot/boot.js Outdated Show resolved Hide resolved
@CrossEye
Copy link
Contributor Author

The latest commit (0cabd09) fixes more of the ES6+ stuff added in the previous commit.

It does one other thing. In the forum thread @AnthonyMuscio had this comment:

  • The following additional characters also valid when encoding with encodeURI ; / ? : @ & = + $ , # the + symbol and others are in this subset. So if we modify your proposed change to “not encode these additional characters” the permalinks will look even easier to read.
    • However I have read that not all systems may consider this a valid URI even if only used in the fragment. Thus I believe we need to allow people to opt out or in.

While I didn't get the full details of what he expects to be a problem, I did recognize that trailing sentence-ending punctuation is often not considered part of a link by many programs, assuming that that is part of the surrounding sentence.

Thus

This is a link to https://tiddlywiki.com/#Foo_Bar_Baz!

acts like it has a link to https://tiddlywiki.com/#Foo_Bar_Baz and a separate ! to end the sentence. I correct this by adding a configured trailing character if the title ends in ., ?, !, or that trailing character itself. When decoding I strip off one copy of that trailing character from any URL.

I originally used the tilde ( ~) as the trailing character, but soon realized that I could reuse underscore (_) and it's prettier. So now the title Foo Bar Baz! is encoded as #Foo_Bar_Baz!_.

I haven't tested all the remaining characters yet to see if there are others besides ./?/! which need this treatment. But before I bother, does it make more sense to simply do this for all non-alphanumeric characters?


There's one other major decision. Tony's concern above is probably not only about these sentence-enders. He would like this to be opt-in or opt-out, so that the current URL generation is still available.

I understand the concern, but I'm loathe to do that. It certainly is possible, but I think it adds needless complexity.

What do others think?

@CrossEye
Copy link
Contributor Author

I originally used the tilde ( ~) as the trailing character, but soon realized that I could reuse underscore (_) and it's prettier. So now the title Foo Bar Baz! is encoded as #Foo_Bar_Baz!_.

And I just realized that this fixes an issue with the current permalink generation, which, while it converts Foo? to #Foo%3F, also converts Foo! to #Foo!, which will fail when used most places that autolink urls, because they will treat the ! as outside the link. The code above will turn these to #Foo?_, and #Foo!_, which while not as pretty as we might like, are not bad.

boot/boot.js Outdated Show resolved Hide resolved
boot/boot.js Outdated Show resolved Hide resolved
boot/boot.js Outdated Show resolved Hide resolved
boot/boot.js Outdated Show resolved Hide resolved
boot/boot.js Outdated Show resolved Hide resolved
boot/boot.js Outdated Show resolved Hide resolved
@pmario
Copy link
Contributor

pmario commented Sep 19, 2023

@CrossEye -- at Talk you wrote:

It’s not clear, though, if there is any support. This discussion has been mostly you and me, with some technical details from @pmario, some interesting interjections from @oeyoews, and a somewhat mixed-signals message from @jeremyruston. The only interactions on the actual pull request have been a few well-appreciated corrections and suggestions from @pmario. I’m not sure whether to take this as discouragement or only a sign that things may move slowly around here.

I can assure you, that this contribution is valuable and appreciated by everyone here. --

As you wrote. Things move slowly in the core, especially if a change will be as visible for the end-users as this one is. Once it's merged and published, we have to live with it for a long time.

Making things go away will have side effects for the community. As can be seen with v5.3.0, which only lived for several weeks and v5.3.1 was needed.

So I hope this comment is encouraging


We will also need to extend the documentation at: https://tiddlywiki.com/#PermaLinks

@saqimtiaz
Copy link
Contributor

I would also add that core development does tend to slow down for a while after a release, a natural consequence of the work and time commitment that each release takes. Also, I would recommend that for the future, directly creating an issue for a core improvement will move things along faster than posting on the community forums.

I support the proposed change and will try to find time in the near future to review the implementation. Thank you for your work on this.

@pmario
Copy link
Contributor

pmario commented Sep 19, 2023

There is 1 thing, we did not talk about yet: Non-Latin alphabets like Greek, Russian and so on. See the screenshot.

If a tiddler is titled: заметка (note) the current URL is:
https://tiddlywiki5-2vg2v0leq-jermolene.vercel.app/#%D0%B7%D0%B0%D0%BC%D0%B5%D1%82%D0%BA%D0%B0

But using the existing "slugify" mechanism it could be:
https://tiddlywiki5-2vg2v0leq-jermolene.vercel.app/#zametka

image

Since I do not speak Russian or Greek I cannot say if this makes sense. But visually there is a big difference.

Any idea about this one?

@Jermolene
Copy link
Owner

Thank you @CrossEye, this is an ingenious solution to a long term blemish on TiddlyWiki's usability, and it is a great achievement to do so whilst retaining backwards compatibility.

One minor point is that I think that new definitions in boot.js are not actually used until after all module loading has been completed, and so they can be moved to $:/core/modules/utils/utils.js

  • Do we want to extend to allowing other characters? All the following characters could be added without breaking the implementation or the spec: -, ., _, ~, !, $, ', (, ), *, ;, =, and@. This would mean many fewer ugly percent-encoded characters?

I think that would make sense, yes.

  • Should we switch the separator for the list portion from the likely-to-appear in titles, & to the much-less-likely one of ~?

That's a tricky one. Using & is familiar and clear. The tilde character can be hard to distinguish from a dash.

Perhaps the difficulties of making such a change backwards compatible make it impractical?

  • What other formats of fragment identifier does TW support? This only tested for the target and a title list. I have vague memories of seeing other filters used as fragment identifiers in the URL. Am I remembering correctly? If so, how are those formatted.

The format for permalinks is the encoded title of the target tiddler followed by an optional colon and then a filter giving the tiddlers to display in the story (ie a permaview). The two parts can be given independently:

https://tiddlywiki.com/#:[search[jeremy]]

  • How well does this match TW's coding standards? I don't think I've added any linting errors, although that's hard to tell at the moment. But my personal style uses many more modern JS features than the code-base seems to. Some of those may have slipped in. One specific question: does this project still try to support ES3? Do I need to replace Array.prototype.map and Array.prototype.filter with other implementations?

We don't use map/filter/reduce, sadly. There has been some recent discussion about updating the JS dialect we use, but it has not been resolved yet.

  • Do you have other suggestions for a contributor new to the code-base?

This is a great PR, the top post is nice and clear, and the code has the hallmarks of being written for ease of comprehension, good stuff.

@CrossEye
Copy link
Contributor Author

Damn, I wrote a long response to everyone, in great detail, and think I must have left it in Preview mode when my machine needed to reboot. I will create it again soon, but I don't have the heart right now.

In short, though, thanks for the encouragement! I will make (most of) the requested changes for my next commit.

@CrossEye
Copy link
Contributor Author

@pmario:

I can assure you, that this contribution is valuable and appreciated by everyone here. --

So I hope this comment is encouraging

It is, very much so. Thank you very much!

We will also need to extend the documentation at: https://tiddlywiki.com/#PermaLinks

Oops. I remembered to do this in my previous PR, but that was a tiny bug fix. Can't believe I forgot it here. I will include this in my next commit. (Tomorrow, I'm hoping.)

@CrossEye
Copy link
Contributor Author

@saqimtiaz:

I would also add that core development does tend to slow down for a while after a release, a natural consequence of the work and time commitment that each release takes.

I do understand. I created and help maintain a large project with varied periods of frenetic activity and others of relative quiescence.

Also, I would recommend that for the future, directly creating an issue for a core improvement will move things along faster than posting on the community forums.

Ok, I will probably get to that point eventually. But I don't have enough experience with the core to get a good sense of whether something is useful or has been considered and rejected or is extremely controversial. I'll get there, though.

I support the proposed change and will try to find time in the near future to review the implementation. Thank you for your work on this.

Thank you.

@CrossEye
Copy link
Contributor Author

@pmario:

There is 1 thing, we did not talk about yet: Non-Latin alphabets like Greek, Russian and so on. See the screenshot.

If a tiddler is titled: заметка (note) the current URL is:
https://tiddlywiki5-2vg2v0leq-jermolene.vercel.app/#%D0%B7%D0%B0%D0%BC%D0%B5%D1%82%D0%BA%D0%B0

But using the existing "slugify" mechanism it could be:
https://tiddlywiki5-2vg2v0leq-jermolene.vercel.app/#zametka

I think there's a real problem that this is not reversible. If we had a tiddler named заметка and one named zametka, which one would we choose to show? It's the same problem I mentioned on talk which asked which of the following three existing tiddlers #more-coffee should like to: More Coffee?, "More Coffee!, or MORE COFFEE!!!`?

But this is a serious problem you raise. My proposal will make for cleaner URLs, but only for those using the Latin alphabet. That's related to the specifications for URL fragment strings. It might be interesting to see if we can find a way to do this more readably, even if the generated URLs are not legally allowed, if they would work in any reasonable place that people use URLs. I've been mostly focused on the space and other punctuation, but perhaps if we do just generate

https://tiddlywiki.com/#заметка

and so long as the characters are letters, it would just work. I will have to think about this more.

@CrossEye
Copy link
Contributor Author

Thank you @CrossEye, this is an ingenious solution to a long term blemish on TiddlyWiki's usability, and it is a great achievement to do so whilst retaining backwards compatibility.

😊 Thank you. I should reiterate that there is one backward incompatible situation: if the user has laying around a permalink/view that was handcrafted and contains an _, then we will fail to decode it properly. This shouldn't affect permalinks actually generated by TW. I think this is likely to be an extremely rare case, but we should note it.

One minor point is that I think that new definitions in boot.js are not actually used until after all module loading has been completed, and so they can be moved to $:/core/modules/utils/utils.js

While I have no problem moving them, the existing code that I'm modifying/replacing is in boot.js. I don't know the lifecycle well enough. Clearly the encoding part will happen late enough, but what about the decoding? Have we already loaded everything everything at the time we try to parse the incoming fragment?

Should we switch the separator for the list portion from the likely-to-appear in titles, & to the much-less-likely one of ~?

That's a tricky one. Using & is familiar and clear. The tilde character can be hard to distinguish from a dash.

The latest incarnation uses _ for the space (also prepended to trailing sentence-ending punctuation) and uses a semicolon between list items. I think the semicolon is much less likely in titles than the comma or ampersand, although I have no evidence to back that up. The biggest disappointment is the colon. That is much more likely to appear in titles, but we can't do anything about it without badly breaking backwards compatibility.

The format for permalinks is the encoded title of the target tiddler followed by an optional colon and then a filter giving the tiddlers to display in the story (ie a permaview). The two parts can be given independently:

https://tiddlywiki.com/#:[search[jeremy]]

I will have to do more testing with these, but am not expecting any problems.

We don't use map/filter/reduce, sadly. There has been some recent discussion about updating the JS dialect we use, but it has not been resolved yet.

Understood. The Ramda library I founded and still sporadically maintain supports ES3!

I guess I have some refactoring to do. Are there TW functions I can use in place of these? Could I write my own plain functional versions of them? Or should I just resort to for/while loops?

This is a great PR, the top post is nice and clear, and the code has the hallmarks of being written for ease of comprehension, good stuff.

Thank you. I'm happy to hear it.

@CrossEye
Copy link
Contributor Author

A different idea altogether

I will try to make another commit tonight to address some issues, but I just had a thought about an entirely different permalink/view approach and wanted to see what people thought.

I want to make it clear up front that I still prefer the version I've been coding. But I think this alternative is compelling enought to deserve some discussion.

Use a short SHA-1 hash

We can make simple URLs that are not transparent but are short, easy to type, and would work everywhere by using one-way hashes of the titles, and then shorten them to a useful length. For instance, The First Rule of Using TiddlyWiki would convert with the SHA-1 algorithm to d8d158cb0dc0b0ab8baf69a29cf2b33c328abfa9, which we could shorten to d8d158cb0d. (In a wiki with 5000 tiddler titles, the chance of there being any collisions at all is approximately one in one hundred thousand. If that scared us, we could use more characters.) With this, we might have a permalink like https://tiddlywiki.com/#/d8d158cb0d If we wanted one a little easier to type, we could break it up with a punctuation character: https://tiddlywiki.com/#/d8d-158-cb0d

On loading with that URL, TW would scan tiddlers for one with a title whose SHA-1 title hash starts with that value. (If we use this elsewhere besides on load we should pre-cache the values.)

This could be done in an entirely backwards-compatible manner if we signal this style url fragment with an initial character we wouldn't otherwise create in a URL. Here / makes the most sense, as it's widely seen around the web for navigating within single-page apps. (Ok, the same caveat applies to pre-existing hand-crafted URL fragments starting with /, but that's still ignorable.)

Possible extension

A possible extension of this idea would be to store this slug as a field in the tiddler, to be updated when the title changes. (I don't know just how fraught that might be, though.) But this could also let users override that slug for custom needs. (https://tiddlywiki.com/#/first-rule) There are several complexities here, and I don't know if it would be worth it.

Downsides

There are two significant downsides I see:

  • There is a noted lack of transparency: You're going somewhere, but who knows where? If this slug is not found, you do not have the nice signal such as

    Missing tiddler "The First Rule of Using TiddlyWik" – click to create

    which might make it clear that you dropped a character when copying the URL. And this is simply nicer to read: https://tiddlywiki.com/#The_First_Rule_of_Using_TiddlyWiki than is any variant of https://tiddlywiki.com/#/d8d158cb0d.

  • Although this could convert permalinks the way my current one does, with a hash for each title, there is no clear extension mechanism for an arbitrary filter. It probably could not extend to covering something equivalent to https://tiddlywiki.com/#:[tag[Reference]] or, these days, https://tiddlywiki.com/#:%5Btag%5BReference%5D%5D

Question

Is this worth pursuing? I want to finish up the approach I'm working on. But is there something more compelling about this alternative that makes it worth following up in parallel?

@Jermolene Jermolene merged commit 145a8d6 into Jermolene:master Nov 21, 2023
4 checks passed
@CrossEye CrossEye deleted the plus-for-space-in-permalink branch November 21, 2023 15:10
@CrossEye
Copy link
Contributor Author

@Jermolene:

Thank you again for all your hard work on this @CrossEye

It was very much my pleasure. I learned a lot about TW in doing it!

@AnthonyMuscio
Copy link
Contributor

I am afraid I could not follow this whole process and change, but I just took the preview to test it, and see the safe _ was used.

I just created a tiddler with the underscore in the name and it creates a separate tiddler, but a permalink will only open the one with spaces. And fail if that is deleted, even if the underscore version is available.

  • At a minimum this needs to be well documented.
  • My concern is this is technically not backward compatible.
    • if someone made extensive use of underscores in the titles for this very purpose, of readable URL's in the past it could be a disaster for their wiki. Although they could batch change them as long as they do not also have space separated titles.
    • At the very least we would should provide a way for them to defeat or turn off this new behaviour, even if it is hacky.

I have not explored the use of other characters at this point.

never the less thanks @CrossEye for your effort.

@CrossEye
Copy link
Contributor Author

@AnthonyMuscio:

I just created a tiddler with the underscore in the name and it creates a separate tiddler, but a permalink will only open the one with spaces. And fail if that is deleted, even if the underscore version is available.

Yeah, this is a flaw I should have seen. I will try a trick to alleviate it. Probably tomorrow, but maybe not until Sunday.

We knew that this was not entirely backward compatible, but this is a bigger gap than expected. I think I can get this close enough not to matter as a practical concern.

At the very least we would should provide a way for them to defeat or turn off this new behaviour, even if it is hacky.

I'm quite loathe to do this, unless absolutely necessary. Technically it would not be very difficult, but maintaining both behaviors feels like bloat.

@pmario
Copy link
Contributor

pmario commented Nov 24, 2023

I think it should be easy to test, if a tiddler with spaces exists and then test if tiddlers with underscores exist. The only problem is, that every combination has to be tested. Starting with one which has underscores instead of spaces, which is the most likely one

@Jermolene
Copy link
Owner

Hi @CrossEye. I'm afraid I'm going to revert this PR for the moment due to the backwards compatibility issues. We'll return to it for v5.3.3

Jermolene added a commit that referenced this pull request Nov 24, 2023
@Jermolene
Copy link
Owner

Reverted in 0716ed4

I should add that even if this were to be fixed now, I think it would be unwise to rush it into this release. I would also prefer to have tests for the encoding/decoding functions.

@CrossEye
Copy link
Contributor Author

@pmario:

I think it should be easy to test, if a tiddler with spaces exists and then test if tiddlers with underscores exist. The only problem is, that every combination has to be tested. Starting with one which has underscores instead of spaces, which is the most likely one

I'd really rather avoid that. I believe the URL conversion should not depend on the actual tiddler names in the wiki. I'm thinking that I can double-up underscores to represent single underscores. There is still a backward incompatibility, but it is a much less likely scenario: if you have spaces next to underscores in titles, then there is an ambiguity between <underscore><space> and <space><underscore>, which both would map to a triple underscore. And any other combination of the two would have similar ambiguities. But I think finding wikis where people are using combinations of these two is a vanishingly small set.

If we don't want to accept this incompatibility, then I would prefer to go with Tony's suggestion of a toggle for the behavior, so that you can revert to percent-encoding. I don't like the idea of continuing to support both, but I'm far too new to the dev community to argue too hard against it. And I would definitely prefer it to testing for the existence of any of the 2n potential matching tiddlers.

@Jermolene:

Hi @CrossEye. I'm afraid I'm going to revert this PR for the moment due to the backwards compatibility issues. We'll return to it for v5.3.3

Of course. I'm disappointed, but it's definitely the right call.

I would also prefer to have tests for the encoding/decoding functions.

Yes, I did forget to ask about that. I didn't see tests anywhere near the root of repo, so thought there was probably not much of a test suite. Now I see there's a dedicated edition -- a fascinating concept! When I revisit this, I'll convert my personal mocha tests to match the style.

@AnthonyMuscio
Copy link
Contributor

AnthonyMuscio commented Nov 25, 2023

Would it not be adequate to simply test for a tiddler with underscores and open it, otherwise open the equivalent with spaces as if it were underscores?

Ie if someone has gone out of there way to use underscores in the tiddler title it will open them. If not they will have a link generated from new permalink (now containing underscores) to open the tiddler by the same name with spaces.

To me this is adequate if documented, it even allows underscore tiddlers to be generated specifically to intercept a link incoming with underscores if desired. As I suggested earlier this will allow you to maintain a destination url even when the tiddler title needs to change within the wiki.

  • Trust me this is of great value allowing a form of redirect for SOE and link maintenance.

@CrossEye
Copy link
Contributor Author

CrossEye commented Nov 26, 2023

@AnthonyMuscio:

Would it not be adequate to simply test for a tiddler with underscores and open it, otherwise open the equivalent with spaces as if it were underscores?

There's no perfect solution. I would live with this if we had to, but as I replied to @pmario, I believe the URL conversion should not depend on the actual tiddler names in the wiki.

It also would have a problem with a tiddler title like "Team decided to WONT_FIX the issue", as neither "Team decided to WONT FIX the issue" nor "Team_decided_to_WONT_FIX_the_issue" actually exists. But my main objection is simply philosophical. The job of this code is to encode/decode between URLs and titles/title lists. It should do the job the same way regardless of the actual tiddler titles in the wiki.

My current thinking, which seems quite possible, and I will try it soon, is to convert this to

Team_decided_to_WONT__FIX_the_issue
                     ^--- double underscore

and then possibly add the flag you were suggesting earlier, the one I didn't and still don't really like, but which would let people exit this new behavior if they have a need to mix underscores and spaces together or have titles with consecutive spaces.

To me this is adequate if documented, it even allows underscore tiddlers to be generated specifically to intercept a link incoming with underscores if desired. As I suggested earlier this will allow you to maintain a destination url even when the tiddler title needs to change within the wiki.

This is a fascinating concept. I can definitely see the uses for it. But I think if we want this, we should have an explicit mechanism for it, and not depend upon a convention about spaces and underscores.

To me that mechanism sounds simple both to write and to configure. It would probably run before the decode above and might use dictionary tiddlers something like this:

title: SEO Animals
tags: $:/tags/TitleMapping
type: application/x-tiddler-dictionary

Horses: Mammals - Equine
Dolpins: Mammals - Delphinidae
Dogs: Mammals - Canine 

or the equivalent in JSON. (Please excuse any lapses in biological taxonomy!)

@pmario
Copy link
Contributor

pmario commented Nov 26, 2023

..and then possibly add the flag you were suggesting earlier, the one I didn't and still don't really like, but which would let people exit this new behavior if they have a need to mix underscores and spaces together or have titles with consecutive spaces.

There should be no option to deactivate the new behaviour. IMO it only creates confusion.

@CrossEye
Copy link
Contributor Author

@pmario:

There should be no option to deactivate the new behaviour. IMO it only creates confusion.

That would be my preference too. I would only be willing to do this if it's the only way to make this change widely acceptable. Otherwise we should treat it as a break with past behavior.

Here's there's significant backward compatibility because -- except in some narrow cases we've been discussing -- older permalinks will still keep working. New ones would look different, true, but that doesn't cause issue with the existing ones.

@pmario
Copy link
Contributor

pmario commented Nov 26, 2023

Here's there's significant backward compatibility because -- except in some narrow cases we've been discussing -- older permalinks will still keep working.

IMO there is an other way to handle it. As soon as there is an underscore in the tiddler title use the "old behaviour" for this tiddler.
It would create an ugly, but compatible link.

So users with underscores in their titles can change them if they want or stay with the old behaviour. -> Done.

@AnthonyMuscio
Copy link
Contributor

This seems far simpler to me than the responses raise. If a url is given, then open the tiddler to which it refers whether it has underscores or not. Now if such a tiddler is not found as per this activity, look to open the same tiddler where the underscores are replaced with spaces.

A wiki already using underscored tiddlers (for whatever reason will still work), urls that "permalinks containing spaces, to one containing underscores" will; also open the "required tiddler.

  • The only side effect is if you have both an underscored tiddler and a underscore permalink is the underscored tiddler wins.

This side effect is unlikely to cause problems because it relates to legacy wikis, only "if you upgrade to a wiki containing these new permalinks is it even a question?" and the preference is for backward compatibility.

However this side effect is a feature and can be called out as such a feature and documented, thereby not becoming a problem but a feature.

  • Any url permalink, permaview generated and published is stored elsewhere is "outside the current wikis control", however it is within the wikis ability to maintain the effectiveness of such links, so no published link ends up being a dead link. If an existing underscored tiddler exists, or a permalink using this mechanism is used we want the underscored url to always be valid.
  • However if you use the new mechanism and publish an underscored link to the world to a space delimited tiddler, highly likely, there is a chance you will want to rename the tiddler one day, relink may even do it for you. And your published link is now a dead one, because it can no longer find the space delimited version anymore.
  • However in such cases it is trivial to maintain a tiddler with the original title, where spaces are replaced with underscores and will be valid for eternity, because any incoming links will find this tiddler. This tiddler can redirect the user, present a note, or simply transclude the now renamed tiddler, or even give a deprecated link message.

My preference here is driven by a reasonably deep understanding of search Engin optimisation, maintaining valid links, website publishing and more so please take my appeal seriously, especially since this is a core change, that sets a standard, we need to maintain indefinatly, to support backward compatibility.

@CrossEye
Copy link
Contributor Author

@pmario:

IMO there is an other way to handle it. As soon as there is an underscore in the tiddler title use the "old behaviour" for this tiddler.
It would create an ugly, but compatible link.

I wish we could. The reason that we're having this discussion, though, is that the "old behavior" does not do anything to the underscore characters. They were not percent-encoded. So we would have no way to distinguish between Foo Bar Baz and Foo_Bar_Baz.

@CrossEye
Copy link
Contributor Author

@AnthonyMuscio:

This seems far simpler to me than the responses raise. If a url is given, then open the tiddler to which it refers whether it has underscores or not. Now if such a tiddler is not found as per this activity, look to open the same tiddler where the underscores are replaced with spaces.

That reduces the problem, but doesn't eliminate it. Again, if we were to try to load the url https://tiddlywiki.com/#Team_decided_to_WONT_FIX_the_issue, we could well find that there is no tiddler titled "Team_decided_to_WONT_FIX_the_issue" and so try to load "Team decided to WONT FIX the issue", which will also fail because the original tiddler was titled "Team decided to WONT_FIX the issue" (note the single underscore still used.) Then we have to decide whether it's all-or-none, or whether we want to search for all 26, or 64 possible combinations, including such monstrosities as "Team_decided to WONT_FIX_the issue".

Again, maybe it's mostly a philosophical issue, but I think that we really would prefer our encoded fragment to represent a single title, and not multiple ones depending upon what titles happen to be present in the current wiki.

This side effect is unlikely to cause problems because it relates to legacy wikis, only "if you upgrade to a wiki containing these new permalinks is it even a question?" and the preference is for backward compatibility.

The backward compatibility I've been striving for is simply that legacy permalinks/views should continue to properly load the appropriate tiddlers. You have correctly demonstrated that there was a potentially larger class of permalinks for which this would fail with my first code. I'm simply trying to reduce the damage. Your suggestion gets us part-way there. It may be our best bet, but it introduces an intellectual overhead I'd like to avoid.

@saqimtiaz
Copy link
Contributor

@CrossEye I think the goal needs to be ensuring that a permalink only maps to a single tiddler title while maximizing backwards compatibility. We should reconsider the original proposal of using the + character as the encoding for a space.

@Jermolene
Copy link
Owner

Hi @CrossEye thanks for your patience with this. To add to @saqimtiaz's comment, it is also very important that neither the encoding nor decoding should depend on the state of the tiddler store. In other words, encode/decode must be pure functions.

@pmario
Copy link
Contributor

pmario commented Nov 27, 2023

We should reconsider the original proposal of using the + character as the encoding for a space.

I think with a + my suggestion from #7729 (comment) could work.

@AnthonyMuscio
Copy link
Contributor

If spaces are changed to underscore in the custom permalinks existing underscores remain untouched there is no complexity here.

Test for the incomming tiddler titles and if it does not exist translate it to the new permalink form and open that.

munnox pushed a commit to munnox/TiddlyWiki5 that referenced this pull request Dec 24, 2023
* Simplify Permalink/Permaview URLs

* Fix lint warnings by removing arrow functions

* Remove commented sample code

* Remove post-ES5 code

* Add many more allowable non-percent-encodedcharacters

* Fix more ES6+ stuff, add end-of-sentence padding character.

* Fix to match standards

* Move the new code from boot to util

* Change from custom map/filter to $tw.utils.each

* Make `each` blocks multi-line

* Move the permalink handling to its own file

* Remove auto-navigation

* Revert "Remove auto-navigation"

This reverts commit ca1e5cf.
munnox pushed a commit to munnox/TiddlyWiki5 that referenced this pull request Dec 24, 2023
@pmario
Copy link
Contributor

pmario commented Jan 23, 2024

@CrossEye -- I think this PR still has value. We should change the SPACE_SUBSTITUTE from underscore _ back to plus +, which probably is more unlikely to be in a tiddler title.

On the other hand, it will never be backwards compatible, since we only restrict 5 characters | [ ] { } from titles.

@Jermolene -- If we would like to go with + we should also add "+" to the warning in tiddler titles and see what happens. We may do that sooner than later, to get some feedback.

@CrossEye
Copy link
Contributor Author

CrossEye commented Feb 3, 2024

@pmario:

I think this PR still has value. We should change the SPACE_SUBSTITUTE from underscore _ back to plus +, which probably is more unlikely to be in a tiddler title.

Yes, that was my thought too. I expect to get to this next weekend, if not before.

On the other hand, it will never be backwards compatible, since we only restrict 5 characters | [ ] { } from titles.

Because the current mechanism converts + to %2B, the only backward incompatibility will be if someone handcrafted a URL to a tiddler using + rather than %2B. For example, if I have a tiddler Foo+Bar, TW would give a permalink of https://tiddlywiki.com/#Foo%2BBar. But if the user happened to hand-create the URL https://tiddlywiki.com/#Foo+Bar, TiddlyWiki currently would happily load Foo+Bar. With this change, it would try to load Foo Bar.

This is a much smaller issue that what happens with _, which isn't converted.

If we would like to go with + we should also add "+" to the warning in tiddler titles and see what happens. We may do that sooner than later, to get some feedback.

That isn't necessary. We will continue to encode + as %2B. We are trying to make more readable as many URLs as possible, but there will still be percent encoding for many characters.

@pmario
Copy link
Contributor

pmario commented Feb 3, 2024

That isn't necessary. We will continue to encode + as %2B. We are trying to make more readable as many URLs as possible, but there will still be percent encoding for many characters.

I personally did remove spaces from my tiddler titles and use "titles-with-hyphens" instead. So with the new mechanism I would probably be able to go back to have "titles with spaces" again. -- may be :)

IMO you should change the char to + and create a new PR.

@CrossEye CrossEye mentioned this pull request Feb 15, 2024
@CrossEye
Copy link
Contributor Author

New version in #7990

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[IDEA] Simplify Permalink/Permaview URLs
5 participants