Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalise wiki link format #337

Closed
Jermolene opened this issue Jan 11, 2014 · 21 comments
Closed

Finalise wiki link format #337

Jermolene opened this issue Jan 11, 2014 · 21 comments
Labels

Comments

@Jermolene
Copy link
Owner

TiddlyWiki5 currently uses the same regular expression for matching WikiLinks as TiddlyWiki Classic:

var textPrimitives = {
    upperLetter: "[A-Z\u00c0-\u00de\u0150\u0170]",
    lowerLetter: "[a-z0-9_\\-\u00df-\u00ff\u0151\u0171]",
    anyLetter:   "[A-Za-z0-9_\\-\u00c0-\u00de\u00df-\u00ff\u0150\u0170\u0151\u0171]",
    anyLetterStrict: "[A-Za-z0-9\u00c0-\u00de\u00df-\u00ff\u0150\u0170\u0151\u0171]"
};

textPrimitives.unWikiLink = "~";
textPrimitives.wikiLink = textPrimitives.upperLetter + "+" +
    textPrimitives.lowerLetter + "+" +
    textPrimitives.upperLetter +
    textPrimitives.anyLetter + "*";

In plain language:

  • An uppercase letter is defined as the letters A-Z plus the characters: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞŐŰ
  • A lowercase letter is defined as the letters a-z and the digits 0-9 plus the characters: underscore, minus, and ßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿőű
  • Any letter is defined as either an uppercase letter or a lowercase letter
  • A link mustn't be preceded by any letter (except minus and underscore)
  • A link is:
    • one or more uppercase letters
    • followed by one or more lowercase letters
    • followed by one upper case letter
    • followed by any combination of upper and lowercase letters

There are several obvious problems:

  • The characters × and ÷ shouldn't be classified as letters
  • The rule that allows links to be preceded by minus or underscore means that the string "HelloThere" in the text "something_HelloThere" is wikified, which seems incorrect
  • Allowing the minus sign as a lower case letter means that many compound nouns in German are erroneously rendered as links

Generally, though, the rules are arguably far too loose. I think it makes sense for the wikilink rules to be on the conservative side, as it is easier to explicitly link a text than it is to suppress a wiki link.

@pmario
Copy link
Contributor

pmario commented Jan 12, 2014

Generally, though, the rules are arguably far too loose. I think it makes sense for the wikilink rules to be on the conservative side, as it is easier to explicitly link a text than it is to suppress a wiki link.

hmmm,
I think [[create a wikilink]] is more work than ~NoLink. Both escape mechanisms are annoying, if you have a lot of them. So imo it depends on the usecase and probably the language settings.

@Jermolene
Copy link
Owner Author

It's not just the work of typing that we should be concerned with, it's also the cognitive overhead of links appearing unexpectedly versus explicitly forcing a link. The trouble with complex rules is that the user has to be able to rerun the rules in their head in order to understand what's going on.

@pmario
Copy link
Contributor

pmario commented Jan 12, 2014

I think the rules are not complex. Ther's just a bug with the "word" detection

something_HelloThere and something-HelloThere are treated as 2 words, where the second word is CamelCase. -> if something_HelloThere and something-HelloThere would be the same as somethingHelloThere, there would be no problem.

So for me this is just a word detection bug

The question is, if "underline" and "hyphens" combine to words, so the parser sees them as one.


In german language there are rules, that are not "optional" (rule 26-30)[1] ...
eg: Mund-zu-Mund-Beatmung ... (TW5 does create a wikilink here)

If we apply the first assumption hypens and underline combine words we get perfect CamelCase words. Those words can be escaped by ~ -> ~Mund-zu-Mund-Beatmung. Which imo is a simple rule that actually works.

The problem I see, is that adding ~ sucks, if you have a lot of combined words like this.

I'd use \rules except wikitext in this case or I'd like to have a possibility to switch off wikilinks globally and only allow [[wikilinks]] like that.

IMO it depends on the usecase, if Mund-zu-Mund-Beatmung is a CamelCase word or not.

  • If I write an online "first aid" manual, that contains the text "Mund-zu-Mund-Beatmung", I'd like the automatic link to an "explanation tiddler"
    • Since TW5 actually does what I expect, I have no reason to complain.
    • If this rule would be changed, I may have a reason :)
  • If I write a german prose "single tiddler paper", that contains a lot of combined words I'd probably have a language specific problem. May be this can be solved with "translation plugins"

So I really think it depends very much on the usecase.
eg: "egoism" in german can be "Ichsucht" or "Ich-Sucht". While the first one is "just a word" the second one "wants to indicate importance", which may need a more detailed explanation. -> new tiddler -> so it should be a wikilink.

[1] http://www.duden.de/sprachwissen/rechtschreibregeln/bindestrich

@ghost
Copy link

ghost commented Jan 16, 2014

Hi... I hope you won't mind a little input from a very appreciative and fairly new user.

Regarding this:

  • I find any sort of link inference annoying (in practice and in theory).
  • I obviously find the situation worse if such inference is active by default.
  • It's obviously even worse if one can't ever turn the inference off at all!

In the interest of simplicity (and interoperability), though, it's better to have fewer settings to toggle, so I'd actually vote for no inference at all.

But of course you have the question of backward compatibility to consider.

I believe that 'no inference' (i.e. explicit links only) is the Markdown (and the MediaWiki) way, and I am very keen (for reasons of interoperability etc.) on seeing increased support for that format in TW5; I actually visited here today to raise what turns out to be #352!

Reasons for finding link inference annoying:

  • I'm always typing upper-camel-case and similar words.
  • I'm forced to pepper my text with tildes, which reduces portability.
  • It should be up to me what gets (potentially, eventually) construed as a link; that's why I'm using a wiki!
  • Conversely, it's a poor use of developer time (and a complication of your code base) trying to concoct universal rules to second-guess requirements that might satisfy everyone!

I hope these viewpoints are of some use, though I'm sure they are obvious to you anyway.

@Jermolene
Copy link
Owner Author

Hi @pipedelimited that's very useful feedback, thank you. I think you might like the proposal in #345. It's basically a way to allow authors to control the wikification of their own tiddlers whilst giving them interoperability so that their content can be displayed properly in other wikis.

@tgirod
Copy link

tgirod commented Jan 17, 2014

Just a quick word from another somewhat new user to say that I totally agree with @pipedelimited - from my personal experience with wikis, I always end up disabling CamelCase links.

Emphasis is already expressed by enclosing text between double chars - isn't it logical to do the same for links ?

@Spangenhelm
Copy link
Contributor

+1 for CamelCase and ~tildes, very annoying when using lots of them

@pmario
Copy link
Contributor

pmario commented May 16, 2014

There are several obvious problems:

  • The characters × and ÷ shouldn't be classified as letters

IMO this is a bug and needs to be fixed anyway

@pmario
Copy link
Contributor

pmario commented May 16, 2014

@ssokolow
Copy link

For the record, if one of my TWClassic wikis isn't running DisableWikiLinksPlugin from TiddlyTools, it's either because I forgot or because I haven't touched it since I discovered that plugin.

@matthias-ronge
Copy link

I would like to provide this example image (sentences from German Wikipedia article about Wikipedia itself) to show that this is a real issue in our language. Here are six undesired links showing up in four sentences:

1b61d756-12fd-11e4-9366-f7ab9c844320

I vote for no inference at all, too. Another reason is that practically none of the links do automatically match on Wiki entries, even if they exist, due to lingual transformations (singular, plural, genitive, compound nouns, …) and it seems to me to be the common case that you need the “pipe”, like
If you want to [[sail on tidal waters|Sailing on tidal waters]], you will need to …

@Jermolene
Copy link
Owner Author

I'll try to include this ticket in 5.0.14. My view is that automatic wikilinking should only occur for classical camelcase words. (In your example above @matthias-ronge, "ShareAlike" would be the only wikified word).

@pmario
Copy link
Contributor

pmario commented Aug 8, 2014

@Jermolene
Is this issue still on your radar for beta 14? .. It would be nice, if so.

Jermolene pushed a commit that referenced this issue Aug 8, 2014
Jermolene pushed a commit that referenced this issue Aug 8, 2014
Jermolene pushed a commit that referenced this issue Aug 8, 2014
“HelloThere” in “My-HelloThere” shouldn’t be wikified.

Part of #337
@Jermolene
Copy link
Owner Author

I've made a series of changes to the camelcase rules for 5.0.14:

  • Removed support for underscore and dash within camelcase words (d7390db)
  • Stop classifying "÷" (\u00f7) as an upper case letter (f8548cc)
  • Stop classifying "×" (\u00d7) as a lower case letter (9c8564d)
  • Disabled camelcase recognition when preceded with any letter, dash or underscore

The idea is that these changes take us back to a much stricter, more conservative recognition scheme, with fewer false positives.

@AwesomeAxolotl
Copy link

Sounds nice, when can we expect 5.0.14 to be out?
I've downloaded TiddlyWiki for the first time today, stumbled over the problem (being a German speaker) and found it a nice coincidence that it just seems to be fixed by today.

@Jermolene
Copy link
Owner Author

Thanks @mindfaQ - 5.0.14 will be out in the next couple of days.

@pmario
Copy link
Contributor

pmario commented Aug 9, 2014

I did a short test and it seems nice till now :) thx a lot!

@Jermolene
Copy link
Owner Author

Great, glad to hear that @pmario

@tgirod
Copy link

tgirod commented Sep 12, 2014

Just a quick word on disabling wikilinks - for a project I have to generate tiddlers automatically from an external source which is not wiki-aware, and I get a few false wikilinks. Not many, but enough to make my sense of consistency tingle.

As regular imports need to be done, it is not feasible to correct the link manually. Correcting them automatically would mean to reimplement tiddlywiki's wikilink rules in my generator, which is boring.

@pmario
Copy link
Contributor

pmario commented Sep 13, 2014

Hi @tgirod Your importer can add the following line to the imported tiddler.

\rules except wikilink

How do your false positive links look like?

@tgirod
Copy link

tgirod commented Sep 14, 2014

Hi @pmario
Ok, I didn't know about this feature, it's pretty nice. Most of my positives look like regular WikiWords, so it's normal to see them Wikified. I think there is only one who could justify a change in the wikilink detection: T4LB07

A mix of uppercase letters and numbers can appear as a designation code, or a password, or something else and should not be wikified by default, I think.

@pmario
Copy link
Contributor

pmario commented Sep 14, 2014

T4LB07

A mix of uppercase letters and numbers can appear as a designation code, or a password, or something else and should not be wikified by default, I think.

I think, you should create a new issue for this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants