Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify wording in spec for character groups #618

Merged
merged 21 commits into from May 23, 2020
Merged

Clarify wording in spec for character groups #618

merged 21 commits into from May 23, 2020

Conversation

wooorm
Copy link
Contributor

@wooorm wooorm commented Oct 2, 2019

  • Remove line tab and form feed from “whitespace”
  • Rename newline to line feed (the LF, \n character) or line ending (the concept)
  • Remove the “whitespace” grouping, and be explicit everywhere whether spaces as indentation; spaces; spaces or tabs; spaces, tabs, and up to one line ending; is allowed
  • Rename “Punctuation” to “Unicode punctuation”, to disambiguate it more clearly from “ASCII punctuation”
  • Reword line breaks, in some cases they meant line endings, blank lines, or either hard or soft line breaks

This is a substantial change and needs a thorough review

Open questions:

  • Do tabs play a role in how spaces in the inline code span algorithm are stripped and merged?
  • What about whitespace in HTML tags? Those are used block HTML of kind 7, which can only be on one line and therefore does not support line endings as whitespace, and inline HTML. HTML understands form feed as whitespace here

@wooorm
Copy link
Contributor Author

wooorm commented Oct 2, 2019

Closes GH-604.

spec.txt Outdated Show resolved Hide resolved
spec.txt Show resolved Hide resolved
spec.txt Show resolved Hide resolved
spec.txt Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
@wooorm
Copy link
Contributor Author

wooorm commented Oct 3, 2019

Thanks for the review @JohelEGP! I’ll leave the rest up to see what John thinks

Copy link
Member

@jgm jgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great! Thanks for going through the spec with such care.
Most of my comments are the same thing. You may have had a reason for writing "indented N spaces of indentation"; so maybe there's something I'm missing?

spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
spec.txt Outdated Show resolved Hide resolved
@wooorm
Copy link
Contributor Author

wooorm commented Nov 6, 2019

It’s a bit hidden, so to clarify, my response to your “of indentation” question is here: #618 (comment)

@jgm
Copy link
Member

jgm commented Nov 6, 2019

Summarizing here for convenience:

I commented:

"indented four or more spaces of indentation" seems a bit redundant.

You said:

The reason for having this term in there, instead of dropping it, was to signal that “spaces of indentation” also includes “virtual spaces” (a term I coined to describe the sizes created from tabs). I’m open to dropping them (but that makes it a bit ambiguous that tabs also work), or a different term!

I figured that was the reason. But I wonder whether adding "of indentation" is going to convey this reliably. Is the notion of "virtual space" defined under that name in the spec as you've revised it? If so, maybe we could say "indented four or more virtual spaces"? (with a hyperlink on virtual spaces everywhere it is used)

Another possibility would be just to rephrase. Instead of "indented four spaces of indentation," one could say something like "with four spaces of indentation"?

@wooorm
Copy link
Contributor Author

wooorm commented Nov 6, 2019

I figured that was the reason. But I wonder whether adding "of indentation" is going to convey this reliably.

Is the notion of "virtual space" defined under that name in the spec as you've revised it? If so, maybe we could say "indented four or more spaces or virtual spaces"?
@jgm

I'm going to move this discussion to the main thread, since it gets lost here.
@jgm

The parsing spec I’m working on defines virtual spaces here (and elaborates more in the tabs and input stream sections). That spec is focussed on parsing though, and CM isn’t. I think it is good for CM to not focus on parsing (in line with the content question in GH-605 and GH-622), both have a “raison d'être” (reason for being).

The terms may be useful though, and I could include some of that here (depending on how you want to go about it)

@jgm
Copy link
Member

jgm commented Nov 6, 2019

What about the minimal change suggested here?

Instead of "indented four spaces of indentation," one could say something like "with four spaces of indentation"?

@wooorm
Copy link
Contributor Author

wooorm commented Nov 6, 2019

(edit: whoops, messed up)

@wooorm
Copy link
Contributor Author

wooorm commented Nov 6, 2019

Sorry about that, I now see the problem was only referencing “indentation” twice, and that you suggest using it once. I think that’s good!

wooorm and others added 11 commits May 19, 2020 15:47
Co-Authored-By: Johel Ernesto Guerrero Peña <johelegp@gmail.com>
Co-Authored-By: Johel Ernesto Guerrero Peña <johelegp@gmail.com>
Co-Authored-By: Johel Ernesto Guerrero Peña <johelegp@gmail.com>
Co-Authored-By: Johel Ernesto Guerrero Peña <johelegp@gmail.com>
Co-Authored-By: Johel Ernesto Guerrero Peña <johelegp@gmail.com>
@wooorm
Copy link
Contributor Author

wooorm commented May 19, 2020

@jgm Sorry for forgetting about this. Rebased and fixed those duplicate mentions of indentation.

Gah, sorry again—what do you think?

@jgm
Copy link
Member

jgm commented May 19, 2020

Thanks! I'll have a look before too long.
It will be good to get these changes merged.

spec.txt Outdated Show resolved Hide resolved
@jgm jgm merged commit 1103710 into commonmark:master May 23, 2020
@jgm
Copy link
Member

jgm commented May 23, 2020

Excellent, thanks for all of this!

@tats-u
Copy link

tats-u commented Nov 13, 2023

FOUND YOU, THE CULPRIT! (Edit: NOT GUILTY; The culprit is in v0.14; Sorry)

Rename “Punctuation” to “Unicode punctuation”, to disambiguate it more clearly from “ASCII punctuation”

This change for worse has broken many Chinese and Japanese documents. The following are all due to this change:

These didn't use to happen in MDX v1 and does happen MDX v2 and 3.
I don't know this until starting using Docusaurus v3, using MDX v3 instead of v1.
This annoying change was mixed in the CM spec v0.30. This is just a regression!

https://spec.commonmark.org/0.30/changes

This regression is caused because all of you don't know Chinese and Japanese don't use space to split words or space is not always adjacent to punctuation marks in Chinese and Japanese.
To make matters worse, they use U+201C & U+201D (left and right double quotes) without preceded or followed by a space.
I don't think filtering out punctuation marks by Unicode blocks isn't so good idea.

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a Unicode punctuation character, or (2b) preceded by a Unicode punctuation character and followed by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

The definition of left- and -right-franking emphasis for * and ** must use ASCII punctuation characters instead of Unicode ones.

@wooorm
Copy link
Contributor Author

wooorm commented Nov 13, 2023

This PR has nothing to do with that.
This has nothing to do with MDX.
If you care about #650, comment there.

@tats-u
Copy link

tats-u commented Nov 13, 2023

@wooorm THIS HAPPENS IN ALL MARKDOWN DOCS, NOT ONLY MDX!!!!!!!!!!!!!!!!!!!!!!!!!!!

@wooorm
Copy link
Contributor Author

wooorm commented Nov 13, 2023

Read what I said. You are spamming an unrelated old issue. You are rude.

@tats-u
Copy link

tats-u commented Nov 13, 2023

This has nothing to do with that discussion!

@wooorm
Copy link
Contributor Author

wooorm commented Nov 13, 2023

Again, please read. Yes it does. This PR is about words. It doesn’t change anything in markdown. The behavior has always been the same. #650.

@tats-u
Copy link

tats-u commented Nov 13, 2023

You are rude.

I'm sorry for going a little overboard. I was a bit ticked off to discover a much more serious and much neglected issue than that discussion.

Again, please read. Yes it does. This PR is about words. It doesn’t change anything in markdown. The behavior has always been the same. #650.

I got it. I'd like to ask you which the concrete amendment plan should be discussed there or on another new issue. I'll try there first.

@tats-u
Copy link

tats-u commented Nov 14, 2023

@wooorm I would like to take an opportunity to apologize for making you feel very uncomfortable by blowing a false accusation due to lack of prior research, calling your important changes "the culprit" in anger, and closing dialogue with too much rhetoric. (you might not be able to read this comment anymore though)
Your change including one that I claimed it's the "culprit" by misunderstanding due to lack of prior research are all correct and should be welcomed. Much older change cause the problem in fact.
I do not expect to be forgiven.


I apologize to everyone else for messing up this place with unnecessary and offensive comments.

@wooorm
Copy link
Contributor Author

wooorm commented Nov 14, 2023

Thank you! I do accept your apology. You are forgiven :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants