Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-latin characters (such as japanese) and Markdown preview anchor #20626

Closed
satokaz opened this issue Feb 15, 2017 · 4 comments
Closed

non-latin characters (such as japanese) and Markdown preview anchor #20626

satokaz opened this issue Feb 15, 2017 · 4 comments
Assignees
Labels
verification-needed Verification of issue is requested verified Verification succeeded
Milestone

Comments

@satokaz
Copy link
Contributor

satokaz commented Feb 15, 2017

  • VSCode Version: Code 1.9.1 (f9d0c68, 2017-02-08T23:31:51.320Z)
  • OS Version: Darwin x64 16.4.0

This issue is recognized as not a problem of vscode itself, but because it is used in Markdown extension.

PR: markdown-it-named-header custom slugify for non-latin characters #20628

PROBLEM:

I want to realize an in-page link by using an anchor in Markdown Preview, but it does not work with non-latin characters.
If the header is written in non-latin characters (such as Japanese), it is judged to be an invalid character, so the value of the id attribute is empty.

BACKGROUND:

Markdown-it-named-headers (mdnh) generates an id attribute from Markdown's header element and uses the slug() function provided by strings.js to convert the text to valid URL slug.

example:

# Example Header   -->   <h1 id="example-header">Example</h1>

There is a problem with non-latin characters.
If the header is written in non-latin characters (such as Japanese), it is judged to be an invalid character, so the value of the id attribute is empty.

As a result of investigation, processing of non-latin characters can not be performed correctly.
(However, it may be correct in the sense that it deletes a character string that can not be used in the URL.)

Sample Markdown:

* [test](#test)
* [さくら](#さくら)
* [さくら 桜](#さくら-桜)
* [🌸](#🌸)

## test
## さくら
## さくら 桜
## 🌸

Rendered HTML:

<li data-line="0" class="code-line"><a href="#test">test</a></li>
<li data-line="1" class="code-line"><a href="#%E3%81%95%E3%81%8F%E3%82%89">さくら</a></li>
<li data-line="2" class="code-line"><a href="#%E3%81%95%E3%81%8F%E3%82%89-%E6%A1%9C">さくら 桜</a></li>
<li data-line="3" class="code-line"><a href="#%F0%9F%8C%B8">🌸</a></li>
</ul>
<h2 data-line="5" class="code-line" id="test">test</h2>
<h2 data-line="6" class="code-line" id="">さくら</h2>      <--- here
<h2 data-line="7" class="code-line" id="">さくら 桜</h2>   <--- here
<h2 data-line="8" class="code-line" id="">🌸</h2>         <--- here

In Markdown extension, there is a process of creating an href link from a header when anchor (#anchor) is specified and rendering <a href=""> tag.
In this process, the non-latin characters appear to be encodeURI.
I think it is a very good process.

Markdown-it-named-headers has an option to define custom slugify.
So, when a non-latin character was written in the header, I decided to define its custom slug() function.

.use(mdnh, {
	slugify: function (header: string) {
            return encodeURI(header.trim()
 							.toLowerCase()
							.replace(/[\]\[\!\"\#\$\%\&\'\(\)\*\+\,\.\/\:\;\<\=\>\?\@\\\^\_\{\|\}\~]/g, '') //remove symbol
							.replace(/\s+/g, '-')) // Replace spaces with hyphens
							.replace(/\-+$/, ''); // Replace trailing hyphen
					}
})

HTML rendered with custom slugify applied:

<li data-line="0" class="code-line"><a href="#test">test</a></li>
<li data-line="1" class="code-line"><a href="#%E3%81%95%E3%81%8F%E3%82%89">さくら</a></li>
<li data-line="2" class="code-line"><a href="#%E3%81%95%E3%81%8F%E3%82%89-%E6%A1%9C">さくら 桜</a></li>
<li data-line="3" class="code-line"><a href="#%F0%9F%8C%B8">🌸</a></li>
</ul>
<h2 data-line="5" class="code-line" id="test">test</h2>
<h2 data-line="6" class="code-line" id="%E3%81%95%E3%81%8F%E3%82%89">さくら</h2>
<h2 data-line="7" class="code-line" id="%E3%81%95%E3%81%8F%E3%82%89-%E6%A1%9C">さくら 桜</h2>
<h2 data-line="8" class="code-line code-active-line" id="%F0%9F%8C%B8">🌸</h2>

Screenshot:

anchor

This is the best for me. Please tell me if there is any other better way.

@mjbvz mjbvz self-assigned this Feb 15, 2017
@mjbvz mjbvz added this to the February 2017 milestone Feb 15, 2017
@mjbvz mjbvz added the verification-needed Verification of issue is requested label Feb 15, 2017
@mjbvz
Copy link
Collaborator

mjbvz commented Feb 15, 2017

Fixed by #20628

@mjbvz mjbvz closed this as completed Feb 15, 2017
@satokaz
Copy link
Contributor Author

satokaz commented Feb 15, 2017

@mjbvz

Thank you! I will review and report.
After that, your commit was very helpful for me.

Thanks

@satokaz
Copy link
Contributor Author

satokaz commented Feb 15, 2017

@mjbvz

perfect!

Thanks.

anchor_test

@bpasero
Copy link
Member

bpasero commented Feb 24, 2017

Marking as verified given @satokaz assessment.

@bpasero bpasero added the verified Verification succeeded label Feb 24, 2017
@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
verification-needed Verification of issue is requested verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

3 participants