mkdx TOC does not support utf8 title #130

Cyperwu · 2020-09-25T11:17:55Z

OS type:

[x ] Unix
Windows
Other ([SPECIFY])

Vim:

vim
[x ] neovim
Other ([SPECIFY])

Vim version:

NVIM v0.5.0-nightly

Reproduce steps:

Open a markdown file.
Paste the following example into the file:

## 测试

Then, attempt to generate TOC

Expected:

# Table of Contents

- [Table of Contents](#table-of-contents)
  - [测试](#测试)

## 测试

Actual:

outputs:

# Table of Contents

- [Table of Contents](#table-of-contents)
  - [测试](#)

## 测试

The text was updated successfully, but these errors were encountered:

SidOfc · 2020-09-25T11:26:57Z

Yep that looks like a bug indeed, will check it out somewhere this week.

Cheers for the report 👍

Cyperwu · 2020-09-25T11:32:51Z

Thanks! And there is another bug that

## 1. test

generates

[1. test](#1-test)

The period was missing. Should I open a new issue or leave it here?

SidOfc · 2020-09-25T15:20:49Z

@Cyperwu so it actually generated [1 test](#1-test) instead of [1. test](#1-test)?

SidOfc · 2020-09-25T16:05:13Z

Oke so the issue with chinese characters should now be fixed (https://stackoverflow.com/questions/41318003/how-to-match-chinese-characters-with-grep) by including the unicode range of chinese.

About this other bug, I checked to see what gets generated when using # 1. some heading and I can't find any errors in the link text or actual link, it ends up like this: [1. some heading](#1-some-heading) which is correct in both text and generated fragment link on GH.

Cyperwu · 2020-09-27T02:42:26Z

Sorry, I checked my code and found that it was related to my markdown preview tool. So that the related bug was actually not exist.
And I tried out the latest commit, which raises E945: Range too large in character class when generating TOC.

…rontmatter last line showing up in generated TOC

SidOfc · 2020-09-27T21:02:44Z

I can't reproduce this on either neovim (0.5.0) or regular vim (8.1), even after set re=1 it works, I did however find some useful information when going to :h E945 which shows me how to "fix" it. I've pushed the commit to master so you can try it out by updating.

I also patched another bug in there with YAML frontmatter showing up in generated table of contents because in markdown --- is used for titles as well as YAML front matter.

Cyperwu · 2020-09-28T06:26:18Z

It still raises E945, but the TOC is generated. And updating TOC won't throw the error.

Here is a piece the error message:

Error detected while processing function mkdx#GenerateOrUpdateTOC[10]..mkdx#GenerateTOC[7]..324[23]..326:
line    6:
E945: Range too large in character class
E945: Range too large in character class
...
E945: Range too large in character class
Error detected while processing function mkdx#GenerateOrUpdateTOC[10]..mkdx#GenerateTOC[85]..<lambda>4[1]..327[1]..326:
line    6:
E945: Range too large in character class
E945: Range too large in character class
E945: Range too large in character class
Error detected while processing function mkdx#GenerateOrUpdateTOC[10]..mkdx#GenerateTOC[79]..326:
line    6:
E945: Range too large in character class
Error detected while processing function mkdx#GenerateOrUpdateTOC[10]..mkdx#GenerateTOC[82]..<lambda>4[1]..327[1]..326:
line    6:
E945: Range too large in character class
Error detected while processing function mkdx#GenerateOrUpdateTOC[10]..mkdx#GenerateTOC[85]..<lambda>4[1]..327[1]..326:
line    6:
E945: Range too large in character class
E945: Range too large in character class
E945: Range too large in character class

I found some code on vim-markdown-toc. They support Chinese, Korean specifically but I think that's not the common use case for markdown writing.

SidOfc · 2020-09-28T08:44:33Z

Hmmn very strange, also not very easy to fix for me as I don't get this error at all :(

What I think I'll try to do is split this one big range up into multiple smaller ones that are no more than 256 chars apart since that is the limit of the old regex engine. Unfortunately though I'll also have to ask for your feedback on it once I've committed some change since I can't reproduce it myself 😅

I'll keep you posted when I've worked more on it, will try to apply another patch after work.

…s limit (#130) :|

SidOfc · 2020-09-28T15:55:59Z

alright @Cyperwu can you update again and give it another shot?

The ranges are now split and should all be < 256 characters apart as per Vim limit. The only thing I can imagine that would cause it to fail is because they are all still in the same character class, but not as one giant range but quite a few small ranges.

Cheers for your patience and feedback so far, looking forward to more info 👍

Cyperwu · 2020-09-29T07:54:22Z

Cheers, your patch works!

SidOfc · 2020-09-29T08:34:27Z

Good stuff, will close this, thanks for reporting!

SidOfc self-assigned this Sep 25, 2020

SidOfc added the bug:report Something does not work as intended / expected. label Sep 25, 2020

SidOfc added a commit that referenced this issue Sep 25, 2020

Fix chinese unicode filtered out of fragment links (#130)

ad11a1c

SidOfc added a commit that referenced this issue Sep 25, 2020

once again strip punct / special chars from fragment links (#130)

2278bae

SidOfc added a commit that referenced this issue Sep 27, 2020

Hopefully fix "Range too large in character class" (#130) and fixed f…

d6cd5e2

…rontmatter last line showing up in generated TOC

SidOfc added a commit that referenced this issue Sep 28, 2020

split chinese character range into smaller ones that dont exceed Vim'…

cf8a451

…s limit (#130) :|

SidOfc closed this as completed Sep 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mkdx TOC does not support utf8 title #130

mkdx TOC does not support utf8 title #130

Cyperwu commented Sep 25, 2020

SidOfc commented Sep 25, 2020

Cyperwu commented Sep 25, 2020 •

edited

SidOfc commented Sep 25, 2020

SidOfc commented Sep 25, 2020

Cyperwu commented Sep 27, 2020 •

edited

SidOfc commented Sep 27, 2020

Cyperwu commented Sep 28, 2020 •

edited

SidOfc commented Sep 28, 2020

SidOfc commented Sep 28, 2020

Cyperwu commented Sep 29, 2020

SidOfc commented Sep 29, 2020

mkdx TOC does not support utf8 title #130

mkdx TOC does not support utf8 title #130

Comments

Cyperwu commented Sep 25, 2020

SidOfc commented Sep 25, 2020

Cyperwu commented Sep 25, 2020 • edited

SidOfc commented Sep 25, 2020

SidOfc commented Sep 25, 2020

Cyperwu commented Sep 27, 2020 • edited

SidOfc commented Sep 27, 2020

Cyperwu commented Sep 28, 2020 • edited

SidOfc commented Sep 28, 2020

SidOfc commented Sep 28, 2020

Cyperwu commented Sep 29, 2020

SidOfc commented Sep 29, 2020

Cyperwu commented Sep 25, 2020 •

edited

Cyperwu commented Sep 27, 2020 •

edited

Cyperwu commented Sep 28, 2020 •

edited