Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] More friendly support for CJK words (characters) #529

Closed
Mikachu2333 opened this issue Mar 21, 2024 · 18 comments
Closed

[Feature] More friendly support for CJK words (characters) #529

Mikachu2333 opened this issue Mar 21, 2024 · 18 comments
Labels
answered question has been answered cjk Chinese/Japanese/Korean question user-centred question about behaviour of latexindent.pl

Comments

@Mikachu2333
Copy link
Contributor

Here is a tabular which has been formatted by latexindent, but as you can see, due to the symbol of "~", the tabular could not be sorted as an all-English tabular.

As we all know, one Chinese character occupying space between two English characters. And all qusetions break out for that reason especially when I wrote the sentences with mixing Chinese and English, as shown in the following figure. Therefore, I hope you can make improvements to this issue...

\begin{table}[H]
  \centering
  \begin{tblr}{
      hlines,
      vlines,
      cells = {c,m}}
    5:30        & 起床 & 12:30~14:00 & 午休  \\
    6:00~7:00   & 早操 & 14:00~17:30 & 集训  \\
    7:00~7:50   & 早餐 & 17:30~18:30 & 晚餐  \\
    7:50~11:00  & 集训 & 19:00~21:00 & 晚自习 \\
    11:00~12:30 & 午餐 & 23:00       & 熄灯  \\
  \end{tblr}
\end{table}

图片

Catastrophic

@trarer
Copy link

trarer commented Mar 21, 2024

采用半宽英文字体,开启 unicode-string

@Mikachu2333
Copy link
Contributor Author

开启 unicode-string

终究只是权宜之计,真正根除还得作者改进统计方式啊

采用半宽英文字体

我现在就是用的思源黑HW,中英文标准1:2

@trarer
Copy link

trarer commented Mar 21, 2024

搞不懂你“统计方式”意思,采用 unicode-string 功能可将 unicode 当作两字符宽来处理,带有汉字的表格可以对齐。不开启这个功能,默认是将所有字符当作等宽的。或者是说,你觉得程序应该根据实际字体的大小来统计,这是排版程序应该干的事情。问题的本质就是汉字比拉丁字母复杂太多,混排时做到中西文等宽会导致拉丁字母太大或汉字太小。若是采用半汉字宽英文,英文又会相对偏小或汉字偏大。我感觉应该给出一个用户设定中西文字体倍率的功能,比如一个汉字等于 1.5 个拉丁字母,而不是只能固定 2。

@Mikachu2333
Copy link
Contributor Author

采用 unicode-string 功能可将 unicode 当作两字符宽来处理,带有汉字的表格可以对齐

抱歉,这里没懂。按照您的说法,我该如何调整我的VSCode设置才能让表格看起来排版整齐呢?

你觉得程序应该根据实际字体的大小来统计

是的,我的想法就是按照utf编码排序,如果是某几区的字符就把一个字符统计为两个再format,不过看起来有点天方夜谭了……

图片

@trarer
Copy link

trarer commented Mar 21, 2024

你想的没错,并不是天方夜谭。latexindent 的 unicode::string 功能就是 unicode 的 CJK 分区的字符当作 2 个字符来计算的,但不是实际字体的大小,所以你必须找一个 0.5 倍汉字宽的西文等宽字体配合使用。你需要在 latexindent 命令的参数中加入 -GCString,可以打开 unicode::string 功能。

@Mikachu2333
Copy link
Contributor Author

你需要在 latexindent 命令的参数中加入 -GCString,可以打开 unicode::string 功能。

还有这个解决方案?!牛逼!

@Mikachu2333
Copy link
Contributor Author

可以打开 unicode::string 功能

不大对劲,我是用的windows以及latexindent独立程序,按照doc中所言,windows上的独立程序已经默认启用,所以无论是否使用都不应当看到任何有差异化的输出……

所以说还是不太行啊 (笑哭)

图片

@trarer
Copy link

trarer commented Mar 22, 2024

不知道了,肯定是可以的。如果你用命令行肯定是可以的,如果你用 vscode 你需要在设置中加入参数。

@cmhughes
Copy link
Owner

cmhughes commented Mar 22, 2024 via email

@trarer
Copy link

trarer commented Mar 22, 2024

I'm sorry, I have no idea what this means

On Fri, 22 Mar 2024, 00:55 trarer, @.> wrote: 不知道了,肯定是可以的。如果你用命令行肯定是可以的,如果你用 vscode 你需要在设置中加入参数。 — Reply to this email directly, view it on GitHub <#529 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYADCMAFGF6FMMVC6G3YZN6OLAVCNFSM6AAAAABFBRWTS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGEZDIMZWGU . You are receiving this because you are subscribed to this thread.Message ID: @.>

@Mikachu2333 has trouble using Unicode::GCString on vscode.

@cmhughes
Copy link
Owner

This repository has nothing to do with vscode, I recommend posting your issue on the vscode repository.

@saxyx
Copy link
Contributor

saxyx commented Mar 22, 2024

可以打开 unicode::string 功能

不大对劲,我是用的windows以及latexindent独立程序,按照doc中所言,windows上的独立程序已经默认启用,所以无论是否使用都不应当看到任何有差异化的输出……

所以说还是不太行啊 (笑哭)

图片

In the latest version 3.23.7, from the code in C:\Users\AAA\AppData\Local\Temp\par-5875\cache-915a8c0064c09de5b911dab5529ae5b236f00de7\inc\lib/LatexIndent/AlignmentAtAmpersand.pm extracted by latexindent.exe,

sub get_column_width {

    my $stringToBeMeasured = $_[0];

    # default length measurement
    # credit/reference: https://perldoc.perl.org/perlunicook#%E2%84%9E-33:-String-length-in-graphemes
    unless ( $switches{GCString} ) {
        my $count = 0;
        while ( $stringToBeMeasured =~ /\X/g ) { $count++ }
        return $count;
    }

    # if GCString active, then use Unicode::GCString
    return Unicode::GCString->new($stringToBeMeasured)->columns();
}

it can be seen that latexindent.exe only automatically loads Unicode::GCString and does not default to using Unicode::GCString to treat cjk characters as 2 characters for length calculation. You still need to use –GCString to enable this feature.

@cmhughes cmhughes added question user-centred question about behaviour of latexindent.pl answered question has been answered cjk Chinese/Japanese/Korean labels Mar 22, 2024
@Mikachu2333
Copy link
Contributor Author

You still need to use –GCString to enable this feature.

Thanks for answering!

And, I use the powershell to run the command D:\texlive\2024\bin\windows\latexindent.exe -c d:/LanguageLearning/Latex/test/ d:\\LanguageLearning\\Latex\\test\\main -y=defaultIndent: ' ' -GCString, and, everything is pretty good without any error.


So, I change the args and use --screenlog and Finally got the answer of both #528 and this issue.

As the following output and pic show, the vscode use the latexindent in the temp floder instead of the exe file I worote in settings, and the two have different SHA-1 value...

#sha1#
C:\Users\mikac\AppData\Local\Temp\par-6d696b6163\cache-915a8c0064c09de5b911dab5529ae5b236f00de7\latexindent.exe
D252625B4EA7FDE482F1F4EC291625D2F9F1C7CB

#sha1#
D:\texlive\2024\bin\windows\latexindent.exe
2EA50CF2185A183F4982E3391A7A54367A1666E7
INFO:  ANSI Code Page:  936
INFO:  Current console output code page: 936 
INFO:  Change the current console output code page to 65001
INFO:  Command line:
       C:\Users\mikac\AppData\Local\Temp\par-6d696b6163\cache-915a8c0064c09de5b911dab5529ae5b236f00de7/latexindent.exe --screenlog --overwriteIfDifferent --cruft=d:/LanguageLearning/Latex/test/ --modifylinebreaks "--yaml=defaultIndent: '    '" --GCString d:\LanguageLearning\Latex\test\main
       Command line arguments:
       --screenlog, --overwriteIfDifferent, --cruft=d:/LanguageLearning/Latex/test/, --modifylinebreaks, --yaml=defaultIndent: '    ', --GCString, d:\LanguageLearning\Latex\test\main

INFO:  latexindent.exe version 3.23.7, 2024-03-16, a script to indent .tex files
       latexindent.exe lives here: D:/texlive/2024/bin/windows/
       Sat Mar 23 12:22:02 2024
       Filename: d:\LanguageLearning\Latex\test\main
INFO:  Processing switches:
       -sl|--screenlog: log file will also be output to the screen
       -wd|--overwriteIfDifferent: will overwrite ONLY if indented text is different
       -y|--yaml: YAML settings specified via command line
       -m|--modifylinebreaks: modify line breaks
       -c|--cruft: cruft directory
       --GCString switch active, loading Unicode::GCString module
INFO:  Directory for backup files and d:/LanguageLearning/Latex/test//indent.log:
       d:/LanguageLearning/Latex/test/
INFO:  YAML settings read: defaultSettings.yaml
       Reading defaultSettings.yaml from D:/texlive/2024/bin/windows/defaultSettings.yaml
       Reading defaultSettings.yaml (2nd attempt) from D:/texlive/2024/bin/windows/../../texmf-dist/scripts/latexindent/defaultSettings.yaml
       and then, if necessary, D:/texlive/2024/bin/windows/LatexIndent/defaultSettings.yaml
INFO:  YAML reading settings
       Home directory is C:\Users\mikac
       latexindent.pl didn't find indentconfig.yaml or .indentconfig.yaml
       see all possible locations: https://latexindentpl.readthedocs.io/en/latest/sec-appendices.html#indentconfig-options)
INFO:  YAML settings read: -y switch
       YAML setting: defaultIndent:'    '
       single-quoted string found in -y switch: '    ', substitute to     
       Updating mainSettings with defaultIndent:     
INFO:  File extension work:
       latexindent called to act upon d:\LanguageLearning\Latex\test\main without a file extension;
       searching for files in the following order (see fileExtensionPreference):
       d:\LanguageLearning\Latex\test\main.tex
       d:\LanguageLearning\Latex\test\main.sty
       d:\LanguageLearning\Latex\test\main.cls
       d:\LanguageLearning\Latex\test\main.bib
       d:\LanguageLearning\Latex\test\main.tex found!
       Updated fileName to d:\LanguageLearning\Latex\test\main.tex
INFO:  Phase 1: searching for objects
INFO:  Phase 2: finding surrounding indentation
INFO:  Phase 3: indenting objects
INFO:  Phase 4: final indentation check
INFO:  -wd switch active
       Original body matches indented body, NOT overwriting, no backup files created
INFO:  Output routine:
       Not outputting to file; see -w and -o switches for more options.

Full vscode settings.

"latex-workshop.bibtex-fields.sort.enabled": true,
    "latex-workshop.bibtex-format.sort.enabled": true,
    "latex-workshop.bibtex-format.tab": "4 spaces",
    "latex-workshop.intellisense.file.base": "both",
    "latex-workshop.intellisense.package.enabled": true,
    "latex-workshop.intellisense.triggers.latex": [],
    "latex-workshop.latex.autoClean.run": "onBuilt",
    "latex-workshop.latex.build.clearLog.everyRecipeStep.enabled": false,
    "latex-workshop.latex.clean.fileTypes": [
        "*.aux",
        "*.bbl",
        "*.blg",
        "*.idx",
        "*.ind",
        "*.lof",
        "*.lot",
        "*.out",
        "*.toc",
        "*.acn",
        "*.acr",
        "*.alg",
        "*.glg",
        "*.glo",
        "*.gls",
        "*.ist",
        "*.fls",
        "*.log",
        "*.fdb_latexmk",
        "*.synctex.gz"
    ],
    "latex-workshop.latex.recipe.default": "lastUsed",
    "latex-workshop.latex.recipes": [
        {
            "name": "XeLaTeX *2",
            "tools": [
                "xelatex",
                "xelatex"
            ]
        },
        {
            "name": "XeLaTeX *3",
            "tools": [
                "xelatex",
                "xelatex",
                "xelatex"
            ]
        },
        {
            "name": "XeLaTeX -> BibTeX",
            "tools": [
                "xelatex",
                "bibtex",
                "xelatex",
                "xelatex"
            ]
        }
    ],
    "latex-workshop.latex.tools": [
        {
            "args": [
                "-synctex=1",
                "-interaction=nonstopmode",
                "-file-line-error",
                "%DOCFILE%"
            ],
            "command": "xelatex",
            "name": "xelatex"
        },
        {
            "args": [
                "%DOCFILE%"
            ],
            "command": "bibtex",
            "name": "bibtex"
        }
    ],
    "latex-workshop.latexindent.args": [
        "--screenlog",
        "--overwriteIfDifferent",
        "--cruft=%DIR%/",
        "--modifylinebreaks",
        "--yaml=defaultIndent: '    '",
        "--GCString",
        "%DOC_W32%"
    ],
    "latex-workshop.latexindent.path": "D:\\texlive\\2024\\bin\\windows\\latexindent.exe",
    "latex-workshop.message.error.show": false,
    "latex-workshop.message.information.show": true,
    "latex-workshop.message.warning.show": false,
    "latex-workshop.showContextMenu": true,
    "latex-workshop.synctex.afterBuild.enabled": true,
    "latex-workshop.texcount.autorun": "onSave",
    "latex-workshop.view.autoFocus.enabled": true,
    "latex-workshop.view.pdf.internal.synctex.keybinding": "double-click",
    "latex-workshop.view.pdf.invertMode.enabled": "auto",
    "latex-workshop.view.pdf.viewer": "browser",

@saxyx
Copy link
Contributor

saxyx commented Mar 23, 2024

As the following output and pic show, the vscode use the latexindent in the temp floder instead of the exe file I worote in settings, and the two have different SHA-1 value..

When using D:\texlive\2024\bin\windows\latexindent.exe, it will automatically unzip under the Temp path and re-run a new command in the background. The --screenlog option merely displays the final command run in the background. Of course, you cannot directly use C:\Users\mikac\AppData\Local\Temp\par-6d696b6163\cache-915a8c0064c09de5b911dab5529ae5b236f00de7\latexindent.exe to format your file.

@Mikachu2333
Copy link
Contributor Author

Mikachu2333 commented Mar 23, 2024

When using D:\texlive\2024\bin\windows\latexindent.exe, it will automatically unzip under the Temp path and re-run a new command in the background. The --screenlog option merely displays the final command run in the background. Of course, you cannot directly use C:\Users\mikac\AppData\Local\Temp\par-6d696b6163\cache-915a8c0064c09de5b911dab5529ae5b236f00de7\latexindent.exe to format your file.

But there is no other possible reason for this issue to occur,furthermore, I am not sure if this is related to paths that were not wrapped in double quotation marks.

Besides, when running from the command line, everything is normal and correct, so I believe this is due to VSCode

@Mikachu2333
Copy link
Contributor Author

Here is the exe file I repackaged after modifying Document.pm.

After I use the exe file repacked by @fengzyf , when I was about to use --GCString arg, latexindent noted me that Locale 'Chinese (Simplified)_China.936' is unsupported... And, after remove the arg, latexindent format the file successfully but still unable to align the contents of cells.

图片

图片

@cmhughes
Copy link
Owner

I believe this is fixed as of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.23.8 let me know if not

@Mikachu2333
Copy link
Contributor Author

Wonderful! Tremendous! The problem that has been bothering me for a long time has been perfectly fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered question has been answered cjk Chinese/Japanese/Korean question user-centred question about behaviour of latexindent.pl
Projects
None yet
Development

No branches or pull requests

4 participants