xeCJKfntef: 关于下划线中出现公式的一个「解决方案」 #614

Sophanatprime · 2022-04-15T16:39:33Z

xeCJK 版本 3.8.8，TeXLive 2022，expl3：Released 2022-04-10。

如下代码在 XeLaTeX 下编译：

\documentclass{ctexart}
\usepackage{xeCJKfntef}
\begin{document}

\uline{不好$n$}% 出错
\uline{不好 $n$}% 吃掉空格

\CJKunderline{不好$n$} % 出错
\CJKunderline{不好 $n$}% 吃掉空格

\end{document}

第一个 \uline 和第一个 \CJKunderline 将报错。且第二个 \uline 和 \CJKunderline 中中文和公式之间的空格将被吃掉，本意不应如此。

而且，当且仅当 $ 前的字符类为 CJK 时会出现错误，也就是 CJK and Boundary 的情况，其它情况则不会出现此错误。比如西文字符或中文标点均不会有此错误。CJK+空格（catcode=10）+Boundary 只是吸收掉了空白。

此 issue 可见于 #530 。

涉及到如下三个宏：\__xeCJK_ulem_CJK_and_Boundary:w、\__xeCJK_ulem_glue:n、\__xeCJK_peek_catcode_ignore_spaces_branches:w，

将第一个和第三个修改为如下：

\cs_gset_protected:Npn \__xeCJK_ulem_CJK_and_Boundary:w
  {
    \xeCJK_if_ulem_patch:TF
      {
        \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token
          {
            \xeCJK_class_group_end: %\UL@stop %% remove
            \CJKecglue
            %\UL@start %% remove
          }
          {
            \bool_if:NTF \l__xeCJK_peek_ignore_spaces_bool
              {
                \xeCJK_class_group_end: \UL@stop
                \UL@start 
                { \xeCJK_make_node:n { CJK-space } }
              }
              {
                \xeCJK_class_group_end: \UL@stop
                \UL@start { \xeCJK_make_node:n { CJK } }
              }
            \xeCJK_make_group_tag:
          }
      }
      { \__xeCJK_ulem_CJK_and_Boundary:w }
  }

移除 \UL@stop 和 \UL@start（位置见上方注释）。

\cs_gset_protected:Npn \__xeCJK_peek_catcode_ignore_spaces_branches:w
  {
    \if_meaning:w \l_peek_token \c_space_token
      \bool_set_true:N \l__xeCJK_peek_ignore_spaces_bool
      \exp_after:wN \peek_after:Nw
      \exp_after:wN \__xeCJK_peek_catcode_ignore_spaces_branches:w
      \exp:w \exp_end_continue_f:w %% add
      \tex_romannumeral:D 0
    \else:
      \if_catcode:w
        \exp_not:N \l_peek_token \exp_not:N \l__xeCJK_peek_search_token
        \exp_after:wN \exp_after:wN
        \exp_after:wN \__xeCJK_peek_catcode_true:w
      \else:
        \exp_after:wN \exp_after:wN
        \exp_after:wN \__xeCJK_peek_catcode_false:w
      \fi:
    \fi:
  }

增加 \exp:w \exp_end_continue_f:w，但不删除 \tex_romannumeral:D 0（位置见上方注释）。

其它地方未作改动。

这样居然能正确编译上方的代码了。这是我没有想到的。本应不修改就可正常工作。

这里贴出上面提到的第二个宏的代码，以便随后的分析。

\cs_gset_protected:Npn \__xeCJK_ulem_glue:n #1 %% unchange
  {
    \xeCJK_if_ulem_patch:TF
      {
        \tl_if_empty:NTF \l__xeCJK_group_tag_tl
          { \UL@stop \__xeCJK_ulem_hskip:n {#1} \UL@start }
          {
            \str_if_eq:eeTF { \l__xeCJK_group_tag_tl } { \c__xeCJK_group_tag_tl }
              { \UL@stop \__xeCJK_ulem_hskip:n {#1} \UL@start }
              { \skip_horizontal:n {#1} }
          }
      }
      { \skip_horizontal:n {#1} }
  }

且在 UL 内部，\CJKecglue 为：

\cs_set_protected:Npn \CJKecglue { \__xeCJK_ulem_glue:n \l__xeCJK_ecglue_skip }

我的分析是这样的（不使用上述 patch）：对于 不好$n$，直接就是 CJK and Boundary，匹配到了 math toggle，应该就插入 \UL@stop \CJKecglue \UL@start。但是 \CJKecglue 此时会使用 \__xeCJK_ulem_glue:n 的两个 T 分支，于是再次出现了 \UL@stop 造成组不匹配。这是我理解的出错的原因。

当然所谓的解决办法就是删掉 \UL@stop 和 \UL@start。经过我的测试这项修改并未引起其它错误，可能我没有考虑到所有情况。

第二个是，不好 $n$，同样执行到了 CJK and Boundary，它和上述结果应该相同，因为 peek catcode 时忽略了空格。但实际并非如此，请看下列代码：

\documentclass{ctexart}
\usepackage{xeCJKfntef}
\begin{document}

\ExplSyntaxOn

\xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} ~~$x$ %%:: tx

\CJKunderline{ \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} ~~$x$ } %%:: fx

\ExplSyntaxOff

\end{document}

其中第一个结果为 T 分支，第二个结果为 F 分支。这表明在 UL 中，原有的 peek catcode 宏并未正确处理，使用（第二个） patch 后，则可”正确“处理。

未 patch 时，经过我的 debug，发现在 UL 中， peek catcode 的最后一个 \peek_after:Nw peek 到的是 \expandafter，在正常文本中则能正确 peek 到 $，这是第二个我不理解的地方。

正因如此，执行到了错误的分支，于是 CJK 与公式之间并未插入空白。

patch 仅仅是在 \romannumeral 0 \else ... 前增加了 \exp:w \exp_stop_end_continue:w，实际也就是增加了一个 \romannumeral，则在 UL 中和正常文本中都能正确 peek 到 $，我不理解为什么需要加上这个 \exp:w。

总之，在我看来原来的代码应该能够正确处理，但是却没有达到想要的效果，使用修改后的代码则能够正确处理。

最后附上一个 work 的例子：

\documentclass{ctexart}
\usepackage{xeCJKfntef}

\makeatletter
\ExplSyntaxOn
\cs_gset_protected:Npn \__xeCJK_ulem_CJK_and_Boundary:w
  {
    \xeCJK_if_ulem_patch:TF
      {
        \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token
          {
            \xeCJK_class_group_end: %\UL@stop %% remove
            \CJKecglue
            %\UL@start %% remove
          }
          {
            \bool_if:NTF \l__xeCJK_peek_ignore_spaces_bool
              {
                \xeCJK_class_group_end: \UL@stop
                \UL@start 
                { \xeCJK_make_node:n { CJK-space } }
              }
              {
                \xeCJK_class_group_end: \UL@stop
                \UL@start { \xeCJK_make_node:n { CJK } }
              }
            \xeCJK_make_group_tag:
          }
      }
      { \__xeCJK_ulem_CJK_and_Boundary:w }
  }
\cs_gset_protected:Npn \__xeCJK_peek_catcode_ignore_spaces_branches:w
  {
    \if_meaning:w \l_peek_token \c_space_token
      \bool_set_true:N \l__xeCJK_peek_ignore_spaces_bool
      \exp_after:wN \peek_after:Nw
      \exp_after:wN \__xeCJK_peek_catcode_ignore_spaces_branches:w
      \exp:w \exp_end_continue_f:w %% add
      \tex_romannumeral:D 0
    \else:
      \if_catcode:w
        \exp_not:N \l_peek_token \exp_not:N \l__xeCJK_peek_search_token
        \exp_after:wN \exp_after:wN
        \exp_after:wN \__xeCJK_peek_catcode_true:w
      \else:
        \exp_after:wN \exp_after:wN
        \exp_after:wN \__xeCJK_peek_catcode_false:w
      \fi:
    \fi:
  }
\ExplSyntaxOff
\makeatother

\begin{document}

\uline{如果按某种对应关系 $f$，对于集合$A$中的任意 \relax 一}

\uline{$f$}

\CJKunderline*{好$k$}

\CJKunderline{好。好 $k$ 好$x$ 好。$x$ 好 \relax 好\relax 好}

好。好 $k$ 好$x$ 好。$x$ 好 \relax 好\relax 好

\end{document}

The text was updated successfully, but these errors were encountered:

RuixiZhang42 · 2022-04-17T18:14:10Z

关于 \__xeCJK_peek_catcode_ignore_spaces_branches:w，目前的这个写法结构上是

\ifx<token1><token2>% \ifx does not expand tokens; it compares meaning of <token1> and <token2>
  ...
  \expandafter \peek_after:Nw \expandafter \__xeCJK_peek_catcode_ignore_spaces_branches:w
  \romannumeral 0%
\else
  ...
\fi

首先就是这个写法完全不是 LaTeX3 的写法，这个是 low-level TeX 的写法。我猜是历史遗留问题。

吞掉空格的原因十有八九是那个 \romannumeral 0（且不说它完全没有必要）。这里的本意是要先展开余下没用的 \else ... \fi 部分，然后再用 \peek_after:Nw \__xeCJK_peek_catcode_ignore_spaces_branches:w 往前看。正常来说这样就可以了：

\ifx<token1><token2>%
  ...
  \expandafter \peek_after:Nw \expandafter \__xeCJK_peek_catcode_ignore_spaces_branches:w
\else
  ...
\fi

\romannumeral 后面是数字（如十进制 0～9），最后要多一个空格来表示数字结束，TeX 要么找到空格并吞掉它、要么找到别的终止数字的 token，TeX 为了找到这个空格会持续展开后续的 tokens。按照旧的写法不仅展开了 \else（一直到 \fi 全部被跳过），后面若还有 tokens 还要继续展开。特别地，若后面有空格，这个空格才真正终止了 0 这个数字（并且被吞掉），然后 \romannumeral 0␣ 的展开结果为空。

正统的 LaTeX3 写法应该是：

\cs_gset_protected:Npn \__xeCJK_peek_catcode_ignore_spaces_branches:w
  {
    \token_if_eq_meaning:NNTF \l_peek_token \c_space_token
      {
        \bool_set_true:N \l__xeCJK_peek_ignore_spaces_bool
        \peek_after:Nw \__xeCJK_peek_catcode_ignore_spaces_branches:w
      }
      {
        \token_if_eq_catcode:NNTF \l_peek_token \l__xeCJK_peek_search_token
          { \__xeCJK_peek_catcode_true:w }
          { \__xeCJK_peek_catcode_false:w }
      }
  }

其中 \token_if_eq_meaning:NNTF \l_peek_token \c_space_token {<True code>} {<False code>} 可以换成更加合理的 \token_if_space:NTF \l_peek_token {<True code>} {<False code>}。另，应该不需要 \cs_gset_protected:Npn，照理来说 \cs_set_protected:Npn 就行了。

Sophanatprime · 2022-04-18T02:30:15Z

关于 \__xeCJK_peek_catcode_ignore_spaces_branches:w，目前的这个写法结构上是
\ifx<token1><token2>% \ifx does not expand tokens; it compares meaning of <token1> and <token2>
  ...
  \expandafter \peek_after:Nw \expandafter \__xeCJK_peek_catcode_ignore_spaces_branches:w
  \romannumeral 0%
\else
  ...
\fi
首先就是这个写法完全不是 LaTeX3 的写法，这个是 low-level TeX 的写法。我猜是历史遗留问题。

吞掉空格的原因十有八九是那个 \romannumeral 0（且不说它完全没有必要）。这里的本意是要先展开余下没用的 \else ... \fi 部分，然后再用 \peek_after:Nw \__xeCJK_peek_catcode_ignore_spaces_branches:w 往前看。正常来说这样就可以了：
\ifx<token1><token2>%
  ...
  \expandafter \peek_after:Nw \expandafter \__xeCJK_peek_catcode_ignore_spaces_branches:w
\else
  ...
\fi
\romannumeral 后面是数字（如十进制 0～9），最后要多一个空格来表示数字结束，TeX 要么找到空格并吞掉它、要么找到别的终止数字的 token，TeX 为了找到这个空格会持续展开后续的 tokens。按照旧的写法不仅展开了 \else（一直到 \fi 全部被跳过），后面若还有 tokens 还要继续展开。特别地，若后面有空格，这个空格才真正终止了 0 这个数字（并且被吞掉），然后 \romannumeral 0␣ 的展开结果为空。

正统的 LaTeX3 写法应该是：
\cs_gset_protected:Npn \__xeCJK_peek_catcode_ignore_spaces_branches:w
  {
    \token_if_eq_meaning:NNTF \l_peek_token \c_space_token
      {
        \bool_set_true:N \l__xeCJK_peek_ignore_spaces_bool
        \peek_after:Nw \__xeCJK_peek_catcode_ignore_spaces_branches:w
      }
      {
        \token_if_eq_catcode:NNTF \l_peek_token \l__xeCJK_peek_search_token
          { \__xeCJK_peek_catcode_true:w }
          { \__xeCJK_peek_catcode_false:w }
      }
  }
其中 \token_if_eq_meaning:NNTF \l_peek_token \c_space_token {<True code>} {<False code>} 可以换成更加合理的 \token_if_space:NTF \l_peek_token {<True code>} {<False code>}。另，应该不需要 \cs_gset_protected:Npn，照理来说 \cs_set_protected:Npn 就行了。

您的代码似乎无法在下例中编译：

\documentclass{ctexart}
\usepackage{xeCJKfntef}

\makeatletter
\ExplSyntaxOn
\cs_set_protected:Npn \__xeCJK_peek_catcode_ignore_spaces_branches:w
  {
    \token_if_eq_meaning:NNTF \l_peek_token \c_space_token
      {
        \bool_set_true:N \l__xeCJK_peek_ignore_spaces_bool
        \peek_after:Nw  \__xeCJK_peek_catcode_ignore_spaces_branches:w
      }
      {
        \token_if_eq_catcode:NNTF \l_peek_token \l__xeCJK_peek_search_token
          { \__xeCJK_peek_catcode_true:w }
          { \__xeCJK_peek_catcode_false:w }
      }
  }
\ExplSyntaxOff
\makeatother

\begin{document}

好 $x$

\ExplSyntaxOn

\xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} ~~$x$ %%:: tx

\CJKunderline{ \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} ~~$x$ } %%:: fx

\ExplSyntaxOff

\end{document}

代码陷入到了死循环中。
我的理解是，在 peek 到了空格时，\peek_after:Nw 并不会将 peek 到的字符移除（此例中空格没有被移除），所以每个 peek 操作都是 peek 到的第一个字符。即使改为

\documentclass{ctexart}
\usepackage{xeCJKfntef}

\makeatletter
\ExplSyntaxOn
\cs_set_protected:Npn \__xeCJK_peek_catcode_ignore_spaces_branches:w
  {
    \token_if_eq_meaning:NNTF \l_peek_token \c_space_token
      {
        \bool_set_true:N \l__xeCJK_peek_ignore_spaces_bool
        \exp_after:wN \peek_after:Nw 
        \exp_after:wN \__xeCJK_peek_catcode_ignore_spaces_branches:w
        \exp:w \exp_end_continue_f:w
      }
      {
        \token_if_eq_catcode:NNTF \l_peek_token \l__xeCJK_peek_search_token
          { \__xeCJK_peek_catcode_true:w }
          { \__xeCJK_peek_catcode_false:w }
      }
  }
\ExplSyntaxOff
\makeatother

\begin{document}

好 $x$

\ExplSyntaxOn

\xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} ~~$x$ %%:: tx

\CJKunderline{ \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} ~~$x$ } %%:: fx

\ExplSyntaxOff

\end{document}

在 UL 中同样也匹配到的是 F 分支。
我不理解的是为何这个相同的代码能在正常文本中工作，但不会在 UL 中工作。

实际仍需要使用

...
\exp:w \exp_end_continue_f:w
\exp:w 0
...

才能正常工作，但这样与使用 low-level 的写法并无多大区别。

RuixiZhang42 · 2022-04-18T03:00:15Z

@Sophanatprime

我的理解是，在 peek 到了空格时，\peek_after:Nw 并不会将 peek 到的字符移除（此例中空格没有被移除），所以每个 peek 操作都是 peek 到的第一个字符。

啊，是的，我疏忽大意了，因为没有移除，所以死循环。

实际仍需要使用
...
\exp:w \exp_end_continue_f:w
\exp:w 0
...
才能正常工作，但这样与使用 low-level 的写法并无多大区别。

但是 \romannumeral`^^@\romannumeral0\else...\fi 是逻辑不通的呀……不过我倒是明白了（？）为啥原本会有 \romannumeral0\else...\fi：因为正是要移除掉刚刚被 peeked 到的那个 space token，然后去 peek 下一个 token（也就是 $）。

Sophanatprime · 2022-04-18T03:18:29Z

@RuixiZhang42

但是 \romannumeral`^^@\romannumeral0\else...\fi 是逻辑不通的呀……不过我倒是明白了（？）为啥原本会有 \romannumeral0\else...\fi：因为正是要移除掉刚刚被 peeked 到的那个 space token，然后去 peek 下一个 token（也就是 $）。

是的，我也不理解，就是尝试这样写，然后居然就能正常工作了。

我又测试了一下，不使用两个 \romannumeral 的写法，需要使用 \use:nn {...} {space} 才能工作，

% 不修改 peek catcode
\CJKunderline{ \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} ~~ $x$ } %%:: fx
\CJKunderline{ \use:nn { \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} } {~} $x$ } %%:: tx

而使用两个 \romannumeral 则会多移除一个空格：

% 修改 peek catcode，使用两次 \exp:w
%%:: tx
\CJKunderline{ \use:nn { \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} } {~} $x$ } 
%%:: tx, 但每次 peek 都多移除了一个空格
\CJKunderline{ \use:nnn { \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} } {~} {~} $x$ }

RuixiZhang42 · 2022-04-18T04:13:18Z

@Sophanatprime {␣} v.s. ␣ 是一个很好的突破口，「相同的代码能在正常文本中工作，但不会在 UL 中工作」也给了一定的提示。

我简单研究了一下 \CJKunderline 之后发生的事，相关的就是 \xeCJK_ulem_on:n（也就是 \ULon，这也是为啥用 \uline 会出现几乎一样的问题）。根据 ulem.sty 的定义，\ULon 一般情况下是 \UL@on，而

\long\def\UL@on#1{...
  \UL@word\@empty#1\xdef\UL@spfactor{\the\spacefactor} \UL@end * }

当你做如下测试：

\ExplSyntaxOn
\CJKunderline{ \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t} {f} ~~$x$ }
\ExplSyntaxOff

首先那两个连续的 ~~ 在读入阶段就被 TeX 正则化成一个 space token 了，所以 \UL@on 的 #1 是 \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t}{f}␣$x$，展开之后就是

\UL@word \@empty \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t}{f}␣$x$%
  \xdef\UL@spfactor{\the\spacefactor}␣\UL@end *␣%

第一行的那个 space token 就暴露在 \UL@word 的 parameter text 下：

\long\def\UL@word#1␣{\expandafter\UL@start#1␣%
  ...\UL@word\@empty}

所以 \UL@word 的 #1 是 \@empty \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t}{f}，展开之后就是

\expandafter \UL@start \@empty \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t}{f}␣%
  ...\UL@word\@empty

再展开一步就是

\UL@start \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t}{f}␣%
  ...\UL@word\@empty

所以 \xeCJK_peek_catcode_ignore_spaces:NTF 自然是找不到 $ 的，因为整个 $x$ 被 \UL@word 当成下一个 chunk，根本都还没有读进去；同时，既然是 ignore_spaces，空格自然也被吞掉了。

你做 \use:nn {...} {~} 试验的时候，情况完全不一样，\UL@on 展开之后是

\UL@word \@empty
  \use:nn {\xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t}{f}}{␣}$x$%
  \xdef\UL@spfactor{\the\spacefactor}␣\UL@end *␣%

第三行才出现第一个「暴露在外」的 space token，所以 \UL@word 展开两步之后是

\UL@start
  \use:nn {\xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token {t}{f}}{␣}$x$%
  \xdef\UL@spfactor{\the\spacefactor}␣%
  ...\UL@word\@empty

之后的 \xeCJK_peek_catcode_ignore_spaces:NTF 自然能越过那个存活下来的 ␣ 并且 peek 到 $。

Sophanatprime · 2022-04-18T09:39:08Z

但是这样还无法解释为何使用两次 \romannumeral 就能够工作。但是提供了一个突破口。

使用 \uline{好 $x$} 的情况下，文本被分成了两个不同的 chunk，因此无法检测到 $。但是使用两个 \romannumeral 时，使得 $x$ 与 好 在同一个 chunk 下进行处理。

我的理解如下：

首先贴出几个重要的代码：

\cs_new_protected:Npn \xeCJK_ulem_word:nw #1 ~
  {
    \exp_after:wN \UL@start #1 ~ %% 注意尾部的这个空格
    \exp_after:wN \if_meaning:w \exp_after:wN \UL@end #1
      \exp_after:wN \__xeCJK_ulem_end:
    \else:
      \exp_after:wN \__xeCJK_ulem_loop:nw
    \fi:
  }
\cs_new_protected:Npn \__xeCJK_ulem_loop:nw
  {
    \reverse_if:N \if_mode_math:
      \reverse_if:N \if_dim:w \tex_lastskip:D = \c_zero_dim
        \skip_gset_eq:NN \UL@skip \tex_lastskip:D
        \tex_unskip:D
        \UL@stop \UL@leaders
      \fi:
    \fi:
    \xeCJK_ulem_word:nw \prg_do_nothing:
  }
\cs_set_eq:NN \UL@word \xeCJK_ulem_word:nw

每个 chunk 都被 \xeCJK_ulem_word:nw 处理，并且其参数前会附加 \prg_do_nothing:，也就是 \prg_do_nothing:好，\prg_do_nothing:$x$，…。\xeCJK_ulem_word:nw 的第一个 \exp_after:wN 就是为了去掉它。在执行时，CJK and Boundary 被 XeLaTeX 自动插入到了尾部，这里主要就是 peek catcode 的那部分代码。当只有一个 \romannumeral 时，它被 \xeCJK_ulem_word:nw 中 \exp_after:wN \UL@start #1 ~ 尾部的空格给终止了！因此不会继续展开后面的代码。

但是当有两个 \romannumeral 时，使用 \tracingall 发现，仅仅是第二个 \romannumeral 被空格终止了，第一个继续展开，由于还不是 UL 的尾部，将执行 \__xeCJK_ulem_loop:nw，虽然它是 \protected，但可以被 \romannumeral 展开（用 LaTeX3 的话就是 f-expandable），此时会再次执行到 \xeCJK_ulem_word:nw，它将吸收参数（$x$...），此时 \UL@start 为空，因此，\romannumeral 将遇到 $，它终止展开。此时才会 peek，并且“正确” peek 到了 $，因此将插入 hskip。

所以反倒是使用两个 \romannumeral 阴差阳错地 peek 到了正确的字符。而不使用两个 \rommanumeral，则 peek 到的就是 \if_meaning:w 前的 \exp_after:wN。

为此，请看下例：（需要之前的两个 patch）

\documentclass{ctexart}
\usepackage{xeCJKfntef}
\makeatletter
\ExplSyntaxOn
%% 需要之前 patch 的结果
\cs_gset_protected:Npn \__xeCJK_ulem_CJK_and_Boundary:w
  {
    \xeCJK_if_ulem_patch:TF
      {
        \xeCJK_peek_catcode_ignore_spaces:NTF \c_math_toggle_token
          {
            \xeCJK_class_group_end: %\UL@stop %% remove
            \CJKecglue
            %\UL@start %% remove
          }
          {
            \bool_if:NTF \l__xeCJK_peek_ignore_spaces_bool
              {
                \xeCJK_class_group_end: \UL@stop
                \UL@start 
                { \xeCJK_make_node:n { CJK-space } }
              }
              {
                \xeCJK_class_group_end: \UL@stop
                \UL@start { \xeCJK_make_node:n { CJK } }
              }
            \xeCJK_make_group_tag:
          }
      }
      { \__xeCJK_ulem_CJK_and_Boundary:w }
  }
\cs_gset_protected:Npn \__xeCJK_peek_catcode_ignore_spaces_branches:w
  {
    \if_meaning:w \l_peek_token \c_space_token
      \bool_set_true:N \l__xeCJK_peek_ignore_spaces_bool
      \exp_after:wN \peek_after:Nw
      \exp_after:wN \__xeCJK_peek_catcode_ignore_spaces_branches:w
      \exp:w \exp_end_continue_f:w %% add
      \exp:w \exp_end_continue_f:w %% \tex_romannumeral:D 0
    \else:
      \if_catcode:w
        \exp_not:N \l_peek_token \exp_not:N \l__xeCJK_peek_search_token
        \exp_after:wN \exp_after:wN
        \exp_after:wN \__xeCJK_peek_catcode_true:w
      \else:
        \exp_after:wN \exp_after:wN
        \exp_after:wN \__xeCJK_peek_catcode_false:w
      \fi:
    \fi:
  }
\ExplSyntaxOff
\makeatother

\begin{document}

\def\test{\uline{好 $x$ 好 $y$}\par
  \CJKunderline{好 $x$ 好 $y$}}

\test

\ExplSyntaxOn
\makeatletter
\cs_set_protected:Npn \xeCJK_ulem_word:nw #1 ~
  {
    % \exp_after:wN \UL@start #1 ~
    \use:nnn { \exp_after:wN \UL@start #1 } {~} {~} %% 增加一个空格
    \exp_after:wN \if_meaning:w \exp_after:wN \UL@end #1
      \exp_after:wN \__xeCJK_ulem_end:
    \else:
      \exp_after:wN \__xeCJK_ulem_loop:nw
    \fi:
  }
\cs_set_eq:NN \UL@word \xeCJK_ulem_word:nw
\ExplSyntaxOff

\test

\end{document}

可以看到，增加一个空格后将不能正确输出。

RuixiZhang42 · 2022-04-18T12:29:39Z

使用 \uline{好 $x$} 的情况下，文本被分成了两个不同的 chunk，因此无法检测到 $。但是使用两个 \romannumeral 时，使得 $x$ 与 好 在同一个 chunk 下进行处理。[……]
[……] 当只有一个 \romannumeral 时，它被 \xeCJK_ulem_word:nw 中 \exp_after:wN \UL@start #1 ~ 尾部的空格给终止了！因此不会继续展开后面的代码。
但是当有两个 \romannumeral 时，使用 \tracingall 发现，仅仅是第二个 \romannumeral 被空格终止了，第一个继续展开 [……]
[……] 所以反倒是使用两个 \romannumeral 阴差阳错地 peek 到了正确的字符。而不使用两个 \rommanumeral，则 peek 到的就是 \if_meaning:w 前的 \exp_after:wN。

这里的分析是正确的。不过问题是： $x$ 与 好 在同一个 chunk 下进行处理，它们之间插入了 \CJKecglue，是否还能正常断行？

Sophanatprime · 2022-04-18T13:01:16Z

正常文字下可以自动在公式两端断行（公式右端需有空格）；
使用宏保存文字时，可以自动在公式左端断行，在公式后使用 \allowbreak 可以在公式右端断行；
但是不能在公式中间断行，好像原来的 ulem 就不能在公式中间断行。

\documentclass{ctexart}
\usepackage{xeCJKfntef}
%%% 两个 patch
\patch

\begin{document}

\hfuzz=1pt
\overfullrule=5pt
\lineskip=2.5pt

\def\test{好。好 $k$ 好$x$ 好。$x\displaystyle\int$ 好 \relax 好\relax 好\hbox{内}好 \hbox{内}好\parbox[t]{1\ccwd}{呐\par 讷}哦。我能吞下玻璃而不伤身 $E=mc^2$ 体。}

\CJKunderline{\test}

\CJKunderline*{\test}

\test


\def\test{好。好 $k$ 好$x$ 好。$x\displaystyle\int$ 好 \relax 好\relax 好\hbox{内}好 \hbox{内}好\parbox[t]{1\ccwd}{呐\par 讷}哦。我能吞下玻璃 $E=mc^2$ 而不伤身体。}

\textbf{展开，可以自动断行：}

\expandafter\CJKunderline\expandafter{\test}

\textbf{不展开，则不能自动断行：}

\CJKunderline*{\test}

\test


\def\test{好。好 $k$ 好$x$ 好。$x\displaystyle\int$ 好 \relax 好\relax 好\hbox{内}好 \hbox{内}好\parbox[t]{1\ccwd}{呐\par 讷}哦。我能吞下玻璃而不 $E^2=m^2c^4+c^2p^2$ 伤身体，我能吞下玻璃而不伤身体。}

\CJKunderline{\test}

\CJKunderline*{\test}

\test

\end{document}

ulem：

\documentclass{article}
\usepackage{ulem}
\begin{document}

\overfullrule=5pt
\parskip=5pt

\def\test{I can eat glass, it doesn't hurt me. I can eat glass, it doesn't hurt $E^2=m^2c^4+c^2p^2$ me. I can eat glass, it doesn't hurt me. I can eat glass, it doesn't hurt me.}

\expandafter\uline\expandafter{\test}

\uline{\test}

\test

\end{document}

qinglee · 2022-07-27T16:01:53Z

\__xeCJK_ulem_CJK_and_Boundary:w 中的 \CJKecglue 前后的 \UL@stop 和 \UL@start 确实是多余的，应该去掉，因为 ulem 环境中使用的被修改过的 \CJKecglue 已经内含了。
至于两次 f 展开就可以得到预期结果，因为第二个 f 展开去掉了一个空格，第一个 f 展开继续将汉字之后的其他可以展开的杂项都展开了，\peek_after:Nw 就能看到 $ 了，走向 T 分支，插入 \CJKecglue。

这里需要注意的是 ulem 参数中的空格被作为宏参数的定界符。具体就 \uline{不好 $n$} 来说，第一次读参数读到

\xeCJK_ulem_word:nw 不好 ~

继续展开后为（一些无关分析的 token 用 xxx 代替）

\exp_after:wN \UL@start xxxxxx 不好 ~
\exp_after:wN \if_meaning:w \exp_after:wN \UL@end #1
  \exp_after:wN \__xeCJK_ulem_end:
\else:
  \exp_after:wN \__xeCJK_ulem_loop:nw
\fi:

其中的 \UL@start 就开始构建盒子，执行到 好 ~，汉字后面是一个空格，执行 \__xeCJK_peek_catcode_ignore_spaces_branches:w 的 T 分支。

如果是 xeCJK 的版本，空格被 \tex_romannumeral:D 0 吃掉，\peek_after:Nw 将看到下一行的 \exp_after:wN，结束处理；
如果是 @Sophanatprime 修改过的版本，空格被第二个 \exp:w \exp_end_continue_f:w 吃掉，同时它的展开被截止，第一个 \exp:w \exp_end_continue_f:w 继续往下展开，展开到 \__xeCJK_ulem_loop:nw：

\cs_new_protected:Npn \__xeCJK_ulem_loop:nw
  {
    \reverse_if:N \if_mode_math:
      \reverse_if:N \if_dim:w \tex_lastskip:D = \c_zero_dim
        \skip_gset_eq:NN \UL@skip \tex_lastskip:D
        \tex_unskip:D
        \UL@stop \UL@leaders
      \fi:
    \fi:
    \xeCJK_ulem_word:nw \prg_do_nothing:
  }

这里 \xeCJK_ulem_word:nw 之前的 \if 判断都被展开，因为不是在数学模式，并且空格被吃掉了，\if 分支展开都为空，所以还将继续展开 \xeCJK_ulem_word:nw ，又开始读进参数，这时读进的参数是

 \prg_do_nothing: $n$ xxx

继续展开为

\exp_after:wN \UL@start \prg_do_nothing: $n$ xxx ~
\exp_after:wN \if_meaning:w \exp_after:wN \UL@end #1
  \exp_after:wN \__xeCJK_ulem_end:
\else:
  \exp_after:wN \__xeCJK_ulem_loop:nw
\fi:

注意这里的 \UL@start 会被上一个 \UL@start 定义为 \@empty

% \UL@start: start of each chunk. It gives two levels of grouping.
% Each chunk is ended by \UL@stop.  Local intermissions go like
% \UL@stop...\UL@start.
\def\UL@start{\setbox\UL@box\hbox\bgroup\everyhbox{\UL@hrest}%
% the following are to cope with stops (\ ,\- etc) within extra braces
  \let\UL@start\@empty \def\UL@unegroup{\bgroup\bgroup}\let\UL@leadtype\@empty
  \bgroup \kern-3sp\kern3sp % kerns so I can test for beginning of list
  \if@ignore \global\@ignorefalse \ignorespaces \fi}

所以此时它展开为空，接下来，之前的 \exp:w 终于遇到了第一个不可以展开的 token $，展开被截止，\peek_after:Nw 就看到了 $，走向 T 分支，结束字符类分组，插入 \CJKecglue，得到间距。

xkwxdyy · 2022-07-29T07:36:27Z

我安装了 ad44c66 的 xeCJK 到 local 目录，并有 MWE：

\documentclass{article}
\usepackage{xeCJKfntef}

\begin{document}

\CJKunderline{张量 $A$ 的维度}

\CJKunderline{张量 \,$A$ 的维度}

\end{document}

得到下面的结果：

文字和数学公式的间距并没有正常添加，是没完全解决还是？

qinglee · 2022-07-29T07:53:45Z

@xkwxdyy 你看一下 log，确保你的例子用的是开发版本。

syvshc · 2022-07-29T08:03:29Z

我这里用开发版本测试的表现正常

\documentclass{article}
\usepackage{xeCJKfntef}
\listfiles
\begin{document}

\CJKunderline{张量 $A$ 的维度}

\CJKunderline{张量$A$ 的维度}

\end{document}

 *File List*
 article.cls    2021/10/04 v1.4n Standard LaTeX document class
  size10.clo    2021/10/04 v1.4n Standard LaTeX file (size option)
   xeCJK.sty    2022/07/28 v3.9.0 Typesetting CJK scripts with XeLaTeX
   expl3.sty    2022-07-15 L3 programming layer (loader)
l3backend-xetex.def    2022-07-01 L3 backend support: XeTeX
ctexhook.sty    2022/07/14 v2.5.10 Document and package hooks (CTEX)
xtemplate.sty    2022-06-22 L3 Experimental prototype document functions
fontspec.sty    2022/01/15 v2.8a Font selection for XeLaTeX and LuaLaTeX
  xparse.sty    2022-06-22 L3 Experimental document command parser
fontspec-xetex.sty    2022/01/15 v2.8a Font selection for XeLaTeX and LuaLaTeX
 fontenc.sty    2021/04/29 v2.0v Standard LaTeX package
fontspec.cfg
   xeCJK.cfg    2022/07/28 v3.9.0 Configuration file for xeCJK package
xeCJKfntef.sty    2022/07/28 v3.9.0 xeCJK font effect
    ulem.sty    2019/11/18
  ts1cmr.fd    2019/12/16 v2.5j Standard LaTeX font definitions
 ***********

xkwxdyy · 2022-07-29T08:05:23Z

@xkwxdyy 你看一下 log，确保你的例子用的是开发版本。

抱歉，我放在 local 并 texhash 了不知道为什么没有识别。放在了 MWE 的目录下正常。

Package: expl3 2022-07-15 L3 programming layer (loader) 
 (/usr/local/texlive/2022/texmf-dist/tex/latex/l3backend/l3backend-xetex.def
File: l3backend-xetex.def 2022-07-01 L3 backend support: XeTeX
\g__graphics_track_int=\count189
\l__pdf_internal_box=\box51
\g__pdf_backend_object_int=\count190
\g__pdf_backend_annotation_int=\count191
\g__pdf_backend_link_int=\count192
))
Package: xeCJKfntef 2022/07/28 v3.9.0 xeCJK font effect
 (./xeCJK.sty
Package: xeCJK 2022/07/28 v3.9.0 Typesetting CJK scripts with XeLaTeX
 (/usr/local/texlive/2022/texmf-dist/tex/latex/ctex/ctexhook.sty
Package: ctexhook 2022/07/14 v2.5.10 Document and package hooks (CTEX)
) (/usr/local/texlive/2022/texmf-dist/tex/latex/l3packages/xtemplate/xtemplate.sty
Package: xtemplate 2022-06-22 L3 Experimental prototype document functions

stone-zeng added the package/xeCJKfntef label Apr 25, 2022

qinglee added the bug label Jul 27, 2022

qinglee self-assigned this Jul 27, 2022

qinglee closed this as completed in ad44c66 Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xeCJKfntef: 关于下划线中出现公式的一个「解决方案」 #614

xeCJKfntef: 关于下划线中出现公式的一个「解决方案」 #614

Sophanatprime commented Apr 15, 2022 •

edited

RuixiZhang42 commented Apr 17, 2022 •

edited

Sophanatprime commented Apr 18, 2022 •

edited

RuixiZhang42 commented Apr 18, 2022

Sophanatprime commented Apr 18, 2022 •

edited

RuixiZhang42 commented Apr 18, 2022

Sophanatprime commented Apr 18, 2022

RuixiZhang42 commented Apr 18, 2022

Sophanatprime commented Apr 18, 2022 •

edited

qinglee commented Jul 27, 2022 •

edited

xkwxdyy commented Jul 29, 2022 •

edited

qinglee commented Jul 29, 2022

syvshc commented Jul 29, 2022 •

edited

xkwxdyy commented Jul 29, 2022

xeCJKfntef: 关于下划线中出现公式的一个「解决方案」 #614

xeCJKfntef: 关于下划线中出现公式的一个「解决方案」 #614

Comments

Sophanatprime commented Apr 15, 2022 • edited

RuixiZhang42 commented Apr 17, 2022 • edited

Sophanatprime commented Apr 18, 2022 • edited

RuixiZhang42 commented Apr 18, 2022

Sophanatprime commented Apr 18, 2022 • edited

RuixiZhang42 commented Apr 18, 2022

Sophanatprime commented Apr 18, 2022

RuixiZhang42 commented Apr 18, 2022

Sophanatprime commented Apr 18, 2022 • edited

qinglee commented Jul 27, 2022 • edited

xkwxdyy commented Jul 29, 2022 • edited

qinglee commented Jul 29, 2022

syvshc commented Jul 29, 2022 • edited

xkwxdyy commented Jul 29, 2022

Sophanatprime commented Apr 15, 2022 •

edited

RuixiZhang42 commented Apr 17, 2022 •

edited

Sophanatprime commented Apr 18, 2022 •

edited

Sophanatprime commented Apr 18, 2022 •

edited

Sophanatprime commented Apr 18, 2022 •

edited

qinglee commented Jul 27, 2022 •

edited

xkwxdyy commented Jul 29, 2022 •

edited

syvshc commented Jul 29, 2022 •

edited