Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal:expected:id Cannot find a node with xml:id #812

Closed
dginev opened this issue Nov 6, 2016 · 8 comments
Closed

Fatal:expected:id Cannot find a node with xml:id #812

dginev opened this issue Nov 6, 2016 · 8 comments

Comments

@dginev
Copy link
Collaborator

dginev commented Nov 6, 2016

This type of error is currently responsible for Fatal errors over 1.89% of the arXiv dataset in CorTeX (statistics here)

I have managed to obtain a minimal example that triggers the Fatal, although it may not be a unique combination of constructs that may result in such id mismatch. A general solution could have a very significant boost on the arXiv conversion results.

Example:

\documentclass{article}
\usepackage{amsmath}
\begin{document}

\begin{equation}
\begin{split}
x &= \mbox{tanh}
\end{split}
\end{equation}

\end{document}

Running:

latexmlc --pmml --cmml --preload=[ids]latexml.sty ~/testbed/fatal_expected_id/sigmax.tex --dest=/tmp/test.html

Has the conversion log:

(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/TeX.pool.ltxml...
(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/eTeX.pool.ltxml... 0.00 sec)
(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/pdfTeX.pool.ltxml... 0.01 sec) 0.12 sec)
(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/latexml.sty.ltxml... 0.01 sec)

latexmlc (LaTeXML version 0.8.2; revision d536e8f)
processing started Sun Nov  6 16:49:04 2016

(Digesting TeX sigmax...
(Processing content /home/dreamweaver/testbed/fatal_expected_id/sigmax.tex...
(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/LaTeX.pool.ltxml... 0.14 sec)
(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/article.cls.ltxml... 0.02 sec)
(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/amsmath.sty.ltxml...
(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/amsbsy.sty.ltxml... 0.00 sec)
(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/amstext.sty.ltxml... 0.00 sec)
(Loading /home/dreamweaver/perl5/lib/perl5/LaTeXML/Package/amsopn.sty.ltxml... 0.00 sec) 0.04 sec) 0.23 sec) 0.23 sec)
(Building...
(Loading compiled schema /home/dreamweaver/perl5/lib/perl5/LaTeXML/resources/RelaxNG/LaTeXML.model... 0.01 sec). 0.05 sec)
(Rewriting... 0.00 sec)
(Math Parsing...1 formulae ...[1]
Math parsing succeeded:
   ltx:XMArg: 2/2
   ltx:XMath: 1/1
   ltx:XMWrap: 1/1
 0.02 sec)
(Finalizing... 0.00 sec)
Conversion complete: No obvious problems.

(post-processing...
(Scan test.html processing... 0.00 sec)
(CrossRef test.html processing... 0.00 sec)
(MathML::Presentation[w/MathML::Content] test.html processing...
Fatal:expected:id Cannot find a node with xml:id='S0.E1.m1.1.1.1.2.1.2'
	In Post::Document[test.html] ->realizeXMNode
	 <= Post::MathML::Content[@0x5638597d2d38] <= Post::MathML::Presentation[@0x5638597...
1 fatal error

Post-processing complete: 1 fatal error

Importantly, this error is only triggered when all of pmml, cmml, and [ids]latexml.sty are used. And the current example only triggers the error if there is a \mbox inside a {split} environment, which seems too specific of a use to justify nearly 2% of arXiv. So there is some more digging to do to diagnoze the underlying general issue.

@dginev dginev added this to the LaTeXML-0.8.3 milestone Nov 6, 2016
@dginev
Copy link
Collaborator Author

dginev commented Nov 6, 2016

Hm, when I simply run the core conversion and output the latexml XML, the ID in question is indeed missing:

<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths="/home/dreamweaver/testbed/fatal_expected_id,/home/dreamweaver/git/my-LaTeXML"?>
<?latexml package="latexml" options="ids"?>
<?latexml class="article"?>
<?latexml package="amsmath"?>
<?latexml RelaxNGSchema="LaTeXML"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML">
  <resource src="LaTeXML.css" type="text/css"/>
  <resource src="ltx-article.css" type="text/css"/>
  <para xml:id="p1">
    <equation frefnum="(1)" refnum="1" xml:id="S0.E1">
      <Math mode="display" tex="\begin{split}\displaystyle x&amp;\displaystyle=\mbox{tanh}\end{split}" text="x = [tanh]" xml:id="S0.E1.m1">
        <XMath xml:id="S0.E1.m1.1">
          <XMDual xml:id="S0.E1.m1.1.2">
            <XMApp xml:id="S0.E1.m1.1.2.4">
              <XMRef idref="S0.E1.m1.1.1.1.2.1.1" xml:id="S0.E1.m1.1.2.2"/>
              <XMRef idref="S0.E1.m1.1.1.1.1.1" xml:id="S0.E1.m1.1.2.1"/>
              <XMRef idref="S0.E1.m1.1.1.1.2.1.2" xml:id="S0.E1.m1.1.2.3"/>
            </XMApp>
            <XMArray colsep="0pt" name="aligned" xml:id="S0.E1.m1.1.1">
              <XMRow xml:id="S0.E1.m1.1.1.1">
                <XMCell align="right" xml:id="S0.E1.m1.1.1.1.1">
                  <XMTok font="italic" role="UNKNOWN" xml:id="S0.E1.m1.1.1.1.1.1">x</XMTok>
                </XMCell>
                <XMCell align="left" xml:id="S0.E1.m1.1.1.1.2">
                  <XMApp xml:id="S0.E1.m1.1.1.1.2.1">
                    <XMTok meaning="equals" role="RELOP" xml:id="S0.E1.m1.1.1.1.2.1.1">=</XMTok>
                    <XMTok meaning="absent" xml:id="S0.E1.m1.1.1.1.2.1.3"/>
                    <XMText xml:id="S0.E1.m1.1.1.1.2.1.2.1">tanh</XMText>
                  </XMApp>
                </XMCell>
              </XMRow>
            </XMArray>
          </XMDual>
        </XMath>
      </Math>
    </equation>
  </para>
</document>

Note in particular the "almost correct" id on the tanh XMText:

<XMText xml:id="S0.E1.m1.1.1.1.2.1.2.1">tanh</XMText>

@brucemiller
Copy link
Owner

I found a bug when collapsing ltx:XMText/ltx:text, where the id was being copied from the inner node, overwriting the correct one. Fixed and checked in; this probably happened fairly commonly, but it will be interesting to see how many of the Fatals it caused.

@dginev
Copy link
Collaborator Author

dginev commented Nov 10, 2016

Indeed, fixes this example, thanks!

Seemed a bit too tricky to trace for me. I have downloaded about 20 examples of the Fatal:expected:id problem and will test each now, I'll write down any new cases in this issue, and close if they're all OK.

@dginev
Copy link
Collaborator Author

dginev commented Nov 10, 2016

Only 2 out of my sample of 20 still exhibit the Fatal, which is pretty neat.

I have extracted two minimal examples out of each:

Example 1

\documentclass{article}
\begin{document}
\begin{eqnarray}
1. \line(0,1){60} &=& 2
\end{eqnarray}
\end{document}

Minimal post-processing log snip (there is a malformed line error in core processing prior to this):

Warning:uninitialized:$id Use of uninitialized value $id  in hash element
    at /home/dreamweaver/perl5/lib/perl5/LaTeXML/Post.pm line 1303
    In Post::Document[test.html] ->findNodeByID

Warning:uninitialized:$id Use of uninitialized value $id  in concatenation (.) or string
    at /home/dreamweaver/perl5/lib/perl5/LaTeXML/Post.pm line 1328
    In Post::Document[test.html] ->realizeXMNode

Fatal:expected:id Cannot find a node with xml:id=''
    In Post::Document[test.html] ->realizeXMNode
     <= Post::MathML::Content[@0x563bcf191180] <= Post::MathML::Presentation[@0x563bcf3...

Example 2

\documentclass{article}

\setbox0=\hbox{
  \begin{picture}(3624,690)(289,-544)
  \put(3301, 14){\makebox(0,0){$k'^4$}}
  \end{picture}
}

\begin{document}
\begin{eqnarray}
1 &=& \,\,\hbox{\box0}
\end{eqnarray}
\end{document}

This one is making my head spin a bit, it requires an interplay of two hboxes, a setbox, the picture environment, and the bizarre formula k'^4 .

Minimal post-processing log snip:

Warning:unexpected:nested-math We're getting m:math nested within an m:mtext
    In Post::MathML::Presentation[@0x55fd3f7... ->pmml_top

Warning:unexpected:nested-math We're getting m:math nested within an m:mtext
    In Post::MathML::Presentation[@0x55fd3f7... ->pmml_top

Fatal:expected:id Cannot find a node with xml:id='S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.2'
    In Post::Document[test.html] ->realizeXMNode
     <= Post::MathML::Content[@0x55fd3f8be460] <= Post::MathML::Presentation[@0x55fd3f7...

@brucemiller I'll try to take a look at the first example. If we can resolve these two cases, I'll start a rerun on the Fatals + the new arXiv articles. Exciting!

@dginev
Copy link
Collaborator Author

dginev commented Nov 10, 2016

Interesting, so the problem behind the first example comes from <line> appearing in a place where one expects a formula subtree. Maybe this calls for some artificial addition of an XMWrap wrapper around unexpected elements in formulas? Or alternatively drop the content altogether from XMDuals?

Error in question is:

Error:malformed:ltx:line <ltx:line> isn't allowed here
[... snip ...]
Warning:expected:id Missing idref on ltx:XMRef

and the XML tree missing an idref is this XMDual:

<XMDual xml:id="S0.E1.m1.1.3">
  <XMApp xml:id="S0.E1.m1.1.3.1">
    <XMTok meaning="formulae" xml:id="S0.E1.m1.1.3.1.1"/>
    <XMRef idref="S0.E1.m1.1.1" xml:id="S0.E1.m1.1.3.1.2"/>
    <XMRef xml:id="S0.E1.m1.1.3.1.3"/>
  </XMApp>
  <XMWrap>
    <XMTok meaning="1" role="NUMBER" xml:id="S0.E1.m1.1.1">1</XMTok>
    <XMTok role="PERIOD" xml:id="S0.E1.m1.1.2">.</XMTok>
    <line points="0,0 0,60" stroke-width="0.4"/>
  </XMWrap>
</XMDual>

@dginev
Copy link
Collaborator Author

dginev commented Nov 10, 2016

The second example seems to be related to MathFork and the way IDs are suffixed with .mf in there. The suffix never enters XMath deposited inside <picture> elements, which I am assuming is the reason for the post-processing fatal. Here is the largish XML in question:

<equationgroup class="ltx_eqn_eqnarray" xml:id="S0.EGx1">
  <equation frefnum="(1)" refnum="1" xml:id="S0.E1">
    <MathFork>
      <Math tex="\displaystyle 1=\,\,\hbox{\hbox{\begin{picture}(3624.0,690.0)(289.0,-544.0)%&#10;\put(3301.0,14.0){\makebox(0.0,0.0){$k^{\prime 4}$}}\end{picture}}}" text="1 =   * [k′⁣4]" xml:id="S0.E1.m4">
        <XMath xml:id="S0.E1.m4.1">
          <XMApp xml:id="S0.E1.m4.1.1">
            <XMTok meaning="equals" role="RELOP" xml:id="S0.E1.m2.1.1.mf">=</XMTok>
            <XMTok meaning="1" role="NUMBER" xml:id="S0.E1.m1.1.1.mf">1</XMTok>
            <XMApp xml:id="S0.E1.m4.1.1.1">
              <XMTok meaning="times" role="MULOP" xml:id="S0.E1.m4.1.1.1.1">⁢</XMTok>
              <XMTok role="UNKNOWN" rpadding="1.7pt" xml:id="S0.E1.m3.1.1.mf"> </XMTok>
              <XMText xml:id="S0.E1.m3.1.3.mf"><picture fill="black" height="690.0pt" stroke="black" tex="\begin{picture}(3624.0,690.0)(289.0,-544.0)\put(3301.0,14.0){\makebox(0.0,0.0)%&#10;{$k^{\prime 4}$}}\end{picture}" unitlength="1.0pt" width="3624.0pt" xml:id="S0.E1.m3.1.3.1.1.pic1">
                  <g transform="translate(-289,544)">
                    <g transform="translate(3301,14)">
                      <g height="0" width="0">
                        <Math mode="inline" tex="k^{\prime 4}" text="k ^ (list@(prime1, 4))" xml:id="S0.E1.m3.1.3.1.1.pic1.m1">
                          <XMath xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1">
                            <XMApp xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.3">
                              <XMTok role="SUPERSCRIPTOP" scriptpos="post5" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.3.1"/>
                              <XMTok font="italic" role="UNKNOWN" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.1">k</XMTok>
                              <XMDual xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1">
                                <XMApp xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.3">
                                  <XMTok meaning="list" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.3.1"/>
                                  <XMRef idref="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.4" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.3.2"/>
                                  <XMRef idref="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.2" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.3.3"/>
                                </XMApp>
                                <XMWrap>
                                  <XMTok name="prime1" role="SUPOP" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.4">′</XMTok>
                                  <XMTok role="PUNCT" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.5">⁣</XMTok>
                                  <XMTok fontsize="70%" meaning="4" role="NUMBER" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.2">4</XMTok>
                                </XMWrap>
                              </XMDual>
                            </XMApp>
                          </XMath>
                        </Math>
                      </g>
                    </g>
                  </g>
                </picture></XMText>
            </XMApp>
          </XMApp>
        </XMath>
      </Math>
      <MathBranch>
        <tr xml:id="S0.E1.1">
          <td align="right" xml:id="S0.E1.1.1"><Math mode="inline" tex="\displaystyle 1" text="1" xml:id="S0.E1.m1">
              <XMath xml:id="S0.E1.m1.1">
                <XMTok meaning="1" role="NUMBER" xml:id="S0.E1.m1.1.1">1</XMTok>
              </XMath>
            </Math></td>
          <td align="center" xml:id="S0.E1.1.2"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S0.E1.m2">
              <XMath xml:id="S0.E1.m2.1">
                <XMTok meaning="equals" role="RELOP" xml:id="S0.E1.m2.1.1">=</XMTok>
              </XMath>
            </Math></td>
          <td align="left" xml:id="S0.E1.1.3"><Math mode="inline" tex="\displaystyle\,\,\hbox{\hbox{\begin{picture}(3624.0,690.0)(289.0,-544.0)\put(3%&#10;301.0,14.0){\makebox(0.0,0.0){$k^{\prime 4}$}}\end{picture}}}" text="  * [k′⁣4]" xml:id="S0.E1.m3">
              <XMath xml:id="S0.E1.m3.1">
                <XMApp xml:id="S0.E1.m3.1.4">
                  <XMTok meaning="times" role="MULOP" xml:id="S0.E1.m3.1.4.1">⁢</XMTok>
                  <XMTok role="UNKNOWN" rpadding="1.7pt" xml:id="S0.E1.m3.1.1"> </XMTok>
                  <XMText xml:id="S0.E1.m3.1.3"><picture fill="black" height="690.0pt" stroke="black" tex="\begin{picture}(3624.0,690.0)(289.0,-544.0)\put(3301.0,14.0){\makebox(0.0,0.0)%&#10;{$k^{\prime 4}$}}\end{picture}" unitlength="1.0pt" width="3624.0pt" xml:id="S0.E1.m3.1.3.1.1.pic1">
                      <g transform="translate(-289,544)">
                        <g transform="translate(3301,14)">
                          <g height="0" width="0">
                            <Math mode="inline" tex="k^{\prime 4}" text="k ^ (list@(prime1, 4))" xml:id="S0.E1.m3.1.3.1.1.pic1.m1">
                              <XMath xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1">
                                <XMApp xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.3a">
                                  <XMTok role="SUPERSCRIPTOP" scriptpos="post5" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.3a.1"/>
                                  <XMTok font="italic" role="UNKNOWN" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.1">k</XMTok>
                                  <XMDual xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1">
                                    <XMApp xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.3a">
                                      <XMTok meaning="list" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.3a.1"/>
                                      <XMRef idref="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.4a" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.3a.2"/>
                                      <XMRef idref="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.2" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.3a.3"/>
                                    </XMApp>
                                    <XMWrap>
                                      <XMTok name="prime1" role="SUPOP" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.4a">′</XMTok>
                                      <XMTok role="PUNCT" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.5a">⁣</XMTok>
                                      <XMTok fontsize="70%" meaning="4" role="NUMBER" xml:id="S0.E1.m3.1.3.1.1.pic1.m1.1.2.1.2">4</XMTok>
                                    </XMWrap>
                                  </XMDual>
                                </XMApp>
                              </XMath>
                            </Math>
                          </g>
                        </g>
                      </g>
                    </picture></XMText>
                </XMApp>
              </XMath>
            </Math></td>
        </tr>
      </MathBranch>
    </MathFork>
  </equation>
</equationgroup>

@brucemiller
Copy link
Owner

So, the first example has picture markup outside of a picture environment. I'd once tried to make ltx:picture autoOpen, but it caused inf.recursion and other havoc. The solution seems to be to make it autoOpen, but a less-preferred route in most cases. I think I got that working right, w/o other damage.

The second example was a case of the tree being rearranged, but not correctly keeping track of ids, so that when the fork was being created, it didn't realize all the picture nodes already had their ids used. Also should be fixed.

So, these fixes ought to handle a lot of those fatals, although there could be other corner cases. But at least the 1st fix should likely fix some regular Error's, for malformed (of course, not all). So, if you're setting up a rerun of arXiv, you might want to include those!

Whew! OK, so I needed some programming fun, and wanted to fix this stuff, but I'll have to get back to the grind...

@dginev
Copy link
Collaborator Author

dginev commented Nov 11, 2016

Thanks!!!

@dginev dginev closed this as completed Nov 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants