Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paste from External Sources (MathJax in the wild, Wikipedia, etc.) #17

Open
benrbray opened this issue Apr 1, 2021 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@benrbray
Copy link
Owner

benrbray commented Apr 1, 2021

It would be great to modify the default paste behavior to automatically detect math markup in HTML pasted from external sources. Unfortunately, the solution will be messy, as there is not yet a universally-accepted way to render math on the web.

See Robert Miner 2010, "MathType, Math Markup, and the Goal of Cut and Paste" for a brief summary of the challenges faced in this area. Here's an except from one of the slides:

Math on the Web formats in the Wild

  • Image with TeX code (alt tags, comments, urls)
  • Some content is in text (HTML math, TeX source, ASCII art)
  • Some is in the DOM (MathML, s and CSS)

The following tasks are relatively low-effort and high-reward:

  • support pasting MathML expressions when the source TeX code is included as an annotation (this seems to be a standard feature in some mathjax configurations)
  • support pasting inline math images from Wikipedia when there is an alt tag present

Some higher-effort tasks:

Things to be cautious of:

  • MathJax and KaTeX both include the same math expression multiple times in the same block, e.g. rendered as MathML and SVG simultaneously for compatibility reasons. We need to identify the common parent element and ensure that it is replaced by a single math expression, rather than two or three.
  • Pasting behavior between different browsers

Here are some places we might expect users to paste from:

  • Wikipedia: Extremely inconsistent -- pages have a mix of MathJax, HTML math, and pre-rendered images
  • StackExchange: Uses MathJax. The source code is evidently stored in a <script type="math/tex; mode=display"> tag within a .math-container-classed element.
  • ncatlab: Uses MathJax. Source is stored in a <annotation encoding="application/x-tex"> tag.
  • Planet Math: Uses MathJax, with some weird layouts. Display math is sometimes wrapped in a <table class="ltx_equation ltx_eqn_table"> element. The MathML node has an an alttext attribute containing the TeX source.
  • arXiv: (example) Uses MathJax with the source stored in a <script type="math/tex"> tag.
  • ProofWiki: uses MathJax with source in a <script type="math/tex"> tag
  • Google Docs: ???
  • Microsoft Word: ???
@benrbray benrbray added the enhancement New feature or request label Apr 1, 2021
@benrbray
Copy link
Owner Author

benrbray commented Apr 3, 2021

I started to implement pasting of math from Wikipedia using a custom ProseMirror ParseRule (and the .getContent property), but ran into some unexpected behavior where the pasted math nodes all come up empty. I started a question on the ProseMirror forum which will hopefully resolve the issue.

@benrbray
Copy link
Owner Author

benrbray commented Apr 6, 2021

This website has math rendered using Madoko, which renders math and diagram SVGs server-side and includes them the following format:

<svg class="snippet math-display math-render-svg math" data-math-full="true" style="..." viewBox="...">
    <desc>\begin{tikzpicture}
    \matrix[nodes={draw}, row sep=0.3cm,column sep=0.5cm] {
      \node [rectangle, draw=none] (eq) {$a = b, b = c, d = e, b = s, d = t: $};&
      \node [circle, draw] (abcs) {$a, b, c, s$}; &
      \node [circle, draw] (det) {$d, e, t$}; \\
    };
    \end{tikzpicture}
    </desc>
    <g id="math-a6e187">...</g>
</svg>

This example contains an SVG rendering of a tikz diagram, which is obviously problematic for KaTeX, which is the current default. Once MathJax is supported, an extension like TikzJax can be used to render diagrams.

UPDATE: It won't be possible to paste from documents rendered with Madoko. The TeX source is contained in a <desc> tag within an SVG element, and apparently the <desc> tags are stripped away in both Chrome and Firefox when copying.

@benrbray
Copy link
Owner Author

benrbray commented Apr 6, 2021

UPDATE: StackExchange keeps its TeX code in <script type="math/tex"> tags, but these are stripped away when copying for security reasons. To copy from StackExchange, we'll need to parse the MathML directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant