Skip to content

Understanding mathjax performance

Peter Krautzberger edited this page Aug 5, 2013 · 13 revisions

This page gives an overview of the different aspects that affect MathJax performance as well as several potential development options to improve performance.

"Real" size

A full download of the MathJax code is ~22MB, but most of it is due to the legacy image fonts (~9.5MB), , the unpacked folder (containing the code before it was compressed -- ~4.1MB), and the configuration folder (~2.8MB -- most pages need only one configuration file but more later).

In other words, what's "really" MathJax, is MathJax.js as well as the extensions, localization and jax folders, and the webfonts -- summing up to ~5MB.

However, MathJax will never actually need all of these 5MB. E.g., we offer webfonts in 4 formats, which exist for specific (older) browsers that can't use the current webfonts standard -- woff.

So as a first approximation: "all of MathJax", i.e., all input and output options and their extensions that a user would ever have to download is ~3.5MB.

But in real life most pages only use 1 input + 1 output jax, which is ~1.5MB (and sending compressed files should bring it down to ~650KB).

As a comparison: the average web page is ~1.5MB in June 2013 according to the http-archive.

Effective size and MathJax's modularity

The effective load a visitor experiences is lower still since most pages don't use all MathJax features at once.

MathJax is highly modular even within a single input or output option. MathJax will only load those components which are actually needed for the mathematical content found on a page.

For example, if MathJax is configured to render TeX input to HTML output, it won't load the components needed for certain LaTeX packages unless there's content in the page using them. Similarly, it will only load those webfont files containing the characters actually needed.

The same principle applies to multiple input options: if e.g. the configuration allows both MathML and TeX input, but the page only contains MathML, then the main TeX processing code will not be loaded (only a small configuration file that allows TeX to be loaded had it appeared on the page).

We do not have specific data, but we estimate that the effective size is 500kb (or 1MB uncompressed).

We need to balance the benefit of modularizing with the number of network connections.

We offer some ways for authors to optimize the effective size via the combined MathJax configuration files (see below).

But this balance must be revisited regularly and more options could help.

Caching

In addition to the size of the MathJax components, browser caching improves performance after the first load.

Once any MathJax components have downloaded, they will remain in the browser cache for a specific time (usually 1 week) so a visitor will usually only download them on the visit to the very first page using MathJax and skip this particular performance drain in later visits.

While browsers separate their caching per domain (for security reasons), the MathJax CDN let's page authors benefit from each other: if a user visits one site using the CDN, than any other site using the CDN will benefit from the MathJax components already cached during the visit to the first site.

While not helping performance on an initial visit to a site, caching improves speed on any future visit.

Alternative and additional caching methods (implemented as part of MathJax) could expand and optimize this performance benefit.

Optimizing loading via configuration files

The download of MathJax components can be optimized by the page author.

On the one end of the spectrum, we provide combined configuration files which compile specific input and output components into a single file. These are useful for page authors who know exactly which MathJax components their content will require.

As the name suggests, combined configuration files combine various components into one large file. This allows page authors to specify the components they want to load up front as one big file rather than many parallel files later, speeding up processing. For example, the TeX-AMS_HTML configuration file loads the TeX-input with its AMS-math extensions as well as configuring the HTML-output.

On the other end of the spectrum, a page author who wants everything to load asynchronously can use extremely light configurations which leave it to MathJax to queue the download of its components. This is often good for community sites that have pages with math, but also pages without it.

Many sites do not configure MathJax efficiently. We could provide tools to analyze configurations and create more options.

MathJax Processing

MathJax processing of a page has three stages, one pre-processing and two processing stages as described at http://docs.mathjax.org/en/latest/model.html.

Pre-processing.

Pre-processing identifies mathematical content on a page (MathML, TeX, different TeX-delimiters etc) and converts it into a standard input format (script-tags). While this pre-processing can be done server-side, it's not a bottleneck and very little performance is gained by optimizing here.

Input-processing

An Input-Jax will process the input into MathJax's internal format (which is essentially MathML).

This process is already very fast. While it could theoretically benefit from parallelization (e.g. via webworker), the benefits will only be noticeable in pages with a very large amount of mathematical content or extremely large equations (e.g. we've seen a 80,000 line MathML equation a while ago). Other bottlenecks are much more critical.

Since the input processors are modular, network latency can create delays as components are loaded as they are needed. This is the core problem of balancing modularity vs network activity and needs to be revisited as network speed and processing power develop. We also need to develop more quantitative tools to make it easier to analyze the trade-offs.

Because network connections for different users vary (e.g., mobile users have much slower connections, in general), there is no "one size fits all" solution to this problem. The settings that work best for a user with a desktop computer on a high-speed network may not be the best ones for a tablet user on a wi-fi network.

Output-processing.

The third part of MathJax processing is the generation of its output which currently comes in one of two forms: HTML-CSS or SVG.

The output generation is the second performance bottleneck of MathJax.

The key problem with the MathJax output lies in the fact that math layout is a bottom-up process while HTML-CSS display is a top-down process. CSS layout algorithm determines the width of a parent element and then descends to its children to determine their widths and later on determines the heights. This limits the speed of output one can gain with current HTML methods.

MathJax essentially implements the Knuth-Plass algorithm, which goes bottom-up, determining the widths and heights of the children before determining the width and height of a parent.

This is the core problem: top-down vs bottom-up.

However, SVG is often ~25% faster than HTML which is due to an additional problem with HTML layout. While the SVG output can reliably calculate relative sizes within an equation internally, browser limitations prevent the HTML/CSS output from determining these successfully.

First, browsers do not reliably allow the calculation of width -- simply put, the sum of the width of characters is not the width of the string as it's laid out by the browser. To get around this, MathJax has to measure the substrings/subequations by laying them out and asking the browser to measure them. This problem naturally occurs recursively and shows dramatically in complex equations.

Next, browsers do not provide javascript access to all font metrics (let alone modern features like OpenMath tables). That's why MathJax need to provide the metrics separately, which is the reason why MathJax only supports a handful of fonts.

To work around the width issues, MathJax recursively asks the browser to layout & measure subexpressions. This is a performance drain, as browsers are not designed to layout content repeatedly.

While widths can be measured correctly as mentioned above, heights cannot be measured accurately since browsers provide only the font height/depth (the maximal height/depth of any character in the font). Since this is the same for every character in the font, MathJax has to compensate for these incorrect measurements itself.

Preliminary tests have shown that deactivating these measurements will speed up the HTML output to the level of the SVG output. However, this will currently come at a loss of rendering quality (although the preliminary tests have shown that modern browsers do a much better job than those in place when MathJax was initially conceived). We can work with browser vendors to improve things on their end, e.g. the Chrome team seems interested in this; the necessary browser improvements could increase typesetting quality in browsers in general.

Ways forward

While MathJax is a large javascript library, the effective size is much smaller in practice thanks to its modular structure. But this modular structure adds overhead. The rendering process itself is complex and can be slow on older and mobile devices.

We face a difficult situation: on slow machines (like mobile), the download of MathJax is overshadowed by slow rendering whereas on fast machines rendering is overshadowed by network calls for missing components.

Options for moving forward

The following are not exclusive to each other.

Optimizing MathJax loading

On current desktop/laptop CPUs, the dominant performance issues are the delays due to asynchronous download of components. This also affects mobile even if actual rendering performance is a problem there.

We should investigate how to optimize this. Some ideas are

  • improved combined files
    • Creating better combined configuration files as well as tools for page authors to build optimized packages would reduce latency issues.
  • lazy pre-loading
    • Creating an option for MathJax components to download in the background after a page has finished. This would improve performance on subsequent pages and dynamically created content.
      • in particular webfonts could be loaded separately
  • perform component loading in parallel
    • Currently, in an expression loads an extension, processing waits until that extension is loaded; instead, Mathjax could continue to process other equations while the needed component is being delivered.

Optimizing the current output algorithm

One way forward is seek new ways to optimize the SVG and HTML output.

As mentioned, the SVG output is often 25% faster than the HTML output. The HTML output could catch up to SVG if certain measurements could be dropped. But this can only succeed if browsers themselves become more reliable.

Another idea is to re-use formulas or even subexpressions, i.e., if an equation appears multiple times in a page, we could try to only render it once. It's not clear how much of an advantage this is. Most formulas do not appear the exact same way over and over again. Small changes in CSS (size, positioning) could damage the quality when re-using.

We can investigate current javascript optimization techniques.

We can also develop speed profiling tools for content providers to narrow down performance problems related to MathJax on individual sites.

Optimizing perceived performance

Both the latency and performance issues are especially a perception problem. Even though the page is readable quickly, users perceive the processing as slow.

By tweaking the way content appears on the page, we could reduce the impression.

  • multi-pass layout
    • We can add a first "quick & dirty" rendering and then re-render until full TeX-quality is achieved.
    • We could provide tools for server-side pre-processing of SVG which gets replaced with MathJax rendering on the fly.
  • rendering small equations before large ones
    • Due to the recursive nature of MathJax output, complex equations take much longer. In combination with equation-chunking (the number of equations MathJax will reveal on a page at once), this can lead to negative perceived performance. For example, a page rarely starts with a highly complex equation but usually has a number of small inline equations before a complicated one shows up. However, the chunking can prevent those small ones to show up until the large ones are typeset. A size-oriented chunking could reduce this problem.
  • local storage
    • Local storage could save rendered output and MathJax wouldn't have to re-typeset while a user browses back and forth.

Improving browser infrastructure

We can try to work with browser vendors to improve the browser behavior.

  • Enabling better webfont APIs (e.g., to reduce our hacks to detect webfonts arrival)
  • remove the width-measuring problems
  • allow javascript to access font metrics, openmath tables, etc, to become font agnostic
  • improve a new layout algorithm that is HTML-focused

The advantage would be that MathJax could help move browser vendors to enable better typesetting tools in general. This would be a big step forward in general.

Creating a new HTML output algorithm

A very basic problem is that the Knuth-Plass bottom-up algorithm we use has to work against the top-down HTML/CSS layout algorithms. This problem cannot be resolved and affects performance.

We could investigate a fundamentally new approach, letting the browser do the layout for us. The latest CSS modules such as flexbox could enable native rendering speed but are still in their infancy.

Dropping support for legacy browsers

An big questions is how much support for legacy browsers, in particular IE<9, is holding speed back. Browser JavaScript engines changed the way of optimizing javascript execution.

Caveat emptor: this would probably lead to a re-write of much of MathJax.

Remarks on speed

A simple test indicates that the output rendering speed varies greatly across platforms.

For example, on a 2011 macbook pro, rendering (no downloads, everything cached) of https://en.wikipedia.org/wiki/Matrix_multiplication

  • Chrome: html: ~2500ms, svg:~1850ms
  • Safari: html: ~1450ms, svg:~1000ms, mathml: ~300ms
  • Firefox: html: ~3300ms, svg:~2400ms, mathml:~880ms

Disabling the measurements that HTML needs but SVG doesn't, brings HTML output up to SVG speed (but comes at the cost of rendering quality).

Notes

Safari SVG output is already close to the performance of Firefox Native MathML. But it's hard to judge the 300ms for Safari's MathML since that implementation is incomplete -- it's easy to be fast if you're not doing the job. However, in Safari's defense, the page is within the range of its abilities.

Another comparison: a copy of the page, with SVG output and equation chunking to 100 (so that it's one go).

  • Nexus 7, Galaxy Nexus/ Chrome: ~18sec
  • Nexus 7 / Dolphin Browser: ~8.5sec
  • iPad (2013) / Safari: ~3sec
  • iPhone 4 / Safari: ~6sec

on ubuntu 13.04, i7:

  • Firefox 22: ~2sec
  • Chromium 28: ~1.6sec
  • Chrome 29: ~1.7sec
  • Windows 8 / IE 10: ~2.5sec (virtualbox + Microsoft's free testing VM)
  • Windows 7 / IE9: ~3sec (virtualbox + Microsoft's free testing VM)
  • Windows 7 / Firefox 22: ~3.8sec (virtualbox + Microsoft's free testing VM)
  • Windows 8.1 / IE11: ~3sec (virtualbox + release candidate)

A copy of the page with TeX converted to MathML was ~0.5-1 sec slower.

Clone this wiki locally