Skip to content

Initial cdn investigation

pkra edited this page Apr 17, 2012 · 1 revision

This document is now archival.

It set out requirements and considerations for picking a CDN hosting service, as well as some information about the CDN hosts we looked at. We have selected CloudFront, part of the Amazon Web Services.

Requirement: The HTTP Header Issue

In order to allow script and font code to be hosted on a separate site from the HTML page content, Mozilla-based browsers require the use of a custom HTTP header, Access-Control-Allow-Origin: * defined by the W3C Cross-Origin Resource Sharing (http://www.w3.org/TR/access-control/)

Unfortunately, the most popular CDN services do not provide a way to customize HTTP headers for hosted content. This applies to both fonts and scripts. For example, we have determined that the MJ fonts hosted at Amazon S3 used by the GitHub forum software is not accessible via Mozilla-based browsers, and one gets the fallback images instead. (They are not subject to the "same origin" restriction.)

At the moment, the options we can think of are:

  1. Find a CDN that does allow custom headers. See, for example, the list of CDN services on the Wikipedia CDN page
  2. Host it ourselves. This gains us only the benefits of having a central public copy, but does not address network latency, as a network of geographically distributed mirror servers would in a true CDN.
  3. Persuade an existing CDN to allow custom headers. Lots of others are trying to do this too. We have discussed using our contacts (such as they are) at Google, to see if they would host it as part of their Google APIs.
  4. Use data URLs. TypeKit does this in some instances. E.g. http://typekit.com/css/families/d/default/all/1049.css

Next Steps

  • Check on other CDNs
    • Rackspace doesn't support it (they offer cloud file storage that can be CDN enabled through Lime Networks. They pointed out we could develop a custom app to customize the headers, but then I don't think it would be CDN enabled
    • Edge Cast seems to. Following up. They allow us to use our own server as the content origin, and push it up to a network of HTTP caching proxies.
      • performance was good
      • good user interface
      • no logging of referrer URLs for objects < 1MB (important!)
      • powerful rules to allowing access to CDN content (probably capable of blocking high-volume sites if necessary)
      • can gzip content for a custom list of MIME types, though it would take a support request for the web font types.
      • Easy configuration of CNAMES to have cdn.mathjax.org point to the CDN network
      • Fairly pricey, ~$350/month for 500GB depending on various options
    • Amazon CloudFront just added the capability to use a web server as the content origin , instead of S3. It looks like this enabled cross-domain use of web fonts by passing on the custom headers on the origin server.
      • decent performance
      • UI is a work in progress by comparison to EdgeCast
      • logs for referrer URLs are available
      • seems to pass through gzip settings of content origin
      • Need to investigate CNAMES
      • Less expensive, ~$100/month for 500

In the end, CouldFront was the best choice. See other docs in this section of the site

Versioning and Updating Plan

Working out a good versioning and updating plan will be key. This should be tied to our source control plan, and the build/test/release process.

Next Steps

  • Robert will put together a source control page to get the design started
  • When our build/test engineer begins, this will be a work item for him
  • This became the Directory Structure document

Data Mining Support

We would like to collect statistics about MathJax usage to both guide future development decisions and to develop semantic services. There are two aspects

Technical Capabilities

  • We can identify web pages and sites using MathJax via server logs
  • We would like to enable automatic error reporting to monitor problems due to new/updated browsers
  • Ideally, we would like to collect usage statistics about particular MathJax capabilities, e.g. how often is cut and paste used, what percentage of pages use TeX, etc. In many cases, we can answer these questions by examining which script files a page accesses, e.g. the TeX output jax or the MathML output jax. However, other capabilities would require instrumentation and automated reporting. Instrumentation and reporting also has a performance hit.
  • We are using the normal CloudFront http logs, but haven't done anything about usage stats or automatic error reporting in v1.1

Privacy Policy

Use of the CDN would imply acceptance of terms of service. While the hosting is separate from the open source software, privacy will nonetheless be an issue for many users. The policy needs to be clearly stated. A major dividing line lies between passing monitoring via web logs and active monitoring through instrumentation and reporting. The policy clearly depends on a number of non-technical factors, and some basic decisions are required before finalizing the technical design.

This became the Terms of Service agreement on mathjax.org

Clone this wiki locally