Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Branch: gh-pages
Fetching contributors…

Cannot retrieve contributors at this time

106 lines (88 sloc) 3.795 kB
<!doctype html>
<html lang=en>
<meta charset=utf-8>
<title>Ambiguities in the "data" URL scheme</title>
<link rel=stylesheet href=https://www.whatwg.org/style/specification>
<h1>Ambiguities in the <code>data</code> URL scheme</h1>
<p>Last updated on [DATE].
<dl>
<dt>Feedback:</dt>
<dd><a href="https://github.com/SimonSapin/data-urls/issues">File an issue</a>
<dd><a href="https://wiki.whatwg.org/wiki/IRC">IRC: #whatwg on Freenode</a>
</dl>
<p>
The <code>data</code> URL scheme is defined by
<a href="http://tools.ietf.org/html/rfc2397">RFC 2397</a>,
which unfortunately is vague regarding many details of the syntax.
This document lists some of the details that should be specified in a future,
more precise specification.
<p>
See also
<a href="https://www.w3.org/Bugs/Public/show_bug.cgi?id=19494">Bug 19494</a>
on the W3C Bugzilla
and other stuff linked from there.
<ul>
<li>
If the URL has a <a href="http://url.spec.whatwg.org/#concept-url-query">query</a>,
the <code>?</code> separator and the query string should be part of the input
to the <code>data</code> URL parsing algorithm.
<li>
However if the URL has a <a href="http://url.spec.whatwg.org/#concept-url-fragment">fragment</a>,
the <code>#</code> separator and the fragment identifier string should
<strong>not</strong> be part of the input.
Instead, the fragment identifier has the meaning and behavior
as it would e.g. with an <code>http</code> URL.
<li>
Although it is often not necessary,
<a href="https://url.spec.whatwg.org/#percent-encoded-bytes">percent-encoding</a>
still applies to base64-encoded <code>data</code> URLs.
<li>
The first U+002C comma of the input separates the MIME type from the data.
Does this still apply in that comma is inside a MIME quoted string for a parameter value?
Example: <code>data:text/plain;foo="bar,baz";charset=utf8,body</code>
<li>
What about a percent-encoded comma?
Example: <code>data:text/plain;foo=bar%2Cbaz;charset=utf8,body</code>
<li>
<p>How strictly should the parser look for <code>;base64</code>?
Examples:
<pre>
data:text/plain;base64,Rm9vCg==
data:text/plain; base64,Rm9vCg==
data:text/plain;base64 ,Rm9vCg==
data:text/plain;base 64,Rm9vCg==
data:text/plain;Base64,Rm9vCg==
data:text/plain;%62ase64,Rm9vCg==
data:text/plain%3Bbase64,Rm9vCg==
</pre>
<p>When RFC 2397 says:
<blockquote>
The ";base64" extension is distinguishable from a content-type
parameter by the fact that it doesn't have a following "=" sign.
</blockquote>
<p>Does this mean that other MIME parsing rules apply?
<li>
How should percent-encoding interact with MIME type parsing?
Examples:
<pre>
data:text/plain;charset=utf8,%F0%9F%92%A9
data:text/plain%3Bcharset=utf8,%F0%9F%92%A9
data:text/plain;charset%3Dutf8,%F0%9F%92%A9
data:text/plain;charset="utf8%22,%F0%9F%92%A9
data:text/plain;charset=utf8,%F0%9F%92%A9
</pre>
<li>
Although RFC 2397 doesn’t bother with a normative reference,
base64 in IETF-land is defined by <a href="https://tools.ietf.org/html/rfc4648">RFC 4648</a>,
which defines both <em>The Base 64 Alphabet</em>
and <em>The "URL and Filename safe" Base 64 Alphabet</em>.
Which of them should be used?
The former looks like the one to be used by default, but the latter sounds kinda relevant.
Or should both of them be accepted?
<li>
What should happen to non-alphabet characters in base64 data?
Options include ignoring them, or making parsing fail (return the equivalent of a network error.)
Should this differ for whitespace and other non-alphabet characters?
<li>
What should happen if base64 data has too little padding (including none) or too much?
</ul>
Jump to Line
Something went wrong with that request. Please try again.