Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Fetching contributors…
Cannot retrieve contributors at this time
106 lines (88 sloc) 3.71 KB
<!doctype html>
<html lang=en>
<meta charset=utf-8>
<title>Ambiguities in the "data" URL scheme</title>
<link rel=stylesheet href=>
<h1>Ambiguities in the <code>data</code> URL scheme</h1>
<p>Last updated on [DATE].
<dd><a href="">File an issue</a>
<dd><a href="">IRC: #whatwg on Freenode</a>
The <code>data</code> URL scheme is defined by
<a href="">RFC 2397</a>,
which unfortunately is vague regarding many details of the syntax.
This document lists some of the details that should be specified in a future,
more precise specification.
See also
<a href="">Bug 19494</a>
on the W3C Bugzilla
and other stuff linked from there.
If the URL has a <a href="">query</a>,
the <code>?</code> separator and the query string should be part of the input
to the <code>data</code> URL parsing algorithm.
However if the URL has a <a href="">fragment</a>,
the <code>#</code> separator and the fragment identifier string should
<strong>not</strong> be part of the input.
Instead, the fragment identifier has the meaning and behavior
as it would e.g. with an <code>http</code> URL.
Although it is often not necessary,
<a href="">percent-encoding</a>
still applies to base64-encoded <code>data</code> URLs.
The first U+002C comma of the input separates the MIME type from the data.
Does this still apply in that comma is inside a MIME quoted string for a parameter value?
Example: <code>data:text/plain;foo="bar,baz";charset=utf8,body</code>
What about a percent-encoded comma?
Example: <code>data:text/plain;foo=bar%2Cbaz;charset=utf8,body</code>
<p>How strictly should the parser look for <code>;base64</code>?
data:text/plain; base64,Rm9vCg==
data:text/plain;base64 ,Rm9vCg==
data:text/plain;base 64,Rm9vCg==
<p>When RFC 2397 says:
The ";base64" extension is distinguishable from a content-type
parameter by the fact that it doesn't have a following "=" sign.
<p>Does this mean that other MIME parsing rules apply?
How should percent-encoding interact with MIME type parsing?
Although RFC 2397 doesn’t bother with a normative reference,
base64 in IETF-land is defined by <a href="">RFC 4648</a>,
which defines both <em>The Base 64 Alphabet</em>
and <em>The "URL and Filename safe" Base 64 Alphabet</em>.
Which of them should be used?
The former looks like the one to be used by default, but the latter sounds kinda relevant.
Or should both of them be accepted?
What should happen to non-alphabet characters in base64 data?
Options include ignoring them, or making parsing fail (return the equivalent of a network error.)
Should this differ for whitespace and other non-alphabet characters?
What should happen if base64 data has too little padding (including none) or too much?
Jump to Line
Something went wrong with that request. Please try again.