Permalink
Browse files

More HTML formating

  • Loading branch information...
1 parent 66151e5 commit 886dd7e31f32f37a4d9a6933b867a72b95555db4 Adam Barth committed Jul 28, 2011
Showing with 71 additions and 262 deletions.
  1. +71 −262 drafts/sniff.html
View
333 drafts/sniff.html
@@ -56,19 +56,26 @@ <h2 class="no-num no-toc" id=work-in-progress-&mdash;-last-update-27-july-2011>W
</div>
-<h2 class=no-num id=status-of-this-document>Status of this Document</h2>
-
-<p>TODO
-
-
<h2 class=no-num id=table-of-contents>Table of contents</h2>
<!--begin-toc-->
<ol class=toc>
- <li><a class=no-num href=#status-of-this-document>Status of this Document</a></li>
<li><a class=no-num href=#table-of-contents>Table of contents</a></li>
<li><a class=no-num href=#abstract>Abstract</a></li>
<li><a href=#introduction><span class=secno>1 </span>Introduction</a></li>
+ <li><a href=#conventions><span class=secno>2 </span>Conventions</a></li>
+ <li><a href=#metadata><span class=secno>3 </span>Metadata</a></li>
+ <li><a href=#web-pages><span class=secno>4 </span>Web Pages</a></li>
+ <li><a href=#text-or-binary><span class=secno>5 </span>Text or Binary</a></li>
+ <li><a href=#unknown-type><span class=secno>6 </span>Unknown Type</a>
+ <ol>
+ <li><a href=#signature-for-mp4><span class=secno>6.1 </span>Signature for MP4</a>
+ </ol>
+ </li>
+ <li><a href=#images><span class=secno>7 </span>Images</a></li>
+ <li><a href=#video><span class=secno>8 </span>Video</a></li>
+ <li><a href=#fonts><span class=secno>9 </span>Fonts</a></li>
+ <li><a href=#feed-or-html><span class=secno>10 </span>Feed or HTML</a></li>
<li><a class=no-num href=#acknowledgements>Acknowledgements</a></li>
</ol>
<!--end-toc-->
@@ -877,7 +884,7 @@ <h2 id=unknown-type><span class=secno>6 </span>Unknown Type</h2>
"text or binary" section, to avoid sniffing text/plain content as a type that
can be used for a privilege escalation attack.
-<h3 id=mp4-signature><span class=secno>6.1 </span>Signature for MP4</h3>
+<h3 id=signature-for-mp4><span class=secno>6.1 </span>Signature for MP4</h3>
<p>This section defines whether a sequence of <var>n</var> octets <dfn
id=matches-the-signature-for-mp4>matches the signature for MP4</dfn>.
@@ -1065,265 +1072,67 @@ <h3 id=feed-or-html><span class=secno>10 </span>Feed or HTML</h3>
<td>Let the sniffed-type be "application/atom+xml" and abort these steps.</td>
<td>feed</td>
</tr>
-+----------------------+------------------------------------+---------+
-| 72 64 66 3A 52 44 46 | Continue to the next step in this | rdf:RDF |
-| | algorithm. | |
-+----------------------+------------------------------------+---------+
- </artwork>
- <postamble>
- If none of the octet sequences above match the octets in s
- starting at pos, then let the sniffed-type be "text/html" and
- abort these steps.
- </postamble>
- </figure>
- </t>
-
- <t>Initialize RDF-flag to 0.</t>
-
- <t>Initialize RSS-flag to 0.</t>
-
- <t>If the octets with positions pos to pos+23 in s are exactly equal
- to 0x68, 0x74, 0x74, 0x70, 0x3A, 0x2F, 0x2F, 0x70, 0x75, 0x72, 0x6C,
- 0x2E, 0x6F, 0x72, 0x67, 0x2F, 0x72, 0x73, 0x73, 0x2F, 0x31, 0x2E,
- 0x30, 0x2F respectively (ASCII for "http://purl.org/rss/1.0/"), then:
- <list style="numbers">
- <t>Increase pos by 23.</t>
-
- <t>Set RSS-flag to 1.</t>
- </list>
- </t>
-
- <t>If the octets with positions pos to pos+42 in s are exactly equal
- to 0x68, 0x74, 0x74, 0x70, 0x3A, 0x2F, 0x2F, 0x77, 0x77, 0x77, 0x2E,
- 0x77, 0x33, 0x2E, 0x6F, 0x72, 0x67, 0x2F, 0x31, 0x39, 0x39, 0x39,
- 0x2F, 0x30, 0x32, 0x2F, 0x32, 0x32, 0x2D, 0x72, 0x64, 0x66, 0x2D,
- 0x73, 0x79, 0x6E, 0x74, 0x61, 0x78, 0x2D, 0x6E, 0x73, 0x23
- respectively (ASCII for
- "http://www.w3.org/1999/02/22-rdf-syntax-ns#"), then:
- <list style="numbers">
- <t>Increase pos by 42.</t>
-
- <t>Set RDF-flag to 1.</t>
- </list>
- </t>
-
- <t>Increase pos by 1.</t>
-
- <t>If RDF-flag is 1 and RSS-flag is 1, then let the sniffed-type be
- "application/rss+xml" and abort these steps.</t>
-
- <t>If pos points beyond the end of the octet stream s, then continue
- to step 19 of this algorithm.</t>
-
- <t>Jump back to step 13 of this algorithm.</t>
-
- <t>Let the sniffed-type be "text/html" and abort these steps.</t>
- </list>
- </t>
-
- <t>For efficiency reasons, implementations might wish to implement this
- algorithm and the algorithm for detecting the character encoding of HTML
- documents in parallel.</t>
- </section>
- </middle>
- <back>
- <references title="Normative References">
-<reference anchor="RFC2046">
-<front>
-<title abbrev="Media Types">
-Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types
-</title>
-<author initials="N." surname="Freed" fullname="Ned Freed">
-<organization>Innosoft International, Inc.</organization>
-<address>
-<postal>
-<street>1050 East Garvey Avenue South</street>
-<city>West Covina</city>
-<region>CA</region>
-<code>91790</code>
-<country>US</country>
-</postal>
-<phone>+1 818 919 3600</phone>
-<facsimile>+1 818 919 3614</facsimile>
-<email>ned@innosoft.com</email>
-</address>
-</author>
-<author initials="N." surname="Borenstein" fullname="Nathaniel S. Borenstein">
-<organization>First Virtual Holdings</organization>
-<address>
-<postal>
-<street>25 Washington Avenue</street>
-<city>Morristown</city>
-<region>NJ</region>
-<code>07960</code>
-<country>US</country>
-</postal>
-<phone>+1 201 540 8967</phone>
-<facsimile>+1 201 993 3032</facsimile>
-<email>nsb@nsb.fv.com</email>
-</address>
-</author>
-<date year="1996" month="November"/>
-<abstract>
-<t>
-STD 11, RFC 822 defines a message representation protocol specifying considerable detail about US-ASCII message headers, but which leaves the message content, or message body, as flat US-ASCII text. This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages to allow for
-</t>
-<t>
-(1) textual message bodies in character sets other than US-ASCII,
-</t>
-<t>
-(2) an extensible set of different formats for non-textual message bodies,
-</t>
-<t>(3) multi-part message bodies, and</t>
-<t>
-(4) textual header information in character sets other than US-ASCII.
-</t>
-<t>
-These documents are based on earlier work documented in RFC 934, STD 11 and RFC 1049, but extends and revises them. Because RFC 822 said so little about message bodies, these documents are largely orthogonal to (rather than a revision of) RFC 822.
-</t>
-<t>
-The initial document in this set, RFC 2045, specifies the various headers used to describe the structure of MIME messages. This second document defines the general structure of the MIME media typing sytem and defines an initial set of media types. The third document, RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text data in Internet mail header fields. The fourth document, RFC 2048, specifies various IANA registration procedures for MIME-related facilities. The fifth and final document, RFC 2049, describes MIME conformance criteria as well as providing some illustrative examples of MIME message formats, acknowledgements, and the bibliography.
-</t>
-<t>
-These documents are revisions of RFCs 1521 and 1522, which themselves were revisions of RFCs 1341 and 1342. An appendix in RFC 2049 describes differences and changes from previous versions.
-</t>
-</abstract>
-</front>
-<seriesInfo name="RFC" value="2046"/>
-<format type="TXT" octets="105854" target="http://www.rfc-editor.org/rfc/rfc2046.txt"/>
-</reference>
- <reference anchor="RFC2119">
- <front>
- <title abbrev="RFC Key Words">
- Key words for use in RFCs to Indicate Requirement Levels
- </title>
- <author initials="S." surname="Bradner" fullname="Scott Bradner">
- <organization>Harvard University</organization>
- <address>
- <postal>
- <street>1350 Mass. Ave.</street>
- <street>Cambridge</street>
- <street>MA 02138</street>
- </postal>
- <phone>- +1 617 495 3864</phone>
- <email>sob@harvard.edu</email>
- </address>
- </author>
- <date year="1997" month="March"/>
- <area>General</area>
- <keyword>keyword</keyword>
- <abstract>
- <t>In many standards track documents several words are used to
- signify the requirements in the specification. These words are
- often capitalized. This document defines these words as they
- should be interpreted in IETF documents. Authors who follow these
- guidelines should incorporate this phrase near the beginning of
- their document:
- <list>
- <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
- NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
- "OPTIONAL" in this document are to be interpreted as described
- in RFC 2119.</t>
- </list>
- </t>
- <t>Note that the force of these words is modified by the
- requirement level of the document in which they are used.</t>
- </abstract>
- </front>
- <seriesInfo name="BCP" value="14"/>
- <seriesInfo name="RFC" value="2119"/>
- <format type="TXT" octets="4723"
- target="ftp://ftp.isi.edu/in-notes/rfc2119.txt"/>
- <format type="HTML" octets="17491"
- target="http://xml.resource.org/public/rfc/html/rfc2119.html"/>
- <format type="XML" octets="5777"
- target="http://xml.resource.org/public/rfc/xml/rfc2119.xml"/>
- </reference>
- <reference anchor="RFC2616">
- <front>
- <title>Hypertext Transfer Protocol -- HTTP/1.1</title>
- <author initials="R." surname="Fielding" fullname="R. Fielding">
- <organization>University of California, Irvine</organization>
- <address><email>fielding@ics.uci.edu</email></address>
- </author>
- <author initials="J." surname="Gettys" fullname="J. Gettys">
- <organization>W3C</organization>
- <address><email>jg@w3.org</email></address>
- </author>
- <author initials="J." surname="Mogul" fullname="J. Mogul">
- <organization>Compaq Computer Corporation</organization>
- <address><email>mogul@wrl.dec.com</email></address>
- </author>
- <author initials="H." surname="Frystyk" fullname="H. Frystyk">
- <organization>MIT Laboratory for Computer Science</organization>
- <address><email>frystyk@w3.org</email></address>
- </author>
- <author initials="L." surname="Masinter" fullname="L. Masinter">
- <organization>Xerox Corporation</organization>
- <address><email>masinter@parc.xerox.com</email></address>
- </author>
- <author initials="P." surname="Leach" fullname="P. Leach">
- <organization>Microsoft Corporation</organization>
- <address><email>paulle@microsoft.com</email></address>
- </author>
- <author initials="T." surname="Berners-Lee"
- fullname="T. Berners-Lee">
- <organization>W3C</organization>
- <address><email>timbl@w3.org</email></address>
- </author>
- <date month="June" year="1999"/>
- </front>
- <seriesInfo name="RFC" value="2616"/>
- </reference>
- </references>
- <references title="Informative References">
- <reference anchor="RFC0959">
- <front>
- <title abbrev="File Transfer Protocol">File Transfer Protocol</title>
- <author initials="J." surname="Postel" fullname="J. Postel">
- <organization>Information Sciences Institute (ISI)</organization>
- </author>
- <author initials="J." surname="Reynolds" fullname="J. Reynolds">
- <organization/>
- </author>
- <date year="1985" day="1" month="October"/>
- </front>
- <seriesInfo name="STD" value="9"/>
- <seriesInfo name="RFC" value="959"/>
- <format type="TXT" octets="147316" target="http://www.rfc-editor.org/rfc/rfc959.txt"/>
- </reference>
- <reference anchor="BarthCaballeroSong2009" target="http://www.adambarth.com/papers/2009/barth-caballero-song.pdf">
- <front>
- <title>Secure Content Sniffing for Web Browsers, or How to Stop
- Papers from Reviewing Themselves</title>
- <author initials="A." surname="Barth" fullname="Adam Barth">
- <organization>UC Berkeley</organization>
- </author>
- <author initials="J." surname="Caballero" fullname="Juan Caballero">
- <organization>UC Berkeley and CMU</organization>
- </author>
- <author initials="D." surname="Song" fullname="Dawn Song">
- <organization>UC Berkeley</organization>
- </author>
- <date year="2009"/>
- </front>
- </reference>
- </references>
- <!--
- TODO:
- * Transcribe the tables into C and auto generate the tables.
- -->
- <!-- Ack Alfred HÎnes, Mark Pilgrim -->
- </back>
-</rfc>
+ <tr>
+ <td>72 64 66 3A 52 44 46</td>
+ <td>Continue to the next step in this algorithm.</td>
+ <td>rdf:RDF</td> |
+ </tr>
+ </table>
+
+ <p>If none of the octet sequences above match the octets in <var>s</var>
+ starting at <var>pos</var>,
+ then let the <var>sniffed-type</var> be "text/html" and abort these steps.
+ <li>Initialize <var>RDF-flag</var> to 0.
+ <li>Initialize <var>RSS-flag</var> to 0.
+
+ <li>If the octets with positions <var>pos</var> to <var>pos</var>+23 in
+ <var>s</var> are exactly equal to 0x68, 0x74, 0x74, 0x70, 0x3A, 0x2F, 0x2F,
+ 0x70, 0x75, 0x72, 0x6C, 0x2E, 0x6F, 0x72, 0x67, 0x2F, 0x72, 0x73, 0x73, 0x2F,
+ 0x31, 0x2E, 0x30, 0x2F respectively (ASCII for "http://purl.org/rss/1.0/"),
+ then:
+ <ol>
+ <li>Increase <var>pos</var> by 23.
+
+ <li>Set <var>RSS-flag</var> to 1.
+ </ol>
+
+ <li>If the octets with positions <var>pos</var> to <var>pos</var>+42 in
+ <var>s</var> are exactly equal to 0x68, 0x74, 0x74, 0x70, 0x3A, 0x2F, 0x2F,
+ 0x77, 0x77, 0x77, 0x2E, 0x77, 0x33, 0x2E, 0x6F, 0x72, 0x67, 0x2F, 0x31, 0x39,
+ 0x39, 0x39, 0x2F, 0x30, 0x32, 0x2F, 0x32, 0x32, 0x2D, 0x72, 0x64, 0x66, 0x2D,
+ 0x73, 0x79, 0x6E, 0x74, 0x61, 0x78, 0x2D, 0x6E, 0x73, 0x23 respectively (ASCII
+ for "http://www.w3.org/1999/02/22-rdf-syntax-ns#"), then:
+ <ol>
+ <li>Increase pos by 42.
+
+ <li>Set <var>RDF-flag</var> to 1.
+ </ol>
+
+ <li>Increase <var>pos</var> by 1.
+
+ <li>If <var>RDF-flag</var> is 1 and <var>RSS-flag</var> is 1, then let the
+ <var>sniffed-type</var> be "application/rss+xml" and abort these steps.
+
+ <li>If <var>pos</var> points beyond the end of the octet stream <var>s</var>,
+ then continue to step 19 of this algorithm.
+
+ <li>Jump back to step 13 of this algorithm.
+
+ <li>Let the <var>sniffed-type</var> be "text/html" and abort these steps.
+</ol>
+
+<p>For efficiency reasons, implementations might wish to implement this
+algorithm and the algorithm for detecting the character encoding of HTML
+documents in parallel.
+
+<h2 class=no-num id=references>References</h2>
+
+<p class=XXX>TODO
<h2 class=no-num id=acknowledgements>Acknowledgements</h2>
-<p>Thanks to:
-<ul>
- <li>TODO
-</ul>
+<p>Thanks to Alfred HÎnes Boris Zbarsky David Singer Mark Pilgrim, and Russ Cox.
<!-- <script src=http://www.whatwg.org/specs/web-apps/current-work/dfn.js></script> -->

0 comments on commit 886dd7e

Please sign in to comment.