github
Advanced Search
  • Home
  • Pricing and Signup
  • Explore GitHub
  • Blog
  • Login

biilmann / javascript-xhtml-purifier

  • Admin
  • Watch Unwatch
  • Fork
  • Your Fork
  • Pull Request
  • Download Source
    • 43
    • 3
  • Source
  • Commits
  • Network (3)
  • Issues (0)
  • Downloads (0)
  • Wiki (1)
  • Graphs
  • Branch: master

click here to add a description

click here to add a homepage

  • Branches (2)
    • add_support_for_tables
    • master ✓
  • Tags (0)
Sending Request…
Enable Donations

Pledgie Donations

Once activated, we'll place the following badge in your repository's detail box:
Pledgie_example
This service is courtesy of Pledgie.

Provides a way to cleanup messy HTML (pasted from word for example) and returns valid, pretty printed XHTML — Read more

  cancel

  cancel
  • Private
  • Read-Only
  • HTTP Read-Only

This URL has Read+Write access

Now converting <i> to <em> 
Mathias Biilmann Christensen (author)
Thu May 21 06:30:49 -0700 2009
commit  e8bb4ed605c59002a0c85014723328ccce7ffe60
tree    e8c2591f51b8c358f3d6257b4dfa83beb9280a41
parent  f59798c03b7696cafbb0ad7220a9416c05b023d6
javascript-xhtml-purifier /
name age
history
message
file README.textile Loading commit data...
directory test/
file xhtml_purifier.html
file xhtml_purifier.js
README.textile

Javascript XHTML Purifier

This script provides a method to cleanup dirty html. It will take a string of dirty and badly formatted html, and return a pretty printed valid XHTML string.

Usage

XHTMLPurifier.purify(html_string);

About the Implementation

The purifying is based on section 8.2 in the “HTML5 specification”http://www.whatwg.org/specs/web-apps/current-work/#parsing , and implements a subset of the algorithm described there.

Only a limited set of the permitted HTML5 elements and attributes are permitted, and all other tags/attributes will simply be gone in the resulting XHTML.

Allowed elements

  • strong (b and all headers will currently be transformed to strong)
  • em
  • blockquote
  • ol
  • ul
  • li
  • p
  • pre
  • a
  • img
  • br
  • table
  • caption
  • col
  • colgroup
  • tbody
  • td
  • tfoot
  • th
  • thead
  • tr

All other elements will be stripped from the resulting XHTML, although the inner text will be left intact.

The script was originally created for use with a Rich Text Editor for a CMS, and purposefully puts very firm limits on what can be included in the resulting XHTML. Since it is based on the HTML5 parsing specification it is very robust when it comes to cleaning up tag soup.

License

Copyright © 2008 Mathias Biilmann Christensen / Domestika INTERNET S.L., released under the MIT license (see MIT-LICENSE)

Includes John Resig’s and Erik Arvidsson’s HTML Parser, which is used as a tokenizer.

HTML Parser By John Resig (ejohn.org)
Original code by Erik Arvidsson, Mozilla Public License
http://erik.eae.net/simplehtmlparser/simplehtmlparser.js

Blog | Support | Training | Contact | API | Status | Twitter | Help | Security
© 2010 GitHub Inc. All rights reserved. | Terms of Service | Privacy Policy
Powered by the Dedicated Servers and
Cloud Computing of Rackspace Hosting®
Dedicated Server