queso / dryopteris forked from brynary/dryopteris

HTML sanitization using Nokogiri

This URL has Read+Write access

name age message
file .gitignore Thu Dec 04 09:41:51 -0800 2008 initial whitelist lifted from html5. [flavorjones]
file README.markdown Loading commit data...
file Rakefile
file benchmark.rb Mon Jan 05 22:51:08 -0800 2009 Adding benchmark script [brynary]
file init.rb Thu Dec 04 17:19:47 -0800 2008 Adding sanitize_fields Rails extension [brynary]
directory lib/
directory test/
README.markdown

Dryopteris

Dryopteris erythrosora is the Japanese Shield Fern. It also can be used to sanitize HTML to help prevent XSS attacks.

Usage

Let's say you run a web site, and you allow people to post HTML snippets.

Let's also say some script-kiddie from Norland posts this to your site, in an effort to swipe some credit cards:

<SCRIPT SRC=http://ha.ckers.org/xss.js></SCRIPT>

Oooh, that could be bad. Here's how to fix it:

safe_html_snippet = Dryopteris.sanitize(dangerous_html_snippet)

Yeah, it's that easy.

In this example, safe_html_snippet will have all of its broken markup fixed by libxml2, and it will also be completely sanitized of harmful tags and attributes. That's twice as clean!

More Usage

You're still here? Ok, let me tell you a little something about the two different methods of sanitizing the Dryopteris offers.

Fragments

The first method is for html fragments, which are small snippets of markup such as those used in forum posts, emails and homework assignments.

Usage is the same as above:

safe_html_snippet = Dryopteris.sanitize(dangerous_html_snippet)

Generally speaking, unless you expect to have <html> and <body> tags in your HTML, this is the sanitizing method to use.

The only real limitation on this method is that the snippet must be a string object. (Support for IO objects was sacrificed at the altar of fixer-uppery-ness. If you need to sanitize data that's coming from an IO object, either socket or file, check out the next section on Documents).

Documents

Sometimes you need to sanitize an entire HTML document. (Well, maybe not you, but other people, certainly.)

safe_html_document = Dryopteris.sanitize_document(dangerous_html_document)

The returned string will contain exactly one (1) well-formed HTML document, with all broken HTML fixed and all harmful tags and attributes removed.

Coolness: dangerous_html_document can be a string OR an IO object (a file, or a socket, or ...). Which makes it particularly easy to sanitize large numbers of docs.

Standing on the Shoulders of Giants

Dryopteris uses Nokogiri and libxml2, so it's fast.

Dryopteris also takes its tag and tag attribute whitelists and its CSS sanitizer directly from HTML5.

Authors

Quotes About Dryopteris

"dryopteris shields you from xss attacks using nokogiri and NY attitude" - hasmanyjosh

"I just wanted to say thank you for your dryopteris plugin. It is by far the best sanitization I've found." - catalystmediastudios