Bleach is an HTML sanitizing library that escapes or strips markup and
attributes based on a white list. Bleach can also linkify text safely, applying
filters that Django's
urlize filter cannot, and optionally setting
attributes, even on links already in the text.
The version on github is the most up-to-date and contains the latest bug fixes.
The simplest way to use Bleach is:
>>> import bleach >>> bleach.clean('an <script>evil()</script> example') u'an <script>evil()</script> example' >>> bleach.linkify('an http://example.com url') u'an <a href="http://example.com" rel="nofollow">http://example.com</a> url
NB: Bleach always returns a
unicode object, whether you give it a
bytestring or a
unicode object, but Bleach does not attempt to detect
incoming character encodings, and will assume UTF-8. If you are using a
different character encoding, you should convert from a bytestring to
unicode before passing the text to Bleach.
linkify() can take several optional keyword arguments
to customize their behavior.
||A whitelist of HTML tags. Must be a list. Defaults to
||A whitelist of HTML attributes. Either a list, in
which case all attributes are allowed on all elements,
or a dict, with tag names as keys and lists of allowed
attributes as values ('*' is a wildcard key to allow
an attribute on any tag). Or it is possible to pass a
callable instead of a list that accepts name and
value of attribute and returns True of False.
||A whitelist of allowed CSS properties within a
||Strip disallowed HTML instead of escaping it. A
boolean. Defaults to
||Strip HTML comments. A boolean. Defaults to
||A callable through which the
||A callable through which the text of links (only
those created by
||Do not create new links inside
||Linkify email addresses with