Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Python HTML purifier

About

Cuts the tags and attributes from HTML that are not on the whitelist. Their content is leaves. Signature of whitelist:

{'enabled tag name' : ['list of enabled tag\'s attributes']}

You can use the symbol * to allow all tags and/or attributes.

Note that the script and style tags are removed with content.

This module is based on HTMLParser Class - in the standard Python package. There are no other dependencies, which can sometimes be a plus.

Part info in my blog

Package on PyPi

Installation

$ pip install html-purifier

Basic Usage

>>> from purifier.purifier import HTMLPurifier
>>> purifier = HTMLPurifier({
    'div': ['*'], # разрешает все атрибуты у тега div - All attributes are allowed for div
    'span': ['attr-2'], # разрешает только атрибут attr-2 у тега span - Only "attr-2" attribute is allowed for span elements
    # все остальные теги удаляются, но их содержимое остается - All other tags and attributes are removed but their content is kept
})
>>> print purifier.feed('<div class="e1" id="e1">Some <b>HTML</b> for <span attr-1="1" attr-2="2">purifying</span></div>')
<div class="e1" id="e1">Some HTML for <span attr-2="2">purifying</span></div>

Django Usage

As usually used in models and forms. Here is purifier.models.PurifyedCharField, purifier.models.PurifyedTextField for Django ORM and purifier.forms.PurifyedCharField for Django forms

About

Purify HTML string

Resources

License

Releases

No releases published

Packages

No packages published