-
Notifications
You must be signed in to change notification settings - Fork 117
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #408 from disko/feature/sanitize_html
Sanitizers
- Loading branch information
Showing
15 changed files
with
441 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
.. _api-kotti.sanitizers: | ||
|
||
kotti.sanitizers | ||
---------------- | ||
|
||
.. automodule:: kotti.sanitizers | ||
:members: | ||
:member-order: bysource |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,3 +15,4 @@ Advanced Topics | |
blobs | ||
static-resource-management | ||
understanding-kotti-startup | ||
sanitizers |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
.. _sanitizers: | ||
|
||
Sanitizers | ||
========== | ||
|
||
Kotti provides a mechanism to *sanitize* arbitrary strings. | ||
|
||
You can configure *available* sanitizers via ``kotti.sanitizers``. | ||
This setting takes a list of strings, with each specifying a ``name:callable`` pair. | ||
``name`` is the name under which this sanitizer is registered. | ||
``callable`` is a dotted path to a function taking an unsanitized string and returning a sanitized version of it. | ||
|
||
The default configuration is:: | ||
|
||
kotti.sanitizers = | ||
xss_protection:kotti.sanitizers.xss_protection | ||
minimal_html:kotti.sanitizers.minimal_html | ||
no_html:kotti.sanitizers.no_html | ||
|
||
For thorough explaination of the included sanitizers see :mod:`kotti.sanitizers`. | ||
|
||
Explicit sanitization | ||
--------------------- | ||
|
||
You can explicitly use any configured sanitizer like this:: | ||
|
||
from kotti.sanitizers import sanitize | ||
|
||
sanitzed = sanitize(unsanitized, 'xss_protection') | ||
|
||
The sanitize function is also available as a method of the :class:`kotti.views.util.TemplateAPI`. | ||
This is just a convenience wrapper to ease usage in templates:: | ||
|
||
${api.sanitize(context.foo, 'minimal_html')} | ||
|
||
Sanitize on write (implicit sanitization) | ||
----------------------------------------- | ||
|
||
The second setting related to sanitization is ``kotti.sanitize_on_write``. | ||
It defines *what* is filtered *how* when values are assigned to object attributes. | ||
|
||
This setting takes a list of ``dotted_path:sanitizer_name(s)`` pairs. | ||
``dotted_path`` is a dotted path to a resource class attribute that will be sanitized implicitly with the respective sanitizer(s) upon write access. | ||
``sanitizer_name(s)`` is a comma separated list of available sanitizer names as configured above. | ||
|
||
Kotti will setup :ref:`listeners <events>` for the :class:`kotti.events.ObjectInsert` and :class:`kotti.events.ObjectUpdate` events for the given classes and attach a function that filters the respective attributes with the specified sanitizer. | ||
|
||
This means that *any* write access to configured attributes through your application (also within correctly setup command line scripts) will be sanitized *implicitly*. | ||
|
||
The default configuration is:: | ||
|
||
kotti.sanitize_on_write = | ||
kotti.resources.Document.body:xss_protection | ||
kotti.resources.Content.title:no_html | ||
|
||
You can also use multiple sanitizers:: | ||
|
||
kotti.sanitize_on_write = | ||
kotti.resources.Document.body:xss_protection,some_other_sanitizer | ||
|
||
Implementing a custom sanitizer | ||
------------------------------- | ||
|
||
A sanitizer is just a function that takes and returns a string. | ||
It can be as simple as:: | ||
|
||
def no_dogs_allowed(html): | ||
return html.replace('dogs', 'cats') | ||
|
||
no_dogs_allowed('<p>I love dogs.</p>') | ||
... '<p>I love cats.</p>' | ||
|
||
You can also look at :mod:`kotti.sanitizers` for examples. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
""" | ||
For a high level introduction and available configuration options | ||
see :ref:`sanitizers`. | ||
""" | ||
|
||
from bleach import clean | ||
from bleach_whitelist import all_styles | ||
from bleach_whitelist import generally_xss_safe | ||
from bleach_whitelist import markdown_attrs | ||
from bleach_whitelist import markdown_tags | ||
from bleach_whitelist import print_attrs | ||
from bleach_whitelist import print_tags | ||
from pyramid.util import DottedNameResolver | ||
|
||
from kotti import get_settings | ||
from kotti.events import objectevent_listeners | ||
from kotti.events import ObjectInsert | ||
from kotti.events import ObjectUpdate | ||
|
||
|
||
def sanitize(html, sanitizer): | ||
""" Sanitize HTML | ||
:param html: HTML to be sanitized | ||
:type html: basestring | ||
:param sanitizer: name of the sanitizer to use | ||
:type sanitizer: str | ||
:result: sanitized HTML | ||
:rtype: unicode | ||
""" | ||
|
||
sanitized = get_settings()['kotti.sanitizers'][sanitizer](html) | ||
|
||
return sanitized | ||
|
||
|
||
def xss_protection(html): | ||
""" Sanitizer that removes tags that are not considered XSS safe. See | ||
``bleach_whitelist.generally_xss_unsafe`` for a complete list of tags that | ||
are removed. Attributes and styles are left untouched. | ||
:param html: HTML to be sanitized | ||
:type html: basestring | ||
:result: sanitized HTML | ||
:rtype: unicode | ||
""" | ||
|
||
sanitized = clean( | ||
html, | ||
tags=generally_xss_safe, | ||
attributes=lambda self, key, value: True, | ||
styles=all_styles, | ||
strip=True, | ||
strip_comments=True) | ||
|
||
return sanitized | ||
|
||
|
||
def minimal_html(html): | ||
""" Sanitizer that only leaves a basic set of tags and attributes. See | ||
``bleach_whitelist.markdown_tags``, ``bleach_whitelist.print_tags``, | ||
``bleach_whitelist.markdown_attrs``, ``bleach_whitelist.print_attrs`` for a | ||
complete list of tags and attributes that are allowed. All styles are | ||
completely removed. | ||
:param html: HTML to be sanitized | ||
:type html: basestring | ||
:result: sanitized HTML | ||
:rtype: unicode | ||
""" | ||
|
||
attributes = dict(zip( | ||
markdown_attrs.keys() + print_attrs.keys(), | ||
markdown_attrs.values() + print_attrs.values())) | ||
|
||
sanitized = clean( | ||
html, | ||
tags=markdown_tags + print_tags, | ||
attributes=attributes, | ||
styles=[], | ||
strip=True, | ||
strip_comments=True) | ||
|
||
return sanitized | ||
|
||
|
||
def no_html(html): | ||
""" Sanitizer that removes **all** tags. | ||
:param html: HTML to be sanitized | ||
:type html: basestring | ||
:result: plain text | ||
:rtype: unicode | ||
""" | ||
|
||
sanitized = clean( | ||
html, | ||
tags=[], | ||
attributes={}, | ||
styles=[], | ||
strip=True, | ||
strip_comments=True) | ||
|
||
return sanitized | ||
|
||
|
||
def _setup_sanitizers(settings): | ||
|
||
# step 1: resolve sanitizer functions and make ``kotti.sanitizers`` a | ||
# dictionary containing resolved functions | ||
|
||
if not isinstance(settings['kotti.sanitizers'], basestring): | ||
return | ||
|
||
sanitizers = {} | ||
|
||
for s in settings['kotti.sanitizers'].split(): | ||
name, dottedname = s.split(':') | ||
sanitizers[name.strip()] = DottedNameResolver(None).resolve(dottedname) | ||
|
||
settings['kotti.sanitizers'] = sanitizers | ||
|
||
|
||
def _setup_listeners(settings): | ||
|
||
# step 2: setup listeners | ||
|
||
for s in settings['kotti.sanitize_on_write'].split(): | ||
dotted, sanitizers = s.split(':') | ||
|
||
classname, attributename = dotted.rsplit('.', 1) | ||
_class = DottedNameResolver(None).resolve(classname) | ||
|
||
def _create_handler(attributename, sanitizers): | ||
def handler(event): | ||
value = getattr(event.object, attributename) | ||
for sanitizer_name in sanitizers.split(','): | ||
value = settings['kotti.sanitizers'][sanitizer_name](value) | ||
setattr(event.object, attributename, value) | ||
return handler | ||
|
||
objectevent_listeners[(ObjectInsert, _class)].append( | ||
_create_handler(attributename, sanitizers)) | ||
objectevent_listeners[(ObjectUpdate, _class)].append( | ||
_create_handler(attributename, sanitizers)) | ||
|
||
|
||
def includeme(config): | ||
|
||
_setup_sanitizers(config.registry.settings) | ||
_setup_listeners(config.registry.settings) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.