Skip to content

Commit

Permalink
Merge pull request #408 from disko/feature/sanitize_html
Browse files Browse the repository at this point in the history
Sanitizers
  • Loading branch information
disko committed Apr 7, 2015
2 parents 01d37be + 8a38d52 commit f26e0ee
Show file tree
Hide file tree
Showing 15 changed files with 441 additions and 43 deletions.
10 changes: 8 additions & 2 deletions CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
Change History
==============

1.1.0-alpha.2 - unreleased
--------------------------

- Add ``target`` option to ``kotti.util.Link``. See #405.

- Add sanitizers. See :ref:`sanitizers` and :mod:`kotti.sanitizers` for
details. This fixes #296.

1.1.0-alpha.1 - 2015-03-19
--------------------------

Expand Down Expand Up @@ -54,8 +62,6 @@ Change History
- Change ``height`` property on ``body``'s widget (``RichTextField``) for
improved usability. See #403.

- Add ``target`` option to ``kotti.util.Link``. See #405.

1.0.0 - 2015-01-20
------------------

Expand Down
1 change: 1 addition & 0 deletions docs/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ API Documentation
kotti.request
kotti.resources
kotti.filedepot
kotti.sanitizers
kotti.security
kotti.sqla
kotti.testing
Expand Down
8 changes: 8 additions & 0 deletions docs/api/kotti.sanitizers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _api-kotti.sanitizers:

kotti.sanitizers
----------------

.. automodule:: kotti.sanitizers
:members:
:member-order: bysource
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@

# -- Options for Intersphinx ---------------------------------------------------
intersphinx_mapping = {
'bleach': ('http://bleach.readthedocs.org/en/latest/', None),
'colander': ('http://colander.readthedocs.org/en/latest/', None),
'deform': ('http://deform.readthedocs.org/en/latest/', None),
'fanstatic': ('http://www.fanstatic.org/en/latest/', None),
Expand Down
1 change: 1 addition & 0 deletions docs/developing/advanced/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ Advanced Topics
blobs
static-resource-management
understanding-kotti-startup
sanitizers
73 changes: 73 additions & 0 deletions docs/developing/advanced/sanitizers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
.. _sanitizers:

Sanitizers
==========

Kotti provides a mechanism to *sanitize* arbitrary strings.

You can configure *available* sanitizers via ``kotti.sanitizers``.
This setting takes a list of strings, with each specifying a ``name:callable`` pair.
``name`` is the name under which this sanitizer is registered.
``callable`` is a dotted path to a function taking an unsanitized string and returning a sanitized version of it.

The default configuration is::

kotti.sanitizers =
xss_protection:kotti.sanitizers.xss_protection
minimal_html:kotti.sanitizers.minimal_html
no_html:kotti.sanitizers.no_html

For thorough explaination of the included sanitizers see :mod:`kotti.sanitizers`.

Explicit sanitization
---------------------

You can explicitly use any configured sanitizer like this::

from kotti.sanitizers import sanitize

sanitzed = sanitize(unsanitized, 'xss_protection')

The sanitize function is also available as a method of the :class:`kotti.views.util.TemplateAPI`.
This is just a convenience wrapper to ease usage in templates::

${api.sanitize(context.foo, 'minimal_html')}

Sanitize on write (implicit sanitization)
-----------------------------------------

The second setting related to sanitization is ``kotti.sanitize_on_write``.
It defines *what* is filtered *how* when values are assigned to object attributes.

This setting takes a list of ``dotted_path:sanitizer_name(s)`` pairs.
``dotted_path`` is a dotted path to a resource class attribute that will be sanitized implicitly with the respective sanitizer(s) upon write access.
``sanitizer_name(s)`` is a comma separated list of available sanitizer names as configured above.

Kotti will setup :ref:`listeners <events>` for the :class:`kotti.events.ObjectInsert` and :class:`kotti.events.ObjectUpdate` events for the given classes and attach a function that filters the respective attributes with the specified sanitizer.

This means that *any* write access to configured attributes through your application (also within correctly setup command line scripts) will be sanitized *implicitly*.

The default configuration is::

kotti.sanitize_on_write =
kotti.resources.Document.body:xss_protection
kotti.resources.Content.title:no_html

You can also use multiple sanitizers::

kotti.sanitize_on_write =
kotti.resources.Document.body:xss_protection,some_other_sanitizer

Implementing a custom sanitizer
-------------------------------

A sanitizer is just a function that takes and returns a string.
It can be as simple as::

def no_dogs_allowed(html):
return html.replace('dogs', 'cats')

no_dogs_allowed('<p>I love dogs.</p>')
... '<p>I love cats.</p>'

You can also look at :mod:`kotti.sanitizers` for examples.
59 changes: 25 additions & 34 deletions docs/developing/basic/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ kotti.max_file_size Max size for file uploads, default: ```10`` (MB)

kotti.depot.*.* Configure the blob storage. More details below

kotti.sanitizers Configure available :ref:`sanitizers`.
kotti.sanitize_on_write Configure :ref:`sanitizers` to be used on write
access to resource objects.

pyramid.default_locale_name Set the user interface language, default ``en``
============================ ==================================================

Expand Down Expand Up @@ -312,42 +316,29 @@ The default configuration here is:
Blob storage configuration
--------------------------

By default, Kotti will store blob data (files uploaded in File and Image
instances) in the database. Internally, Kotti integrates with :app:`filedepot`,
so it is possible to use any :app:``filedepot`` compatible storage, including those
provided by :app:``filedepot`` itself:

- :class:``depot.io.local.LocalFileStorage``
- :class:``depot.io.awss3.S3Storage``
- :class:``depot.io.gridfs.GridFSStorage``

The default storage for :app:``Kotti`` is
:class:``~kotti.filedepot.DBFileStorage``. The benefit of storing files in
``DBFileStorage`` is having *all* content in a single place (the DB) which
makes backups, exporting and importing of your site's data easy, as long as you
don't have too many or too large files. The downsides of this approach appear
when your database server resides on a different host (network performance
becomes a greater issue) or your DB dumps become too large to be handled
efficiently.

To configure a depot, several ``kotti.depot.*.*`` lines need to be added. The
number in the first position is used to group backend configuration and to
order the file storages in the configuration of :app:``filedepot``. The depot
configured with number 0 will be the default depot, where all new blob data
will be saved. There are 2 options that are required for every storage
configuration: ``name`` and ``backend``. The ``name`` is a unique string that
will be used to identify the path of saved files (it is recorded with each blob
info), so once configured for a particular storage, it should never change. The
``backend`` should point to a dotted path for the storage class. Then, any
number of keyword arguments can be added, and they will be passed to the
backend class on initialization.
By default, Kotti will store blob data (files uploaded in File and Image instances) in the database.
Internally, Kotti integrates with :app:`filedepot`, so it is possible to use any :app:`filedepot` compatible storage, including those provided by :app:`filedepot` itself:

- :class:`depot.io.local.LocalFileStorage`
- :class:`depot.io.awss3.S3Storage`
- :class:`depot.io.gridfs.GridFSStorage`

The default storage for :app:`Kotti` is :class:`~kotti.filedepot.DBFileStorage`.
The benefit of storing files in ``DBFileStorage`` is having *all* content in a single place (the DB) which makes backups, exporting and importing of your site's data easy, as long as you don't have too many or too large files.
The downsides of this approach appear when your database server resides on a different host (network performance becomes a greater issue) or your DB dumps become too large to be handled efficiently.

To configure a depot, several ``kotti.depot.*.*`` lines need to be added.
The number in the first position is used to group backend configuration and to order the file storages in the configuration of :app:`filedepot`.
The depot configured with number 0 will be the default depot, where all new blob data will be saved.
There are 2 options that are required for every storage configuration: ``name`` and ``backend``.
The ``name`` is a unique string that will be used to identify the path of saved files (it is recorded with each blob info), so once configured for a particular storage, it should never change.
The ``backend`` should point to a dotted path for the storage class.
Then, any number of keyword arguments can be added, and they will be passed to the backend class on initialization.

Example of a possible configurationi that stores blob data on the disk, in
``/var/local/files`` using the :app:``filedepot``
:class:``depot.io.local.LocalFileStorage`` provided backend. Kotti's default
backend, ``DBFileStorage`` has been moved to position **1** and all data stored
there will continue to be available. See :ref:`blobs` to see how to migrate
blob data between storages.
``/var/local/files`` using the :app:`filedepot` :class:`depot.io.local.LocalFileStorage` provided backend.
Kotti's default backend, ``DBFileStorage`` has been moved to position **1** and all data stored there will continue to be available.
See :ref:`blobs` to see how to migrate blob data between storages.

.. code-block:: ini

Expand Down
11 changes: 11 additions & 0 deletions kotti/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ def none_factory(**kwargs): # pragma: no cover
'kotti',
'kotti.filedepot',
'kotti.events',
'kotti.sanitizers',
'kotti.views',
'kotti.views.cache',
'kotti.views.view',
Expand Down Expand Up @@ -112,6 +113,16 @@ def none_factory(**kwargs): # pragma: no cover
'kotti.register.group': '',
'kotti.register.role': '',
'pyramid_deform.template_search_path': 'kotti:templates/deform',
'kotti.sanitizers': ' '.join([
'xss_protection:kotti.sanitizers.xss_protection',
'minimal_html:kotti.sanitizers.minimal_html',
'no_html:kotti.sanitizers.no_html',
]),
'kotti.sanitize_on_write': ' '.join([
'kotti.resources.Document.body:xss_protection',
'kotti.resources.Content.title:no_html',
'kotti.resources.Content.description:no_html',
]),
}

conf_dotted = set([
Expand Down
158 changes: 158 additions & 0 deletions kotti/sanitizers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# -*- coding: utf-8 -*-

"""
For a high level introduction and available configuration options
see :ref:`sanitizers`.
"""

from bleach import clean
from bleach_whitelist import all_styles
from bleach_whitelist import generally_xss_safe
from bleach_whitelist import markdown_attrs
from bleach_whitelist import markdown_tags
from bleach_whitelist import print_attrs
from bleach_whitelist import print_tags
from pyramid.util import DottedNameResolver

from kotti import get_settings
from kotti.events import objectevent_listeners
from kotti.events import ObjectInsert
from kotti.events import ObjectUpdate


def sanitize(html, sanitizer):
""" Sanitize HTML
:param html: HTML to be sanitized
:type html: basestring
:param sanitizer: name of the sanitizer to use
:type sanitizer: str
:result: sanitized HTML
:rtype: unicode
"""

sanitized = get_settings()['kotti.sanitizers'][sanitizer](html)

return sanitized


def xss_protection(html):
""" Sanitizer that removes tags that are not considered XSS safe. See
``bleach_whitelist.generally_xss_unsafe`` for a complete list of tags that
are removed. Attributes and styles are left untouched.
:param html: HTML to be sanitized
:type html: basestring
:result: sanitized HTML
:rtype: unicode
"""

sanitized = clean(
html,
tags=generally_xss_safe,
attributes=lambda self, key, value: True,
styles=all_styles,
strip=True,
strip_comments=True)

return sanitized


def minimal_html(html):
""" Sanitizer that only leaves a basic set of tags and attributes. See
``bleach_whitelist.markdown_tags``, ``bleach_whitelist.print_tags``,
``bleach_whitelist.markdown_attrs``, ``bleach_whitelist.print_attrs`` for a
complete list of tags and attributes that are allowed. All styles are
completely removed.
:param html: HTML to be sanitized
:type html: basestring
:result: sanitized HTML
:rtype: unicode
"""

attributes = dict(zip(
markdown_attrs.keys() + print_attrs.keys(),
markdown_attrs.values() + print_attrs.values()))

sanitized = clean(
html,
tags=markdown_tags + print_tags,
attributes=attributes,
styles=[],
strip=True,
strip_comments=True)

return sanitized


def no_html(html):
""" Sanitizer that removes **all** tags.
:param html: HTML to be sanitized
:type html: basestring
:result: plain text
:rtype: unicode
"""

sanitized = clean(
html,
tags=[],
attributes={},
styles=[],
strip=True,
strip_comments=True)

return sanitized


def _setup_sanitizers(settings):

# step 1: resolve sanitizer functions and make ``kotti.sanitizers`` a
# dictionary containing resolved functions

if not isinstance(settings['kotti.sanitizers'], basestring):
return

sanitizers = {}

for s in settings['kotti.sanitizers'].split():
name, dottedname = s.split(':')
sanitizers[name.strip()] = DottedNameResolver(None).resolve(dottedname)

settings['kotti.sanitizers'] = sanitizers


def _setup_listeners(settings):

# step 2: setup listeners

for s in settings['kotti.sanitize_on_write'].split():
dotted, sanitizers = s.split(':')

classname, attributename = dotted.rsplit('.', 1)
_class = DottedNameResolver(None).resolve(classname)

def _create_handler(attributename, sanitizers):
def handler(event):
value = getattr(event.object, attributename)
for sanitizer_name in sanitizers.split(','):
value = settings['kotti.sanitizers'][sanitizer_name](value)
setattr(event.object, attributename, value)
return handler

objectevent_listeners[(ObjectInsert, _class)].append(
_create_handler(attributename, sanitizers))
objectevent_listeners[(ObjectUpdate, _class)].append(
_create_handler(attributename, sanitizers))


def includeme(config):

_setup_sanitizers(config.registry.settings)
_setup_listeners(config.registry.settings)
2 changes: 2 additions & 0 deletions kotti/scaffolds/package/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ Unidecode==0.04.17
WebOb==1.4
alembic==0.6.7
argparse==1.3.0
bleach==1.4.1
bleach-whitelist==0.0.7
colander==1.0
deform==2.0a2
docopt==0.6.2
Expand Down

0 comments on commit f26e0ee

Please sign in to comment.