Skip to content

Commit

Permalink
security.rst unicode.rst done!
Browse files Browse the repository at this point in the history
  • Loading branch information
dormouse committed Apr 8, 2015
1 parent 39cbdf5 commit 5682ed6
Show file tree
Hide file tree
Showing 2 changed files with 130 additions and 166 deletions.
144 changes: 62 additions & 82 deletions security.rst
Expand Up @@ -10,17 +10,17 @@ Web 应用常常会面对各种各样的安全问题,因此要把所有问题
-------------------

跨站脚本攻击是指在一个网站的环境中注入恶任意的 HTML (包括附带的 JavaScript
)。要防防御这种攻击,开发者需要正确地转义文本,使其不能包含恶意的 HTML 标记。
更多的相关信息请参维基百科上在文章: `Cross-Site Scripting
)。要防防御这种攻击,开发者需要正确地转义文本,使其不能包含恶意的 HTML
标记。更多的相关信息请参维基百科上在文章: `Cross-Site Scripting
<http://en.wikipedia.org/wiki/Cross-site_scripting>`_ 。

在 Flask 中,除非显式指明不转义, Jinja2 会自动转义所有值。这样可以排除所有
模板导致的 XSS 问题,但是其它地方仍需小心:

- 不使用 Jinja2 生成 HTML 。
- 在用户提交的数据上调用了 :class:`~flask.Markup` 。
- 发送上传的 HTML ,永远不要这么做,使用 `Content-Disposition: attachment`
标头来避免这个问题。
- 发送上传的 HTML ,永远不要这么做,使用
`Content-Disposition: attachment` 标头来避免这个问题。
- 发送上传的文本文件。一些浏览器基于文件开头几个字节来猜测文件的
content-type ,用户可以利用这个漏洞来欺骗浏览器,通过伪装文本文件来执行
HTML 。
Expand All @@ -40,78 +40,61 @@ Web 应用常常会面对各种各样的安全问题,因此要把所有问题

onmouseover=alert(document.cookie)

当用户鼠标停放在这个链接上时,会在警告窗口里显示 cookie 信息。一个精明的攻击者
可能还会执行其它的 JavaScript 代码,而不是把 cookie 显示给用户。结合 CSS 注入,
攻击者甚至可以把元素填满整个页面,这样用户把鼠标停放在页面上的任何地方都会触发
攻击
当用户鼠标停放在这个链接上时,会在警告窗口里显示 cookie 信息。一个精明的
攻击者可能还会执行其它的 JavaScript 代码,而不是把 cookie 显示给用户。结合
CSS 注入,攻击者甚至可以把元素填满整个页面,这样用户把鼠标停放在页面上的
任何地方都会触发攻击

跨站请求伪造( CSRF )
----------------------

Cross-Site Request Forgery (CSRF)
---------------------------------
另一个大问题是 CSRF 。这个问题非常复杂,因此我不会在此详细展开,只是介绍
CSRF 是什么以及在理论上如何避免这个问题。

Another big problem is CSRF. This is a very complex topic and I won't
outline it here in detail just mention what it is and how to theoretically
prevent it.
如果你的验证信息存储在 cookie 中,那么你就使用了隐式的状态管理。“已登入”
这个状态由一个 cookie 控制,并且这个 cookie 在页面的每个请求中都会发送。
不幸的是,在第三方站点发送的请求中也会发送这个 cookie 。如果你不注意这点,
一些人可能会通过社交引擎来欺骗应用的用户在不知情的状态下做一些蠢事。

If your authentication information is stored in cookies, you have implicit
state management. The state of "being logged in" is controlled by a
cookie, and that cookie is sent with each request to a page.
Unfortunately that includes requests triggered by 3rd party sites. If you
don't keep that in mind, some people might be able to trick your
application's users with social engineering to do stupid things without
them knowing.
假设你有一个特定的 URL ,当你发送 `POST` 请求时会删除一个用户的资料(例如
`http://example.com/user/delete` 。如果一个攻击者现在创造一个页面并通过
页面中的 JavaScript 发送这个 post 请求,只要诱骗用户加载该页面,那么用户的
资料就会被删除。

Say you have a specific URL that, when you sent `POST` requests to will
delete a user's profile (say `http://example.com/user/delete`). If an
attacker now creates a page that sends a post request to that page with
some JavaScript they just has to trick some users to load that page and
their profiles will end up being deleted.
设象在有数百万的并发用户的 Facebook 上,某人放出一些小猫图片的链接。当用户
访问那个页面欣赏毛茸茸的小猫图片时,他们的资料就被删除了。

Imagine you were to run Facebook with millions of concurrent users and
someone would send out links to images of little kittens. When users
would go to that page, their profiles would get deleted while they are
looking at images of fluffy cats.
那么如何预防这个问题呢?基本思路是:对于每个要求修改服务器内容的请求,应该
使用一次性令牌,并存储在 cookie 里, **并且** 在发送表单数据的同时附上它。
在服务器再次接收数据之后,需要比较两个令牌,并确保它们相等。

How can you prevent that? Basically for each request that modifies
content on the server you would have to either use a one-time token and
store that in the cookie **and** also transmit it with the form data.
After receiving the data on the server again, you would then have to
compare the two tokens and ensure they are equal.

Why does Flask not do that for you? The ideal place for this to happen is
the form validation framework, which does not exist in Flask.
为什么 Flask 没有替你做这件事?因为这应该是表单验证框架做的事,而 Flask 不
包括表单验证。

.. _json-security:

JSON Security
-------------
JSON 安全
---------

.. admonition:: ECMAScript 5 Changes
.. admonition:: ECMAScript 5 的变更

Starting with ECMAScript 5 the behavior of literals changed. Now they
are not constructed with the constructor of ``Array`` and others, but
with the builtin constructor of ``Array`` which closes this particular
attack vector.
从 ECMAScript 5 开始,常量的行为改变了。现在它们不由 ``Array`` 或其它
的构造函数构造,而是由 ``Array`` 的内建构造函数构造,关闭了这个特殊的
攻击媒介。

JSON itself is a high-level serialization format, so there is barely
anything that could cause security problems, right? You can't declare
recursive structures that could cause problems and the only thing that
could possibly break are very large responses that can cause some kind of
denial of service at the receiver's side.
JSON 本身是一种高级序列化格式,所以它几乎没有什么可以导致安全问题,对吗?
你不能声明导致问题的递归结构,唯一可能导致破坏的就是非常大的响应可能导致
接收端在某种意义上拒绝服务。

However there is a catch. Due to how browsers work the CSRF issue comes
up with JSON unfortunately. Fortunately there is also a weird part of the
JavaScript specification that can be used to solve that problem easily and
Flask is kinda doing that for you by preventing you from doing dangerous
stuff. Unfortunately that protection is only there for
:func:`~flask.jsonify` so you are still at risk when using other ways to
generate JSON.
然而有一个陷阱。由于浏览器在 CSRF 问题上处理方式, JSON 也不能幸免。幸运
的是, JavaScript 规范中有一个怪异的部分可以轻易地解决这一问题。 Flask
在这方面做了一点工作,为你避免一些风险。不幸的是,只有在
:func:`~flask.jsonify` 中有这样的保护,所以使用其它方法生成 JSON 仍然有
风险。

So what is the issue and how to avoid it? The problem are arrays at
top-level in JSON. Imagine you send the following data out in a JSON
request. Say that's exporting the names and email addresses of all your
friends for a part of the user interface that is written in JavaScript.
Not very uncommon:
那么,问题出在哪里?如何避免?问题的根源是数组是 JSON 中的一等公民。设想
有一个 JavaScript 写的用户界面,在界面中导出你所有朋友的姓名和电子邮件
地址,常见的是在 JSON 请求中发送如下数据:

.. sourcecode:: javascript

Expand All @@ -120,13 +103,12 @@ Not very uncommon:
"email": "admin@localhost"}
]

And it is doing that of course only as long as you are logged in and only
for you. And it is doing that for all `GET` requests to a certain URL,
say the URL for that request is
``http://example.com/api/get_friends.json``.
当然只能你登入的时候,针对本人才可以这么做。而且,它对一个特定 URL 上的所有
`GET` 请求都这么做。假设请求的 URL 是
``http://example.com/api/get_friends.json``

So now what happens if a clever hacker is embedding this to his website
and social engineers a victim to visiting his site:
那么如果一个聪明的黑客把这个嵌入到他自己的网站上,并用社交引擎使得受害者访问
他的网站,会发生什么:

.. sourcecode:: html

Expand All @@ -149,18 +131,16 @@ and social engineers a victim to visiting his site:
// now we have all the data in the captured array.
</script>

If you know a bit of JavaScript internals you might know that it's
possible to patch constructors and register callbacks for setters. An
attacker can use this (like above) to get all the data you exported in
your JSON file. The browser will totally ignore the ``application/json``
mimetype if ``text/javascript`` is defined as content type in the script
tag and evaluate that as JavaScript. Because top-level array elements are
allowed (albeit useless) and we hooked in our own constructor, after that
page loaded the data from the JSON response is in the `captured` array.

Because it is a syntax error in JavaScript to have an object literal
(``{...}``) toplevel an attacker could not just do a request to an
external URL with the script tag to load up the data. So what Flask does
is to only allow objects as toplevel elements when using
:func:`~flask.jsonify`. Make sure to do the same when using an ordinary
JSON generate function.
如果你懂得一些 JavaScript 的内部工作机制,你会知道给构造函数打补丁和为
setter 注册回调是可能的。一个攻击者可以利用这点(像上面一样上)来获取
所有你导出的 JSON 文件中的数据。如果在 script 标签中定义了内容类型是
``text/javascript`` ,浏览器会完全忽略 ``application/json`` 的
mimetype ,而把其作为 JavaScript 来求值。因为顶层数组元素是允许的(虽然
没用)且我们在自己的构造函数中挂钩,在这个页面载入后, JSON 响应中的数据
会出现在 `captured` 数组中。

因为在 JavaScript 中对象文字( ``{...}`` )处于顶层是一个语法错误,攻
击者可能不只是用 script 标签加载数据并请求一个外部的 URL 。所以, Flask
所做的只是在使用 :func:`~flask.jsonify` 时允许对象作为顶层元素。应当确保
使用普通的 JSON 生成函数时也这么做。

152 changes: 68 additions & 84 deletions unicode.rst
@@ -1,107 +1,91 @@
Unicode in Flask
================

Flask like Jinja2 and Werkzeug is totally Unicode based when it comes to
text. Not only these libraries, also the majority of web related Python
libraries that deal with text. If you don't know Unicode so far, you
should probably read `The Absolute Minimum Every Software Developer
Absolutely, Positively Must Know About Unicode and Character Sets
<http://www.joelonsoftware.com/articles/Unicode.html>`_. This part of the
documentation just tries to cover the very basics so that you have a
pleasant experience with Unicode related things.

Automatic Conversion
--------------------

Flask has a few assumptions about your application (which you can change
of course) that give you basic and painless Unicode support:

- the encoding for text on your website is UTF-8
- internally you will always use Unicode exclusively for text except
for literal strings with only ASCII character points.
- encoding and decoding happens whenever you are talking over a protocol
that requires bytes to be transmitted.

So what does this mean to you?

HTTP is based on bytes. Not only the protocol, also the system used to
address documents on servers (so called URIs or URLs). However HTML which
is usually transmitted on top of HTTP supports a large variety of
character sets and which ones are used, are transmitted in an HTTP header.
To not make this too complex Flask just assumes that if you are sending
Unicode out you want it to be UTF-8 encoded. Flask will do the encoding
and setting of the appropriate headers for you.

The same is true if you are talking to databases with the help of
SQLAlchemy or a similar ORM system. Some databases have a protocol that
already transmits Unicode and if they do not, SQLAlchemy or your other ORM
should take care of that.

The Golden Rule
Flask 中的 Unicode
===================

Flask 与 Jinja2 、 Werkzeug 一样,文本方面完全基于 Unicode ,大多数与 web
相关的 Python 库都是这样处理文本的。如果你还不了解 Unicode ,最好先阅读
`The Absolute Minimum Every Software Developer Absolutely, Positively Must
Know About Unicode and Character Sets
<http://www.joelonsoftware.com/articles/Unicode.html>`_ 。
本文档尝试介绍一些基本的知识,以便于能够愉快地处理与 Unicode 相关的问题。

自动转换
--------

为了提供基本的、无痛的 Unicode 支持, Flask 做了以下假设:

- 你网站上文本编码是 UTF-8 。
- 你在内部对文本始终只使用 Unicode ,除非是只有 ASCII 字符的文字字符串。
- 只要通过协议传送字节,都离不开编码和解码过程。

所以,这对你来说有什么意义?

HTTP 是基于字节的,不仅是协议,用于定位服务器文档的系统也是这样(即 URI
或 URL )。然而,通常在 HTTP 上传送的 HTML 支持很多种字符集,并且需要在
HTTP header 中注明。为了避免不必要的复杂性, Flask 假设你发送的都是 UTF-8
编码的 Unicode,Flask 会为你完成编码工作,并设置适当的 header。

如果你使用 SQLAlchemy 或类似的 ORM 系统操作数据库,道理也是同样的。一些
数据库已经使用传输 Unicode 的协议,即使没有,SQLALchemy 或其它 ORM 也会
自动处理好这个问题。

金科玉律
---------------

So the rule of thumb: if you are not dealing with binary data, work with
Unicode. What does working with Unicode in Python 2.x mean?

- as long as you are using ASCII charpoints only (basically numbers,
some special characters of latin letters without umlauts or anything
fancy) you can use regular string literals (``'Hello World'``).
- if you need anything else than ASCII in a string you have to mark
this string as Unicode string by prefixing it with a lowercase `u`.
(like ``u'Hänsel und Gretel'``)
- if you are using non-Unicode characters in your Python files you have
to tell Python which encoding your file uses. Again, I recommend
UTF-8 for this purpose. To tell the interpreter your encoding you can
put the ``# -*- coding: utf-8 -*-`` into the first or second line of
your Python source file.
- Jinja is configured to decode the template files from UTF-8. So make
sure to tell your editor to save the file as UTF-8 there as well.

Encoding and Decoding Yourself
------------------------------

If you are talking with a filesystem or something that is not really based
on Unicode you will have to ensure that you decode properly when working
with Unicode interface. So for example if you want to load a file on the
filesystem and embed it into a Jinja2 template you will have to decode it
from the encoding of that file. Here the old problem that text files do
not specify their encoding comes into play. So do yourself a favour and
limit yourself to UTF-8 for text files as well.

Anyways. To load such a file with Unicode you can use the built-in
:meth:`str.decode` method::
经验法则:如果不是处理二进制数据,一律使用 Unicode 。在 Python 2.x 中,
如何使用 Unicode ?

- 只使用 ASCII charpoints (基本是数字、非变音或非奇特的拉丁字母)时,
可以使用常规的字符串常量( ``'Hello World'`` )。
- 如果你的字符串里有 ASCII 之外的东西,需要把这个字符串标记为 Unicode
字符串,方法是加上一个小写 `u` 作为前辍(比如
``u'Hänsel und Gretel'`` )
- 如果在 Python 文件中使用了非 Unicode 字符,那么需要告诉 Python 使用了
何种编码。这里,我再次建议使用 UTF-8 。你可以在 Python 源文件的第一行
或第二行写入 ``# -*- coding: utf-8 -*-`` 来告知解释器你的编码类型。
- Jinja 被配置为以 UTF-8 解码模板文件,所以请同时确保你的编辑器使用
UTF-8 编码保存文件。

自助编码和解码
--------------

如果你打交道的文件系统或环境不是真正基于 Unicode 编码的话,那么使用 Unicode
接口时需要妥善地解码。比如,当从文件系统中加载一个文件,并嵌入到 Jinja2
模板时,需要按照文件的编码来解码。这里有一个老问题就是文本文件不指定其本身
的编码。所以帮你自己一个忙,限定在文本文件中使用 UTF-8 。

无论如何,转入一个 Unicode 文件,可以使用内置的 :meth:`str.decode` 方法::

def read_file(filename, charset='utf-8'):
with open(filename, 'r') as f:
return f.read().decode(charset)

To go from Unicode into a specific charset such as UTF-8 you can use the
:meth:`unicode.encode` method::
Unicode 转换成指定的字符集( UTF-8 ),可以使用 :meth:`unicode.encode`
方法::

def write_file(filename, contents, charset='utf-8'):
with open(filename, 'w') as f:
f.write(contents.encode(charset))

Configuring Editors

配置编辑器
-------------------

Most editors save as UTF-8 by default nowadays but in case your editor is
not configured to do this you have to change it. Here some common ways to
set your editor to store as UTF-8:
现在的大多数编辑器默认存储为 UTF-8 ,但是如果你的编辑器不是,你必须重新配置。
下面是设置你编辑器存储为 UTF-8 的常用做法:

- Vim: put ``set enc=utf-8`` to your ``.vimrc`` file.
- Vim: 在你的 ``.vimrc`` 文件中加入 ``set enc=utf-8``

- Emacs: either use an encoding cookie or put this into your ``.emacs``
file::
- Emacs: 要么使用 encoding cookie,要么把这段文字加入到你的 ``.emacs``
文件::

(prefer-coding-system 'utf-8)
(setq default-buffer-file-coding-system 'utf-8)

- Notepad++:

1. Go to *Settings -> Preferences ...*
2. Select the "New Document/Default Directory" tab
3. Select "UTF-8 without BOM" as encoding
1. 打开 *设置 -> 首选项 ...*
2. 选择“新建/缺省路径”选项卡
3. 选择“ UTF-8 无 BOM ”作为编码

同样也建议使用 Unix 的换行格式,可以在相同的面板中选择,但不是必须的。

It is also recommended to use the Unix newline format, you can select
it in the same panel but this is not a requirement.

0 comments on commit 5682ed6

Please sign in to comment.