-
Notifications
You must be signed in to change notification settings - Fork 7
Review template quoting, sanitize templates #545
Comments
valya: Usage of encodeURIComponent function here is inappropriate since it encodes entire URI, rather then parameter values. For instance, let consider two calls from JS code shown in das_table.tmpl (I made stand-alone app to show this feature): In first case we encode all parameters with encodeURIComponent inside of das_table.tmpl, e.g. {{{ }}} and in this case the server get the following list of arguments {{{ input kwargs {'input=find data&idx=0&limit=10': ''}127.0.0.1 - - 966a4d2676b7013bd1fb3e238092cbc550322419Oct/2010:15:05:42] "GET /asearch?input%3Dfind%20data%26idx%3D0%26limit%3D10 HTTP/1.1" 200 236 "" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5" }}} In this case, all parameters are smashed into a single string and passed along to the server, while using urllib.quote_plus {{{ }}} we get correct argument list on a server side {{{ input kwargs {'input': 'find data', 'limit': '10', 'idx': '0'}127.0.0.1 - - 966a4d2676b7013bd1fb3e238092cbc550322419Oct/2010:15:06:30] "GET /asearch?input=find+data&idx=0&limit=10 HTTP/1.1" 200 236 "" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5" }}} So, without doubts parameters need to be checked and properly encoded, but in this case usage of JS function is inappropriate. |
valya: Replying to [ticket:545 valya]:
Lassi, I walk through templates and don't see obvious places which concerns me. I also not sure what do you mean by asking to sanitize every template. The data passed to templates and internal data of DAS server. For instance if in server I define a variable foo=1 and pass it into template which will use as $foo, does it require sanitization? I would appreciate if you'll pick up two distinguished examples and ask specific question which bother you about parameters passed to templates. I provided a comment about usage encodeURIComponent in das_table template. All templates are used to build HTML pages, some of them construct HTML links and my understanding that only those parts need to be sanitized to ensure that constructed HTML links are properly encoded. |
valya: Lassi I would appreciate if you provide your feedback to my comments. In fact, to make your life easier I did clean the code and remove das_table.tmpl and associated code from DAS web server due to the fact that I no longer use this template. If you can find another example of inappropriate encoding in templates it would be helpful to me. |
lat: Everywhere where you produce HTML, you should sanitise / quote the value unless you know for sure it cannot be unsafe (e.g. it's an integer). HTML has two quoting contexts, URLs ({{{%nn}}}) and HTML ({{{&}}}), with subtly different rules what to quote. Equally whenever you use javascript to generate HTML, you need to quote. Then finally whenever you make requests from javascript via XHR, you again need to quote the arguments, unless you can without doubt prove that the argument is safe without quoting (e.g. integer). Note that lack of quoting has been used in numerous attacks. Remember the attacks where people used funny searches - which just happened to <script> tags - to cause later results for "most recent searches" to cause arbitrary javascript to run, and worse yet, it runs in the security context of the originating page. It's not enough to say "it came from core server, it must be safe", if it's possible for someone to inject the data to the core server in the first place, either in form of a query, or database used, or some upstream source (e.g. funny file name). So the general rule is that everything must be quoted by default, and you need to show positive proof why it is not necessary to quote the material. |
valya: Lassi I am well aware of this general principles. What I asked is slightly different. Let me make concrete example without asking to look at specific template. All our HTML is constructed via template. The template parameters are passed by server in variety of forms, e.g. strings, ints, lists, dicts, etc.. All of them are not part of the request and rather the results of it. The server get/construct data upon request from its back-end/other sources. The data is the python data which are passed to the template. My question is do you want to sanitize all python data we pass to template or not? For example, the server has the following part: {{{ The template on another hand has code logic: {{{ So what do you want to sanitize here? The ''$key, $val'' or you don't need to do that? My understanding that you want to sanitize parts which construct either URLs or JavaScript, but not the HTML body. So if I have code like {{{ then both ''$mykey'' and ''$myval'' need to be sanitized. It would be nice if we will discuss this particular example to clarify things. Feel free to modify it to specify patterns/problems. |
lat: Thanks for the example. You should sanitise everything. Only if you know for sure the data is safe you can omit it. In particular, if you are representing data from another server, then by definition you cannot be sure of the contents, and you should sanitise it. An example of safe not to sanitise is that say you page the data, and you set the page number in your own server code, and it cannot ever be anything other than an integer, and it's 100% obvious from inspecting the code that is the case, then you could not sanitise it. But if the page number is coming from the client XHR request, then you do need to sanitise it. So by default you sanitise absolutely everything - HTML, URL (when generating e.g. attributes) and javascript (if you generate any) and in browser side, anything entered into URL arguments when making XHR requests or building URLs to refer to (=~ use encodeURIComponent()). |
lat: So quoting your example: {{{ And: {{{ In javascript: {{{ |
lat: In short, always sanitise everything :-) |
valya: I started this work and found it's going to be real fun. Here is example when I cannot quote: {{{ Another example {{{
$results
}}}
So, what I want to say that just looking at templates it is not obvious to blindly require to quote every single variable with dollar prefix. What I can do is provide comments in template about variable which does not require the quoting. |
lat: Yes, quoting is fun, especially avoiding double quoting. If you can prove a value is without shadow of doubt an integer, it doesn't need to be quoted. That requires it's not tainted, e.g. the value didn't originate from the client request in any way (e.g. current page number), or from upstream data source. Looking at the source code it must be without any doubt clear it's obviously an integer, and cannot ever have non-integer values. As you suggest it's helpful to have comments explaining what the data is. Your second example is a good example why it's better to design the application so it has single pass over data, and quite possibly, prefers json/xml over html. So I'd recommend to change the design so that it's always clear if the data is tainted or untainted. Mixing content like done in your example is a good example to avoid: it's unclear what state the data is in. It would be very much more difficult for anyone reading the code to understand the multi-layer templating. |
valya: Replying to [comment:11 lat]:
The proof can be easily done inside template by using
I tend to agree, even it's very seducing to dynamically create content using nested templates. |
lat: Yes asserts are fine if they work inside the template engine. As for page contents, I tend to prefer to use javascript for rendering :-) |
valya: One more question. I show on a web page the data-records via the following template (das_code.tmpl) {{{ $code }}} with server code like this one: {{{ The "code" here represents DAS records (I should probably rename it for clarity, but it's not a point for this question), which I don't know a-priory, since DAS is data agnostic. All records are JSON and retrieved from various DAS collections (MongoDB analog of tables). I represents them in different format to allow users to see them appropriately. Obviously I don't want to quote them, but at the same time I don't know their structure. How to address this issue, does the usage of pre tag, e.g. " $code", is sufficient in this case? |
lat: I am not sure I understand the question, but if 'code', i.e. result of pformat, is arbitrary text data, you need to sanitise it. Using doesn't affect text interpretation, only appearance/formatting (mono-width, follow line breaks), so that's immaterial. |
valya: Let me make example. DAS aggregates data from different data-services. One data-service, e.g. DBS, yields in DAS the following document {{{ and another data-service, e.g. PhEDEx, yields in DAS the following record: {{{ DAS does not know a-priory the structure of those documents, we just store records in DB. On a web interface we want to see our aggregated records. The server gets data from its DB (MongoDB) and send to templates. I provided templates to view records in different representations (using pformat, etc.) The point is that I don't know in my code the structure of the record, it can be a dict, a list, a dict with lists, a dict with another dict which by itself can contain strings, integers, lists, etc. What I know is that all records are JSON. My question were:
|
lat: If you are putting arbitrary text into a html page, you have to sanitise the text string. In your case it happens the text is actually json. That doesn't really affect anything, as far as I can tell. |
valya: Yes, but as a user you want to see this {{{ instead of this {{{ So it is UI problem. May be we should reconsider what we should display to end-users and provide a link to method which will return JSON docs, instead of HTML. |
lat: I am very confused. Sanitising to HTML does not convert the text to the form you quote. It will replace " with ", < with < and so on. |
meloam: Replying to [comment:19 lat]:
I think val is using urlencoding and not escaping html entities. One generates ampersands, the other makes percents. If he's using these strings to be displayed in html, he needs html escaping. If it goes in a url, it needs url encoding. |
valya: I used python urllib.quote function. {{{ {{{ So, do we have python library for sanitization? |
meloam: Replying to [comment:21 valya]:
the cgi module has an escape() function, but I don't know if you want to pull that dependency in. I'm sure you and lassi could come up with a suitable replacement, the rules are pretty simple. |
valya: I can use cgi.escape which says
Do we need to escape (") character? If so all JSON will look like { "test" : 1} on a page. |
meloam: Replying to [comment:23 valya]:
It won't. Web browsers universally interpret " to mean ' when rendering the body of a webpage. There's a whole slew of HTML special entities that get interpreted that way. |
meloam: Replying to [comment:24 meloam]:
Sorry, I meant that " gets translated to ", not ' .. (how do you quote a quotation character??) |
lat: As per the earlier reply, there are two distinct contexts for sanitisation in HTML (html context, url context), and you need to use quoting primitives according to the context of contents. There's no particular need to sanitise double or single quotes, that was just an example. You should use existing library functions for sanitisation, not roll your own. IIRC the routines are cgi.escape() for HTML context and urllib.quote_plus() for URL context. |
lat: OK, right, so there's three different sanitisation contexts, one more for HTML non-URL attributes. Anyway, still, should use existing quoting functions if at all possible. Last thing I want us to debug is bugs in someone's sanitisation routines. |
valya: Thanks for clarifications. Now it becomes clear how I can accomplish this. Here is layout. I made a new function {{{ which can be used to sanitize non-URL content. The templates will get it as simple as {{{ $quote($jsoncode) }}} and I'll use urlllib.quote_plus for sanitization of URL attributes in HTML and encodeURIComponent for JS parts. I will also provide appropriate assert statements to check passed arguments types. |
valya: (In adb3d0e) Sanitize DAS templates, fixes #545 Signed-off-by: Valentin Kuznetsov vkuznet@gmail.com |
From #290
I didn't understand the addition of urllib quoting in, for example, das_table.tmpl. Shouldn't you use encodeURIComponent in javascript code / arguments, and urllib when quoting something originating from DAS server itself? To me it seems you are now sometimes quoting javascript itself, not the javascript variable value.
Also I note here that the quoting wasn't added universally everywhere - not in all templates, and not even systematically in the one example I happened to quote, das_table.tmpl. As I wrote before, it looks like every template needs to be sanitised. I can't easily tell which values are safe.
The text was updated successfully, but these errors were encountered: