Skip to content
This repository has been archived by the owner on May 28, 2021. It is now read-only.

Review template quoting, sanitize templates #545

Closed
vkuznet opened this issue Oct 7, 2010 · 29 comments
Closed

Review template quoting, sanitize templates #545

vkuznet opened this issue Oct 7, 2010 · 29 comments
Assignees

Comments

@vkuznet
Copy link
Contributor

vkuznet commented Oct 7, 2010

From #290

I didn't understand the addition of urllib quoting in, for example, das_table.tmpl. Shouldn't you use encodeURIComponent in javascript code / arguments, and urllib when quoting something originating from DAS server itself? To me it seems you are now sometimes quoting javascript itself, not the javascript variable value.

Also I note here that the quoting wasn't added universally everywhere - not in all templates, and not even systematically in the one example I happened to quote, das_table.tmpl. As I wrote before, it looks like every template needs to be sanitised. I can't easily tell which values are safe.

@vkuznet
Copy link
Contributor Author

vkuznet commented Oct 12, 2010

valya: Usage of encodeURIComponent function here is inappropriate since it encodes entire URI, rather then parameter values. For instance, let consider two calls from JS code shown in das_table.tmpl (I made stand-alone app to show this feature):

In first case we encode all parameters with encodeURIComponent inside of das_table.tmpl, e.g.

{{{
initialRequest: encodeURIComponent("input=find data&idx=0&limit=10")

}}}

and in this case the server get the following list of arguments

{{{

input kwargs {'input=find data&idx=0&limit=10': ''}

127.0.0.1 - - 966a4d2676b7013bd1fb3e238092cbc550322419Oct/2010:15:05:42] "GET /asearch?input%3Dfind%20data%26idx%3D0%26limit%3D10 HTTP/1.1" 200 236 "" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5"

}}}

In this case, all parameters are smashed into a single string and passed along to the server, while using urllib.quote_plus

{{{
initialRequest: encodeURIComponent("input="+urllib.quote_plus("find data") + "&idx=0&limit=10")

}}}

we get correct argument list on a server side

{{{

input kwargs {'input': 'find data', 'limit': '10', 'idx': '0'}

127.0.0.1 - - 966a4d2676b7013bd1fb3e238092cbc550322419Oct/2010:15:06:30] "GET /asearch?input=find+data&idx=0&limit=10 HTTP/1.1" 200 236 "" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5"

}}}

So, without doubts parameters need to be checked and properly encoded, but in this case usage of JS function is inappropriate.

@vkuznet
Copy link
Contributor Author

vkuznet commented Oct 13, 2010

valya: Replying to [ticket:545 valya]:

Also I note here that the quoting wasn't added universally everywhere - not in all templates, and not even systematically in the one example I happened to quote, das_table.tmpl. As I wrote before, it looks like every template needs to be sanitised. I can't easily tell which values are safe.

Lassi, I walk through templates and don't see obvious places which concerns me. I also not sure what do you mean by asking to sanitize every template. The data passed to templates and internal data of DAS server. For instance if in server I define a variable foo=1 and pass it into template which will use as $foo, does it require sanitization? I would appreciate if you'll pick up two distinguished examples and ask specific question which bother you about parameters passed to templates. I provided a comment about usage encodeURIComponent in das_table template. All templates are used to build HTML pages, some of them construct HTML links and my understanding that only those parts need to be sanitized to ensure that constructed HTML links are properly encoded.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 12, 2010

valya: Lassi I would appreciate if you provide your feedback to my comments. In fact, to make your life easier I did clean the code and remove das_table.tmpl and associated code from DAS web server due to the fact that I no longer use this template. If you can find another example of inappropriate encoding in templates it would be helpful to me.

@ghost
Copy link

ghost commented Nov 18, 2010

lat: Everywhere where you produce HTML, you should sanitise / quote the value unless you know for sure it cannot be unsafe (e.g. it's an integer). HTML has two quoting contexts, URLs ({{{%nn}}}) and HTML ({{{&}}}), with subtly different rules what to quote.

Equally whenever you use javascript to generate HTML, you need to quote.

Then finally whenever you make requests from javascript via XHR, you again need to quote the arguments, unless you can without doubt prove that the argument is safe without quoting (e.g. integer).

Note that lack of quoting has been used in numerous attacks. Remember the attacks where people used funny searches - which just happened to <script> tags - to cause later results for "most recent searches" to cause arbitrary javascript to run, and worse yet, it runs in the security context of the originating page. It's not enough to say "it came from core server, it must be safe", if it's possible for someone to inject the data to the core server in the first place, either in form of a query, or database used, or some upstream source (e.g. funny file name).

So the general rule is that everything must be quoted by default, and you need to show positive proof why it is not necessary to quote the material.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 19, 2010

valya: Lassi I am well aware of this general principles. What I asked is slightly different. Let me make concrete example without asking to look at specific template.

All our HTML is constructed via template. The template parameters are passed by server in variety of forms, e.g. strings, ints, lists, dicts, etc.. All of them are not part of the request and rather the results of it. The server get/construct data upon request from its back-end/other sources. The data is the python data which are passed to the template. My question is do you want to sanitize all python data we pass to template or not?

For example, the server has the following part:

{{{
def bla(self, ...):
mydata = dict(mydict={...}, mylist=[...], mykey=something, myval=something)
return template('template_name', data=mydata)
}}}

The template on another hand has code logic:

{{{
#for key, val in $data.items()

$key$val #end for }}}

So what do you want to sanitize here? The ''$key, $val'' or you don't need to do that?

My understanding that you want to sanitize parts which construct either URLs or JavaScript, but not the HTML body. So if I have code like

{{{
<a href="http://a.b.com/method?$mykey=$myval
}}}

then both ''$mykey'' and ''$myval'' need to be sanitized.

It would be nice if we will discuss this particular example to clarify things. Feel free to modify it to specify patterns/problems.

@ghost
Copy link

ghost commented Nov 19, 2010

lat: Thanks for the example. You should sanitise everything. Only if you know for sure the data is safe you can omit it. In particular, if you are representing data from another server, then by definition you cannot be sure of the contents, and you should sanitise it.

An example of safe not to sanitise is that say you page the data, and you set the page number in your own server code, and it cannot ever be anything other than an integer, and it's 100% obvious from inspecting the code that is the case, then you could not sanitise it. But if the page number is coming from the client XHR request, then you do need to sanitise it.

So by default you sanitise absolutely everything - HTML, URL (when generating e.g. attributes) and javascript (if you generate any) and in browser side, anything entered into URL arguments when making XHR requests or building URLs to refer to (=~ use encodeURIComponent()).

@ghost
Copy link

ghost commented Nov 19, 2010

lat: So quoting your example:

{{{
#for key, val in $data.items()

$quote($key)$quote($val) #end for }}}

And:

{{{
<a href="http://a.b.com/method?$urlquote($mykey)=$urlquote($myval)
}}}

In javascript:

{{{
var name = ...;
var href = "/some/base?name=" + encodeURIComponent(name);
}}}

@ghost
Copy link

ghost commented Nov 19, 2010

lat: In short, always sanitise everything :-)

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 29, 2010

valya: I started this work and found it's going to be real fun. Here is example when I cannot quote:

{{{

#set total = $idx+$limit
#if $total>$nrows
#set total=$nrows
#end if
#set startidx=$idx+1
$startidx-$total

}}}
the idx/limit are passed to template as integers.

Another example

{{{

$results

}}}
here "$results" are passed by DAS server code from another template. So the flow is the following

  • DAS get some data and pass it to template print time in human readable format #1
  • it assigns variable "results" to the template itself, such that results is a string which contains HTML snippet (and may be JS)
  • I pass "results" into new template where it gets inserted into div tag. If I'll quote results in aforementioned div tag I can screw my HTML snippet.

So, what I want to say that just looking at templates it is not obvious to blindly require to quote every single variable with dollar prefix. What I can do is provide comments in template about variable which does not require the quoting.

@ghost
Copy link

ghost commented Nov 29, 2010

lat: Yes, quoting is fun, especially avoiding double quoting.

If you can prove a value is without shadow of doubt an integer, it doesn't need to be quoted. That requires it's not tainted, e.g. the value didn't originate from the client request in any way (e.g. current page number), or from upstream data source. Looking at the source code it must be without any doubt clear it's obviously an integer, and cannot ever have non-integer values. As you suggest it's helpful to have comments explaining what the data is.

Your second example is a good example why it's better to design the application so it has single pass over data, and quite possibly, prefers json/xml over html. So I'd recommend to change the design so that it's always clear if the data is tainted or untainted. Mixing content like done in your example is a good example to avoid: it's unclear what state the data is in. It would be very much more difficult for anyone reading the code to understand the multi-layer templating.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 29, 2010

valya: Replying to [comment:11 lat]:

Yes, quoting is fun, especially avoiding double quoting.

If you can prove a value is without shadow of doubt an integer, it doesn't need to be quoted. That requires it's not tainted, e.g. the value didn't originate from the client request in any way (e.g. current page number), or from upstream data source. Looking at the source code it must be without any doubt clear it's obviously an integer, and cannot ever have non-integer values. As you suggest it's helpful to have comments explaining what the data is.

The proof can be easily done inside template by using
{{{
assert isinstance(idx, int)
}}}
statements, I think it would be sufficient.

Your second example is a good example why it's better to design the application so it has single pass over data, and quite possibly, prefers json/xml over html. So I'd recommend to change the design so that it's always clear if the data is tainted or untainted. Mixing content like done in your example is a good example to avoid: it's unclear what state the data is in. It would be very much more difficult for anyone reading the code to understand the multi-layer templating.

I tend to agree, even it's very seducing to dynamically create content using nested templates.

@ghost
Copy link

ghost commented Nov 30, 2010

lat: Yes asserts are fine if they work inside the template engine. As for page contents, I tend to prefer to use javascript for rendering :-)

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 30, 2010

valya: One more question. I show on a web page the data-records via the following template (das_code.tmpl)

{{{

$code

}}}

with server code like this one:

{{{
record = self.dasmgr.mapping.api_info(name)
show = kwargs.get('show', 'json')
page = "DAS mapping record"
if show == 'json':
jsoncode = {'jsoncode': json2html(record, "")}
page += self.templatepage('das_json', **jsoncode)
elif show == 'code':
code = pformat(record, indent=1, width=100)
page += self.templatepage('das_code', code=code)
else:
code = yaml.dump(record, width=100, indent=4,
default_flow_style=False)
page += self.templatepage('das_code', code=code)
}}}

The "code" here represents DAS records (I should probably rename it for clarity, but it's not a point for this question), which I don't know a-priory, since DAS is data agnostic. All records are JSON and retrieved from various DAS collections (MongoDB analog of tables). I represents them in different format to allow users to see them appropriately. Obviously I don't want to quote them, but at the same time I don't know their structure. How to address this issue, does the usage of pre tag, e.g. "

$code
", is sufficient in this case?

@ghost
Copy link

ghost commented Nov 30, 2010

lat: I am not sure I understand the question, but if 'code', i.e. result of pformat, is arbitrary text data, you need to sanitise it. Using

 doesn't affect text interpretation, only appearance/formatting (mono-width, follow line breaks), so that's immaterial.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 30, 2010

valya: Let me make example. DAS aggregates data from different data-services. One data-service, e.g. DBS, yields in DAS the following document

{{{
{'test': 1}
}}}

and another data-service, e.g. PhEDEx, yields in DAS the following record:

{{{
{'test': 1, 'site': [{'se':'A'}, {'se': 'B'}]
}}}

DAS does not know a-priory the structure of those documents, we just store records in DB.

On a web interface we want to see our aggregated records. The server gets data from its DB (MongoDB) and send to templates. I provided templates to view records in different representations (using pformat, etc.) The point is that I don't know in my code the structure of the record, it can be a dict, a list, a dict with lists, a dict with another dict which by itself can contain strings, integers, lists, etc. What I know is that all records are JSON. My question were:

  • how I can put JSON into our HTML?
  • should we sanitize the JSON data? if so, how to generalize sanitization of arbitrary JSON structure?

@ghost
Copy link

ghost commented Nov 30, 2010

lat: If you are putting arbitrary text into a html page, you have to sanitise the text string. In your case it happens the text is actually json. That doesn't really affect anything, as far as I can tell.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 30, 2010

valya: Yes, but as a user you want to see this

{{{
{
"test":1
}
}}}

instead of this

{{{
%7B%0A%20%22test%22%3A1%20%0A%7D
}}}

So it is UI problem. May be we should reconsider what we should display to end-users and provide a link to method which will return JSON docs, instead of HTML.

@ghost
Copy link

ghost commented Nov 30, 2010

lat: I am very confused. Sanitising to HTML does not convert the text to the form you quote. It will replace " with ", < with < and so on.

@PerilousApricot
Copy link
Member

meloam: Replying to [comment:19 lat]:

I am very confused. Sanitising to HTML does not convert the text to the form you quote. It will replace " with ", < with < and so on.

I think val is using urlencoding and not escaping html entities. One generates ampersands, the other makes percents. If he's using these strings to be displayed in html, he needs html escaping. If it goes in a url, it needs url encoding.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 30, 2010

valya: I used python urllib.quote function.

{{{
print sss
{
"test":1
}
}}}

{{{
print urllib.quote(sss)
%7B%0A%20%22test%22%3A1%20%0A%7D
}}}

So, do we have python library for sanitization?

@PerilousApricot
Copy link
Member

meloam: Replying to [comment:21 valya]:

I used python urllib.quote function.

{{{
print sss
{
"test":1
}
}}}

{{{
print urllib.quote(sss)
%7B%0A%20%22test%22%3A1%20%0A%7D
}}}

So, do we have python library for sanitization?

the cgi module has an escape() function, but I don't know if you want to pull that dependency in. I'm sure you and lassi could come up with a suitable replacement, the rules are pretty simple.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 30, 2010

valya: I can use cgi.escape which says

Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.

Do we need to escape (") character? If so all JSON will look like

{ "test" : 1}

on a page.

@PerilousApricot
Copy link
Member

meloam: Replying to [comment:23 valya]:

I can use cgi.escape which says

Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.

Do we need to escape (") character? If so all JSON will look like

{ "test" : 1}

on a page.

It won't. Web browsers universally interpret " to mean ' when rendering the body of a webpage. There's a whole slew of HTML special entities that get interpreted that way.

@PerilousApricot
Copy link
Member

meloam: Replying to [comment:24 meloam]:

Replying to [comment:23 valya]:

I can use cgi.escape which says

Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.

Do we need to escape (") character? If so all JSON will look like

{ "test" : 1}

on a page.

It won't. Web browsers universally interpret " to mean ' when rendering the body of a webpage. There's a whole slew of HTML special entities that get interpreted that way.

Sorry, I meant that " gets translated to ", not ' .. (how do you quote a quotation character??)

@ghost
Copy link

ghost commented Nov 30, 2010

lat: As per the earlier reply, there are two distinct contexts for sanitisation in HTML (html context, url context), and you need to use quoting primitives according to the context of contents. There's no particular need to sanitise double or single quotes, that was just an example. You should use existing library functions for sanitisation, not roll your own. IIRC the routines are cgi.escape() for HTML context and urllib.quote_plus() for URL context.

@PerilousApricot
Copy link
Member

meloam: Replying to [comment:26 lat]:

As per the earlier reply, there are two distinct contexts for sanitisation in HTML (html context, url context), and you need to use quoting primitives according to the context of contents. There's no particular need to sanitise double or single quotes, that was just an example. You should use existing library functions for sanitisation, not roll your own. IIRC the routines are cgi.escape() for HTML context and urllib.quote_plus() for URL context.

There's certainly no need to escape quotes if you're just writing to the body of a tag, but it doesn't hurt anything, and it keeps you from having to be super careful about remembering to do the extra escaping when you put user-data into attributes of tags (for example, USERTEXT could end up being bad if the user puts the test as:

" /><script lang=blah blah blah,etc"

@ghost
Copy link

ghost commented Nov 30, 2010

lat: OK, right, so there's three different sanitisation contexts, one more for HTML non-URL attributes. Anyway, still, should use existing quoting functions if at all possible. Last thing I want us to debug is bugs in someone's sanitisation routines.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 30, 2010

valya: Thanks for clarifications. Now it becomes clear how I can accomplish this. Here is layout. I made a new function

{{{
def quote(data):
"""
Sanitize the data using cgi.escape.
"""
if isinstance(data, int) or isinstance(data, float):
res = data
else:
res = cgi.escape(str(data), quote=True)
return res
}}}

which can be used to sanitize non-URL content. The templates will get it as simple as

{{{
#from DAS.web.utils import quote
#assert isinstance($jsoncode, str)

$quote($jsoncode)

}}}

and I'll use urlllib.quote_plus for sanitization of URL attributes in HTML and encodeURIComponent for JS parts. I will also provide appropriate assert statements to check passed arguments types.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 30, 2010

valya: (In adb3d0e) Sanitize DAS templates, fixes #545

Signed-off-by: Valentin Kuznetsov vkuznet@gmail.com

@ghost ghost assigned vkuznet Jul 24, 2012
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants