Using Unicode.translate to speed up html and js escaping #53

Merged
merged 0 commits into from Sep 18, 2012

Projects

None yet

4 participants

@etianen
Contributor
etianen commented May 10, 2012

Django is currently making use of inefficient repeated called to .replace() to perform html and js string escaping. Given that these functions are used all over the codebase, and typically called a hundred or so times per request, this is a sweet spot for some simple optimisation.

By replacing the repeated .replace() calls with a single call to .translate(), we improve efficiency with no side-effects (so far as I can see). All tests pass, certainly.

A discussion on the relative benefits of .replace() versus .translate() can be found here:

http://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python

(Benchmarks in the top answer)

@alex
Member
alex commented May 10, 2012

I actually benchmarked this a while ago and was shocked to find that there didn't appear to be a difference (this would have been on CPython 2.6 or 2.7). Have you run benchmarks specifically on Django with this method to see if there's a difference?

@etianen
Contributor
etianen commented May 10, 2012

So I knocked up this benchmark.

https://gist.github.com/2656581

I ran it using both the current Django master, and my branch. This was done using CPython 2.7 on a rather powerful MacBook Pro laptop.

django/django:

escape_text      : 0.537447929382
escape_html      : 0.605153083801
escapejs_text    : 10.5186798573
escapejs_html    : 10.7592608929

django-html-escape

escape_text      : 0.981202125549
escape_html      : 1.47704315186
escapejs_text    : 0.941799879074
escapejs_html    : 1.29069900513

It would seem that using translate actually makes the escape() function slower, but substantially speeds up the escapejs() function. This is presumably because the repeated iteration in escapejs is the killer, not actually an inherent slowness in repeated called to replace().

What's really odd is that when I was benchmarking this in Python 3 a while back, I found translate() to be faster than chained replace() calls.

I guess the solution here is to only use translate() in the escapejs() function. I've updated my pull request accordingly.

@aaugustin
Member

@simukis We care about code readability, but we don't consider wrapping at 80 cols to be an important factor. We also avoid committing cosmetic changes.

@etianen A 10-times speedup in escape_js is definitely interesting.

@etianen
Contributor
etianen commented Sep 8, 2012

So you'll pull? Go on... let me feel useful! Run the benchmarks yourself with and without the pull request:

https://gist.github.com/2656581

@apollo13
Member

@etianen The pull request isn't mergeable in it's current form, also make sure it works on Python 3 too, thx!

@etianen etianen merged commit 70248cc into django:master Sep 18, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment