/
increasing-performance.txt
265 lines (194 loc) · 8.26 KB
/
increasing-performance.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
Measuring and Increasing Performance
====================================
"Premature optimization is the root of all evil (or at least most of
it) in programming." Donald Knuth.
In other words, know thy code! The only way to find performance
bottlenecks is to profile your code. Unfortunately, the situation is
a bit more complex in Python than you would like it to be: see
http://docs.python.org/lib/profile.html. Briefly, there are three
(!?) standard profiling systems that come with Python: profile,
cProfile (only since python 2.5!), and hotshot (thought note that
profile and cProfile are Python and C implementations of the same
API). There is also a separately maintained one called statprof, that
I nominally maintain.
The ones included with Python are deterministic profilers, while
statprof is a statistical profiler. What's the difference? To steal
from the Python docs:
Deterministic profiling is meant to reflect the fact that all function
call, function return, and exception events are monitored, and precise
timings are made for the intervals between these events (during which
time the user's code is executing). In contrast, statistical profiling
randomly samples the effective instruction pointer, and deduces where
time is being spent. The latter technique traditionally involves less
overhead (as the code does not need to be instrumented), but provides
only relative indications of where time is being spent.
Let's go to the examples. Suppose we have two functions 'count1'
and 'count2', and we want to run both and see where time is spent.
-----
Here's some example hotshot code: ::
import hotshot, hotshot.stats
prof = hotshot.Profile('hotshot.prof')
prof.runcall(count1)
prof.runcall(count2)
prof.close()
stats = hotshot.stats.load('hotshot.prof')
stats.sort_stats('time', 'calls')
stats.print_stats(20)
and the resulting output: ::
2 function calls in 5.769 CPU seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 4.335 4.335 4.335 4.335 count.py:8(count2)
1 1.434 1.434 1.434 1.434 count.py:1(count1)
0 0.000 0.000 profile:0(profiler)
-----
Here's some example cProfile code: ::
def runboth():
count1()
count2()
import cProfile, pstats
cProfile.run('runboth()', 'cprof.out')
p = pstats.Stats('cprof.out')
p.sort_stats('time').print_stats(10)
and the resulting output: ::
Wed Jun 13 00:11:55 2007 cprof.out
7 function calls in 5.643 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.817 3.817 4.194 4.194 count.py:8(count2)
1 1.282 1.282 1.450 1.450 count.py:1(count1)
2 0.545 0.272 0.545 0.272 {range}
1 0.000 0.000 5.643 5.643 run-cprofile:8(runboth)
1 0.000 0.000 5.643 5.643 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
-----
And here's an example of statprof, the statistical profiler: ::
import statprof
statprof.start()
count1()
count2()
statprof.stop()
statprof.display()
And the output: ::
% cumulative self
time seconds seconds name
74.66 4.10 4.10 count.py:8:count2
25.34 1.39 1.39 count.py:1:count1
0.00 5.49 0.00 run-statprof:2:<module>
---
Sample count: 296
Total time: 5.490000 seconds
Which profiler should you use?
------------------------------
statprof used to report more accurate numbers than hotshot or
cProfile, because hotshot and cProfile had to instrument the code
(insert tracing statements, basically). However, the numbers shown
above are pretty similar to each other and I'm not sure there's much
of a reason to choose between them any more. So, I recommend starting
with cProfile, because it's the officially supported one.
One note -- none of these profilers really work all that well with
threads, for a variety of reasons. You're best off doing performance
measurements on non-threaded code.
Measuring code snippets with timeit
-----------------------------------
There's also a simple timing tool called timeit: ::
from timeit import Timer
from count import *
t1 = Timer("count1()", "from count import count1")
print 'count1:', t1.timeit(number=1)
t2 = Timer("count2()", "from count import count2")
print 'count2:', t2.timeit(number=1)
Speeding Up Python
==================
There are a couple of options for speeding up Python.
psyco
-----
(Taken almost verbatim from the `psyco introduction <http://psyco.sourceforge.net/introduction.html>`__!)
psyco is a specializing compiler that lets you run your existing
Python code much faster, with *absolutely no change* in your source
code. It acts like a just-in-time compiler by rewriting several
versions of your code blocks and then optimizing them by specializing
the variables they use.
The main benefit is that you get a 2-100x speed-up with an unmodified Python
interpreter and unmodified source code. (You just need to import psyco.)
The main drawbacks are that it only runs on i386-compatible processors
(so, not PPC Macs) and it's a bit of a memory hog.
For example, if you use the prime number generator generator code (see
`Idiomatic Python <idiomatic-python.txt>`__) to generate all primes
under 100000, it takes about 10.4 seconds on my development server.
With psyco, it takes about 1.6 seconds (that's about a 6x speedup).
Even when doing less numerical stuff, I see at least a 2x speedup.
Installing psyco
~~~~~~~~~~~~~~~~
(Note: psyco is an extension module and does not come in pre-compiled
form. Therefore, you will need to have a Python-compatible C compiler
installed in order to install psyco.)
Grab the latest psyco snapshot from here: ::
http://psyco.sourceforge.net/psycoguide/sources.html
unpack it, and run 'python setup.py install'.
Using psyco
~~~~~~~~~~~
Put the following code at the top of your __main__ Python script: ::
try:
import psyco
psyco.full()
except ImportError:
pass
...and you're done. (Yes, it's magic!)
The only place where psyco won't help you much is when you have
already recoded the CPU-intensive component of your code into an
extension module.
pyrex
-----
pyrex is a Python-like language used to create C modules for Python.
You can use it for two purposes: to increase performance by
(re)writing your code in C (but with a friendly extension language),
and to make C libraries available to Python.
In the context of speeding things up, here's an example program: ::
def primes(int maxprime):
cdef int n, k, i
cdef int p[100000]
result = []
k = 0
n = 2
while n < maxprime:
i = 0
# test against previous primes
while i < k and n % p[i] <> 0:
i = i + 1
# prime? if so, save.
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1
return result
To compile this, you would execute: ::
pyrexc primes.pyx
gcc -c -fPIC -I /usr/local/include/python2.5 primes.c
gcc -shared primes.o -o primes.so
Or, more nicely, you can write a setup.py using some of the Pyrex
helper functions: ::
from distutils.core import setup
from distutils.extension import Extension
from Pyrex.Distutils import build_ext # <--
setup(
name = "primes",
ext_modules=[
Extension("primes", ["primes.pyx"], libraries = [])
],
cmdclass = {'build_ext': build_ext}
)
A few notes:
- 'cdef' is a C definition statement
- this is a "python-alike" language but not Python, per se ;)
- pyrex does handle a lot of the nasty C extension stuff for you.
There's an excellent guide to Pyrex available online here:
http://ldots.org/pyrex-guide/.
I haven't used Pyrex much myself, but I have a friend who swears by
it. My concerns are that it's a "C/Python-alike" language but not C
or Python, and I have already memorized too many weird rules about too
many languages!
We'll encounter Pyrex a bit further down the road in the context of
linking existing C/C++ code into your own code.
.. @CTB will we?? ;)