-
Notifications
You must be signed in to change notification settings - Fork 292
/
performance.pod6
232 lines (159 loc) · 10.6 KB
/
performance.pod6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
=begin pod
=TITLE Performance
=SUBTITLE Measuring and improving run-time or compile-time performance
This page is about anything to do with L<computer performance|https://en.wikipedia.org/wiki/Computer_performance>
in the context of Perl 6.
=head1 First, clarify the problem
B<Make sure you're not wasting time on the wrong code>: start by identifying your
L<"critical 3%"|https://en.wikiquote.org/wiki/Donald_Knuth> by "profiling" as explained below.
=head2 Time with C<now - INIT now>
Expressions of the form C<now - INIT now>, where C<INIT> is a
L<phase in the running of a Perl 6 program|/language/phasers>, provide a great idiom for timing code snippets.
Use the C<m: your code goes here> L<#perl6 channel evalbot|/language/glossary#camelia>
to write lines like:
m: say now - INIT now
rakudo-moar abc1234: OUTPUT«0.0018558»
The C<now> to the left of C<INIT> runs 0.0018558 seconds I<later> than the C<now> to the right of the C<INIT>
because the latter occurs during L<the INIT phase|/language/phasers#INIT>.
=head2 Profile with C<prof-m: your code goes here>
Enter C<prof-m: your code goes here> in the L<#perl6 channel|/language/glossary#IRC>
to invoke a Perl 6 compiler with a C<--profile> option.
The evalbot's output includes a link to L<profile info|https://en.wikipedia.org/wiki/Profiling_(computer_programming)>:
=begin code :allow< L >
yournick prof-m: say 'hello world'
camelia prof-m 273e89: OUTPUT«hello world...»
.. Prof: L<http://p.p6c.org/20f9e25>
=end code
Click on the profile info link to see a profile for C<say 'hello world'>.
To learn how to interpret the profile info, ask questions on channel.
=head2 Profile locally
When using the L<MoarVM|http://moarvm.org> backend the L<Rakudo|http://rakudo.org> compiler's C<--profile>
command line option writes profile information as an HTML file. However, if the profile is too big it can
be slow to open in a browser. In that case, if you use the C<--profile-filename=file.extension> option with
an extension of C<.json>, you can use the L<Qt viewer|https://github.com/tadzik/p6profiler-qt> on the resulting
JSON file.
Another option (especially useful for profiles too big even for the Qt viewer) is to use an extension of C<.sql>.
This will write the profile data as a series of SQL statements, suitable for opening in SQLite.
# create a profile
perl6 --profile --profile-filename=demo.sql -e 'say (^20).combinations(3).elems'
# create a SQLite database
sqlite3 demo.sql
# load the profile data
sqlite> .read demo.sql
# the query below is equivalent to the default view of the "Routines" tab in the HTML file
sqlite> select
...> case when r.name = "" then "<anon>" else r.name end,
...> r.file,
...> r.line,
...> sum(entries) as entries,
...> sum(case when rec_depth = 0 then inclusive_time else 0 end) as inclusive_time,
...> sum(exclusive_time) as exclusive_time
...> from
...> callees c,
...> routines r
...> where
...> c.id = r.id
...> group by
...> c.id
...> order by
...> inclusive_time desc;
To learn how to interpret the profile info, use the C<prof-m: your code goes here> evalbot (explained
above) and ask questions on channel.
=head2 Profile compiling
The Rakudo compiler's C<--profile-compile> option profiles the time and memory used to compile code.
=head2 Create or view benchmarks
Use L<perl6-bench|https://github.com/japhb/perl6-bench>.
If you run perl6-bench for multiple compilers (typically versions of Perl 5, Perl 6, or NQP)
then results for each are visually overlaid on the same graphs to provide for quick and easy comparison.
=head2 Share problems
Once you've used the above techniques to pinpoint code and performance that really matters you're in
a good place to share problems, one at a time:
=item For each problem you see, distill it down to a one-liner or short public gist of code that
either already includes performance numbers or is small enough that it can be profiled using
C<prof-m: your code or gist URL goes here>.
=item Think about the minimum speed increase (or ram reduction or whatever) you need/want.
What if it took a month for folk to help you achieve that? A year?
=item Let folk know if your Perl 6 use-case is in a production setting or just for fun.
=head1 Solve problems
This bears repeating: B<make sure you're not wasting time on the wrong code>.
Start by identifying the L<"critical 3%"|https://en.wikiquote.org/wiki/Donald_Knuth> of your code.
=head2 Line by line
A quick, fun, productive way to try improve code line-by-line is to collaborate with
others using the L<#perl6|/language/glossary#IRC> evalbot
L<camelia|/language/glossary#camelia>.
=head2 Routine by routine
With multidispatch you can drop in new variants of routines "alongside" existing ones:
# existing code generically matches a two arg foo call:
multi sub foo(Any $a, Any $b) { ... }
# new variant takes over for a foo("quux", 42) call:
multi sub foo("quux", Int $b) { ... }
The call overhead of having multiple C<foo> definitions is generally insignificant (though see discussion
of C<where> below), so if your new definition handles its particular case more quickly/leanly than the
previously existing set of definitions then you probably just made your code that much faster/leaner for that case.
=head2 Speed up type-checks and call resolution
Most L<C<where> clauses|/type/Signature#Type_Constraints> – and thus most
L<subsets|https://design.perl6.org/S12.html#Types_and_Subtypes> – force dynamic (run-time)
type checking and call resolution for any call it I<might> match. This is slower, or at least later,
than compile-time.
Method calls are generally resolved as late as possible, so dynamically, at run-time,
whereas sub calls are generally resolvable statically, at compile-time.
=head2 Choose better algorithms
One of the most reliable techniques for making large performance improvements regardless of language or compiler
is to pick an algorithm better suited to your needs.
A classic example is L<Boyer-Moore|https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm>.
To match a small string in a large string, one obvious way to do it is to compare the first character of the
two strings and then, if they match, compare the second characters, or, if they don't match, compare the first
character of the small string with the second character in the large string, and so on. In contrast, the
Boyer-Moore algorithm starts by comparing the *last* character of the small string with the correspondingly
positioned character in the large string. For most strings the Boyer-Moore algorithm is close to N times
faster algorithmically, where N is the length of the small string.
The next couple sections discuss two broad categories for algorithmic improvement that are especially
easy to accomplish in Perl 6. For more on this general topic, read the wikipedia page on
L<algorithmic efficiency|https://en.wikipedia.org/wiki/Algorithmic_efficiency>,
especially the See also section near the end.
=head3 Change sequential/blocking code to parallel/non-blocking
This is another very important class of algorithmic improvement.
See the slides for
L<Parallelism, Concurrency, and Asynchrony in Perl 6|http://jnthn.net/papers/2015-yapcasia-concurrency.pdf#page=17>
and/or L<the matching video|https://www.youtube.com/watch?v=JpqnNCx7wVY&list=PLRuESFRW2Fa77XObvk7-BYVFwobZHdXdK&index=8>.
=head2 Use existing high performance code
Is there an existing high (enough) performance implementation of what you're trying to speed up / slim down?
There are a lot of C libs out there.
L<NativeCall|/language/nativecall> makes it easy to create wrappers for C libs (there's experimental
support for C++ libs too) such as L<Gumbo|https://github.com/Skarsnik/perl6-gumbo>.
(Data marshalling and call handling is somewhat poorly optimized at the time of writing this but for many
applications that won't matter.)
Perl 5's compiler can be treated as a C lib. Mix in Perl 6 types, the L<MOP|/language/mop>, and some hairy
programming that someone else has done for you, and the upshot is that you can conveniently
L<use Perl 5 modules in Perl 6|http://stackoverflow.com/a/27206428/1077672>.
More generally, Perl 6 is designed for smooth interop with other languages and there are a number
of L<modules aimed at providing convenient use of libs from other langs|https://modules.perl6.org/#q=inline>.
=head2 Make the Rakudo compiler generate faster code
The focus to date (Feb 2016) regarding the compiler has been correctness, not how fast it generates code or,
more importantly, how fast or lean the code it generates runs. But that's expected to change somewhat this
year and beyond. You can talk to compiler devs on the freenode IRC channels #perl6 and #moarvm about what
to expect. Better still you can contribute yourself:
=item Rakudo is largely written in Perl 6. So if you can write Perl 6, then you can hack on the compiler,
including optimizing any of the large body of existing high-level code that impacts the speed of your code
(and everyone else's).
=item Most of the rest of the compiler is written in a small language called
L<NQP|https://github.com/perl6/nqp> that's basically a subset of Perl 6.
If you can write Perl 6 you can fairly easily learn to use and improve the mid-level NQP code too,
at least from a pure language point of view. To dig in to NQP and Rakudo's guts, start with
L<NQP and internals course|http://edumentab.github.io/rakudo-and-nqp-internals-course/>.
=item If low-level C hacking is your idea of fun, checkout L<MoarVM|http://moarvm.org> and visit the
freenode IRC channel #moarvm (L<logs|https://irclog.perlgeek.de/moarvm/>).
=head2 Still need more ideas?
There are endless performance topics.
Some known current Rakudo performance weaknesses not yet covered in this page include use of gather/take,
use of junctions, regexes, and string handling in general.
If you think some topic needs more coverage on this page please submit a PR or tell someone your idea.
Thanks. :)
=head1 Not getting the results you need/want?
If you've tried everything on this page to no avail, please consider discussing things with a
compiler dev on #perl6 so we can learn from your use-case and what you've found out about it so far.
Once you know one of the main devs knows of your plight, allow enough time for an informed response
(a few days or weeks depending on the exact nature of your problem and potential solutions).
If I<that> hasn't worked out either, please consider filing an issue discussing your experience at
L<our user experience repo|https://github.com/perl6/user-experience/issues> before moving on. Thanks. :)
=end pod