forked from gnoack/fn
-
Notifications
You must be signed in to change notification settings - Fork 0
/
PERFORMANCE
100 lines (78 loc) · 4.19 KB
/
PERFORMANCE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
A little profiling with time ./repl -t.
Sunday:
13.645 real -- unoptimized
13.363 real -- when methods are stored in dictionaries
13.295 real -- when in compilation, labels are stored in dictionaries
13.183 real -- after removing maps.fn
13.194 real -- after switching to open addressing.. (hah!)
2.196 real -- after switching to dframes for frames and dict for global env.
Monday:
1.217 real -- after switching to dictionaries for macro expanders and
deleting a couple of unused modules (and tests)
1.069 real -- after destructuring dynamic lambda lists in an optimized way.
The above measurements are all without gcc optimizations. With -O1,
the tests run well under 2 seconds, but with -O2 the interpreter
segfaults. Maybe I should git bisect on that to see how I introduced
it.
[ Cheated: 0.400 real -- after fixing the -O2 issue. ]
0.790 real -- after fixing value_eq to be a lot quicker.
Oct 13, we're back to 1.012 -- why?
A little more profiling with
(dolist (fn meeeh) (with-timer (compile-fn fn)))
Compile time for: (make-assembler, make-byte-stream, assemble)
Oct 17, with push and pop: (579, 521, 375)
Oct 17, without push after reads: (516, 454, 327)
Oct 17, without push after loads: (495, 434, 304)
Oct 17, without pop/push around writes: (495, 430, 301) -- little improvement.
Oct 17, without push after calls. (465, 403, 281)
Oct 17, without push after make-lambda: (451, 393, 273)
Oct 17, without pop before jump-if-true: (439, 370, 270)
Oct 17, with inlined compile-sequence: (415, 346, 253)
Oct 17, without pop before return: (406, 339, 247)
Oct 17, after random refactorings: (350, 337, 244) -- make-a. got shorter.
Oct 17, after quick-calling simple procs (292, 279, 202) -- NB (105, 99, 72) with gcc -O3 :)
Oct 17, after disabling interpreter debug mode (286, 273, 199) -- Oops.
Oct 17, after flagging out unneeded checks (263, 252, 183)
NB -- when disabling the checks in mem_get, mem_set, you go directly to (149, 140, 102)
-- we are essentially wasting (114, 112, 81) on bounds checks.
-- these checks are there for safety, it will be tricky to get rid of them.
To be continued...
Performance measured by:
(load-file "examples/grammar-utils.fn")
(with-timer (load-grammar! "examples/lisp.g"))
2013-06-05
With retptrs on stack: 2612-2689ms
With interpreter_state_t as cache of frame: 2712-2830ms
After flattening out code tuples in procedures: 2168-2197ms
After using MEM_GET and MEM_SET macros in some places: 1738-1806ms
After using MEM_GET and MEM_SET macros in even more places: 1742-1773ms
After rearranging fields of compiled-procedure: 1735-1792ms (no improvement...)
After using even more MEM_GETs and MEM_SETs: 1640-1718
After using MEM_GET instead of MEM_GET in is_cons(): 1440-1471
After using native memcpy, memcmp: 1250-1273
2013-09-14
After switching to per-frame stacks: 1543-1545
2013-11-20
Before switching to “char* argv, int argc”-style native functions: 1319-1325
After switching to “char* argv, int argc”-style native functions: 1088-1099
2014-11-02 - 2014-11-04 (measurement on ARM, so quite a bit slower :))
Measured by loading Lisp grammar: ./fn -S examples/performance.fn
Before switching to <DefinedVar/UndefinedVar>: 2978-3047 (2014-11-02)
After switching to <DefinedVar/UndefinedVar>: 2092-2150 (2014-11-04)
Neat. :) This is cutting roughly a third of the time again. :)
PS: With -O3, this becomes 1182-1331
2014-12-10 (ARM)
Measured by loading Lisp grammar: ./fn -S examples/performance.fn
After introducing various structs, allowing to skip runtime CHECKs,
after introducing the C-based bytecode compiler: 1795-1939
2014-12-10 (ARM)
Loading Lisp grammars now uses the C-based bytecode compiler;
./fn -S examples/performance.fn is now at 899-921
2014-12-14 (ARM)
./fn -S examples/performance.fn is now at 936-983
after introducing a bytecode for mem-get and mem-set:
./fn -S examples/performance.fn is now at 898-918
2015-01-03 (ARM)
Various changes; in particular: Removed nested lambda-lists,
and made varargs calls not rely on lambda-list destructuring any more.
./fn -S examples/performance.fn is now at 903-933