forked from dmitryvk/sbcl-win32-threads
/
OPTIMIZATIONS
205 lines (179 loc) · 6.76 KB
/
OPTIMIZATIONS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
#1
(defun mysl (s)
(declare (simple-string s))
(declare (optimize (speed 3) (safety 0) (debug 0)))
(let ((c 0))
(declare (fixnum c))
(dotimes (i (length s))
(when (eql (aref s i) #\1)
(incf c)))
c))
* On X86 I is represented as a tagged integer.
* Unnecessary move:
3: SLOT S!11[EDX] {SB-C::VECTOR-LENGTH 1 7} => t23[EAX]
4: MOVE t23[EAX] => t24[EBX]
--------------------------------------------------------------------------------
#2
(defun quux (v)
(declare (optimize (speed 3) (safety 0) (space 2) (debug 0)))
(declare (type (simple-array double-float 1) v))
(let ((s 0d0))
(declare (type double-float s))
(dotimes (i (length v))
(setq s (+ s (aref v i))))
s))
* Python does not combine + with AREF, so generates extra move and
allocates a register.
* On X86 Python thinks that all FP registers are directly accessible
and emits costy MOVE ... => FR1.
--------------------------------------------------------------------------------
#3
(defun bar (n)
(declare (optimize (speed 3) (safety 0) (space 2))
(type fixnum n))
(let ((v (make-list n)))
(setq v (make-array n))
(length v)))
* IR1 does not optimize away (MAKE-LIST N).
--------------------------------------------------------------------------------
#4
(defun bar (v1 v2)
(declare (optimize (speed 3) (safety 0) (space 2))
(type (simple-array base-char 1) v1 v2))
(dotimes (i (length v1))
(setf (aref v2 i) (aref v1 i))))
VOP DATA-VECTOR-SET/SIMPLE-STRING V2!14[EDI] t32[EAX] t30[S2]>t33[CL]
=> t34[S2]<t35[AL]
MOV #<TN t33[CL]>, #<TN t30[S2]>
MOV BYTE PTR [EDI+EAX+1], #<TN t33[CL]>
MOV #<TN t35[AL]>, #<TN t33[CL]>
MOV #<TN t34[S2]>, #<TN t35[AL]>
* The value of DATA-VECTOR-SET is not used, so there is no need in the
last two moves.
* And why two moves?
--------------------------------------------------------------------------------
#8
(defun foo (d)
(declare (optimize (speed 3) (safety 0) (debug 0)))
(declare (type (double-float 0d0 1d0) d))
(loop for i fixnum from 1 to 5
for x1 double-float = (sin d) ;;; !!!
do (loop for j fixnum from 1 to 4
sum x1 double-float)))
Without the marked declaration Python will use boxed representation for X1.
This is equivalent to
(let ((x nil))
(setq x 0d0)
;; use of X as DOUBLE-FLOAT
)
The initial binding is effectless, and without it X is of type
DOUBLE-FLOAT. Unhopefully, IR1 does not optimize away effectless
SETs/bindings, and IR2 does not perform type inference.
--------------------------------------------------------------------------------
#9 "Multi-path constant folding"
(defun foo (x)
(if (= (cond ((irgh x) 0)
((buh x) 1)
(t 2))
0)
:yes
:no))
This code could be optimized to
(defun foo (x)
(cond ((irgh x) :yes)
((buh x) :no)
(t :no)))
--------------------------------------------------------------------------------
#11
(inverted variant of #9)
(lambda (x)
(let ((y (sap-alien x c-string)))
(list (alien-sap y)
(alien-sap y))))
It could be optimized to
(lambda (x) (list x x))
(if Y were used only once, the current compiler would optimize it)
--------------------------------------------------------------------------------
#12
(typep (truly-the (simple-array * (*)) x) 'simple-vector)
tests lowtag.
--------------------------------------------------------------------------------
#13
FAST-+/FIXNUM and similar should accept unboxed arguments in interests
of representation selection. Problem: inter-TN dependencies.
--------------------------------------------------------------------------------
#14
The derived type of (/ (THE (DOUBLE-FLOAT (0D0)) X) (THE (DOUBLE-FLOAT
1D0) Y)) is (DOUBLE-FLOAT 0.0d0). While it might be reasonable, it is
better to derive (OR (MEMBER 0.0d0) (DOUBLE-FLOAT (0.0d0))).
--------------------------------------------------------------------------------
#15
On the alpha, the system is reluctant to refer directly to a constant bignum,
preferring to load a large constant through a slow sequence of instructions,
then cons up a bignum for it:
(LAMBDA (A)
(DECLARE (OPTIMIZE (SAFETY 1) (SPEED 3) (DEBUG 1))
(TYPE (INTEGER -10000 10000) A)
(IGNORABLE A))
(CASE A
((89 125 16) (ASH A (MIN 18 -706)))
(T (DPB -3 (BYTE 30 30) -1))))
--------------------------------------------------------------------------------
#16
(do ((i 0 (1+ i)))
((= i (the (integer 0 100) n)))
...)
It is commonly expected for Python to derive (FIXNUMP I). (If ``='' is
replaced with ``>='', Python will do.)
--------------------------------------------------------------------------------
#17
Type tests for (ARRAY BIT), (ARRAY T) and similar go through full
%TYPEP, even though it is relatively simple to establish the arrayness
of an object and also to obtain the element type of an array. As of
sbcl-0.8.12.30, this affects at least DUMP-OBJECT through
COMPOUND-OBJECT-P, and (LABELS MAYBE-EMIT-MAKE-LOAD-FORMS GROVEL)
through TYPEP UNBOXED-ARRAY, within the compiler itself.
--------------------------------------------------------------------------------
#18
(lambda (x) (declare (null x)) (sxhash x)) goes through SYMBOL-HASH
rather than either constant-folding or manipulating NIL-VALUE or
NULL-TN directly.
--------------------------------------------------------------------------------
#19
(let ((dx (if (foo)
(list x)
(list y z))))
(declare (dynamic-extent dx))
...)
DX is not allocated on stack.
--------------------------------------------------------------------------------
#20
(defun-with-dx foo (x)
(flet ((make (x)
(let ((l (list nil nil)))
(setf (first l) x)
(setf (second l) (1- x))
l)))
(let ((l (make x)))
(declare (dynamic-extent l))
(mapc #'print l))))
Result of MAKE is not stack allocated, which means that
stack-allocation of structures is impossible.
--------------------------------------------------------------------------------
#21
(defun-with-dx foo ()
(let ((dx (list (list 1 2) (list 3 4)
(declare (dynamic-extent dx))
...)))))
External list in DX is allocated on stack, but internal are not.
--------------------------------------------------------------------------------
#22
IR2 does not perform unused code flushing.
--------------------------------------------------------------------------------
#23
Python does not know that &REST lists are LISTs (and cannot derive it).
--------------------------------------------------------------------------------
#24
a. Iterations on &REST lists, returning them as VALUES could be
rewritten with &MORE vectors.
b. Implement local unknown-values mv-call (useful for fast type checking).