Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

180 lines (154 sloc) 5.824 kb
#1
(defun mysl (s)
(declare (simple-string s))
(declare (optimize (speed 3) (safety 0) (debug 0)))
(let ((c 0))
(declare (fixnum c))
(dotimes (i (length s))
(when (eql (aref s i) #\1)
(incf c)))
c))
* On X86 I is represented as a tagged integer.
* Unnecessary move:
3: SLOT S!11[EDX] {SB-C::VECTOR-LENGTH 1 7} => t23[EAX]
4: MOVE t23[EAX] => t24[EBX]
--------------------------------------------------------------------------------
#2
(defun quux (v)
(declare (optimize (speed 3) (safety 0) (space 2) (debug 0)))
(declare (type (simple-array double-float 1) v))
(let ((s 0d0))
(declare (type double-float s))
(dotimes (i (length v))
(setq s (+ s (aref v i))))
s))
* Python does not combine + with AREF, so generates extra move and
allocates a register.
* On X86 Python thinks that all FP registers are directly accessible
and emits costy MOVE ... => FR1.
--------------------------------------------------------------------------------
#3
(defun bar (n)
(declare (optimize (speed 3) (safety 0) (space 2))
(type fixnum n))
(let ((v (make-list n)))
(setq v (make-array n))
(length v)))
* IR1 does not optimize away (MAKE-LIST N).
--------------------------------------------------------------------------------
#4
(defun bar (v1 v2)
(declare (optimize (speed 3) (safety 0) (space 2))
(type (simple-array base-char 1) v1 v2))
(dotimes (i (length v1))
(setf (aref v2 i) (aref v1 i))))
VOP DATA-VECTOR-SET/SIMPLE-STRING V2!14[EDI] t32[EAX] t30[S2]>t33[CL]
=> t34[S2]<t35[AL]
MOV #<TN t33[CL]>, #<TN t30[S2]>
MOV BYTE PTR [EDI+EAX+1], #<TN t33[CL]>
MOV #<TN t35[AL]>, #<TN t33[CL]>
MOV #<TN t34[S2]>, #<TN t35[AL]>
* The value of DATA-VECTOR-SET is not used, so there is no need in the
last two moves.
* And why two moves?
--------------------------------------------------------------------------------
#6
09:49:05 <jtra> I have found a case in those where suboptimal code is
generate with nested loops, it might be moderately easy to fix that
09:49:28 <jtra> see
http://www.bagley.org/~doug/shootout/bench/nestedloop/nestedloop.cmucl
09:50:30 <jtra> if you add declarations to dotimes, generated code is
almost optimal, but most inner loops run out of registers and use
memory location for iteration variable
;;; -*- mode: lisp -*-
;;; http://www.bagley.org/~doug/shootout/
;;; from Friedrich Dominicus
(defun main ()
(let ((n (parse-integer (or (car (last extensions:*command-line-strings*)) "1")))
(x 0))
(declare (fixnum n)
(fixnum x)
(optimize (speed 3) (debug 0) (safety 0)))
(dotimes (a n)
(dotimes (b n)
(dotimes (c n)
(dotimes (d n)
(dotimes (e n)
(dotimes (f n)
(incf x)))))))
(format t "~A~%" x)))
--------------------------------------------------------------------------------
#8
(defun foo (d)
(declare (optimize (speed 3) (safety 0) (debug 0)))
(declare (type (double-float 0d0 1d0) d))
(loop for i fixnum from 1 to 5
for x1 double-float = (sin d) ;;; !!!
do (loop for j fixnum from 1 to 4
sum x1 double-float)))
Without the marked declaration Python will use boxed representation for X1.
This is equivalent to
(let ((x nil))
(setq x 0d0)
;; use of X as DOUBLE-FLOAT
)
The initial binding is effectless, and without it X is of type
DOUBLE-FLOAT. Unhopefully, IR1 does not optimize away effectless
SETs/bindings, and IR2 does not perform type inference.
--------------------------------------------------------------------------------
#9 "Multi-path constant folding"
(defun foo (x)
(if (= (cond ((irgh x) 0)
((buh x) 1)
(t 2))
0)
:yes
:no))
This code could be optimized to
(defun foo (x)
(cond ((irgh x) :yes)
((buh x) :no)
(t :no)))
--------------------------------------------------------------------------------
#11
(inverted variant of #9)
(lambda (x)
(let ((y (sap-alien x c-string)))
(list (alien-sap y)
(alien-sap y))))
It could be optimized to
(lambda (x) (list x x))
(if Y were used only once, the current compiler would optimize it)
--------------------------------------------------------------------------------
#12
(typep (truly-the (simple-array * (*)) x) 'simple-vector)
tests lowtag.
--------------------------------------------------------------------------------
#13
FAST-+/FIXNUM and similar should accept unboxed arguments in interests
of representation selection. Problem: inter-TN dependencies.
--------------------------------------------------------------------------------
#14
The derived type of (/ (THE (DOUBLE-FLOAT (0D0)) X) (THE (DOUBLE-FLOAT
1D0) Y)) is (DOUBLE-FLOAT 0.0d0). While it might be reasonable, it is
better to derive (OR (MEMBER 0.0d0) (DOUBLE-FLOAT (0.0d0))).
--------------------------------------------------------------------------------
#15
On the alpha, the system is reluctant to refer directly to a constant bignum,
preferring to load a large constant through a slow sequence of instructions,
then cons up a bignum for it:
(LAMBDA (A)
(DECLARE (OPTIMIZE (SAFETY 1) (SPEED 3) (DEBUG 1))
(TYPE (INTEGER -10000 10000) A)
(IGNORABLE A))
(CASE A
((89 125 16) (ASH A (MIN 18 -706)))
(T (DPB -3 (BYTE 30 30) -1))))
--------------------------------------------------------------------------------
#16
(do ((i 0 (1+ i)))
((= i (the (integer 0 100) n)))
...)
It is commonly expected for Python to derive (FIXNUMP I). (If ``='' is
replaced with ``>='', Python will do.)
--------------------------------------------------------------------------------
Jump to Line
Something went wrong with that request. Please try again.