Skip to content

digikar99/cl-cblas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cl-cblas

C2FFI / cl-autowrap based wrapper for CBLAS.

Recommended installation: OpenBLAS, which should also be provided with your package manager. See specs/cblas.h for the API (taken from netlib).

As opposed to the FORTRAN blas bindings, cblas provide C bindings, and these can be easier to work with given (i) a LAYOUT parameter for functions operating on matrices allowing for both row-major or column-major matrices (ii) the absence of WORK parameters in several high level functions.

In addition, the cl-autowrap generated bindings expect pointer arguments which translate naturally to displaced arrays which both numcl and dense-numericals rely on.

Other Solutions

CBLAS is only especially useful for small sized arrays (10-100 sized) when the overhead of runtime dispatch or function calls is comparable to the cost of computation itself. For larger arrays, some of the following well-established libraries should be sufficient.

clml

clml also ships with BLAS bindings, but these can introduce a fair amount of code bloat even after inlining, as is evident through the following disassembly:

CL-USER> (declaim (inline array-storage)
                  (ftype (function (cl:array) (cl:simple-array * 1))))
(defun array-storage (array)
  (declare (ignorable array)
           (optimize speed))
  (loop :with array := array
        :do (locally (declare #+sbcl (sb-ext:muffle-conditions sb-ext:compiler-note))
              (typecase array
                ((cl:simple-array * (*)) (return array))
                (cl:simple-array (return #+sbcl (sb-ext:array-storage-vector array)
                                         #+ccl (ccl::%array-header-data-and-offset array)
                                         #-(or sbcl ccl)
                                         (error "Don't know how to obtain ARRAY-STORAGE on ~S"
                                                (lisp-implementation-type))))
                (t (setq array (cl:array-displacement array)))))))
ARRAY-STORAGE
CL-USER> (disassemble (lambda (x)
                        (declare (optimize speed)
                                 (type (array double-float 1) x))
                        (cffi:with-pointer-to-vector-data (ptrx (array-storage x))
                          (cblas:dasum (array-total-size x) ptrx 1))))
; disassembly for (LAMBDA (X))
; Size: 324 bytes. Origin: #x53E3D304                         ; (LAMBDA (X))
; 304:       488BD6           MOV RDX, RSI
; 307:       EB42             JMP L3
; 309:       0F1F8000000000   NOP
; 310: L0:   4C8D72F1         LEA R14, [RDX-15]
; 314:       41F6C60F         TEST R14B, 15
; 318:       750B             JNE L1
; 31A:       458A36           MOV R14B, [R14]
; 31D:       4180EE81         SUB R14B, -127
; 321:       4180FE65         CMP R14B, 101
; 325: L1:   0F82E0000000     JB L15
; 32B:       80FA17           CMP DL, 23
; 32E:       0F84CF000000     JEQ L14
; 334:       488B4A19         MOV RCX, [RDX+25]
; 338:       4881F917011050   CMP RCX, #x50100117             ; NIL
; 33F:       0F85B5000000     JNE L13
; 345:       488BC1           MOV RAX, RCX
; 348: L2:   488BD0           MOV RDX, RAX
; 34B: L3:   4C8D72F1         LEA R14, [RDX-15]
; 34F:       41F6C60F         TEST R14B, 15
; 353:       750B             JNE L4
; 355:       458A36           MOV R14B, [R14]
; 358:       4180EE85         SUB R14B, -123
; 35C:       4180FE61         CMP R14B, 97
; 360: L4:   73AE             JNB L0
; 362:       488BDA           MOV RBX, RDX
; 365: L5:   448B73F1         MOV R14D, [RBX-15]
; 369:       4180EE8D         SUB R14B, -115
; 36D:       4180FE58         CMP R14B, 88
; 371:       0F8780000000     JNBE L12
; 377:       488D4B01         LEA RCX, [RBX+1]
; 37B:       448B76F1         MOV R14D, [RSI-15]
; 37F:       4180FE81         CMP R14B, -127
; 383:       7406             JEQ L6
; 385:       4180FEE9         CMP R14B, -23
; 389:       7263             JB L11
; 38B: L6:   488B4629         MOV RAX, [RSI+41]
; 38F:       48D1F8           SAR RAX, 1
; 392: L7:   4C63F0           MOVSX R14, EAX
; 395:       4939C6           CMP R14, RAX
; 398:       7551             JNE L10
; 39A:       4C8BF4           MOV R14, RSP
; 39D:       4883E4F0         AND RSP, -16
; 3A1:       488BF8           MOV RDI, RAX
; 3A4:       488BF1           MOV RSI, RCX
; 3A7:       BA01000000       MOV EDX, 1
; 3AC:       31C0             XOR EAX, EAX
; 3AE:       FF142548220050   CALL QWORD PTR [#x50002248]     ; cblas_dasum
; 3B5:       498BE6           MOV RSP, R14
; 3B8:       4D896D28         MOV [R13+40], R13               ; thread.pseudo-atomic-bits
; 3BC:       498B5570         MOV RDX, [R13+112]              ; thread.mixed-tlab
; 3C0:       4883C210         ADD RDX, 16
; 3C4:       493B5578         CMP RDX, [R13+120]
; 3C8:       7771             JNBE L17
; 3CA:       49895570         MOV [R13+112], RDX              ; thread.mixed-tlab
; 3CE:       4883C2FF         ADD RDX, -1
; 3D2: L8:   66C742F11D01     MOV WORD PTR [RDX-15], 285
; 3D8:       4D316D28         XOR [R13+40], R13               ; thread.pseudo-atomic-bits
; 3DC:       7402             JEQ L9
; 3DE:       CC09             INT3 9                          ; pending interrupt trap
; 3E0: L9:   F20F1142F9       MOVSD [RDX-7], XMM0
; 3E5:       488BE5           MOV RSP, RBP
; 3E8:       F8               CLC
; 3E9:       5D               POP RBP
; 3EA:       C3               RET
; 3EB: L10:  CC63             INT3 99                         ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 3ED:       02               BYTE #X02                       ; RAX(s)
; 3EE: L11:  488B46F9         MOV RAX, [RSI-7]
; 3F2:       48D1F8           SAR RAX, 1
; 3F5:       EB9B             JMP L7
; 3F7: L12:  CC49             INT3 73                         ; OBJECT-NOT-SIMPLE-SPECIALIZED-VECTOR-ERROR
; 3F9:       0C               BYTE #X0C                       ; RBX(d)
; 3FA: L13:  488B4209         MOV RAX, [RDX+9]
; 3FE:       E945FFFFFF       JMP L2
; 403: L14:  B817011050       MOV EAX, #x50100117             ; NIL
; 408:       CC59             INT3 89                         ; OBJECT-NOT-ARRAY-ERROR
; 40A:       00               BYTE #X00                       ; RAX(d)
; 40B: L15:  488975F8         MOV [RBP-8], RSI
; 40F:       4883EC10         SUB RSP, 16
; 413:       B902000000       MOV ECX, 2
; 418:       48892C24         MOV [RSP], RBP
; 41C:       488BEC           MOV RBP, RSP
; 41F:       B8E24E3650       MOV EAX, #x50364EE2             ; #<FDEFN ARRAY-STORAGE-VECTOR>
; 424:       FFD0             CALL RAX
; 426:       488B75F8         MOV RSI, [RBP-8]
; 42A:       488BDA           MOV RBX, RDX
; 42D:       E933FFFFFF       JMP L5
; 432: L16:  FF24256800A052   JMP QWORD PTR [#x52A00068]      ; SB-VM::ALLOC-TRAMP
; 439:       CC10             INT3 16                         ; Invalid argument count trap
; 43B: L17:  6A10             PUSH 16
; 43D:       E8F0FFFFFF       CALL L16
; 442:       5A               POP RDX
; 443:       80CA0F           OR DL, 15
; 446:       EB8A             JMP L8
NIL
CL-USER> (disassemble (lambda (x)
                        (declare (optimize speed)
                                 (type (simple-array double-float 1) x))
                        (clml.blas:dasum (array-total-size x) x 1)))
; disassembly for (LAMBDA (X))
; Size: 990 bytes. Origin: #x53BB6FE4                         ; (LAMBDA (X))
; 6FE4:       4C8B5AF9         MOV R11, [RDX-7]
; 6FE8:       498BC3           MOV RAX, R11
; 6FEB:       48D1F8           SAR RAX, 1
; 6FEE:       4C63C0           MOVSX R8, EAX
; 6FF1:       4939C0           CMP R8, RAX
; 6FF4:       0F858A030000     JNE L25
; 6FFA:       4C895DF0         MOV [RBP-16], R11
; 6FFE:       4D8BF3           MOV R14, R11
; 7001:       4C8975F8         MOV [RBP-8], R14
; 7005:       4883EC10         SUB RSP, 16
; 7009:       B902000000       MOV ECX, 2
; 700E:       48892C24         MOV [RSP], RBP
; 7012:       488BEC           MOV RBP, RSP
; 7015:       B8C2BD4750       MOV EAX, #x5047BDC2            ; #<FDEFN F2CL-LIB::FIND-ARRAY-DATA>
; 701A:       FFD0             CALL RAX
; 701C:       7208             JB L0
; 701E:       BF17011050       MOV EDI, #x50100117            ; NIL
; 7023:       488BDC           MOV RBX, RSP
; 7026: L0:   488BE3           MOV RSP, RBX
; 7029:       4C8B5DF0         MOV R11, [RBP-16]
; 702D:       4C8B75F8         MOV R14, [RBP-8]
; 7031:       488BDA           MOV RBX, RDX
; 7034:       488BF7           MOV RSI, RDI
; 7037:       4C8D43F1         LEA R8, [RBX-15]
; 703B:       41F6C00F         TEST R8B, 15
; 703F:       7504             JNE L1
; 7041:       418038D5         CMP BYTE PTR [R8], -43
; 7045: L1:   0F8536030000     JNE L24
; 704B:       4C8BC6           MOV R8, RSI
; 704E:       49D1F8           SAR R8, 1
; 7051:       4D63C8           MOVSX R9, R8D
; 7054:       4D39C1           CMP R9, R8
; 7057:       7504             JNE L2
; 7059:       40F6C601         TEST SIL, 1
; 705D: L2:   0F851B030000     JNE L23
; 7063:       31D2             XOR EDX, EDX
; 7065:       4531D2           XOR R10D, R10D
; 7068:       31FF             XOR EDI, EDI
; 706A:       660F57C9         XORPD XMM1, XMM1
; 706E:       488B1513FFFFFF   MOV RDX, [RIP-237]             ; 0.0
; 7075:       660F57C9         XORPD XMM1, XMM1
; 7079:       4D85DB           TEST R11, R11
; 707C:       0F8E28020000     JLE L9
; 7082:       48B900000000ABAAAA2A MOV RCX, 3074457347049914368
; 708C:       498BC3           MOV RAX, R11
; 708F:       48F7E1           MUL RAX, RCX
; 7092:       4883E2FE         AND RDX, -2
; 7096:       486BD206         IMUL RDX, RDX, 6
; 709A:       498BC3           MOV RAX, R11
; 709D:       4829D0           SUB RAX, RDX
; 70A0:       4C8BD0           MOV R10, RAX
; 70A3:       4585D2           TEST R10D, R10D
; 70A6:       0F8550020000     JNE L18
; 70AC: L3:   498D4202         LEA RAX, [R10+2]
; 70B0:       488BF8           MOV RDI, RAX
; 70B3:       498BC6           MOV RAX, R14
; 70B6:       4829F8           SUB RAX, RDI
; 70B9:       4883C00C         ADD RAX, 12
; 70BD:       488BC8           MOV RCX, RAX
; 70C0:       48D1F9           SAR RCX, 1
; 70C3:       4C63C1           MOVSX R8, ECX
; 70C6:       4939C8           CMP R8, RCX
; 70C9:       0F8527020000     JNE L17
; 70CF:       48D1F8           SAR RAX, 1
; 70D2:       48B900000000ABAAAA2A MOV RCX, 3074457347049914368
; 70DC:       48F7E9           IMUL RCX
; 70DF:       48D1E2           SHL RDX, 1
; 70E2:       488BC2           MOV RAX, RDX
; 70E5:       85D2             TEST EDX, EDX
; 70E7:       B900000000       MOV ECX, 0
; 70EC:       480F4FC8         CMOVNLE RCX, RAX
; 70F0:       4C8BC9           MOV R9, RCX
; 70F3:       488BD7           MOV RDX, RDI
; 70F6:       E975010000       JMP L5
; 70FB:       0F1F440000       NOP
; 7100: L4:   488D42FE         LEA RAX, [RDX-2]
; 7104:       488D3C06         LEA RDI, [RSI+RAX]
; 7108:       483B7BF9         CMP RDI, [RBX-7]
; 710C:       0F8384020000     JNB L27
; 7112:       F20F1054BB01     MOVSD XMM2, [RBX+RDI*4+1]
; 7118:       660F541580FEFFFF ANDPD XMM2, [RIP-384]          ; [#x53BB6FA0]
; 7120:       F20F58D1         ADDSD XMM2, XMM1
; 7124:       488D4202         LEA RAX, [RDX+2]
; 7128:       488BC8           MOV RCX, RAX
; 712B:       48D1F9           SAR RCX, 1
; 712E:       4C63C1           MOVSX R8, ECX
; 7131:       4939C8           CMP R8, RCX
; 7134:       0F85B6010000     JNE L16
; 713A:       4883C0FE         ADD RAX, -2
; 713E:       488D3C06         LEA RDI, [RSI+RAX]
; 7142:       483B7BF9         CMP RDI, [RBX-7]
; 7146:       0F834E020000     JNB L28
; 714C:       F20F105CBB01     MOVSD XMM3, [RBX+RDI*4+1]
; 7152:       660F541D46FEFFFF ANDPD XMM3, [RIP-442]          ; [#x53BB6FA0]
; 715A:       F20F58DA         ADDSD XMM3, XMM2
; 715E:       488D4204         LEA RAX, [RDX+4]
; 7162:       488BC8           MOV RCX, RAX
; 7165:       48D1F9           SAR RCX, 1
; 7168:       4C63C1           MOVSX R8, ECX
; 716B:       4939C8           CMP R8, RCX
; 716E:       0F8576010000     JNE L15
; 7174:       4883C0FE         ADD RAX, -2
; 7178:       488D3C06         LEA RDI, [RSI+RAX]
; 717C:       483B7BF9         CMP RDI, [RBX-7]
; 7180:       0F8318020000     JNB L29
; 7186:       F20F1064BB01     MOVSD XMM4, [RBX+RDI*4+1]
; 718C:       660F54250CFEFFFF ANDPD XMM4, [RIP-500]          ; [#x53BB6FA0]
; 7194:       F20F58E3         ADDSD XMM4, XMM3
; 7198:       488D4206         LEA RAX, [RDX+6]
; 719C:       488BC8           MOV RCX, RAX
; 719F:       48D1F9           SAR RCX, 1
; 71A2:       4C63C1           MOVSX R8, ECX
; 71A5:       4939C8           CMP R8, RCX
; 71A8:       0F8536010000     JNE L14
; 71AE:       4883C0FE         ADD RAX, -2
; 71B2:       488D3C06         LEA RDI, [RSI+RAX]
; 71B6:       483B7BF9         CMP RDI, [RBX-7]
; 71BA:       0F83E2010000     JNB L30
; 71C0:       F20F105CBB01     MOVSD XMM3, [RBX+RDI*4+1]
; 71C6:       660F541DD2FDFFFF ANDPD XMM3, [RIP-558]          ; [#x53BB6FA0]
; 71CE:       F20F58DC         ADDSD XMM3, XMM4
; 71D2:       488D4208         LEA RAX, [RDX+8]
; 71D6:       488BC8           MOV RCX, RAX
; 71D9:       48D1F9           SAR RCX, 1
; 71DC:       4C63C1           MOVSX R8, ECX
; 71DF:       4939C8           CMP R8, RCX
; 71E2:       0F85F6000000     JNE L13
; 71E8:       4883C0FE         ADD RAX, -2
; 71EC:       488D3C06         LEA RDI, [RSI+RAX]
; 71F0:       483B7BF9         CMP RDI, [RBX-7]
; 71F4:       0F83AC010000     JNB L31
; 71FA:       F20F1064BB01     MOVSD XMM4, [RBX+RDI*4+1]
; 7200:       660F542598FDFFFF ANDPD XMM4, [RIP-616]          ; [#x53BB6FA0]
; 7208:       F20F58E3         ADDSD XMM4, XMM3
; 720C:       488D420A         LEA RAX, [RDX+10]
; 7210:       488BC8           MOV RCX, RAX
; 7213:       48D1F9           SAR RCX, 1
; 7216:       4C63C1           MOVSX R8, ECX
; 7219:       4939C8           CMP R8, RCX
; 721C:       0F85B6000000     JNE L12
; 7222:       4883C0FE         ADD RAX, -2
; 7226:       488D3C06         LEA RDI, [RSI+RAX]
; 722A:       483B7BF9         CMP RDI, [RBX-7]
; 722E:       0F8376010000     JNB L32
; 7234:       F20F104CBB01     MOVSD XMM1, [RBX+RDI*4+1]
; 723A:       660F540D5EFDFFFF ANDPD XMM1, [RIP-674]          ; [#x53BB6FA0]
; 7242:       F20F58CC         ADDSD XMM1, XMM4
; 7246:       488D420C         LEA RAX, [RDX+12]
; 724A:       488BC8           MOV RCX, RAX
; 724D:       48D1F9           SAR RCX, 1
; 7250:       4C63C1           MOVSX R8, ECX
; 7253:       4939C8           CMP R8, RCX
; 7256:       757A             JNE L11
; 7258:       488BD0           MOV RDX, RAX
; 725B:       498D41FE         LEA RAX, [R9-2]
; 725F:       488BC8           MOV RCX, RAX
; 7262:       48D1F9           SAR RCX, 1
; 7265:       4C63C1           MOVSX R8, ECX
; 7268:       4939C8           CMP R8, RCX
; 726B:       755F             JNE L10
; 726D:       4C8BC8           MOV R9, RAX
; 7270: L5:   4D85C9           TEST R9, R9
; 7273:       0F8587FEFFFF     JNE L4
; 7279: L6:   4D896D28         MOV [R13+40], R13              ; thread.pseudo-atomic-bits
; 727D:       498B5570         MOV RDX, [R13+112]             ; thread.mixed-tlab
; 7281:       4883C210         ADD RDX, 16
; 7285:       493B5578         CMP RDX, [R13+120]
; 7289:       0F871F010000     JNBE L33
; 728F:       49895570         MOV [R13+112], RDX             ; thread.mixed-tlab
; 7293:       4883C2FF         ADD RDX, -1
; 7297: L7:   66C742F11D01     MOV WORD PTR [RDX-15], 285
; 729D:       4D316D28         XOR [R13+40], R13              ; thread.pseudo-atomic-bits
; 72A1:       7402             JEQ L8
; 72A3:       CC09             INT3 9                         ; pending interrupt trap
; 72A5: L8:   F20F114AF9       MOVSD [RDX-7], XMM1
; 72AA: L9:   BF17011050       MOV EDI, #x50100117            ; NIL
; 72AF:       488BF7           MOV RSI, RDI
; 72B2:       488975F0         MOV [RBP-16], RSI
; 72B6:       488D5D10         LEA RBX, [RBP+16]
; 72BA:       B908000000       MOV ECX, 8
; 72BF:       F9               STC
; 72C0:       488D65F0         LEA RSP, [RBP-16]
; 72C4:       488B6D00         MOV RBP, [RBP]
; 72C8:       FF73F8           PUSH QWORD PTR [RBX-8]
; 72CB:       C3               RET
; 72CC: L10:  48D1F8           SAR RAX, 1
; 72CF:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 72D1:       02               BYTE #X02                      ; RAX(s)
; 72D2: L11:  48D1F8           SAR RAX, 1
; 72D5:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 72D7:       02               BYTE #X02                      ; RAX(s)
; 72D8: L12:  48D1F8           SAR RAX, 1
; 72DB:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 72DD:       02               BYTE #X02                      ; RAX(s)
; 72DE: L13:  48D1F8           SAR RAX, 1
; 72E1:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 72E3:       02               BYTE #X02                      ; RAX(s)
; 72E4: L14:  48D1F8           SAR RAX, 1
; 72E7:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 72E9:       02               BYTE #X02                      ; RAX(s)
; 72EA: L15:  48D1F8           SAR RAX, 1
; 72ED:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 72EF:       02               BYTE #X02                      ; RAX(s)
; 72F0: L16:  48D1F8           SAR RAX, 1
; 72F3:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 72F5:       02               BYTE #X02                      ; RAX(s)
; 72F6: L17:  48D1F8           SAR RAX, 1
; 72F9:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 72FB:       02               BYTE #X02                      ; RAX(s)
; 72FC: L18:  4D8BCA           MOV R9, R10
; 72FF:       BA02000000       MOV EDX, 2
; 7304:       EB58             JMP L20
; 7306:       660F1F840000000000 NOP
; 730F:       90               NOP
; 7310: L19:  488D42FE         LEA RAX, [RDX-2]
; 7314:       488D3C06         LEA RDI, [RSI+RAX]
; 7318:       483B7BF9         CMP RDI, [RBX-7]
; 731C:       0F839C000000     JNB L34
; 7322:       F20F1054BB01     MOVSD XMM2, [RBX+RDI*4+1]
; 7328:       660F541570FCFFFF ANDPD XMM2, [RIP-912]          ; [#x53BB6FA0]
; 7330:       F20F58CA         ADDSD XMM1, XMM2
; 7334:       488D4202         LEA RAX, [RDX+2]
; 7338:       488BC8           MOV RCX, RAX
; 733B:       48D1F9           SAR RCX, 1
; 733E:       4C63C1           MOVSX R8, ECX
; 7341:       4939C8           CMP R8, RCX
; 7344:       7532             JNE L22
; 7346:       488BD0           MOV RDX, RAX
; 7349:       498D41FE         LEA RAX, [R9-2]
; 734D:       488BC8           MOV RCX, RAX
; 7350:       48D1F9           SAR RCX, 1
; 7353:       4C63C1           MOVSX R8, ECX
; 7356:       4939C8           CMP R8, RCX
; 7359:       7517             JNE L21
; 735B:       4C8BC8           MOV R9, RAX
; 735E: L20:  4D85C9           TEST R9, R9
; 7361:       75AD             JNE L19
; 7363:       4983FE0C         CMP R14, 12
; 7367:       0F8C0CFFFFFF     JL L6
; 736D:       E93AFDFFFF       JMP L3
; 7372: L21:  48D1F8           SAR RAX, 1
; 7375:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 7377:       02               BYTE #X02                      ; RAX(s)
; 7378: L22:  48D1F8           SAR RAX, 1
; 737B:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 737D:       02               BYTE #X02                      ; RAX(s)
; 737E: L23:  CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 7380:       18               BYTE #X18                      ; RSI(d)
; 7381: L24:  CC33             INT3 51                        ; OBJECT-NOT-SIMPLE-ARRAY-DOUBLE-FLOAT-ERROR
; 7383:       0C               BYTE #X0C                      ; RBX(d)
; 7384: L25:  498BC3           MOV RAX, R11
; 7387:       48D1F8           SAR RAX, 1
; 738A:       CC63             INT3 99                        ; OBJECT-NOT-SIGNED-BYTE-32-ERROR
; 738C:       02               BYTE #X02                      ; RAX(s)
; 738D: L26:  FF24256800A052   JMP QWORD PTR [#x52A00068]     ; SB-VM::ALLOC-TRAMP
; 7394:       CC10             INT3 16                        ; Invalid argument count trap
; 7396: L27:  CC24             INT3 36                        ; INVALID-VECTOR-INDEX-ERROR
; 7398:       0C               BYTE #X0C                      ; RBX(d)
; 7399:       1D               BYTE #X1D                      ; RDI(a)
; 739A: L28:  CC24             INT3 36                        ; INVALID-VECTOR-INDEX-ERROR
; 739C:       0C               BYTE #X0C                      ; RBX(d)
; 739D:       1D               BYTE #X1D                      ; RDI(a)
; 739E: L29:  CC24             INT3 36                        ; INVALID-VECTOR-INDEX-ERROR
; 73A0:       0C               BYTE #X0C                      ; RBX(d)
; 73A1:       1D               BYTE #X1D                      ; RDI(a)
; 73A2: L30:  CC24             INT3 36                        ; INVALID-VECTOR-INDEX-ERROR
; 73A4:       0C               BYTE #X0C                      ; RBX(d)
; 73A5:       1D               BYTE #X1D                      ; RDI(a)
; 73A6: L31:  CC24             INT3 36                        ; INVALID-VECTOR-INDEX-ERROR
; 73A8:       0C               BYTE #X0C                      ; RBX(d)
; 73A9:       1D               BYTE #X1D                      ; RDI(a)
; 73AA: L32:  CC24             INT3 36                        ; INVALID-VECTOR-INDEX-ERROR
; 73AC:       0C               BYTE #X0C                      ; RBX(d)
; 73AD:       1D               BYTE #X1D                      ; RDI(a)
; 73AE: L33:  6A10             PUSH 16
; 73B0:       E8D8FFFFFF       CALL L26
; 73B5:       5A               POP RDX
; 73B6:       80CA0F           OR DL, 15
; 73B9:       E9D9FEFFFF       JMP L7
; 73BE: L34:  CC24             INT3 36                        ; INVALID-VECTOR-INDEX-ERROR
; 73C0:       0C               BYTE #X0C                      ; RBX(d)
; 73C1:       1D               BYTE #X1D                      ; RDI(a)
NIL

gsl

GNU Scientific Library is another alternative to (C)BLAS, but its functions operate on its own data types, thus introducing an overhead in translating lisp arrays to the GSL-native wrappers.

gsll

GSLL too ships with BLAS wrapper, but (i) these are generic functions (ii) even if one uses static-dispatch, the wrappers are made with grid:foreign-array in mind; thus introducing a level of indirection.

magicl

magicl ships with BLAS and LAPACK bindings, however these are FORTRAN bindings. In addition, the magicl generated high level bindings through the magicl/ext-blas or magicl/ext-lapack systems assume that the arguments will be undisplaced simple-array.

About

C2FFI based wrapper for CBLAS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published