# Step-by-step tutorials

<img alt="Logo" src="https://hikettei.github.io/cl-waffe-docs/cl-waffe.png" width="45%">

cl-waffe2 is a Yet Another Extensible Tensor Library and Deep Learning Framework for Common Lisp. In this package, we provide APIs for computing, build, compiling, and optimizing various neural network models. This notebook is divided into several sections and provides tutorials. For installing cl-waffe2 and API references, The [official documentation](https://hikettei.github.io/cl-waffe2/) is also available.

### Prerequisite: common-lisp-jupyter

This jupyter file also depends on [common-lisp-jupyter](https://github.com/yitzchak/common-lisp-jupyter) extensions to execute cells and you have to install it in advance. (See `Readme.md` for installing)

### Loading cl-waffe2

Anyway, let's get started with loading cl-waffe2 package.

As of this writing(2023/08/29), cl-waffe2 is still under development and therefore not yet available from Quicklisp. So the first thing you have to do is clone cl-waffe2 from [this original Github Repository](https://github.com/hikettei/cl-waffe2.git) and load `cl-waffe2.asd` file.

```sh
$ git clone https://github.com/hikettei/cl-waffe2.git
$ cd cl-waffe2
$ jupyter lab
```


In [4]:
(load "../../../cl-waffe2.asd")

(asdf:load-system :cl-waffe2 :silent t)

T

T

SB-KERNEL:REDEFINITION-WITH-DEFMETHOD: redefining PERFORM (#<STANDARD-CLASS ASDF/LISP-ACTION:TEST-OP>
                                                           #<SB-MOP:EQL-SPECIALIZER #<SYSTEM "cl-waffe2/test">>) in DEFMETHOD
SB-KERNEL:REDEFINITION-WITH-DEFMETHOD: redefining PERFORM (#<STANDARD-CLASS ASDF/LISP-ACTION:TEST-OP> #<SB-MOP:EQL-SPECIALIZER #<SYSTEM "cl-waffe2/test">>) in DEFMETHOD
SB-INT:PACKAGE-AT-VARIANCE: CL-WAFFE2-SIMD also exports the following symbols:
  (CL-WAFFE2-SIMD:WAFFE2-I8MUL-SCAL CL-WAFFE2-SIMD:WAFFE2-U8SUB-SCAL
                                    CL-WAFFE2-SIMD:WAFFE2-SLT
                                    CL-WAFFE2-SIMD:WAFFE2-DEQ-SCAL
                                    CL-WAFFE2-SIMD:WAFFE2-U8INV
                                    CL-WAFFE2-SIMD:WAFFE2-U16MUL-SCAL
                                    CL-WAFFE2-SIMD:WAFFE2-SMIN
                                    CL-WAFFE2-SIMD:WAFFE2-U32SUB
                                    CL-WAFFE2-SIMD:WAFFE2-U16LE


## Packaging and naming conventions

It is highly recommended to define a playground package because cl-waffe2 provides various packages depending on its features.

### Core System

| package | description |
| ------- | ----------- |
| :cl-waffe2/vm | cl-waffe2 VM and IR |
| :cl-waffe2/vm.nodes| network definings like defnode(AbstractNode), and defmodel(Composite) |
| :cl-waffe2/vm.generic-tensor | provides AbstractTensor |

### APIs

| package | description |
| ------- | ----------- |
| :cl-waffe2 | provides various utils for configurating and building netoworks |
| :cl-waffe2/base-impl | provides fundamental APIs like `!add` or `!reshape` |
| :cl-waffe2/distributions | initializes tensors sampled from various distributions |
| :cl-waffe2/nn | provides standard implementation of NNs like regressions, and CNN |
| :cl-waffe2/optimizers | provides standard implementations of optimizers like SGD or Adam. |

### Backends

| package | description |
| ------- | ----------- |
| :cl-waffe2/backends.lisp | A backend which works on ANSI Common Lisp |
| :cl-waffe2/backends.cpu | Basically LispTensor but SIMD-Extension and OpenBLAS is enabled |
| :cl-waffe2/backends.jit.cpu | (Experimental) from cl-waffe2 to C compiler |
| :cl-waffe2/backends.jit.lisp | (To be deleted) from cl-waffe2 to Lisp compiler |

In [5]:
(defpackage :section-0-basic
    (:use
     :cl
     :cl-waffe2
     :cl-waffe2/base-impl
     :cl-waffe2/vm
     :cl-waffe2/vm.nodes
     :cl-waffe2/vm.generic-tensor
     :cl-waffe2/distributions))

(in-package :section-0-basic)

#<PACKAGE "SECTION-0-BASIC">

#<PACKAGE "SECTION-0-BASIC">

## Fundamental Data Type

In mathematical like Linear Algebra, a single number is represented as a scalar, a one-dimensional matrix as a Vector and a two-dimensional matrix as a Matrix.

$$ 1 $$

$$ (a_1,a_2,\dots,a_n) $$

\begin{pmatrix}
1 & 2 \\
3 & 4 \\
\end{pmatrix}

For convenience, we call all of them `Tensor` and use a CLOS class, `AbstractTensor` which wraps them. In order to create a new `AbstractTensor`, you can use a function [make-tensor](https://hikettei.github.io/cl-waffe2/generic-tensor/#function-make-tensor). Plus, the package [:cl-waffe2/distributions](https://hikettei.github.io/cl-waffe2/distributions/) will also enable creating tensors sampled from various distributions. For the simplest case, `ScalarTensor` is created like:

In [6]:
(make-tensor 1)

{SCALARTENSOR[float]   
    1.0
  :facet :exist
  :requires-grad NIL
  :backward NIL}

In [7]:
;; And tensors
(make-tensor '(10 10) :initial-element 1.0)

{CPUTENSOR[float] :shape (10 10)  
  ((1.0 1.0 1.0 ~ 1.0 1.0 1.0)           
   (1.0 1.0 1.0 ~ 1.0 1.0 1.0)   
        ...
   (1.0 1.0 1.0 ~ 1.0 1.0 1.0)
   (1.0 1.0 1.0 ~ 1.0 1.0 1.0))
  :facet :exist
  :requires-grad NIL
  :backward NIL}

In [8]:
(randn `(10 10)) ;; randn will sample from gaussian distribution

{CPUTENSOR[float] :shape (10 10)  
  ((0.1371898    0.98653716   -0.41064423  ~ -2.1985002   0.64959      0.37606597)                    
   (-0.990578    1.8650554    1.549668     ~ 2.007281     0.2079959    1.2413626)   
                 ...
   (-0.14403512  -0.13514197  -0.2724243   ~ 0.59860396   0.87737995   -0.39738473)
   (0.3234897    1.256502     -0.66197085  ~ 1.356871     0.502143     0.6284781))
  :facet :exist
  :requires-grad NIL
  :backward NIL}

In [9]:
;; Accessing its storage
(tensor-vec (make-tensor 1))

1.0

In [10]:
;; InputTensor is a `lazy-allocation` tensor.
;; won't be allocated until it is needed (i.e.: tensor-vec is called)
;; it is used in many places: to store the result, to trace networks, and more!
(make-input `(2 2) nil)
(tensor-vec (make-input `(2 2) nil))

{CPUTENSOR[float] :shape (2 2) :named ChainTMP679 
  <<Not-Embodied (2 2) Tensor>>
  :facet :input
  :requires-grad NIL
  :backward NIL}

#(0.0 0.0 0.0 0.0)

## Digging Deeper: AbstractTensor

<div align="center">
    <img alt="Logo" src="../assets/AbstractTensor.png" width="45%">
</div>

A data type `storage` (e.g. `fixnum`, `Common Lisp Standard Array` ... anything is ok!) can be wrapped with an `AbstractTensor` to provide further information:

- `:requires-grad` indicates the tensor requires gradients?

- `:backward` stores computation graph.

- `:facet` indicates the tensor is already allocated? set as `:exist` to represent allocated, and `:input` to not allocated or created by `(make-input ...)`.

etc...

This should be noted that converting `storage` and `AbstractTensor` can be easily done by using a `change-facet` function. It works like `coerce` in CL standard. Moreover, users can add an arbitary combination between `storage` and `AbstractTensor` into a [convert-tensor-facet](https://hikettei.github.io/cl-waffe2/utils/#generic-convert-tensor-facet) method if needed.

```lisp
;; Usage
(change-facet <Target> :direction <Symbol indicates the direction>)

;; Example: AbstractTensor -> CL Array
(change-facet (make-tensor `(3 3)) :direction 'array)
```

In [11]:
;; CL Array -> AbstractTensor
(change-facet #2A((1 2 3)
		          (4 5 6)
		          (7 8 9))
	       :direction 'AbstractTensor)

{CPUTENSOR[int32] :shape (3 3)  
  ((1 2 3)
   (4 5 6)
   (7 8 9))
  :facet :exist
  :requires-grad NIL
  :backward NIL}

In [12]:
;; AbstractTensor -> Array
(change-facet (ax+b `(3 3) 0 1) :direction 'array)

#2A((1.0 1.0 1.0) (1.0 1.0 1.0) (1.0 1.0 1.0))

In [13]:
;; change-facet won't create copies of tensors, that is, pointers are shared before and after the call.
;; Therefore, this function is also useful for editing AbstractTensors as CL Array for a moment.
;; the macro with-facet will first calls the change-facet function, binding the result into a* in this case.

;; Fills the diagonal of a 3x3 matrix filled with 1 with 0.0
(let ((a (ax+b `(3 3) 0 1)))
  (with-facet (a* (a :direction 'array))
    (setf (aref a* 0 0) 0.0)
    (setf (aref a* 1 1) 0.0)
    (setf (aref a* 2 2) 0.0))
   a)

{CPUTENSOR[float] :shape (3 3)  
  ((0.0 1.0 1.0)
   (1.0 0.0 1.0)
   (1.0 1.0 0.0))
  :facet :exist
  :requires-grad NIL
  :backward NIL}

## Performing operations

If you've learned to create a new `AbstractTensor`, let's move to combining several `AbstractTensor` and apply operations. A function `!add` will find the sum of two given tensors, but the function itself will only return a `InputTensor` with no allocation. This is because all operations in cl-waffe2 is **lazy evaluated** and **later compiled**.

In [14]:
(!add 1 1)

{SCALARTENSOR[int32]  :named ChainTMP767 
  :vec-state [maybe-not-computed]
  <<Not-Embodied (1) Tensor>>
  :facet :input
  :requires-grad NIL
  :backward <Node: SCALARANDSCALARADD-SCALARTENSOR (A[SCAL] B[SCAL] -> A[SCAL]
                                                    WHERE SCAL = 1)>}

Seeing its `:backward` slot, you can see the node `ScalarAndScalarAdd-ScalarTensor` is registed as a previous node. And `:facet` is set to `:input` indicating that the result isn't allocated yet. In order to execute the tensor, you have to compile it.

### proceed

The function proceed (and proceed-XXX functions) will execute the node in a less compile-time method, like an interepreter, evaluates the AST as it is to run a lot of development cycles.

`proceed` Runs a given computation node, and, you can continue more additional operation following by `PROCEED-NODE`.

`proceed-backward` Runs a given computation node, and after then, runs a backward propagation.

`proceed-time` Measures the executing time ignored its compiling time.

`proceed-bench` This function is `proceed-XXX` but compiles into `cl-waffe2 IR`, and finds the bottleneck for each computation node.

In [15]:
;; Proceed
(proceed (!add 1 1))

{SCALARTENSOR[int32]  :named ChainTMP786 
  :vec-state [computed]
    2
  :facet :input
  :requires-grad NIL
  :backward <Node: PROCEEDNODE-T (A[~] -> A[~])>}

In [16]:
;; Proceed and proceed

(proceed (!sin (proceed (!sin 1))))

{SCALARTENSOR[float]  :named ChainTMP822 
  :vec-state [computed]
    0.7456241
  :facet :input
  :requires-grad NIL
  :backward <Node: PROCEEDNODE-T (A[~] -> A[~])>}

In [17]:
;; Proceed-backward
(let ((x (parameter (ax+b `(3 3) 0 2))))
     (proceed-backward
      (!sum
       (!mul x 3.0)))
     (grad x))

{CPUTENSOR[float] :shape (3 3)  
  ((3.0 3.0 3.0)
   (3.0 3.0 3.0)
   (3.0 3.0 3.0))
  :facet :exist
  :requires-grad NIL
  :backward NIL}

In [18]:
;; Proceed-time
(proceed-time (!mul 2.0 2.0))

{SCALARTENSOR[float]  :named ChainTMP1275 
  :vec-state [computed]
    4.0
  :facet :input
  :requires-grad NIL
  :backward <Node: PROCEEDNODE-T (A[~] -> A[~])>}

Proceed-Time: With allocation time:
Evaluation took:
  0.052 seconds of real time
  0.009878 seconds of total run time (0.009537 user, 0.000341 system)
  19.23% CPU
  27 lambdas converted
  120,365,948 processor cycles
  1 page fault
  3,789,808 bytes consed
  
Proceed-Time: Without allocation time:
Evaluation took:
  0.000 seconds of real time
  0.000021 seconds of total run time (0.000021 user, 0.000000 system)
  100.00% CPU
  45,310 processor cycles
  0 bytes consed
  


In [19]:
;; proceed-bench

(proceed-bench (!sum (ax+b `(10 10) 0 1)) :n-sample 10000)

{CPUTENSOR[float] :shape (1 1) -> :view (<(BROADCAST 1)> <(BROADCAST 1)>) -> :visible-shape (1 1) :named ChainTMP1297 
  ((100.0))
  :facet :input
  :requires-grad NIL
  :backward NIL}

[Sorted by Instructions]
 Time(s)   |   Instruction ( * - Beyonds the average execution time)
0.009109   | <WfInst[Compiled: SCALARMUL-CPUTENSOR] : TID1298 <= op(TID1298(1 1) <Input>TID1300(1))>
0.004744   | <WfInst[Compiled: VIEWTENSORNODE-T]    : TID1309 <= op(TID1309(10 10) TID1298(1 1))>
0.055617*  | <WfInst[Compiled: ADDNODE-CPUTENSOR]   : TID1309 <= op(TID1309(10 10) <Input>TID1295(10 10))>
0.004947   | <WfInst[Compiled: VIEWTENSORNODE-T]    : TID1331 <= op(TID1331(1 1) TID1309(10 10))>

4 Instructions | 5 Tensors

 Total Time: 0.074417 sec

[Sorted by topK]
 Instruction                           | Total time (s) | Time/Total (n-sample=10000)
<WfInst[Compiled: ADDNODE-CPUTENSOR]   | 0.055617       | 74.73695%
<WfInst[Compiled: VIEWTENSORNODE-T]    | 0.009691       | 13.022561%
<WfInst[Compiled: SCALARMUL-CPUTENSOR] | 0.009109       | 12.240482%


When you are debugging with `proceed` and developing in progress and finally moving on to phases such as training, you can use the `build` function to create a fast network. Instead of a little compiling overhead, `build` will compile the given node into the sequence of `cl-waffe2 IR` and operations are done in `:cl-waffe2 VM`. After the first call of `forward` and `backward`, cl-waffe2 won't do any allocations during training, so works much faster.

In [25]:
(let ((x (parameter (ax+b `(10 10) 0 1)))
      (y (parameter (ax+b `(10 10) 0 3))))
     (let ((compiled-model (build (!sum (!mul x y)))))
          (format t "[Forward]: ~%~a~%" (forward compiled-model))
          (backward compiled-model)
          (format t "~%[X.grad]:~%~a~%[Y.grad]:~%~a~%" (grad x) (grad y))
          nil))

NIL

[Forward]: 
{CPUTENSOR[float] :shape (1 1) -> :view (<(BROADCAST 1)> <(BROADCAST 1)>) -> :visible-shape (1 1) :named ChainTMP6125 
  ((300.0))
  :facet :input
  :requires-grad NIL
  :backward NIL}

[X.grad]:
{CPUTENSOR[float] :shape (10 10)  
  ((3.0 3.0 3.0 ~ 3.0 3.0 3.0)           
   (3.0 3.0 3.0 ~ 3.0 3.0 3.0)   
        ...
   (3.0 3.0 3.0 ~ 3.0 3.0 3.0)
   (3.0 3.0 3.0 ~ 3.0 3.0 3.0))
  :facet :exist
  :requires-grad NIL
  :backward NIL}
[Y.grad]:
{CPUTENSOR[float] :shape (10 10)  
  ((1.0 1.0 1.0 ~ 1.0 1.0 1.0)           
   (1.0 1.0 1.0 ~ 1.0 1.0 1.0)   
        ...
   (1.0 1.0 1.0 ~ 1.0 1.0 1.0)
   (1.0 1.0 1.0 ~ 1.0 1.0 1.0))
  :facet :exist
  :requires-grad NIL
  :backward NIL}


## Changing devices

```lisp
(with-devices (&rest devices)
    body)
```

cl-waffe2の演算(AbstractNode)を実行するデバイスなどは、ユーザーによって拡張したり変更することが簡単になるように設計されています。

利用可能なバックエンド一覧やその状態などは`(show-backends)`を用いて確認することができます。