# __Tensor core documents__
---

<font face="Consolas">Tensor core is the basic data structure in Neutron. It has handle for CPU and GPU sub-data structure, numpy and Quark respectively. Numpy is the python built-in scientific computation pack and Quark is the CUDA accelerated computation's basic elements. In this documents, it'll show you how Tensor class can control both CPU and GPU data together and realize specific operation in Deep learning framework.</font>


<font face="Consolas">Quark data structure is defined in the 'src/array.h' by using struct data structure. It contains four parameters that are useful while computing in the back-end, which is GPU. 

```Python
class Quark(ctypes.Structure):
    """
        C++ back-end data structure. Contains data pointer (numpy data type has to
        be float32, otherwise it'll raise calculate error), device, data shape poin
        ter and dimension.
    """
    _fields_ = [('data', ctypes.POINTER(c_float)),
                ('device', ctypes.c_int),
                ('shape', ctypes.POINTER(ctypes.c_int)),
                ('dim', ctypes.c_int)]
```

<font face="Consolas">Tensor class the most important attribute is the handle. In English, handle is something that you can grab it in a very easy way but you can get lots of useful information or things connected with the handle. The same idea in Neutron. You can simply access data, data shape, data dim and device information through handle.

```Python
class Tensor:
    """
        Python fore-end data structure.
        The most important attr is handle. Handle is a pointer to the real data str
        -ucture. It manages GPU data structure (Quark) and CPU data structure (numpy).
        
        When you instantiate the Tensor, you need to give parameters as follows:
        1. data: numpy array, dtype is np.float32.
        2. device: on cpu or on gpu.
        3. require_grad: require calculate gradient or not.
    """
    __slots__ = ['handle','device','require_grad']

    def __init__(self, data, device=CPU, require_grad=False):
        self.device = device
        self.require_grad = require_grad
        self.handle = self.configureHandle(self, data, device)
```

<font face="Consolas">To get shape and data, it needs some special function called property function.
```Python
    @property
    def shape(self):  # get data shape
        if isinstance(self.handle, ndarray):
            return self.handle.shape
        return tuple([self.handle.shape[idx] for idx in range(self.handle.dim)])
    
    @property
    def data(self):  # get data
        assert(self.device == GPU), "the data on the gpu instead of cpu"
        return np.ctypeslib.as_array(self.handle.data, shape=self.shape)
```


<font face="Consolas">

ConfigureHandle() is used for set value of self.handle. It has two modes, CPU mode and GPU mode. getNumpyHandle() just simply return the numpy array. getQuarkHandle() is a little bit complex.
The routine of getQuarkHandle():
1. get numpy array's information, data, shape, dim.
2. create a Quark and initialize it.
3. start to allocate and copy data to GPU (this include some CUDALib's function, please refer to CUDALib documents).
4. return the Quark data structure.

</font>

```Python
    @staticmethod
    # configure the handle attribute
    def configureHandle(self, data, device):
        if device == GPU:
            return self.getQuarkHandle(data)
        elif device == CPU:
            return self.getNumpyHandle(data)
    
    @staticmethod
    # get the Quark data structure handle
    def getQuarkHandle(numpy_data):
        assert isinstance(numpy_data, ndarray), "input data should be numpy array"
        data = numpy_data
        arr = Quark()
        arr.data = data.ctypes.data_as(ctypes.POINTER(c_float))
        arr.device = GPU
        arr.shape = getShape(ctypes.c_int, data.shape)
        arr.dim = len(data.shape)
         
        # start to allocate and copy data to GPU
        size = CUDALib.getSize(arr.dim, arr.shape)
        dev_ptr = CUDALib.AllocateDeviceData(size)
        CUDALib.CopyDataFromTo(arr.data, dev_ptr, CPU, GPU, size)
        arr.data = dev_ptr
        return arr
        
    @staticmethod
    # get the numpy data structure handle4
    def getNumpyHandle(numpy_data):
        assert isinstance(numpy_data, ndarray), "input data should be numpy array"
        return numpy_data
```

<font face="Consolas">

It also enables users to tranfer data through CPU and GPU, that needs to decode and get data, shape and dim and reconstruct oppsite data structure.

```Python
    # transfer the data from the gpu to the cpu
    def cpu(self):
        if self.device == GPU:
            size = CUDALib.getSize(self.handle.dim, self.handle.shape)
            host_ptr = CUDALib.AllocateHostData(size)
            CUDALib.CopyDataFromTo(self.handle.data, host_ptr, GPU, CPU, size)
            self.handle.data = host_ptr
        return self
    
    # transfer the data from the cpu to the gpu
    def gpu(self):
        if self.device == CPU and isinstance(self.handle, ndarray):
            self.handle = self.getQuarkHandle(self.handle)
        return self
```

#### __Use GPU and Tensor to compute__

<font face="Consolas">First we need to import some important package.

In [4]:
from core import Tensor
from core import *
import numpy as np
from ctypes import *

<font face="Consolas">And then we need create two input numpy arrays and one output numpy array. After that we can create three GPU tensor by declaring that device=GPU, the back-end will allocate and copy the numpy array data to the GPU memory.

In [5]:
a = np.array([[1, 2, 3], [3, 4, 5]], dtype=np.float32)
b = np.array([[2, 2, 9], [1, 1, 1]], dtype=np.float32)
c = np.zeros((2, 3), dtype=np.float32)

In [6]:
input1 = Tensor(a, device=GPU)
input2 = Tensor(b, device=GPU)
output = Tensor(c, device=GPU)

<font face="Consolas">Then we can call cudaAdd() function defined in core/_CUDA_OP.py to compute using GPU.

In [7]:
cudaAdd(input1, input2, output)

<font face="Consolas">While the output data are still on the GPU, so you can't directly print and see. Fortunately, Tensor class provide an easy interface to help you transfer the GPU data to the CPU.

In [8]:
output.cpu()
print(output)

Tensor([[ 3.  4. 12.]
 [ 4.  5.  6.]], shape=(2, 3), dtype=Tensor.float32)


<font face="Consolas">There are plenty of GPU operator defined in core/_CUDA_OP.py. If you wanna try something more, follow the coding model above all.