# Example: Using CHyqmom4
This example shows the usage of CHyqmom4. Chyqmom9 and 27 follow a very similar usage process 

## Input Data
To begin, we need to create a input data object that contains the moments. The ```chyqmom4``` class takes in a very speficific data format as input: the input must be a 2D ```numpy.ndarray``` type.  As well, for the sake of the GPUs, it must also be continuous in memory, and the values must be ```numpy.float32``` types. 

To ensure that an array satisfies these prereq, the simplest way is to use PyCuda's ```aligned_empty``` method to create an empty numpy array that is continuous in memory. However, ```numpy.ndarray``` provides a simple way to check whether an existing array is already continuous.

**NOTE**
 It is important that the array is continuous in C style (row values are continuous in memory) for the GPUs

In [17]:
import numpy as np
import pycuda.driver as cuda
import pycuda.autoinit

arr_size = (3, 3)
array = cuda.aligned_empty(arr_size, dtype=np.float32)

print("Is array continuous in memory? ", array.data.contiguous)          # True
print("Is array continuous in C style? ",array.data.c_contiguous)        # True
print("Is array continuous in Fortran style? ",array.data.f_contiguous)  # False

Is array continuous in memory?  True
Is array continuous in C style?  True
Is array continuous in Fortran style?  False


Helper functions are provided in ```qbmmlib.gpu.util``` for starting up a dummy input moment array for the purpose of examples and debugging. For the case of Chyqmom4, the one we want is ```init_moment_6()```

In [18]:
import qbmmlib.gpu.util as util

num_moments = int(1e4)
dummy_moment = util.init_moment_6(num_moments)

## Initialize a Chyqmom4 class
The Chyqmom4 class is located in  ```qbmmlib.gpu.chyqmom4``` module. To start an instance, we provide two parameters: number of GPU device we wish to use, and number of input we expect. Note that we do not actualy pass in the data during initialization.  

In [19]:
from qbmmlib.gpu.chyqmom4 import Chyqmom4

num_device = 1
C = Chyqmom4(num_device, num_moments)

Now that an instance of the class is initaiized without error, we can set the input moment. The moment's size must match the value we specified during the instance's initailization  

In [20]:
C.set_args(dummy_moment)

That is all to the initialization! The input moment can be changed however many times as long as its size matches. However, a new instance must be started if you want to change either the number of GPU used, or the number of moments in the expected input 

## Running Chyqmom4
Once the class is initailized and its input argument set, the class is ready for execution. Here we set up a simple timing system to measure its performance 

In [21]:
import time

start_time = time.perf_counter()    
# Run chyqmom4 on the specified input 
res = C.run()
stop_time = time.perf_counter()

run_time = (stop_time - start_time) * 1e3 # ms
print("[Chyqmom4] input moment size: {} \nN GPU: {}, N Stream: {} \ntime (ms): {:.4f}".format(
    C.in_size, C.num_device, C.num_stream, run_time))

[Chyqmom4] input moment size: 10000 
N GPU: 1, N Stream: 1 
time (ms): 1.0899


The output weight and abscissas are stored in 3 members of the class: weight in ```w_chunk_host```, (x, y) in ```x_chunk_host``` and ```y_chunk_host``` respectively. Note that these members will contain zeros if they are accessed before the ```run()``` method is called  

In [22]:
print("Weight: ")
print(C.w_chunk_host)

Weight: 
[[array([[0.25, 0.25, 0.25, ..., 0.25, 0.25, 0.25],
       [0.25, 0.25, 0.25, ..., 0.25, 0.25, 0.25],
       [0.25, 0.25, 0.25, ..., 0.25, 0.25, 0.25],
       [0.25, 0.25, 0.25, ..., 0.25, 0.25, 0.25]], dtype=float32)]]
