#### Testing combinations with starts and stops as indices

This is intended to be used with oamap version ( I know we don't intend to finally use it, but let's go with it for now).

##### Idea for calculation

Using the kernel structure used in test_combinations, we can  generate combinations arrays for every event. 

For that, each event is assigned a block ( i.e. each block in the grid serves as an event), and the combinations is calculated for each block using the `starts[block_index]` and event length.
This is appended to `left` and `right` arrays. 

In [1]:
import pycuda.autoinit
import pycuda
import pycuda.driver as cuda
import pycuda.gpuarray as gpuarray
from pycuda.compiler import *
import numpy

In [2]:
# Tha generation step. Let's form a random integer array, from which we will form starts 
# and stops arrays
base_len = 32
base_arr = numpy.random.randint(4, size=base_len)

In [3]:
cumul_arr = numpy.cumsum(base_arr)

In [4]:
start = cumul_arr[:-1]
stop = cumul_arr[1:]

In [5]:
start[:5]

array([0, 0, 2, 5, 8], dtype=int32)

In [6]:
stop[:5]

array([ 0,  2,  5,  8, 10], dtype=int32)

In [7]:
lengths = stop-start
cumul_lengths = numpy.cumsum(lengths)
pairs_lengths = lengths*lengths
pairs_lengths = pairs_lengths.astype(numpy.int32)
lengths_arr = numpy.array([base_len]).astype(numpy.int32)
lengths = lengths.astype(numpy.int32)

In [8]:
lengths

array([0, 2, 3, 3, 2, 3, 1, 2, 2, 1, 3, 3, 0, 0, 1, 1, 1, 3, 2, 2, 3, 2, 2,
       2, 1, 3, 3, 0, 0, 0, 0])

In [9]:
start

array([ 0,  0,  2,  5,  8, 10, 13, 14, 16, 18, 19, 22, 25, 25, 25, 26, 27,
       28, 31, 33, 35, 38, 40, 42, 44, 45, 48, 51, 51, 51, 51], dtype=int32)

##### Error 

The left and right arrays aren't changing at all, they remain as zeroes. Can't figure out why this is happening.

In [32]:
# Now let's form the cuda function

mod = SourceModule('''
__global__ void comb_events(int* left,int* right,int* start,int* length,int* lengths, int* pairs_lengths)
{
    unsigned int block_id = blockIdx.x;
    unsigned int thread_idx = threadIdx.x + blockIdx.x*blockDim.x;
    unsigned int thread_idy = threadIdx.y + blockIdx.y*blockDim.y;
    if (block_id <length[0] && thread_idx<lengths[block_id] && thread_idy<lengths[block_id])
    {
        //left[block_id*lengths[0]+thread_idx*pairs_lengths[block_id]+thread_idy] = thread_idx + start[block_id];
        //right[block_id*lengths[0]+thread_idx*pairs_lengths[block_id]+thread_idy] = thread_idy + start[block_id];
        
        // At this step, we ought to get the thread indices in *left. But we get all zeros instead. Why??
        left[block_id*pairs_lengths[block_id] + thread_idx*lengths[block_id] + thread_idy] = start[block_id];
    }
}
''')

kernel.cu



In [33]:
func = mod.get_function("comb_events")

In [34]:
left = numpy.zeros(sum(pairs_lengths).astype(numpy.int32))
right = numpy.zeros(sum(pairs_lengths).astype(numpy.int32))

In [35]:
func(cuda.InOut(left), cuda.InOut(right), cuda.In(start), cuda.In(lengths_arr),
    cuda.In(lengths),cuda.In(pairs_lengths), block=(4, 4, 1), grid=(base_len, 1))

In [36]:
# Why all zeros?
left

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.])