#### Testing combinations with starts and stops as indices

This is intended to be used with oamap version ( I know we don't intend to finally use it, but let's go with it for now).

##### Idea for calculation

Using the kernel structure used in test_combinations, we can  generate combinations arrays for every event. 

For that, each event is assigned a block ( i.e. each block in the grid serves as an event), and the combinations is calculated for each block using the `starts[block_index]` and event length.
This is appended to `left` and `right` arrays. 

In [1]:
import pycuda.autoinit
import pycuda
import pycuda.driver as cuda
import pycuda.gpuarray as gpuarray
from pycuda.compiler import *
import numpy

In [2]:
# Tha generation step. Let's form a random integer array, from which we will form starts 
# and stops arrays
base_len = 32
base_arr = numpy.random.randint(4, size=base_len)

In [3]:
cumul_arr = numpy.cumsum(base_arr)

In [4]:
start = cumul_arr[:-1]
stop = cumul_arr[1:]

In [5]:
start[:5]

array([ 3,  6,  8, 10, 13])

In [6]:
stop[:5]

array([ 6,  8, 10, 13, 14])

In [7]:
lengths = stop-start
cumul_lengths = numpy.cumsum(lengths)
pairs_lengths = lengths*lengths
pairs_lengths = pairs_lengths.astype(numpy.int32)
lengths_arr = numpy.array([base_len]).astype(numpy.int32)
lengths = lengths.astype(numpy.int32)

In [8]:
lengths

array([3, 2, 2, 3, 1, 2, 2, 3, 0, 2, 3, 0, 3, 3, 1, 3, 2, 2, 2, 3, 1, 2, 3,
       3, 3, 1, 3, 3, 0, 3, 0], dtype=int32)

In [9]:
start

array([ 3,  6,  8, 10, 13, 14, 16, 18, 21, 21, 23, 26, 26, 29, 32, 33, 36,
       38, 40, 42, 45, 46, 48, 51, 54, 57, 58, 61, 64, 64, 67])

##### Error 

The left and right arrays aren't changing at all, they remain as zeroes. Can't figure out why this is happening.

In [10]:
# Now let's form the cuda function

mod = SourceModule('''
__global__ void comb_events(int* left,int* right,int* start,int* lengths,int* pairs_lengths)
{
    unsigned int block_id = blockIdx.x;
    unsigned int thread_idx = threadIdx.x + block_id*gridDim.x*blockDim.x;
    unsigned int thread_idy = threadIdx.y + block_id*blockDim.x*gridDim.x;
    if (block_id <lengths[0] && thread_idx<pairs_lengths[block_id] && thread_idy<pairs_lengths[block_id])
    {
        /* The actual code:
        left[block_id*lengths[0]+thread_idx*pairs_lengths[block_id]+thread_idy] = thread_idx + start[block_id];
        right[block_id*lengths[0]+thread_idx*pairs_lengths[block_id]+thread_idy] = thread_idy + start[block_id];
        */
        // TEST Code: we ought to get the thread indices in *left. But we get all zeros instead. Why??
        left[block_id*lengths[0] + thread_idx*pairs_lengths[block_id] + thread_idy] = thread_idx;
    }
}
''')

In [11]:
func = mod.get_function("comb_events")

In [12]:
left = numpy.zeros(sum(pairs_lengths).astype(numpy.int32))
right = numpy.zeros(sum(pairs_lengths).astype(numpy.int32))

In [13]:
func(cuda.InOut(left), cuda.InOut(right), cuda.In(start), cuda.In(lengths_arr),
    cuda.In(lengths), block=(4, 4, 1), grid=(base_len, 1))

In [14]:
left

array([  0.00000000e+000,   2.12199579e-314,   2.12199579e-314,
         4.24399158e-314,   9.88131292e-324,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+