# Parents generation from starts and stops arrays

Based on Oamap's parent generation function. It assigns the event index to each particle in that event, which is stored in the pointer array

The python code is 
```python
def parent(starts, stops, pointers):
    for i in range(len(starts)):
        pointers[starts[i]:stops[i]] = i
```


In [1]:
import numpy
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
from pycuda.compiler import *
import pycuda.driver as cuda

In [2]:
# Generate random data, as usual
NUMEVENTS = 320            # Number of evenets to simulate the data for.
AVENUMJETS = 32             # Average number of jets per event.

numjets = numpy.random.poisson(AVENUMJETS, NUMEVENTS).astype(numpy.int32) # Number of jets in each event
jets_stops = numpy.cumsum(numjets).astype(numpy.int32)                                      # Stops array
jets_starts = numpy.zeros_like(jets_stops)                              # Starts array
jets_starts[1:] = jets_stops[:-1]

In [3]:
# Form a pointer array to store the results
# One for sequential version. One for CUDA, for checking
pointer_seq = numpy.empty(jets_stops[-1], dtype=numpy.int32)
pointer_cuda = numpy.empty(jets_stops[-1], dtype=numpy.int32)

In [4]:
# Sequential evaluation
def parent(starts, stops, pointers):
    for i in range(len(starts)):
        pointers[starts[i]:stops[i]] = i
parent(jets_starts, jets_stops, pointer_seq)

In [5]:
# Let's put the arrays to GPU

gpu_starts = gpuarray.to_gpu(jets_starts)
gpu_stops = gpuarray.to_gpu(jets_stops)
gpu_pointer = gpuarray.to_gpu(pointer_cuda)

# Calculate the counts array on GPU
gpu_counts = gpu_stops-gpu_starts

#### Idea behind calculation

It borrows the idea from combinations(product). Index `i` iterates over all events, and index `j` will iterate over `starts[i]:stops[i]`. 

For each `(i,j)` pair, `starts[i]+j` gives the required offset for `pointer` at event index `i`. It then just stores the event index at that location.

The cpp code:

In [6]:
mod = SourceModule('''
__global__ void parent(int* starts,int* pointer,int* NUMEVENTS,int* counts)
{
    int i = blockIdx.x*blockDim.x + threadIdx.x;
    int j = blockIdx.y*blockDim.y + threadIdx.y;
    if (i<NUMEVENTS[0])
    {
        if (j<counts[i])
        {
            pointer[starts[i]+j] = i;
        }
    }
}
''')

In [7]:
func = mod.get_function("parent")

In [8]:
# Additional gpu data needed
arr_numevents = numpy.array([NUMEVENTS]).astype(numpy.int32)

In [9]:
# Evaluate
func(gpu_starts,gpu_pointer,cuda.In(arr_numevents),gpu_counts,block = (32,32,1), grid=(100, 10,1))

In [12]:
# Compare with sequential
# First copy data to host
host_pointer_data = gpu_pointer.get()
# Compare. Will not print anything if equal
assert(host_pointer_data.all()==pointer_seq.all())

In [21]:
# Print some values
for i in range(6):
    print("\nEvent: {} \n Pointer: {}".format(i, host_pointer_data[jets_starts[i]:jets_stops[i]]))


Event: 0 
 Pointer: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

Event: 1 
 Pointer: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1]

Event: 2 
 Pointer: [2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

Event: 3 
 Pointer: [3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3]

Event: 4 
 Pointer: [4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4]

Event: 5 
 Pointer: [5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5]
