# Reusing image memory versus re-allocation
This notebook demonstrates how re-using memory can make processing using pyclesperanto faster.

Let's start with allocating some "image stack" of size 100 MB

In [1]:
import pyclesperanto_prototype as cle
import numpy as np
import time

# config
num_iterations = 10
num_tests = 10

# generate data; 100 MB
image = np.random.random([100, 1024, 1024])
print("Image size: " + str(image.shape))

# push image to GPU memory
flip = cle.push_zyx(image)
# create output image in GPU memory of the same size
flop = cle.create_like(flip)

Image size: (100, 1024, 1024)


# Reusing memory: The flip-flop strategy
A workflow that uses the minimum amount of image memory, reuses images. For example a chain of operations `a`, `b`, and `c` can process two images `flip` and `flop` in this order: 

```
a(flip, flop)
b(flop, flip)
c(flip, flop)
```

In that way, no image memory needs to be allocated while processing this little workflow.

Let's simulate that with an loop:

In [2]:
def reuse_memory(flip, flop):
    
    #inner loop with time measurement
    start = time.time()
    for i in range(0, num_iterations):
        cle.maximum_sphere(flip, flop, radius_x=10, radius_y=10, radius_z=0)
        cle.minimum_sphere(flop, flip, radius_x=10, radius_y=10, radius_z=0)
    end = time.time()

    print("Flip-flop took " + str(end - start) + "s")

In [3]:
reuse_memory(flip, flop)

Flip-flop took 0.28287529945373535s


## Re-allocating memory
If we allocate a new output image with every step, we actually spend time for allocation. 

Let's see how much! We use the same loop as above and call the same functions, but we call them differently:

In [4]:
def reallocate_memory(flip, flop):
    
    #inner loop with time measurement
    start = time.time()
    for i in range(0, num_iterations):
        flop = cle.maximum_sphere(flip, radius_x=10, radius_y=10, radius_z=0)
        flip = cle.minimum_sphere(flop, radius_x=10, radius_y=10, radius_z=0)
    end = time.time()

    print("Re-alloc took " + str(end - start) + "s")

In [5]:
reallocate_memory(flip, flop)

Re-alloc took 7.786501169204712s


In [6]:
flip = None
flop = None

## Intel UHD GPU
We execute the same experiments a couple of times in a loop:

In [7]:
# initialize GPU
cle.select_device("Intel")
print("Used GPU: " + cle.get_device().name)

# push image to GPU memory
flip = cle.push_zyx(image)
# create output image in GPU memory of the same size
flop = cle.create_like(flip)

print("Image size in GPU: " + str(flip.shape))

Used GPU: Intel(R) UHD Graphics 620
Image size in GPU: (100, 1024, 1024)


In [8]:
for j in range(0, num_tests):
    reuse_memory(flip, flop)

LogicError: when processing argument #1 (1-based): clSetKernelArg failed: INVALID_MEM_OBJECT

In [9]:
for j in range(0, num_tests):
    reallocate_memory(flip, flop)

LogicError: when processing argument #1 (1-based): clSetKernelArg failed: INVALID_MEM_OBJECT

## NVidia RTX GPU
We execute the same experiments a couple of times in a loop:

In [10]:
# initialize GPU
cle.select_device("RTX")
print("Used GPU: " + cle.get_device().name)

# push image to GPU memory
flip = cle.push_zyx(image)
# create output image in GPU memory of the same size
flop = cle.create_like(flip)

print("Image size in GPU: " + str(flip.shape))

Used GPU: GeForce RTX 2080 Ti
Image size in GPU: (100, 1024, 1024)


In [11]:
for j in range(0, num_tests):
    reuse_memory(flip, flop)

LogicError: when processing argument #1 (1-based): clSetKernelArg failed: INVALID_MEM_OBJECT

In [10]:
for j in range(0, num_tests):
    reallocate_memory(flip, flop)

LogicError: when processing argument #1 (1-based): clSetKernelArg failed: INVALID_MEM_OBJECT

## Conclusions
As you can see, 