# Backward transform

The real-to-complex transforms are not documented very well. The guess is that `r2cf` and `r2cb` are the forward and backward transforms. The both take **two** arguments for real input data. There are two reasonable possibilities: even/odd or lower half/upper half of the data is passed through different pointers. It turns out it is even/odd, which also makes the most sense. To see this, we first test the backward transform.

Clean the Noodles cache:

In [1]:
!rm -f lib/db

In [1]:
import pyopencl as cl
import numpy as np
from copy import copy
from genfft.opencl import run, single_stage_r2c, default_config

Generate the codelet and setup OpenCL

In [14]:
ctx = cl.create_some_context()
cfg = copy(default_config)
codelet = run(single_stage_r2c(cfg, 16, direction='b', sign=1))
prog = cl.Program(ctx, codelet).build()
queue = cl.CommandQueue(ctx)

Create an array of complex values for which we know the backward transform, and hope that FFTW uses the same data layout as the Numpy FFT does. Important: make sure that we convert to `float32`.

In [15]:
x = np.arange(16, dtype='float32')
y = np.fft.rfft(x).astype('complex64')

In [16]:
mf = cl.mem_flags
y_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=y)
x0_g = cl.Buffer(ctx, mf.WRITE_ONLY, x.nbytes//2)
x1_g = cl.Buffer(ctx, mf.WRITE_ONLY, x.nbytes//2)

We can slice the `cl.Buffer` object to get pointer offsets, but note that these slices are in **bytes**. The strides for the real data are 1, and for the complex data, since we interlaced real and imag part, 2. The last three arguments give the amount of times the transform should be repeated, and the respective strides for the outer loop.

In [17]:
prog.r2cb_16(queue, (1,), None, x0_g, x1_g, y_g, y_g[4:],
             np.int32(1), np.int32(2), np.int32(2),
             np.int32(1), np.int32(1), np.int32(1))

<pyopencl._cl.Event at 0x7f3810f19fb0>

Now we read out the results in both `x0` and `x1`.

In [18]:
x0 = np.zeros(8, dtype='float32')
x1 = np.zeros(8, dtype='float32')
cl.enqueue_copy(queue, x0, x0_g)
cl.enqueue_copy(queue, x1, x1_g)

<pyopencl._cl.NannyEvent at 0x7f3810f29fb0>

These are the even and odd parts of the real data, so we can join them using `np.c_` and flattening.

In [19]:
np.c_[x0, x1].flatten() / 16

array([ 0.       ,  1.0000002,  1.9999998,  3.0000005,  4.       ,
        5.       ,  6.       ,  7.       ,  8.       ,  9.       ,
       10.       , 11.       , 12.       , 13.       , 14.       ,
       15.       ], dtype=float32)

## Forward transform

The forward transform should now be easy.

In [21]:
codelet = run(single_stage_r2c(cfg, 16, direction='f'))
prog = cl.Program(ctx, codelet).build()
queue = cl.CommandQueue(ctx)

In [22]:
mf = cl.mem_flags
x = np.arange(16, dtype='float32')
x_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=x)
y_g = cl.Buffer(ctx, mf.WRITE_ONLY, x.nbytes + 8)

In [23]:
prog.r2cf_16(queue, (1,), None, x_g, x_g[4:], y_g, y_g[4:],
             np.int32(2), np.int32(2), np.int32(2),
             np.int32(1), np.int32(1), np.int32(1))

<pyopencl._cl.Event at 0x7f38311592f0>

In [31]:
y = np.zeros(9, dtype='complex64')
cl.enqueue_copy(queue, y, y_g)
np.fft.irfft(y)

array([ 0.        ,  1.00000002,  1.99999978,  3.0000002 ,  3.99999976,
        5.0000003 ,  6.00000002,  7.00000012,  8.        ,  8.99999988,
        9.99999998, 10.9999997 , 12.00000024, 12.9999998 , 14.00000022,
       14.99999998])

In [27]:
np.abs(np.fft.rfft(x) - y).max()

1.7893790626999362e-06

Whoohoo!