# `parllel`

`parllel` is a modular, flexible framework for developing performant algorithms in Reinforcement Learning.

Rather than being a library of algorithm implementations, it instead provides primitive types that are useful for research in RL, then makes it easier to optimize algorithms for speed. `parllel` supports recurrent agents/algorithms, visual RL, multi-agent RL, and RL on graphs/pointclouds.

## Arrays

One of the most fundamental types in `parllel` is the `Array`. It's similar to a `numpy` array, but is intended for data storage rather than math operations.

In [None]:
import numpy as np
from parllel import Array

array = Array(batch_shape=(5, 4), dtype=np.float32)  # use batch_shape instead of shape
array[:] = np.arange(4)
print(array)

To do math operations, we can get a view as an ndarray (this operation does not copy the data).

In [None]:
ndarray = array.to_ndarray()
print(ndarray.sum(axis=-1))

#### padding
In RL, we often need to save state between batches/iterations. Since this state is often associated with time (e.g. next_observation, previous_action, etc.), a convenient place to store this information is in the array itself. For this, we use the `padding` argument.

In [None]:
array = Array(batch_shape=(5, 4), dtype=np.float32, padding=1)
array[:] = np.arange(4)
array[5] = [4, 5, 6, 7]  # note that this appears to be out of bounds!
print(array)
print(array[5])
print(array[array.last + 1])
assert array.last + 1 == 5

The `array.last + 1` is just syntactic sugar that makes it clear we are writing beyond the end of the array.

The values written into the padding are not "visible" to normal operations, or when converting to a numpy array. If we want to access them in the next iteration, we can call `rotate()`.

In [None]:
array[...] = 0
array.rotate()
print(array[0])  # [4, 5, 6, 7] has been copied to the 0th position in the array

#### full_size
For e.g. replay buffers, we may want to allocate a lot of memory, but only a small window is visible for collecting samples from the environment. This window then slides along the entire replay buffer until its full. We do this using the `full_size` argument.

In [None]:
array = Array(batch_shape=(5, 4), dtype=np.float32, full_size=10)  # replaces leading batch dimension, e.g. 5
array[...] = 7
array.rotate()
array[...] = 42
array.rotate()
print(array)
array.rotate()
print(array)

`padding` and `full_size` can be combined arbitrarily.

#### next & previous
RL is often concerned with comparing a value to its past (or future) values. One example is a replay buffer for SAC, where we need both the observation and the next observation to compute the loss for Q-learning. Because Arrays keep track of their indices, we can conveniently access these through the `next` and `previous` attributes.

In [None]:
array = Array(batch_shape=(5, 4), dtype=np.float32, padding=1)
array[...] = np.arange(np.prod(array.shape)).reshape(array.shape)
array[-1] = np.arange(-4, 0)
print(array)
print("previous:\n", array.previous)
print("next:\n", array.next)

This also works with slices and elements of the array, not just with the entire array.

In [None]:
print(array[2])
print("previous: ", array[2].previous)
print("next: ", array[2].next)


#### storage
In RL, we often want to run several environments in parallel for collecting samples faster. In order to avoid copies, we can have these environments write directly to Arrays in shared memory.

In [None]:
import multiprocessing as mp

array = Array(batch_shape=(5, 4), dtype=np.int32, storage="shared")

def write_to_piped_array(pipe):
    arr = pipe.recv()
    arr[...] = 42
from parllel.patterns import write_to_piped_array

parent_pipe, child_pipe = mp.Pipe()
p = mp.Process(target=write_to_piped_array, args=(child_pipe,))
p.start()
parent_pipe.send(array[np.array([1, 3]), np.array([0, 2])])  # unlike for ndarrays, this does not produce a copy
p.join()

print(array)

# array.close()  # always close arrays allocated in shared memory