# `parllel`

`parllel` is a modular, flexible framework for developing performant algorithms in Reinforcement Learning.

Rather than being a library of algorithm implementations, it instead provides primitive types that are useful for research in RL, then makes it easier to optimize algorithms for speed. `parllel` supports recurrent agents/algorithms, visual RL, multi-agent RL, and RL on graphs/pointclouds.

## Arrays

One of the most fundamental types in `parllel` is the `Array`. It's similar to a `numpy` array, but is intended for data storage rather than math operations.

In [6]:
import numpy as np
from parllel import Array

array = Array(batch_shape=(5, 4), dtype=np.float32)  # use batch_shape instead of shape
array[:] = np.arange(4)
print(array)

Array([[0.,1.,2.,3.],
       [0.,1.,2.,3.],
       [0.,1.,2.,3.],
       [0.,1.,2.,3.],
       [0.,1.,2.,3.]], storage=local, dtype=float32, padding=0)


To do math operations, we can get a view as an ndarray (this operation does not copy the data).

In [7]:
ndarray = array.to_ndarray()
print(ndarray.sum(axis=-1))

[6. 6. 6. 6. 6.]


In RL, we often need to save state between batches/iterations. Since this state is often associated with time (e.g. next_observation, previous_action, etc.), a convenient place to store this information is in the array itself. For this, we use padding.

In [9]:
array = Array(batch_shape=(5, 4), dtype=np.float32, padding=1)
array[:] = np.arange(4)
array[5] = [4, 5, 6, 7]  # note that this appears to be out of bounds!
print(array)
print(array[5])
print(array[array.last + 1])

Array([[0.,1.,2.,3.],
       [0.,1.,2.,3.],
       [0.,1.,2.,3.],
       [0.,1.,2.,3.],
       [0.,1.,2.,3.]], storage=local, dtype=float32, padding=1)
Array([4.,5.,6.,7.], storage=local, dtype=float32, padding=1)
Array([4.,5.,6.,7.], storage=local, dtype=float32, padding=1)


The `array.last + 1` is just syntactic sugar that makes it clear we are writing beyond the end of the array.

The values written into the padding are not "visible" to normal operations, or when converting to a numpy array. If we want to access them in the next iteration, we can call `rotate()`.

In [11]:
array[...] = 0
array.rotate()
print(array[0])  # [4, 5, 6, 7] has been copied to the 0th position in the array

Array([4.,5.,6.,7.], storage=local, dtype=float32, padding=1)
