This is an introductory tutorial to the main features of plangym.

## Working with states

### `reset` and `step` return the environment state

The main difference with the `gym` API is that environment state is considered as important as observations, rewards and terminal flags. This is why plangym incorporates them to the tuples that the environment returns after calling `step` and `reset`:

- The `reset` method will return a tuple of (state, observation) unless you pass `return_state=False` as an argument.

- When `step` is called passing the environment state as an argument it will return a tuple containing `(state, obs, reward, end, info)`

In [1]:
import plangym

env = plangym.make("CartPole-v0")
action = env.action_space.sample()

state, obs = env.reset()
state, obs, reward, end, info = env.step(action, state)

However, if you don't provide the environment state when calling `step`, the returned tuple will match the standard `gym` interface:

In [2]:
env = plangym.make("CartPole-v0")
action = env.action_space.sample()

obs = env.reset(return_state=False)
obs, reward, end, info = env.step(action)

### Accessing and modifying the environment state

You can get a copy of the environment's state calling `env.get_state()`:

In [3]:
state = env.get_state()
state

array([ 0.03145539,  0.17749025,  0.01348916, -0.25611924])

And set the environment state using `env.set_state(state)`

In [4]:
env.set_state(state)
assert (state == env.get_state()).all()

## Step vectorization

All plangym environments offer a `step_batch` method that allows vectorized steps of batches of states and actions. 

Calling `step_batch` with a list of states and actions will return a tuple of lists containing the step data for each of the states and actions provided.

In [7]:
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]

data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(new_states), type(observs)

(list, list)

### Parallel step vectorization using multiprocessing

Passing the argument `n_workers` to `plangym.make` will return an environment that steps a batch of actions and states in parallel using multiprocessing.

In [9]:
env = plangym.make("CartPole-v0", n_workers=2)
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]

data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(env), type(new_states), type(observs)

(plangym.parallel.ParallelEnvironment, list, list)

### Step vectorization using ray

It is possible to use ray actors to step the environment in parallel when calling `step_batch` by passing `ray=True` to `plangym.make`

In [10]:
import ray
ray.init()

env = plangym.make("CartPole-v0", n_workers=2, ray=True)
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]

data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(env), type(new_states), type(observs)

2021-12-13 10:01:47,772	INFO services.py:1247 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


(plangym.ray.RayEnv, list, list)