DEPRECATED

This package is discontinued. Please check ReinforcementLearning.jl, POMDPs.jl or AlphaZero.jl instead.

Reinforce

Reinforce.jl is an interface for Reinforcement Learning. It is intended to connect modular environments, policies, and solvers with a simple interface.

Packages which build on Reinforce:

AtariAlgos: Environment which wraps Atari games using ArcadeLearningEnvironment
OpenAIGym: Wrapper for OpenAI's python package: gym

Environment Interface

New environments are created by subtyping AbstractEnvironment and implementing a few methods:

reset!(env) -> env
actions(env, s) -> A
step!(env, s, a) -> (r, s′)
finished(env, s′) -> Bool

and optional overrides:

state(env) -> s
reward(env) -> r

which map to env.state and env.reward respectively when unset.

ismdp(env) -> Bool

An environment may be fully observable (MDP) or partially observable (POMDP). In the case of a partially observable environment, the state s is really an observation o. To maintain consistency, we call everything a state, and assume that an environment is free to maintain additional (unobserved) internal state. The ismdp query returns true when the environment is MDP, and false otherwise.

maxsteps(env) -> Int

The terminating condition of an episode is control by maxsteps() || finished(). It's default value is 0, indicates unlimited.

An minimal example for testing purpose is test/foo.jl.

TODO: more details and examples

Policy Interface

Agents/policies are created by subtyping AbstractPolicy and implementing action. The built-in random policy is a short example:

struct RandomPolicy <: AbstractPolicy end
action(π::RandomPolicy, r, s, A) = rand(A)

Where A is the action space. The action method maps the last reward and current state to the next chosen action: (r, s) -> a.

reset!(π::AbstractPolicy) -> π

Episode Iterator

Iterate through episodes using the Episode iterator. A 4-tuple (s,a,r,s′) is returned from each step of the episode:

ep = Episode(env, π)
for (s, a, r, s′) in ep
    # do some custom processing of the sars-tuple
end
R = ep.total_reward
T = ep.niter

There is also a convenience method run_episode. The following is an equivalent method to the last example:

R = run_episode(env, π) do
    # anything you want... this section is called after each step
end

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
examples		examples
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
NEWS.md		NEWS.md
Project.toml		Project.toml
README.md		README.md
appveyor.yml		appveyor.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEPRECATED

Reinforce

Environment Interface

Policy Interface

Episode Iterator

Author: Tom Breloff

About

Releases 5

Packages

Contributors 16

Languages

License

JuliaML/Reinforce.jl

Folders and files

Latest commit

History

Repository files navigation

DEPRECATED

Reinforce

Environment Interface

Policy Interface

Episode Iterator

Author: Tom Breloff

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 16

Languages

Packages