striatum

Reinforcement Learning test-bed for comparing multiple policies, environments and agents fully compatible with gym.openai.com

Basic usage

from striatum import TestBed
from striatum.policies import EpsilonGreedy
from striatum.environments import MultiArmedBandit
from striatum.analyses import AverageRewardPerStep, PercentageOptimalAction

test = TestBed({'policy': EpsilonGreedy(epsilon=0.1),                
                'env': MultiArmedBandit(n_arms=10)},
                analyses=[AverageRewardPerStep(), 
                          PercentageOptimalAction()])
                
test.run(n_steps=1_000, n_episodes=1_000).plot()

Emphasis on generative processes

Most experiments can be described with a generative process. We use dask custom graphs together with sklearn's double underscore notation to incorporate this into striatum. For example, consider the example shown above but with the added complexity of varying the number of arms between episodes.

def epsilon(n_arms):
  return n_arms/100

test = TestBed({'policy': EpsilonGreedy(),                
                'env': MultiArmedBandit(),
                'env__n_arms': (np.random.choice, [9, 10, 11]),
                'policy__epsilon': (epsilon, 'env__n_arms')},
                analyses=[AverageRewardPerStep(),
                          PercentageOptimalAction()])
                
test.run(n_steps=1_000, n_episodes=1_000).plot()

For each episode (in this case, n_episodes=1_000 times) the graph represented by the dictionary passed to TestBed will be resolved like so:

>>> env__n_arms = np.random.choice([9, 10, 11])
>>> policy__epsilon = epsilon(env__n_arms)
>>> policy = EpsilonGreedy(epsilon=policy__epsilon)
>>> env = MultiArmedBandit(n_arms=env__n_arms)

Flexible analyses

def epsilon(n_arms):
  return n_arms/100

test = TestBed({'policy': EpsilonGreedy(),                
                'env': MultiArmedBandit(),
                'env__n_arms': (np.random.choice, [9, 10, 11]),
                'policy__epsilon': (epsilon, 'env__n_arms')},
                analyses=[AverageRewardPerStep(by='env__n_arms'), 
                          PercentageOptimalAction(by='env__n_arms')])
                
test.run(n_steps=1_000, n_episodes=1_000).plot()

Etymology (why the name?)

Functionally, the striatum coordinates multiple aspects of cognition, including both motor and action planning, decision-making, motivation, reinforcement, and reward perception.

Source: Multiple, all can be found at https://en.wikipedia.org/wiki/Striatum

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
docs/images		docs/images
striatum		striatum
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

striatum

Basic usage

Emphasis on generative processes

Flexible analyses

Etymology (why the name?)

About

Releases

Packages

Languages

License

dsevero/striatum

Folders and files

Latest commit

History

Repository files navigation

striatum

Basic usage

Emphasis on generative processes

Flexible analyses

Etymology (why the name?)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages