rl-psr

Code for "Reconciling Rewards with Predictive State Representations", A. Baisero and C. Amato, IJCAI 2021.

Instructions

Required Packages

The following packages are required, and can be found on github:

rl-parsers <https://github.com/abaisero/rl-parsers>_
gym-pomdps <https://github.com/abaisero/gym-pomdps>_

Preferably, you should be creating an environment specifically to install the required packages first, then this one, and then run the experiments. For each of the above, in that same order:

move to the package directory, and install using
```
python -m pip install .
```
run the tests to make sure installation was correct
```
python -m unittest discover
```

Experiment Code

The code to replicate the experiments from the IJCAI 2021 paper is contained in the experiments/ folder. Move there, then run the following commands. NOTE: in these scripts, bsr stands for belief-state representation (i.e. POMDP), psr stands for predictive-state representation, and rpsr stands for reward-predictvie state representation.

File pomdps.all.txt contains all the POMDPs tested to verify the significance of the accuracy problem of PSRs; while pomdps.ijcai21.txt contains the 6 domains whose PSRs are non-accurate, for which we could run value iteration, and which form the basis for the more thorough evaluation.

Search for core sets of PSRs and R-PSRs:
```
<pomdps.ijcai21.txt ./search.local
```
This will compute core sets, print their ranks, and store them in cores/.
Compute reward errors of PSRs and R-PSRs w.r.t. the POMDP rewards:
```
<pomdps.ijcai21.txt ./info.local
```
This will compute error measures, print them to standard output, and store them in infos/.
Run POMDP-VI, PSR-VI and R-PSR-VI:
```
<pomdps.ijcai21.txt ./vi.local
```
This will run the value iteration algorithms for 150 iterations, and store the resulting value functions in vfs/. This is the slowest step; it will take many hours if a single machine is used.
Plot a quasi-Bellman-error measure to check for convergence of the value functions:
```
<pomdps.ijcai21.txt ./plot.local
```
This will plot the convergence properties of the value functions, and store them in plots/.
Evaluate the value functions' respective policies:
```
<pomdps.ijcai21.txt ./eval.local
```
This will run the Random, POMDP-VI, PSR-VI and R-PSR-VI policies for 100 episodes of 1000 steps each, calculate the true and estimated returns, and store them in evals/.
Compile the evaluation results into tables:
```
<pomdps.ijcai21.txt ./tables | tee tables.tex
```
This will aggregate the results obtained by the evaluation step, print the results in a tex/table format, and save the results in tables.tex.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
experiments		experiments
rl_rpsr		rl_rpsr
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

rl_rpsr

rl_rpsr

scripts

scripts

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

setup.py

setup.py

Repository files navigation

rl-psr

Instructions

Required Packages

Experiment Code

About

Releases

Packages

Languages

License

abaisero/rl-rpsr

Folders and files

Latest commit

History

Repository files navigation

rl-psr

Instructions

Required Packages

Experiment Code

About

Resources

License

Stars

Watchers

Forks

Languages