Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug report] CSV table calculated by ./replay/gather_results.py is wrong (reward underestimated) #51

Open
ncble opened this issue Jun 14, 2019 · 0 comments

Comments

@ncble
Copy link
Collaborator

ncble commented Jun 14, 2019

Describe the bug
In the current master version, the script ./replay/gather_results.py is used to gather experiments and produce CSV table. Recently, when I tried to reproduce the results of your paper: Decoupling feature extraction from policy learning ..., I found that my rewards results are much better than yours (especially the Table 3 in page 7). This table compares the mean reward performance in RL (using PPO) in robotic arm (random target) environment (aka KukaButton with random target). I dug into the code, I did find the bug:

# make sure there is data here, and that there it is above the minimum timestep threashold
if run_acc is not None and (args.min_timestep is None or np.sum(run_acc[:, 0]) > args.min_timestep):
run_acc[:, 1] = run_acc[:, 1] / len(monitor_files)
run_acc[:, 0] = np.cumsum(run_acc[:, 0])

@kalifou has already confirmed this problem.

Explanation
It's a rounding problem. Line 138 actually performs "floor" rather than float division (at least for KuKaButton, and maybe for the other environments too), since run_acc is an array of dtype int64.

Code example
The problem actually comes from numpy. The following code reproduces this phenomenon:

import numpy as np
A = np.arange(10, dtype=np.int64)
print(A[:]/10) # np.array([0.0, 0.1, ..., 0.9])
A[:] = A[:] / 10 
print(A) # np.array([0, 0, ..., 0])

One funny thing is that A = A / 10 works (not as "floor"), but not A[:] = A[:] / 10.

Solution

Use my code instead: ./replay/postprocessing_logs.py (temporary name). My code can directly produce the Latex table and deal with the most of situations (heterogeneous data: different number of experiments, different lengths, scalable "checkpoints" (timesteps), different SRL models)

The following picture is a demo of my code:

  • when there is only one experiment: no confidence interval
  • when there are several experiments: 95% confidence interval is estimated
  • when RL training of a SRL model was stopped accidentally, a "-" will be put
  • don't need to specify the SRL models, the folders are searched automatically.
  • the "checkpoints" [1e6, 2e6, 3e6, 4e6, 5e6] can be changed by user. (put M for million, K for thousand)
  • save the result to .tex file (Latex table).
    latex_demo

Question

Are there similar problems elsewhere in the toolbox ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant