[bug report] CSV table calculated by ./replay/gather_results.py is wrong (reward underestimated) #51

ncble · 2019-06-14T14:03:32Z

Describe the bug
In the current master version, the script ./replay/gather_results.py is used to gather experiments and produce CSV table. Recently, when I tried to reproduce the results of your paper: Decoupling feature extraction from policy learning ..., I found that my rewards results are much better than yours (especially the Table 3 in page 7). This table compares the mean reward performance in RL (using PPO) in robotic arm (random target) environment (aka KukaButton with random target). I dug into the code, I did find the bug:

robotics-rl-srl/replay/gather_results.py

Lines 136 to 140 in 1ab1bd3

    
           # make sure there is data here, and that there it is above the minimum timestep threashold 
        
           if run_acc is not None and (args.min_timestep is None or np.sum(run_acc[:, 0]) > args.min_timestep): 
        
               run_acc[:, 1] = run_acc[:, 1] / len(monitor_files) 
        
               run_acc[:, 0] = np.cumsum(run_acc[:, 0])

@kalifou has already confirmed this problem.

Explanation
It's a rounding problem. Line 138 actually performs "floor" rather than float division (at least for KuKaButton, and maybe for the other environments too), since run_acc is an array of dtype int64.

Code example
The problem actually comes from numpy. The following code reproduces this phenomenon:

import numpy as np
A = np.arange(10, dtype=np.int64)
print(A[:]/10) # np.array([0.0, 0.1, ..., 0.9])
A[:] = A[:] / 10 
print(A) # np.array([0, 0, ..., 0])

One funny thing is that A = A / 10 works (not as "floor"), but not A[:] = A[:] / 10.

Solution

Use my code instead: ./replay/postprocessing_logs.py (temporary name). My code can directly produce the Latex table and deal with the most of situations (heterogeneous data: different number of experiments, different lengths, scalable "checkpoints" (timesteps), different SRL models)

The following picture is a demo of my code:

when there is only one experiment: no confidence interval
when there are several experiments: 95% confidence interval is estimated
when RL training of a SRL model was stopped accidentally, a "-" will be put
don't need to specify the SRL models, the folders are searched automatically.
the "checkpoints" [1e6, 2e6, 3e6, 4e6, 5e6] can be changed by user. (put M for million, K for thousand)
save the result to .tex file (Latex table).

Question

Are there similar problems elsewhere in the toolbox ?

The text was updated successfully, but these errors were encountered:

ncble mentioned this issue Jul 16, 2019

A major (performance) update on the submodule: srl-zoo; fixes several issues #50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug report] CSV table calculated by ./replay/gather_results.py is wrong (reward underestimated) #51

[bug report] CSV table calculated by ./replay/gather_results.py is wrong (reward underestimated) #51

ncble commented Jun 14, 2019 •

edited

[bug report] CSV table calculated by ./replay/gather_results.py is wrong (reward underestimated) #51

[bug report] CSV table calculated by ./replay/gather_results.py is wrong (reward underestimated) #51

Comments

ncble commented Jun 14, 2019 • edited

Question

ncble commented Jun 14, 2019 •

edited