RARL attains more reward on vs. all different strength adversaries. The disparity grows as the adversary is adjusted to be weaker. Without any adversarial action, both the control and RARL are able to stay up indefinitely so their reward is equally maxed at 1000. This is with the exception of some outlying failures due to random initialization. These results show that adversarial training indeed increases the performance in adversarial environments (robustness against adversary), without sacrificing performance in non-adversarial environments.
Models, Tensorboard logs, and evaluation logs zipped and attached.