A version of optimistic least-squares policy iteration (LSPI) for the classic discrete-time linear quaratic regulation (LQR) problem published in paper:
Bo Pang, and Zhong-Ping Jiang. "Robust reinforcement learning: A case study in linear quadratic regulation." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 10. 2021.
Implements the main O-LSPI algorithm.
Implements the data collection step for the learning algorithm.
Collects the data for the experiment in the paper.
Draws the Fig. 1 in the paper.
Auxilliary functions for vector/matrix conversions.