Reinforcement Learning Algorithms for Episodic MDPs With Finite SA-Spaces Source code for experiments in
UBEV - A More Practical Algorithm for Episodic RL with Near-Optimal PAC and Regret Guarantees
Christoph Dann, Tor Lattimore, Emma Brunskill
https://arxiv.org/abs/1703.07710
For Python implementations of some of the algorithms see https://github.com/iosband/TabulaRL/