compare https://github.com/huggingface/candle/tree/main/candle-examples/examples/reinforcement-learning