Changes

Major

upgraded dependency to csnlp==1.5.8
reworked inner computations both in LstdQlearningAgent and LstdDpgAgent for performance and adherence to theory
reworked inner workings of callbacks: now they are stored in an internal dict, so easier to debug
fixed disrupting bug in the computations of the parameters' bounds for a constrained update
implemented the mpcrl.optim sub-module: it contains different optimizers such as
- Stochastic Gradient Descent
- Newton's Method
- Adam
- RMSprop
moved parameters' constrained update solver to OSQP (QRQP was having scaling issues)
removed LearningRate class
implemented schedulers.Chain, allowing to chain multiple schedulers into a single one

added possibility to pass integer argument to experience. This will create a buffer with the specified size
improvements to mpcrl.util.math
improvements to wrappers.agents.Log (now uses lazy logging)
fixed bugs on on_episode_end and on_episode_start callback hook
improvements to examples