Summary
Support entropy loss. This would require adding an entropy loss coefficient to the trainer.algorithm configuration. Loss calculation support would involve implementing entropy calculation with gradients enabled in ModelWrapper and propagating that to PolicyWorkerBase