Skip to content

Using Reinforcement Learning to train a piece of floor to balance a ball over it. The PPO (Proximal Policy Optimization) algorithm is used to train the agent. Training process took around half hour with Tensorflow API in CPU and was trained up to 500,000 steps..

Notifications You must be signed in to change notification settings

JaxSulav/Rebalance-----ML-in-Unity

Repository files navigation

About

Using Reinforcement Learning to train a piece of floor to balance a ball over it. The PPO (Proximal Policy Optimization) algorithm is used to train the agent. Training process took around half hour with Tensorflow API in CPU and was trained up to 500,000 steps..

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published