Random search, hill climbing, policy gradient for CartPole

Simple reinforcement learning algorithms implemented for CartPole on OpenAI gym.

This code goes along with my post about learning CartPole, which is inspired by an OpenAI request for research.

##Algorithms implemented

Random Search: Keep trying random weights between [-1,1] and greedily keep the best set.

Hill climbing: Start from a random initialization, add a little noise evey iteration and keep the new set if it improved.

Policy gradient Use a softmax policy and compute a value function using discounted Monte-Carlo. Update the policy to favor action-state pairs that return a higher total reward than the average total reward of that state. Read my post about learning CartPole for a better explanation of this.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
cartpole-hill.png		cartpole-hill.png
cartpole-hill.py		cartpole-hill.py
cartpole-policygradient.py		cartpole-policygradient.py
cartpole-random-chart.jpg		cartpole-random-chart.jpg
cartpole-random.png		cartpole-random.png
cartpole-random.py		cartpole-random.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Random search, hill climbing, policy gradient for CartPole

About

Releases

Packages

Languages

biyuuu/openai-cartpole

Folders and files

Latest commit

History

Repository files navigation

Random search, hill climbing, policy gradient for CartPole

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages