Some basic examples for reinforcement learning

Installing Anaconda and Gymnasium

Download and install Anaconda here
Create conda env for managing dependencies and activate the conda env

conda create -n conda_env
conda activate conda_env

Install gymnasium (Dependencies installed by pip will also go to the conda env)

pip install gymnasium[all]
pip install gymnasium[accept-rom-license]

# Try the next line if box2d-py fails to install.
conda install swig

Install ai2thor if you want to run navigation_agent.py

pip install ai2thor==2.4.10

Install torch with either conda or pip

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

pip install torch torchvision torchaudio

Install other dependencies

pip install numpy pandas matplotlib

Examples

Play with the environment and visualize the agent behaviour

import gymnasium as gym
render = True # switch if visualize the agent
if render:
    env = gym.make('CartPole-v0', render_mode='human')
else:
    env = gym.make('CartPole-v0')
env.reset(seed=0)
for _ in range(1000):
    env.step(env.action_space.sample()) # take a random action
env.close()

Random play with CartPole-v0

import gymnasium as gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        print(observation)
        action = env.action_space.sample()
        observation, reward, terminated, truncated, info = env.step(action)
        done = np.logical_or(terminated, truncated)
env.close()

Example code for random playing (Pong-ram-v0,Acrobot-v1,Breakout-v0)

python my_random_agent.py Pong-ram-v0

Very naive learnable agent playing CartPole-v0 or Acrobot-v1

python my_learning_agent.py CartPole-v0

Playing Pong on CPU (with a great blog). One pretrained model is pong_model_bolei.p(after training 20,000 episodes), which you can load in by replacing save_file in the script.

python pg-pong.py

Random navigation agent in AI2THOR

python navigation_agent.py

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
MDP		MDP
RLalgorithm		RLalgorithm
bandits		bandits
derivativefree		derivativefree
modelfree		modelfree
policygradient		policygradient
project_template		project_template
.gitignore		.gitignore
README.md		README.md
_policies.py		_policies.py
my_learning_agent.py		my_learning_agent.py
my_random_agent.py		my_random_agent.py
navigation_agent.py		navigation_agent.py
pg-pong.py		pg-pong.py
pong_model_bolei.p		pong_model_bolei.p

ucla-rlcourse/RLexample

Folders and files

Latest commit

History

Repository files navigation

Some basic examples for reinforcement learning

Installing Anaconda and Gymnasium

Examples

About

Resources

Stars

Watchers

Forks

Languages