# Reinforcement learning using stable baselines

We first need to install the dependencies in the requirements.txt file. Please use python 3.10 to avoid any issues. I would recommend using a virtual environment, which you can simply create on windows by doing `python -m venv .\venv`, then activating the venv using `.\venv\Scripts\Activate.ps1` and finally installing the required dependencies using `pip install -r requirements.txt`. Once we've done that, let's start by importing the modules that we are going to use.

In [None]:
import gym

import numpy as np
from stable_baselines3 import DQN
from stable_baselines3.common.evaluation import evaluate_policy

Then we create our DQN model (Deep Q Network), which uses neural networks and the reinforcemnet learning technique.

In [None]:
model = DQN('MlpPolicy', 'LunarLander-v2', verbose=1, exploration_final_eps=0.1, target_update_interval=250)

A function to see our model in action.

In [None]:
def render_model(model):
    env = gym.make('LunarLander-v2')
    obs = env.reset()
    done = False
    info = None
    while not done:
        action, _states = model.predict(obs)
        obs, rewards, done, info = env.step(action)
        env.render()
    env.close()
    print(f"{rewards = }")

Let's see how our model performs without training.

In [None]:
# Separate env for evaluation
eval_env = gym.make('LunarLander-v2')

# Random Agent, before training
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

We can take a look at the model in action.

In [None]:
render_model(model)

We can see that our untrained model, did not yield very good results (it did not do anything!). So, let's train it.

In [None]:
# Train the agent
model.learn(total_timesteps=int(1 * 1e5))
# del model  # delete trained model to demonstrate loading

Now, we can see the performance of our model as follows.

In [None]:
# Evaluate the trained agent
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

Looks pretty good! Let's see it in action.

In [15]:
render_model(model)

rewards = 100


We can also save and load the model for production.

In [None]:
del model
model.save("dqn_lunar_model") # we can save the model
model = DQN.load("dqn_lunar_model") # then load it in another place for production