In [1]:
import time
import gym
import gym_snake
from stable_baselines3.common.logger import configure

# Read This First
This notebook is created to make it clear how we train and test the OpenAI Gym Snake. Below you will find a Train Params section and a Test Params section. These are the two sections that you mostly will need to change. As our gym gets more complicated, add more params to these sections.

## Train


### Params

In [2]:
train_timesteps = 10000  # Number of steps to train the snake on. One step is one action for snake.
visualize_training = False  # Set to true in order to see game moves in pygame

### Setup

In [3]:
# Create Gym Environment for training
env = gym.make(
    'snake-v0',
    use_pygame=visualize_training  # Means that we do not render the game in pygame during training
)

pygame 2.0.2 (SDL 2.0.16, Python 3.9.7)
Hello from the pygame community. https://www.pygame.org/contribute.html


In [4]:
# Create Model
from stable_baselines3 import A2C  # Can try using different models
model = A2C(
    'MlpPolicy',
    env, 
    verbose=1
)  # TODO: make sure this uses GPU when on server

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


### Run
The actual training part where we run the model in the environment. Cells can be run multiple times as a way to do more training.

In [5]:
t0 = time.time()
model.learn(
    total_timesteps=train_timesteps  # Number of actions the model should take in learning
)
t1 = time.time()
print("Finished training in " + str(round(t1-t0, 2)) + " seconds")

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 25.8     |
|    ep_rew_mean        | -0.947   |
| time/                 |          |
|    fps                | 1146     |
|    iterations         | 100      |
|    time_elapsed       | 0        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -1.38    |
|    explained_variance | 0.317    |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | -0.257   |
|    value_loss         | 0.0611   |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 28.3     |
|    ep_rew_mean        | -0.914   |
| time/                 |          |
|    fps                | 1165     |
|    iterations         | 200      |
|    time_elapsed       | 0        |
|    total_timesteps    | 1000     |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 61.3      |
|    ep_rew_mean        | -0.75     |
| time/                 |           |
|    fps                | 1186      |
|    iterations         | 1500      |
|    time_elapsed       | 6         |
|    total_timesteps    | 7500      |
| train/                |           |
|    entropy_loss       | -1.03     |
|    explained_variance | -1.35e+04 |
|    learning_rate      | 0.0007    |
|    n_updates          | 1499      |
|    policy_loss        | -0.152    |
|    value_loss         | 0.00857   |
-------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 70.3      |
|    ep_rew_mean        | -0.75     |
| time/                 |           |
|    fps                | 1188      |
|    iterations         | 1600      |
|    time_elapsed       | 6         |
|    total_timesteps    | 8000      |
| train/    

## Test
Test the model to see how well it is performing. Also have the option to visualize the result

### Parameters

In [6]:
test_timesteps = 1000  # Number of steps to test the snake on. One step is one action for snake.
visualize_testing = True  # Set to true in order to see game moves in pygame. Should be false if run on server.
model_filename = ""  # Filename to save model under. If empty, model is not saved. Should only do this with production models

### Setup


In [7]:
env = gym.make(
    'snake-v0',
    use_pygame=visualize_testing
)
obs = env.reset()

### Run

In [8]:
scores = []
for i in range(test_timesteps):
    action, _state = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    if done:
        scores.append(env.game.score)
        obs = env.reset()

### Analyze

In [9]:
import numpy as np

s_arr = np.array(scores)
print("Number of completed games: ", len(s_arr))

if len(s_arr) > 0:
    print("High Score over all games: ", np.max(s_arr))
    print("Mean Score over all games: ", np.average(s_arr))
    print("Median Score over all games: ", np.average(s_arr))    

Number of completed games:  0


### Save

In [10]:
if len(model_filename) > 0:
    model.save(model_filename)