# Deep Reinforcement Learning Nanodegree, Project 1

---

This notebook uses the Unity ML-Agents environment for the first project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).

In [None]:
import sys
from collections import deque

import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from unityagents import UnityEnvironment

## Enviroment configuration

**_Before running the code cells below_**, change the `ENVIRONMENT_PATH` to match the location of the Unity environment (you may use the `../environments/` folder for that).

In [None]:
ENVIRONMENT_PATH = "../environments/Banana.app"

In [None]:
SRC_PATH = "../src"
MODEL_CHECKPOINT_PATH = "../models/drlnd_p1_model.pth"

In [None]:
sys.path.append(SRC_PATH)

In [None]:
from agents import DQNAgent
from environments import UnityEnvWrapper

## Create and explore the Unity environment

In [None]:
unity_env = UnityEnvironment(file_name=ENVIRONMENT_PATH)

Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [None]:
brain_name = unity_env.brain_names[0]
brain = unity_env.brains[brain_name]

In [None]:
# reset the environment
env_info = unity_env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

## Training an agent

We wrap the Unity environment to be compatible with Gym. That way we don't need to change the agent's implementation.

In [None]:
env = UnityEnvWrapper(unity_env)

Instantiate a `DQNAgent` and learn on the given environment:

In [None]:
agent = DQNAgent(state_size=state_size, action_size=action_size, seed=0)

In [None]:
scores = agent.learn(environment=env, model_checkpoint_path=MODEL_CHECKPOINT_PATH)

### Scores

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.arange(len(scores))
plt.plot(x, scores)
plt.plot(x, pd.Series(scores).rolling(100).mean())
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show();

## Load model and watch a trained agent

In [None]:
agent = DQNAgent(state_size=state_size, action_size=action_size).load_model(MODEL_CHECKPOINT_PATH)

In [None]:
state = env.reset(train_mode=False)
score = 0
while True:
    action = agent.act(state)
    next_state, reward, done = env.step(action)
    score += reward
    state = next_state
    if done:
        break
    
print("Score: {}".format(score))

In [None]:
env.close()