# Step 1. Loading Environment

Before starting running this notebook, please make sure you have finished all installments in the [preparation document](https://github.com/ZeratuuLL/Reinforcement-Learning/blob/master/Navigation/Preparation.md).

Run the following blocks will initialize the environment. A window should appear where you can see what the agent sees.

In [None]:
from unityagents import UnityEnvironment
import numpy as np

In [None]:
env = UnityEnvironment(file_name="/Users/lifengwei/gym/deep-reinforcement-learning/p1_navigation/Banana.app")

In [None]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

In [None]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

# Step 2. Watching a random agent

In [None]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
step = 0
while True:
    step +=1
    action = np.random.randint(action_size)        # select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}, Total step is {}".format(score,step))

When finished, you can close the environment.

In [None]:
#env.close()

# Step 3. Set up a trainable agent

The codes for the agent and neural network are in other files. Run the next two blocks **twice** to import and run the code here. Be sure that those two .py files are in the same directory as this .ipynb file.

The first time will load the code and the second time will actually run the code.

I tried to import directly but somehow it does not work and always report me an error.

In [None]:
%load Net.py

In [None]:
%load RL_Agent.py

In [None]:
agent=Agent(state_size, action_size, 0)

# Step 4. Load the trained agent and watch!

Download the trained weights and save it to the same direction as this .ipynb file and you can start watching!

In [None]:
agent.qnetwork_local.load_state_dict(torch.load('checkpoint.pth'))

In [None]:
env_info = env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    state = torch.from_numpy(state).float().unsqueeze(0).to(device)        
    action_values = agent.qnetwork_local(state)
    action=np.argmax(action_values.cpu().data.numpy())# select an action
    env_info = env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

After you have enjoyed, you can close the environment.

In [None]:
env.close()