# Navigation 
## Installing Unity
You can use [Installation Instructions](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md)
from the ML Agents repo to get started.

##### This Banana Environment works only on API version 4 of Unity... So install mlagents version 4 or downgrade using ```pip install mlagents==0.4```

$Note$ : This works only with Python 3.6 for now so you can go ahead and run ```conda create -n <env_name> python=3.6```
and set up TensorFlow in it. Also you need to edit the *setup.py* files from the ML Agents repo to fix compatibility 
issues. You can refer to [my edited repo](https://github.com/Syzygianinfern0/ml-agents) to understand if you didn't.

Let's get started!!

# Imports

In [8]:
from unityagents import UnityEnvironment
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from time import sleep
from IPython.display import clear_output
%matplotlib inline

In [3]:
unity_env = UnityEnvironment('Banana_Windows_x86_64/Banana.exe')

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: BananaBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 37
        Number of stacked Vector Observation: 1
        Vector Action space type: discrete
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


# Understanding Unity
Environments contain brains which are responsible for deciding the actions of their associated agents. Here we check 
for the first brain available, and set it as the default brain we will be controlling from Python.

In [4]:
brain_name = unity_env.brain_names[0]
brain = unity_env.brains[brain_name]

In [5]:
# reset the environment
env_info = unity_env.reset(train_mode=True)[brain_name]

# number of agents in the environment
print('Number of agents:', len(env_info.agents))

# number of actions
action_size = brain.vector_action_space_size
print('Number of actions:', action_size)

# examine the state space 
state = env_info.vector_observations[0]
print('States look like:', state)
state_size = len(state)
print('States have length:', state_size)

Number of agents: 1
Number of actions: 4
States look like: [1.         0.         0.         0.         0.84408134 0.
 0.         1.         0.         0.0748472  0.         1.
 0.         0.         0.25755    1.         0.         0.
 0.         0.74177343 0.         1.         0.         0.
 0.25854847 0.         0.         1.         0.         0.09355672
 0.         1.         0.         0.         0.31969345 0.
 0.        ]
States have length: 37


# Random Agent
Now that we have understood our aliases from OpenAi Gym in Unity, all we need to do is port our codes from OpenAI to 
Unity and they will work perfectly fine

In [6]:
env_info = unity_env.reset(train_mode=False)[brain_name] # reset the environment
state = env_info.vector_observations[0]            # get the current state
score = 0                                          # initialize the score
while True:
    action = np.random.randint(action_size)        # select an action
    env_info = unity_env.step(action)[brain_name]        # send the action to the environment
    next_state = env_info.vector_observations[0]   # get the next state
    reward = env_info.rewards[0]                   # get the reward
    done = env_info.local_done[0]                  # see if episode has finished
    score += reward                                # update the score
    state = next_state                             # roll over the state to next time step
    if done:                                       # exit loop if episode finished
        break
    
print("Score: {}".format(score))

Score: 0.0


In [7]:
unity_env.close()

# Lets Begin!
I plan to use a Double DQN for this environment. Maybe in the next try, I will add a Dueling DQN to a policy based
approach. I have created the model in the assets class as we will not be doing the basic stuff again.

In [9]:
from assets.models.dqn import DQN
agent = DQN(action_space=action_size,
            observation_space=state_size)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 24)                912       
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 100       
Total params: 1,012
Trainable params: 1,012
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 24)                912       
_________________________________________________________________
dense_3 (Dense)              (None, 4)                 100       
Total params: 1,012
Trainable params: 1,012
Non-trainable params: 0
_________________________________________________________________


# Training
How will our first shot at training in Unity work? I hope it does well.
