# Unity ML-Agents Toolkit for Google Colab
## Environment Basics

This notebook contains a walkthrough of the basic functions of the Python API for the Unity ML-Agents toolkit in Google Colab environment. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Create-New.md).

## 1. Required system level packages & download repository to user workspace


In [2]:
!git clone --branch release_1 https://github.com/Unity-Technologies/ml-agents.git 
!git clone https://github.com/ugurkanates/MLAgents-Google-Collab.git
!apt install python-opengl
!apt install ffmpeg
!apt install xvfb
!apt install x11-utils


Cloning into 'ml-agents'...
remote: Enumerating objects: 86, done.[K
remote: Counting objects:   1% (1/86)[Kremote: Counting objects:   2% (2/86)[Kremote: Counting objects:   3% (3/86)[Kremote: Counting objects:   4% (4/86)[Kremote: Counting objects:   5% (5/86)[Kremote: Counting objects:   6% (6/86)[Kremote: Counting objects:   8% (7/86)[Kremote: Counting objects:   9% (8/86)[Kremote: Counting objects:  10% (9/86)[Kremote: Counting objects:  11% (10/86)[Kremote: Counting objects:  12% (11/86)[Kremote: Counting objects:  13% (12/86)[Kremote: Counting objects:  15% (13/86)[Kremote: Counting objects:  16% (14/86)[Kremote: Counting objects:  17% (15/86)[Kremote: Counting objects:  18% (16/86)[Kremote: Counting objects:  19% (17/86)[Kremote: Counting objects:  20% (18/86)[Kremote: Counting objects:  22% (19/86)[Kremote: Counting objects:  23% (20/86)[Kremote: Counting objects:  24% (21/86)[Kremote: Counting objects:  25% (22/86)[Kremote: Countin

## 2. Installation of Python packages 


In [3]:
!pip install pyvirtualdisplay
!pip install -e ml-agents/ml-agents-envs/
!pip install -e ml-agents/ml-agents/
!pip install -e ml-agents/gym-unity/


Collecting pyvirtualdisplay
  Downloading https://files.pythonhosted.org/packages/69/ec/8221a07850d69fa3c57c02e526edd23d18c7c05d58ed103e3b19172757c1/PyVirtualDisplay-0.2.5-py2.py3-none-any.whl
Collecting EasyProcess
  Downloading https://files.pythonhosted.org/packages/48/3c/75573613641c90c6d094059ac28adb748560d99bd27ee6f80cce398f404e/EasyProcess-0.3-py2.py3-none-any.whl
Installing collected packages: EasyProcess, pyvirtualdisplay
Successfully installed EasyProcess-0.3 pyvirtualdisplay-0.2.5
Obtaining file:///content/ml-agents/ml-agents-envs
Installing collected packages: mlagents-envs
  Running setup.py develop for mlagents-envs
Successfully installed mlagents-envs
Obtaining file:///content/ml-agents/ml-agents
Installing collected packages: mlagents
  Running setup.py develop for mlagents
Successfully installed mlagents
Obtaining file:///content/ml-agents/gym-unity
Installing collected packages: gym-unity
  Running setup.py develop for gym-unity
Successfully installed gym-unity


*A kernel restart process required after installation in order to system recognize packages that have built*.

*Or you can comment this section and manually restart runtime*

In [0]:
import os
os._exit(00)

## 3. Import python packages


In [0]:
from pyvirtualdisplay import Display
import sys
import os
import random
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import clear_output 
import mlagents_envs
from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfig, EngineConfigurationChannel

*Name of the Unity environment binary to launch*

*Use headless(server) builds from Unity when building an executable*

In [0]:
env_name = "MLAgents-Google-Collab/headless_collab/basic.x86_64"  


*Set permission to use launch file as an executable*


In [4]:
!chmod -R 755 env_name


chmod: cannot access 'env_name': No such file or directory


*Check Python Version*

In [0]:
if (sys.version_info[0] < 3):
    raise Exception("ERROR: ML-Agents Toolkit (v0.3 onwards) requires Python 3")

*Creating virtual display to work on headless server*

In [0]:
dis = Display(visible=0, size=(400, 400))
dis.start()


## 4. Set the port number for Unity - Python socket communication


In [0]:
PORT_NUMBER = 3055


*Set the time scale of the engine*

In [0]:
engine_configuration_channel = EngineConfigurationChannel()
engine_configuration_channel.set_configuration_parameters(time_scale = 3.0)

*Initialize the launcher*

In [0]:
env = UnityEnvironment(file_name=env_name, side_channels = [engine_configuration_channel],base_port=PORT_NUMBER)

*Reset the environment*

In [0]:
env.reset()

*Set the default behaviour spec to work with*

In [0]:
behavior_name = env.get_behavior_names()[0]
behavior_spec = env.get_behavior_spec(behavior_name)
observation_space = behavior_spec.observation_shapes[0]
action_space = behavior_spec.action_shape

*Get the state of the agents*

In [0]:
step_result = env.get_steps(behavior_name)
num_agents = len(step_result[0].obs[0])

*Examine the number of observations per Agent*

In [13]:
print("Number of observations : ", observation_space[0], "Number of Actions : ",action_space)


Number of observations :  20 Number of Actions :  (3,)


*Check if there is any visual observation from agent*

In [14]:
vis_obs = any([len(shape) == 3 for shape in behavior_spec.observation_shapes])
print("Is there a visual observation ?", vis_obs)

Is there a visual observation ? False


*Examine the visual observations*

In [15]:
if vis_obs:
    vis_obs_index = next(i for i,v in enumerate(behavior_spec.observation_shapes) if len(v) == 3)
    print("Agent visual observation looks like:")
    obs = step_result.obs[vis_obs_index]
    plt.imshow(obs[0,:,:,:])
else:
    print("First Agent observation looks like: \n{}".format(step_result[0].obs[0]))


First Agent observation looks like: 
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


## 5.Utility Functions

*If agent reached any terminal state this method will return True*


In [0]:
def check_done(
    step_result : tuple
):
    if len(step_result[1]) != 0:
        return True
    elif len(step_result[1]) == 0:
        return False

*Simple function to plot graphs using matplotlib*

In [0]:
def plot(
    frame_idx: int, 
    scores: list
):
    clear_output(True)
    plt.figure(figsize=(20, 5))
    plt.subplot(131)
    plt.title('frame %s. score: %s' % (frame_idx, np.mean(scores[-10:])))
    plt.plot(scores)
    plt.show()

## 6.Example of an agent playing taking random actions in environment

*You can set how many random episode to play*


In [0]:
RANDOM_EPISODE_NUMBER = 10

*You can set plotting to True for graphics displayed in notebook*

In [0]:
PLOT_GRAPHICS = True

In [22]:
total_frames = 0 
total_rewards = list()
for episode in range(RANDOM_EPISODE_NUMBER):
    env.reset()
    step_result = env.get_steps(behavior_name)
    done = False
    episode_rewards = 0
    while not done:
        total_frames += 1
        if behavior_spec.is_action_continuous():
            action = np.random.randn(num_agents, action_space)       
        if behavior_spec.is_action_discrete():
            branch_size = behavior_spec.discrete_action_branches
            action = np.column_stack([np.random.randint(0,branch_size[i],size=num_agents) for i in range(len(branch_size))])

        env.set_actions(behavior_name,action)
        env.step()
        step_result = env.get_steps(behavior_name)
        done = check_done(step_result)
        if not done:
            next_state = step_result[0].obs[0] 
            episode_rewards += step_result[0].reward[0]
        else:
            next_state = step_result[1].obs[0]
            episode_rewards += step_result[1].reward[0]
    
    print("Total reward this episode: {}".format(episode_rewards))
    total_rewards.append(episode_rewards)

Total reward this episode: -0.5599999818950891
Total reward this episode: 0.030000004917383194
Total reward this episode: 0.9100000113248825
Total reward this episode: -0.4299999848008156
Total reward this episode: -0.8299999497830868
Total reward this episode: -0.49999998323619366
Total reward this episode: -0.8899999745190144
Total reward this episode: 0.6300000175833702
Total reward this episode: -0.1399999912828207
Total reward this episode: -0.009999994188547134
