## Create the Environment Binary File
Download the unity project from: https://drive.google.com/file/d/1Zjan3xYI-agsXPgfM9AVHc3ySF9j70Rr/view?usp=sharing

* Goto 'build settings' in 'file'
* Select the target platform and architecture
* Click on 'Build'
* Save the file as 'Roller_Ball_Build'
* Copy this notebook to the directory of binary file

## Installation:
Python version: 3.8.10
ML Agents:

    pip install mlagents==0.16.1
    

ML Agents documentation: https://github.com/Unity-Technologies/ml-agents/blob/release_2/docs/Python-API.md


In [1]:
import mlagents
from mlagents_envs.environment import UnityEnvironment as UE
import numpy as np

### Loading the environment

In [2]:
# Change worker ID if currently mentioned worker is busy
env = UE(file_name='Roller_Ball_Build', seed=1, side_channels=[], worker_id=0)

In [3]:
env.reset()

### Get Behaviour Names

In [4]:
behavior_names = env.get_behavior_names()
behavior_name = behavior_names[0]
print(behavior_name)

RollerBall?team=0


### Get Behaviour Specifications

In [5]:
spec = env.get_behavior_spec(behavior_name = behavior_name)
print("Number of observations : ", len(spec.observation_shapes))

Number of observations :  1


### Checking if Action is Continuous or Discrete

In [6]:
if spec.is_action_continuous():
  print("The action is continuous")

if spec.is_action_discrete():
  print("The action is discrete")

The action is continuous


### Sample Observation

In [7]:
decision_steps, terminal_steps = env.get_steps(behavior_name = behavior_name)
print(decision_steps.obs)

[array([[3.9974775, 0.5      , 2.1941023, 0.       , 0.5      , 0.       ,
        0.       , 0.       ]], dtype=float32)]


### Take Random Actions and get Rewards

In [8]:
for episode in range(30):
  env.reset()
  decision_steps, terminal_steps = env.get_steps(behavior_names[0])
  tracked_agent = -1 # -1 indicates not yet tracking
  done = False # For the tracked_agent
  episode_rewards = 0 # For the tracked_agent
  while not done:
    # Track the first agent we see if not tracking
    # Note : len(decision_steps) = [number of agents that requested a decision]
    if tracked_agent == -1 and len(decision_steps) >= 1:
      tracked_agent = decision_steps.agent_id[0]
    # Generate an action for all agents
    action = spec.create_empty_action(len(decision_steps))
    action = np.random.normal(0, 1, size = np.shape(action))
    # Set the actions
    env.set_actions(behavior_name, action)
    # Move the simulation forward
    env.step()
    # Get the new simulation results
    decision_steps, terminal_steps = env.get_steps(behavior_name)
    if tracked_agent in decision_steps: # The agent requested a decision
      episode_rewards += decision_steps[tracked_agent].reward
    if tracked_agent in terminal_steps: # The agent terminated its episode
      episode_rewards += terminal_steps[tracked_agent].reward
      done = True
  print(f"Total rewards for episode {episode+1} is {episode_rewards}")

Total rewards for episode 1 is 1.0
Total rewards for episode 2 is 1.0
Total rewards for episode 3 is 0.0
Total rewards for episode 4 is 0.0
Total rewards for episode 5 is 0.0
Total rewards for episode 6 is 0.0
Total rewards for episode 7 is 1.0
Total rewards for episode 8 is 0.0
Total rewards for episode 9 is 1.0
Total rewards for episode 10 is 0.0
Total rewards for episode 11 is 0.0
Total rewards for episode 12 is 0.0
Total rewards for episode 13 is 1.0
Total rewards for episode 14 is 0.0
Total rewards for episode 15 is 0.0
Total rewards for episode 16 is 0.0
Total rewards for episode 17 is 0.0
Total rewards for episode 18 is 0.0
Total rewards for episode 19 is 0.0
Total rewards for episode 20 is 0.0
Total rewards for episode 21 is 1.0
Total rewards for episode 22 is 0.0
Total rewards for episode 23 is 1.0
Total rewards for episode 24 is 0.0
Total rewards for episode 25 is 0.0
Total rewards for episode 26 is 0.0
Total rewards for episode 27 is 0.0
Total rewards for episode 28 is 1.0
T

In [9]:
env.close()
print("Closed environment")

Closed environment
