# A short tutorial for SMAC

Author: Junyoung Park (Junyoungpark@kaist.ac.kr)  
Original Implemenation is brought from [this link](https://github.com/oxwhirl/smac/blob/master/smac/examples/random_agents.py).

* The code will not run properly on Jupyter notebook. The runnable script is prepared in `random_agnet.py`

In [1]:
from smac.env import StarCraft2Env # Import SMAC env
import numpy as np

## Define a game runner

`StarCraft2Env` is a RL friendly interface of Starcraf II (SC2) and python. Developed by the SMAC team. Among the various input parameters of `StarCraft2Env`, we will tweak few parameters of followings:
* __window_size_x__, __window_size_y__ (Floats) : Two parameters will specify the window size of SC2 launcher. We use 1920/3, 1080/3, (1/3 of Full HD resolution) resepectively.
* __difficulty__ (string casted Integers) : The parameter specifies the difficuly of SC2 environment. You can tweak this paramters when you train your model. In test, you will going to fix this value as **
* __map_name__ (string): The parameter specifies scenario of environment. We will use **

In [2]:
env = StarCraft2Env(map_name="8m",
                    window_size_x=1920/3,
                    window_size_y=1080/3)

## Retrieving meta information about the SC2 env

`StarCraft2Env` supports a function that might be used for specifying the shape of inputs and other paramers of your model. You can instantly access to the meta information of `StarCraft2Env` by calling `env.get_env_info()`

In [3]:
env_info = env.get_env_info()
n_actions = env_info["n_actions"]
n_agents = env_info["n_agents"]

print("The action of each agent: {}".format(n_actions))
print("How many agent will you control (at beginning): {} ".format(n_agents))

The action of each agent: 14
How many agent will you control (at beginning): 8 


In [4]:
n_episodes = 50 # Number of episodes

## Interfacing Neural networks (or any other tensor-ouputed learning method) with SC2

`avail_actions` gives you the agent specific action space. the length of `avail_actions` will vary depending on the scenario. In general (unless you made a modification on the `StarCraft2Env`), the dimension of actions space is
$$\text{Dimension of action space} = 1+1+4+ \text{number of enemy agents}$$

* where the first 1 is for __no_operation__ which is doing nothing
* where the second 1 is for __stop__ (in SC2, stop will cease everything what agent does.)
* where the third 4 is for __move__ (depedning on the actions, the agents will move to the North, South, East, West). Note the amount of moving is determined by `StarCraft2Env`
* where the fourth $\text{Number of enemy agents}$ is for denoting which enemy will attack by the agent.

For instance, if the `avail_actions` of a agnet is [0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
then we can infer that the certrain agent __can__ do 'stop', 'move North', 'move South', 'move East', and 'move West' and __cannot__ do 'no-op', 'attack enemies'.

In [5]:
for e in range(n_episodes):
    env.reset() # Reset everything to make clean start.
    terminated = False
    episode_reward = 0

    while not terminated:
        obs = env.get_obs() # give you a list that contains each agent's observaion
        state = env.get_state() # The global state reserved for centralized exectuion

        actions = []
        for agent_id in range(n_agents):
            
            avail_actions = env.get_avail_agent_actions(agent_id) # give you per-agent actions space mask)
            avail_actions_ind = np.nonzero(avail_actions)[0]  
            action = np.random.choice(avail_actions_ind)
            actions.append(action)

        reward, terminated, _ = env.step(actions)
        episode_reward += reward
    print("Total reward in episode {} = {}".format(e, episode_reward))

env.close() # this will close all connections of SC2 (+ SC2 launcher). make sure to call after you run entire code!

Total reward in episode 0 = 1.875
Total reward in episode 1 = 1.3125
Total reward in episode 2 = 1.3125
Total reward in episode 3 = 1.6875
Total reward in episode 4 = 2.4375
Total reward in episode 5 = 1.6875
Total reward in episode 6 = 2.625
Total reward in episode 7 = 1.3125
Total reward in episode 8 = 1.125
Total reward in episode 9 = 2.0625
Total reward in episode 10 = 1.875
Total reward in episode 11 = 1.875
Total reward in episode 12 = 1.6875
Total reward in episode 13 = 1.875
Total reward in episode 14 = 1.3125
Total reward in episode 15 = 2.4375
Total reward in episode 16 = 1.125
Total reward in episode 17 = 1.5
Total reward in episode 18 = 2.625
Total reward in episode 19 = 1.6875
Total reward in episode 20 = 2.25
Total reward in episode 21 = 1.3125
Total reward in episode 22 = 2.4375
Total reward in episode 23 = 1.6875
Total reward in episode 24 = 1.125
Total reward in episode 25 = 2.0625
Total reward in episode 26 = 2.0625
Total reward in episode 27 = 2.0625
Total reward in 