## AllenActで迷路を解く



## 0. 準備

### 0.1 
AzureクラウドのDLVMを使用しました。

https://azure.microsoft.com/ja-jp/services/virtual-machines/data-science-virtual-machines/

### 0.2 

VMが立ち上がったら、以下のコマンドでAllenAcyをインストールします

conda info -e

conda activate py37_pytorch

mkdir study

cd study

インストールページ　https://allenact.org/installation/installation-allenact/
の案内は、以下のgit cloneですが、エラーが出るので、

git clone git@github.com:allenai/allenact.git

こっちでcloneします。

git clone https://github.com/allenai/allenact.git

で、インストールします
    
cd allenact

pip install -r requirements.txt

### 0.3

その後、JupyterNotebookを立ち上げます。

jupyter notebook

※場所はフォルダ「allenact」にいるようにしま（フォルダpluginsとかを読みに行っているため）。




### 1. 必要なパッケージをimportします

この際、フォルダ「plugins」を読みに行っているので、このファイル（NavigationMiniGrid_ogawa.ipynb）はフォルダ「allenact」の直下に置いてください。

In [1]:
from typing import Dict, Optional, List, Any

import gym
from gym_minigrid.envs import EmptyRandomEnv5x5
from torch import nn
from torch import optim
from torch.optim.lr_scheduler import LambdaLR

from plugins.minigrid_plugin.minigrid_models import MiniGridSimpleConvRNN
from plugins.minigrid_plugin.minigrid_sensors import EgocentricMiniGridSensor
from plugins.minigrid_plugin.minigrid_tasks import MiniGridTaskSampler, MiniGridTask
from core.algorithms.onpolicy_sync.losses.ppo import PPO, PPOConfig
from core.base_abstractions.experiment_config import ExperimentConfig, TaskSampler
from core.base_abstractions.sensor import SensorSuite
from utils.experiment_utils import TrainingPipeline, Builder, PipelineStage, LinearDecay


In [4]:
import torch 
print(torch.__version__)

1.4.0


## 2. 実験の設定をします

In [2]:
class MiniGridTutorialExperimentConfig(ExperimentConfig):

    # [1]
    @classmethod
    def tag(cls) -> str:
        return "MiniGridTutorial_for_Japanese"
    '''実験のタグを設定します（任意）'''
    
    # [2]
    @staticmethod
    def make_env(*args, **kwargs):
        return EmptyRandomEnv5x5()
    '''
    GitHub:gym_minigridの、EmptyRandomEnv5x5()を実験環境とします。
    このタスクは1施行が最大100ステップです。100ステップでゴールでたどり着けないと失敗。
    
    https://github.com/maximecb/gym-minigrid/search?q=EmptyRandomEnv5x5%28%29&unscoped_q=EmptyRandomEnv5x5%28%29
    '''
    
    # [3]
    @classmethod
    def make_sampler_fn(cls, **kwargs) -> TaskSampler:
        return MiniGridTaskSampler(**kwargs)
    '''plugins.minigrid_plugin.minigrid_tasksのMiniGridTaskSamplerを使用します。
    https://github.com/allenai/allenact/blob/master/plugins/minigrid_plugin/minigrid_tasks.py
    '''
    
    # [4] 
    '''TaskSamplerでの1stepでのタスクサンプルの定義です。おまじない的です'''
    def train_task_sampler_args(
        self,
        process_ind: int,
        total_processes: int,
        devices: Optional[List[int]] = None,
        seeds: Optional[List[int]] = None,
        deterministic_cudnn: bool = False,
    ) -> Dict[str, Any]:
        return self._get_sampler_args(process_ind=process_ind, mode="train")

    def valid_task_sampler_args(
        self,
        process_ind: int,
        total_processes: int,
        devices: Optional[List[int]] = None,
        seeds: Optional[List[int]] = None,
        deterministic_cudnn: bool = False,
    ) -> Dict[str, Any]:
        return self._get_sampler_args(process_ind=process_ind, mode="valid")

    def test_task_sampler_args(
        self,
        process_ind: int,
        total_processes: int,
        devices: Optional[List[int]] = None,
        seeds: Optional[List[int]] = None,
        deterministic_cudnn: bool = False,
    ) -> Dict[str, Any]:
        return self._get_sampler_args(process_ind=process_ind, mode="test")

    # [5]
    '''MiniGridTaskSamplerで、呼び出される各タスクのサンプル関数を定義します。
    ■訓練時は後ほど定義する、トータルstep数の間、ずっとstepし続けます（ゴールや失敗時には自動で環境をリセット）。
    そのため、何回タスクが実施されるかは不明です。トータルsteps数を満たすまでタスクを繰り返します。
    ■検証時とテスト時は、途中保存のタイミングで、max_tasks回、タスクを実行しその平均の結果を求めます
    （検証は20回、テスト時は40回）
    '''
    def _get_sampler_args(self, process_ind: int, mode: str) -> Dict[str, Any]:
        """訓練・検証・テストのタスクサンプルの設定を行います。
        # Parameters
        process_ind : index of the current task sampler
        mode:  one of `train`, `valid`, or `test`
        """
        if mode == "train":
            max_tasks = None  # タスクの回数です
            task_seeds_list = None  # no predefined random seeds for training
            deterministic_sampling = False  # randomly sample tasks in training
        else:
            max_tasks = 20 + 20 * (mode == "test")  # タスク回数（検証20回、テスト40回）

            # one seed for each task to sample:
            # - ensures different seeds for each sampler, and
            # - ensures a deterministic set of sampled tasks.
            task_seeds_list = list(
                range(process_ind * max_tasks, (process_ind + 1) * max_tasks)
            )

            deterministic_sampling = (
                True  # deterministically sample task in validation/testing
            )

        return dict(
            max_tasks=max_tasks,  # see above
            # builder for third-party environment (defined below)
            env_class=self.make_env,
            sensors=self.SENSORS,  # sensors used to return observations to the agent
            env_info=dict(),  # parameters for environment builder (none for now)
            task_seeds_list=task_seeds_list,  # see above
            deterministic_sampling=deterministic_sampling,  # see above
        )
    
    
    # [6]
    SENSORS = [
        EgocentricMiniGridSensor(agent_view_size=5, view_channels=3),
    ]

    '''状態観測のSensorsを設定します。このクラスはフォルダpluginsに用意済みです
    from plugins.minigrid_plugin.minigrid_sensors import EgocentricMiniGridSensor
    https://github.com/allenai/allenact/blob/master/plugins/minigrid_plugin/minigrid_sensors.py
    
    view_channels=3は、各迷路のタイルの状態を設定します。
    3の場合は、
    - タイルのタイプ（壁や床やドアやゴールなど11種類）
    - タイルの色（赤や緑など6種類）
    - タイルの状態（open, closed, lockedの3種類）です。
    ※詳細は本notebookの最後記載します。
    
    
    agent_view_size=5は、エージェントが観測できる範囲（横・縦）の長さを示します。
    agent_view_size=5だと、こんな観測範囲です。
    ＊ ＊ ＊ ＊ ＊
    ＊ ＊ ＊ ＊ ＊
    ＊ ＊  A ＊ ＊
    ＊ ＊ ＊ ＊ ＊
    ＊ ＊ ＊ ＊ ＊
    
    ただし、この範囲のうち、自分（Actor）が向いている真横より前のみ見えます(部分観測)。 
    上記でAが上を向いていると、以下の観測範囲となります。ただし、壁などが途中にあると、その先は観測できません。
    
    ＊ ＊ ＊ ＊ ＊
    ＊ ＊ ＊ ＊ ＊
    ＊ ＊  A ＊ ＊
    ｘ ｘ ｘ ｘ ｘ
    ｘ ｘ ｘ ｘ ｘ
    
    なお、行動（Action）は、なお、エージェントができる行動（Action）は、"left", "right", "forward"で、
    それぞれ、「その場で左に向く」、「その場で右に向く」、「1歩前に進む」の3タイプです。
    '''

    # [7]
    @classmethod
    def create_model(cls, **kwargs) -> nn.Module:
        return MiniGridSimpleConvRNN(
            action_space=gym.spaces.Discrete(
                len(MiniGridTask.class_action_names())),
            observation_space=SensorSuite(cls.SENSORS).observation_spaces,
            num_objects=cls.SENSORS[0].num_objects,
            num_colors=cls.SENSORS[0].num_colors,
            num_states=cls.SENSORS[0].num_states,
        )
    '''AllenActに RNNActorCritic()が用意されていて、これを迷路用にチューニングしています。
    https://github.com/allenai/allenact/blob/master/plugins/minigrid_plugin/minigrid_models.py#L141
    
    RNNActorCritic()はリカレントニューラルネットワークRNNを使用したメモリを搭載したタイプのActor-Criticのディープラーニングモデルです。
    RNN型のメモリを使用することで、今回の迷路課題のように部分観測であり、過去の自分の状態と観測した状態の情報を保持できるようにしています。
    '''

   # [8] 
    @classmethod
    def machine_params(cls, mode="train", **kwargs) -> Dict[str, Any]:
        return {
            "nprocesses": 128 if mode == "train" else 16,
            "gpu_ids": [],
        }
    '''
    使用するサブプロセスの数を定義。
    訓練時は128個、検証とテスト時は16個とします。
    gpu_idsのリストを空にして、GPUを使用せず、cpuのみを使用するように定義しています。
    '''
    
    
    # [9]
    '''深層強化学習の訓練・検証・テストのパイプラインを組み立てます。
    今回はPPOを使用します。ここで種々、PPOの設定を与えます。
    ネットワークパラメータの最適化にはAdamを使用し、学習率を線形に小さくしてきます。
    '''
    @classmethod
    def training_pipeline(cls, **kwargs) -> TrainingPipeline:
        ppo_steps = int(150000)
        return TrainingPipeline(
            named_losses=dict(ppo_loss=PPO(**PPOConfig)),  # type:ignore
            pipeline_stages=[
                PipelineStage(loss_names=["ppo_loss"],
                              max_stage_steps=ppo_steps)
            ],
            optimizer_builder=Builder(optim.Adam, dict(lr=1e-4)),
            num_mini_batch=4,
            update_repeats=3,
            max_grad_norm=0.5,
            num_steps=16,
            gamma=0.99,
            use_gae=True,
            gae_lambda=0.95,
            advance_scene_rollout_period=None,
            save_interval=10000,  # 約1万stepに1度、検証、もしくはテストを実施します
            metric_accumulate_interval=1,
            lr_scheduler_builder=Builder(
                LambdaLR, {"lr_lambda": LinearDecay(
                    steps=ppo_steps)}  # type:ignore
            ),
        )


フォルダ「minigrid_jp」を作り、その中に、上記のMiniGridTutorialExperimentConfig()クラスと、その前のimportを記したpythonファイル「minigrid_tutorial_jp.py」を、用意します。


## 3. 実行します

In [3]:
# CPUコア数の確認
import multiprocessing

print(multiprocessing.cpu_count())  # 今回は6だった

6


In [5]:
!python main.py minigrid_tutorial_jp -b minigrid_jp -m 4 -o minigrid_jp -s 12345 

# メインは引数のデフォルトが多いので、インラインでファイル実行にします。

# 「引数の説明」
# -b minigrid_jp は、minigrid_tutorial_jpのあるフォルダを示します。allenact直下に置いています
# -m 4 は、サブプロセスの数です。4に設定しています
# -o は結果出力のフォルダです。inigrid_tutorial_jpのあるフォルダと同じ、フォルダ「minigrid_jp」にしています
# -sは乱数シード値です


09/13 19:58:21 INFO: Running with args Namespace(checkpoint=None, deterministic_agents=False, deterministic_cudnn=False, experiment='minigrid_tutorial_jp', experiment_base='minigrid_jp', extra_tag='', gp=None, log_level='info', max_sampler_processes_per_worker=4, output_dir='minigrid_jp', restart_pipeline=False, seed=12345, skip_checkpoints=0, test_date=None)	[main.py: 242]
09/13 19:58:22 INFO: Git diff saved to minigrid_jp/used_configs/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21	[runner.py: 446]
09/13 19:58:22 INFO: Config files saved to minigrid_jp/used_configs/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21	[runner.py: 478]
09/13 19:58:22 INFO: Using 1 train workers on devices [device(type='cpu')]	[runner.py: 144]
09/13 19:58:22 INFO: Started 1 train processes	[runner.py: 274]
09/13 19:58:22 INFO: Using 1 valid workers on devices [device(type='cpu')]	[runner.py: 144]
09/13 19:58:22 INFO: Started 1 valid processes	[runner.py: 300]
09/13 19:58:23 INFO: valid 0 args {'config': 

09/13 19:58:26 INFO: Starting 1(32-63)-th VectorSampledTask worker with args [{'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7fdb048a65d0>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fdb0483d650>}, {'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7fdb048a65d0>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fdb0483d650>}, {'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor obje

09/13 19:58:26 INFO: Starting 2(64-95)-th VectorSampledTask worker with args [{'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7fdb048a65d0>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fdb0483d650>}, {'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7fdb048a65d0>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fdb0483d650>}, {'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor obje

09/13 19:58:26 INFO: Starting 3(96-127)-th VectorSampledTask worker with args [{'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7fdb048a65d0>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fdb0483d650>}, {'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7fdb048a65d0>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fdb0483d650>}, {'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor obj

09/13 19:58:27 INFO: Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Starting 1-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Starting 2-

09/13 19:58:27 INFO: Starting 25-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Starting 26-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Starting 

09/13 19:58:27 INFO: Starting 15-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b50>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f08c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10350>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Starting 16-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b50>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f08c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10350>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Startin

09/13 19:58:27 INFO: Starting 1-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Starting 2-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Starting 3-

09/13 19:58:27 INFO: Starting 10-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Starting 11-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:27 INFO: Startin

09/13 19:58:28 INFO: Starting 27-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:28 INFO: Starting 28-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9c97f10b10>, 'max_tasks': None, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9c97f06c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f9c97f10310>], 'env_info': {}, 'task_seeds_list': None, 'deterministic_sampling': False}	[vector_sampled_tasks.py: 974]
09/13 19:58:28 INFO: Starting 

09/13 19:58:42 INFO: Starting 3(12-15)-th VectorSampledTask worker with args [{'max_tasks': 20, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7fdb048a65d0>], 'env_info': {}, 'task_seeds_list': [240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259], 'deterministic_sampling': True, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fdb0483e610>}, {'max_tasks': 20, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7fdb048a58c0>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7fdb048a65d0>], 'env_info': {}, 'task_seeds_list': [260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279], 'deterministic_sampling': True, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fdb0483e610>}, {'

09/13 19:58:44 INFO: Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f980973cbd0>, 'max_tasks': 20, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9809734c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f980973c3d0>], 'env_info': {}, 'task_seeds_list': [240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259], 'deterministic_sampling': True}	[vector_sampled_tasks.py: 974]
09/13 19:58:44 INFO: Starting 1-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f980973cbd0>, 'max_tasks': 20, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f9809734c20>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f980973c3d0>], 'env_info': {}, 'task_seeds_list': [260, 261, 

09/13 19:59:28 INFO: train 43008 steps 0 offpolicy: ep_length 4.39 reward 0.961 success 1 lr 7.27e-05 pipeline_stage 0 ppo_loss/action -0.00408 ppo_loss/entropy -0.277 ppo_loss/ppo_total -0.00663 ppo_loss/value 0.000451 total_loss -0.00663 elapsed_time 3.27s approx_fps 626	[runner.py: 588]
09/13 19:59:31 INFO: train 45056 steps 0 offpolicy: ep_length 4.08 reward 0.963 success 1 lr 7.13e-05 pipeline_stage 0 ppo_loss/action -0.00429 ppo_loss/entropy -0.229 ppo_loss/ppo_total -0.00646 ppo_loss/value 0.000235 total_loss -0.00646 elapsed_time 2.73s approx_fps 751	[runner.py: 588]
09/13 19:59:33 INFO: train 47104 steps 0 offpolicy: ep_length 4.35 reward 0.961 success 1 lr 7e-05 pipeline_stage 0 ppo_loss/action -0.00416 ppo_loss/entropy -0.223 ppo_loss/ppo_total -0.00625 ppo_loss/value 0.000288 total_loss -0.00625 elapsed_time 2.64s approx_fps 776	[runner.py: 588]
09/13 19:59:36 INFO: train 49152 steps 0 offpolicy: ep_length 4.58 reward 0.959 success 1 lr 6.86e-05 pipeline_stage 0 ppo_loss/ac

09/13 20:00:32 INFO: train 88064 steps 0 offpolicy: ep_length 4.06 reward 0.963 success 1 lr 4.27e-05 pipeline_stage 0 ppo_loss/action -0.00276 ppo_loss/entropy -0.122 ppo_loss/ppo_total -0.00389 ppo_loss/value 0.000175 total_loss -0.00389 elapsed_time 2.93s approx_fps 699	[runner.py: 588]
09/13 20:00:35 INFO: train 90112 steps 0 offpolicy: ep_length 4.11 reward 0.963 success 1 lr 4.13e-05 pipeline_stage 0 ppo_loss/action -0.00278 ppo_loss/entropy -0.111 ppo_loss/ppo_total -0.00383 ppo_loss/value 0.000129 total_loss -0.00383 elapsed_time 2.98s approx_fps 688	[runner.py: 588]
09/13 20:00:38 INFO: train 92160 steps 0 offpolicy: ep_length 4.07 reward 0.963 success 1 lr 3.99e-05 pipeline_stage 0 ppo_loss/action -0.00146 ppo_loss/entropy -0.115 ppo_loss/ppo_total -0.00254 ppo_loss/value 0.000141 total_loss -0.00254 elapsed_time 2.95s approx_fps 694	[runner.py: 588]
09/13 20:00:39 INFO: valid worker 0 loading checkpoint from minigrid_jp/checkpoints/MiniGridTutorial_for_Japanese/2020-09-13_19

09/13 20:01:40 INFO: train 133120 steps 0 offpolicy: ep_length 3.98 reward 0.964 success 1 lr 1.26e-05 pipeline_stage 0 ppo_loss/action -0.000961 ppo_loss/entropy -0.0909 ppo_loss/ppo_total -0.00182 ppo_loss/value 0.000103 total_loss -0.00182 elapsed_time 3.08s approx_fps 664	[runner.py: 588]
09/13 20:01:41 INFO: valid worker 0 loading checkpoint from minigrid_jp/checkpoints/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21/exp_MiniGridTutorial_for_Japanese__stage_00__steps_000000133120.pt	[engine.py: 299]
09/13 20:01:42 INFO: valid 133120 steps: ep_length 3.8625 reward 0.965237500000003 success 1.0 tasks 320 checkpoint minigrid_jp/checkpoints/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21/exp_MiniGridTutorial_for_Japanese__stage_00__steps_000000133120.pt	[runner.py: 507]
09/13 20:01:44 INFO: train 135168 steps 0 offpolicy: ep_length 3.89 reward 0.965 success 1 lr 1.13e-05 pipeline_stage 0 ppo_loss/action -0.000839 ppo_loss/entropy -0.0864 ppo_loss/ppo_total -0.00166 ppo_loss/value 

09/13 20:02:08 INFO: train worker 0 Closed.	[engine.py: 515]
09/13 20:02:09 INFO: valid worker 0 loading checkpoint from minigrid_jp/checkpoints/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21/exp_MiniGridTutorial_for_Japanese__stage_00__steps_000000151552.pt	[engine.py: 299]
09/13 20:02:10 INFO: valid 151552 steps: ep_length 3.88125 reward 0.965068750000003 success 1.0 tasks 320 checkpoint minigrid_jp/checkpoints/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21/exp_MiniGridTutorial_for_Japanese__stage_00__steps_000000151552.pt	[runner.py: 507]
09/13 20:02:10 INFO: Done	[runner.py: 769]
09/13 20:02:10 INFO: Joining train 0	[runner.py: 812]
09/13 20:02:10 INFO: Closed train 0	[runner.py: 812]
09/13 20:02:10 INFO: Closing valid 0	[runner.py: 812]
09/13 20:02:10 INFO: Joining valid 0	[runner.py: 812]
09/13 20:02:10 INFO: KeyboardInterrupt. Terminating valid worker 0	[engine.py: 1533]
09/13 20:02:10 INFO: SingleProcessVectorSampledTask 0 closing.	[vector_sampled_tasks.py: 959]
09/13 20:

In [6]:
# Tensorboardで訓練と検証の結果を確認を表示します
!tensorboard --logdir minigrid_jp


TensorFlow installation not found - running with reduced feature set.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.3.0 at http://localhost:6006/ (Press CTRL+C to quit)
^C


## テストの実行と結果確認

In [7]:
# テストの実施
# 2020-09-13_19-58-21のような、訓練時のチェックポイント時刻（フォルダ名）を-tに与える
!python main.py minigrid_tutorial_jp -b minigrid_jp -m 4 -o minigrid_jp -s 12345 -t 2020-09-13_19-58-21


09/13 20:21:05 INFO: Running with args Namespace(checkpoint=None, deterministic_agents=False, deterministic_cudnn=False, experiment='minigrid_tutorial_jp', experiment_base='minigrid_jp', extra_tag='', gp=None, log_level='info', max_sampler_processes_per_worker=4, output_dir='minigrid_jp', restart_pipeline=False, seed=12345, skip_checkpoints=0, test_date='2020-09-13_19-58-21')	[main.py: 242]
09/13 20:21:05 INFO: Using 1 test workers on devices [device(type='cpu')]	[runner.py: 144]
09/13 20:21:05 INFO: Started 1 test processes	[runner.py: 349]
09/13 20:21:05 INFO: Running test on 15 steps [10240, 20480, 30720, 40960, 51200, 61440, 71680, 81920, 92160, 102400, 112640, 122880, 133120, 143360, 151552]	[runner.py: 357]
09/13 20:21:05 INFO: Saving metrics in minigrid_jp/metrics/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21/metrics__test_2020-09-13_20-21-05.json	[runner.py: 372]
09/13 20:21:05 INFO: Using checkpoint start time 2020-09-13_19-58-21	[runner.py: 389]
09/13 20:21:07 INFO: test 

09/13 20:21:07 INFO: Starting 3(12-15)-th VectorSampledTask worker with args [{'max_tasks': 40, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f0bca296830>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f0bca297650>], 'env_info': {}, 'task_seeds_list': [480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519], 'deterministic_sampling': True, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f0bca22e690>}, {'max_tasks': 40, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f0bca296830>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f0bca297650>], 'env_info': {}, 'task_seeds_list': [520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 5

09/13 20:21:09 INFO: Starting 1-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f36cedebcd0>, 'max_tasks': 40, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f36cede6b90>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors.EgocentricMiniGridSensor object at 0x7f36cedeb4d0>], 'env_info': {}, 'task_seeds_list': [360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399], 'deterministic_sampling': True}	[vector_sampled_tasks.py: 974]
09/13 20:21:09 INFO: Starting 2-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f36cedebcd0>, 'max_tasks': 40, 'env_class': <function MiniGridTutorialExperimentConfig.make_env at 0x7f36cede6b90>, 'sensors': [<plugins.minigrid_plugin.minigrid_sensors

09/13 20:21:20 INFO: worker 0: 624 tasks pending ([39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39])	[engine.py: 1382]
09/13 20:21:21 INFO: test worker 0 loading checkpoint from minigrid_jp/checkpoints/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21/exp_MiniGridTutorial_for_Japanese__stage_00__steps_000000081920.pt	[engine.py: 299]
09/13 20:21:21 INFO: test 71680 steps: ep_length 4.2 reward 0.962 success 1 tasks 640 checkpoint minigrid_jp/checkpoints/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21/exp_MiniGridTutorial_for_Japanese__stage_00__steps_000000071680.pt	[runner.py: 625]
09/13 20:21:21 INFO: worker 0: 624 tasks pending ([39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39])	[engine.py: 1382]
09/13 20:21:22 INFO: test worker 0 loading checkpoint from minigrid_jp/checkpoints/MiniGridTutorial_for_Japanese/2020-09-13_19-58-21/exp_MiniGridTutorial_for_Japanese__stage_00__steps_000000092160.pt	[engine.py: 299]
09/13 20:21:22 INFO: test 81920 steps: ep_l

In [8]:
# Tensorboardでテストの確認を表示します

# 結果が最初はうまくいっていないのは、そのときの訓練状態でのパラメータでテストを実施するため
# 学習が収束した一番最後を見ることに意味がある

!tensorboard --logdir minigrid_jp3


TensorFlow installation not found - running with reduced feature set.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.3.0 at http://localhost:6006/ (Press CTRL+C to quit)
^C


以上

## 補足

In [None]:
'''状態観測のSensorsの詳細

https://github.com/maximecb/gym-minigrid/blob/master/gym_minigrid/minigrid.py

タイルのタイプ
    OBJECT_TO_IDX = {
    'unseen'        : 0,
    'empty'         : 1,
    'wall'          : 2,
    'floor'         : 3,
    'door'          : 4,
    'key'           : 5,
    'ball'          : 6,
    'box'           : 7,
    'goal'          : 8,
    'lava'          : 9,
    'agent'         : 10,
    }
    
    
    色
    （'red'   : 0,
    'green' : 1,
    'blue'  : 2,
    'purple': 3,
    'yellow': 4,
    'grey'  : 5）
 
     状態
     STATE_TO_IDX = {
    'open'  : 0,
    'closed': 1,
    'locked': 2,
    }
 '''   