Merge pull request #36 from allenai/allenact_0.5.0

2022 Leaderboards and Embodied CLIP Model
allenai · Mar 25, 2022 · 9b58a1f · 9b58a1f
2 parents 64d8b57 + 9b6ee01
commit 9b58a1f
Show file tree

Hide file tree

Showing 23 changed files with 163 additions and 88 deletions.
diff --git a/README.md b/README.md
@@ -229,8 +229,11 @@ be used for evaluation.
 
 We are tracking challenge participant entries using the [AI2 Leaderboard](https://leaderboard.allenai.org/). The team with the best submission made to either of the below leaderboards by May 31st (midnight, [anywhere on earth](https://time.is/Anywhere_on_Earth)) will be announced at the [CVPR'21 Embodied-AI Workshop](https://embodied-ai.org/) and invited to produce a video describing their approach.
 
-**Submission leaderboard links will be announced soon (late Feb 2022). Please check back here.** Our 2021
-leaderboard links can be found [here](https://leaderboard.allenai.org/ithor_rearrangement_1phase) and [here](https://leaderboard.allenai.org/ithor_rearrangement_2phase). Note
+In particular, our 2022 leaderboard links can be found at
+* [**2022 1-phase leaderboard**](https://leaderboard.allenai.org/ithor_rearrangement_1phase_2022) and 
+* [**2022 1-phase leaderboard**](https://leaderboard.allenai.org/ithor_rearrangement_2phase_2022).
+
+Our older (2021) leaderboards are also available indefinitely ([previous 2021 1-phase leaderboard](https://leaderboard.allenai.org/ithor_rearrangement_1phase), [previous 2021 2-phase leaderboard]](https://leaderboard.allenai.org/ithor_rearrangement_1phase)) Note
 that our 2021 challenge uses a different dataset and older version of AI2-THOR and so results will not be 
 directly comparable.
 
@@ -531,18 +534,23 @@ allenact -o rearrange_out -b . baseline_configs/two_phase/two_phase_rgb_resnet_p
 
 ### 💪 Pretrained Models
 
-We currently provide the following pretrained models (see [our paper](https://arxiv.org/abs/2103.16544) for details
-on these models):
-
-| Model | % Fixed Strict (Test, on 2021 dataset) | Pretrained Model |
-|------------|:--------------------------------------:|:----------:|
-| [1-Phase ResNet18+ANM IL](baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py) |                  8.9%                  | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetFrozenMapDagger_40proc__stage_00__steps_000040060240.pt) |
-| [1-Phase ResNet18 IL](baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py) |                  6.3%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetDagger_40proc__stage_00__steps_000050058550.pt) |
-| [1-Phase ResNet18 PPO](baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py) |                  5.3%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetPPO__stage_00__steps_000060068000.pt) |
-| [1-Phase Simple IL](baseline_configs/one_phase/one_phase_rgb_dagger.py) |                  4.8%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBDagger_40proc__stage_00__steps_000065070800.pt) |
-| [1-Phase Simple PPO](baseline_configs/one_phase/one_phase_rgb_ppo.py) |                  4.6%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBPPO__stage_00__steps_000010010730.pt) |
-| [2-Phase ResNet18+ANM IL+PPO](baseline_configs/two_phase_rgb_resnet_frozen_map_ppowalkthrough_ilunshuffle.py) |                 1.44%                  | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetFrozenMapPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000075000985.pt) |
-| [2-Phase ResNet18 IL+PPO](baseline_configs/two_phase/two_phase_rgb_resnet_ppowalkthrough_ilunshuffle.py) |                 0.66%                  | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000020028800.pt) |
+In the below table we provide a collection of pretrained models from:
+
+1. [Our CVPR'21 paper introducing this challenge](https://arxiv.org/abs/2103.16544), and
+2. [Our CVPR'22 paper which showed that using CLIP visual encodings can dramatically improve model performance acros embodied tasks](https://arxiv.org/abs/2111.09888).
+
+We have only evaluated a subset of these models on our 2022 dataset.
+
+| Model | % Fixed Strict (2022 dataset, test) | % Fixed Strict (2021 dataset, test) | Pretrained Model |
+|------------|:-----------------------------------:|:-----------------------------------:|:----------:|
+| [1-Phase Embodied CLIP ResNet50 IL](baseline_configs/one_phase/one_phase_rgb_clipresnet50_dagger.py) |              **19.1%**              |              **17.3%**              | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBClipResNet50Dagger_40proc__stage_00__steps_000065083050.pt) |
+| [1-Phase ResNet18+ANM IL](baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py) |                  -                  |                8.9%                 | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetFrozenMapDagger_40proc__stage_00__steps_000040060240.pt) |
+| [1-Phase ResNet18 IL](baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py) |                  -                  |                6.3%                 | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetDagger_40proc__stage_00__steps_000050058550.pt) |
+| [1-Phase ResNet18 PPO](baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py) |                  -                  |                5.3%                 | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBResNetPPO__stage_00__steps_000060068000.pt) |
+| [1-Phase Simple IL](baseline_configs/one_phase/one_phase_rgb_dagger.py) |                  -                  |                4.8%                 | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBDagger_40proc__stage_00__steps_000065070800.pt) |
+| [1-Phase Simple PPO](baseline_configs/one_phase/one_phase_rgb_ppo.py) |                  -                  |                4.6%                 | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/one-phase/exp_OnePhaseRGBPPO__stage_00__steps_000010010730.pt) |
+| [2-Phase ResNet18+ANM IL+PPO](baseline_configs/two_phase_rgb_resnet_frozen_map_ppowalkthrough_ilunshuffle.py) |              **0.53%**              |              **1.44%**              | [(link)](https://prior-model-weights.s3.us-east-2.amazonaws.com/embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetFrozenMapPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000075000985.pt) |
+| [2-Phase ResNet18 IL+PPO](baseline_configs/two_phase/two_phase_rgb_resnet_ppowalkthrough_ilunshuffle.py) |                  -                  |                0.66%                | [(link)](https://s3.console.aws.amazon.com/s3/object/prior-model-weights?prefix=embodied-ai/rearrangement/two-phase/exp_TwoPhaseRGBResNetPPOWalkthroughILUnshuffle_40proc-longtf__stage_00__steps_000020028800.pt) |
 
 These models can be downloaded at from the above links and should be placed into the `pretrained_model_ckpts` directory.
 You can then, for example, run inference for the _1-Phase ResNet18 IL_ model using AllenAct by running:

diff --git a/baseline_configs/one_phase/one_phase_rgb_base.py b/baseline_configs/one_phase/one_phase_rgb_base.py
@@ -4,7 +4,11 @@
 from allenact.base_abstractions.sensor import SensorSuite, Sensor
 
 try:
-    from allenact.embodiedai.sensors.vision_sensors import DepthSensor
+    from allenact.embodiedai.sensors.vision_sensors import (
+        DepthSensor,
+        IMAGENET_RGB_MEANS,
+        IMAGENET_RGB_STDS,
+    )
 except ImportError:
     raise ImportError("Please update to allenact>=0.4.0.")
 
@@ -17,20 +21,38 @@
 
 
 class OnePhaseRGBBaseExperimentConfig(RearrangeBaseExperimentConfig, ABC):
-    SENSORS = [
-        RGBRearrangeSensor(
-            height=RearrangeBaseExperimentConfig.SCREEN_SIZE,
-            width=RearrangeBaseExperimentConfig.SCREEN_SIZE,
-            use_resnet_normalization=True,
-            uuid=RearrangeBaseExperimentConfig.EGOCENTRIC_RGB_UUID,
-        ),
-        UnshuffledRGBRearrangeSensor(
-            height=RearrangeBaseExperimentConfig.SCREEN_SIZE,
-            width=RearrangeBaseExperimentConfig.SCREEN_SIZE,
-            use_resnet_normalization=True,
-            uuid=RearrangeBaseExperimentConfig.UNSHUFFLED_RGB_UUID,
-        ),
-    ]
+    @classmethod
+    def sensors(cls) -> Sequence[Sensor]:
+        cnn_type, pretraining_type = cls.CNN_PREPROCESSOR_TYPE_AND_PRETRAINING
+        if pretraining_type.strip().lower() == "clip":
+            from allenact_plugins.clip_plugin.clip_preprocessors import (
+                ClipResNetPreprocessor,
+            )
+
+            mean = ClipResNetPreprocessor.CLIP_RGB_MEANS
+            stdev = ClipResNetPreprocessor.CLIP_RGB_STDS
+        else:
+            mean = IMAGENET_RGB_MEANS
+            stdev = IMAGENET_RGB_STDS
+
+        return [
+            RGBRearrangeSensor(
+                height=RearrangeBaseExperimentConfig.SCREEN_SIZE,
+                width=RearrangeBaseExperimentConfig.SCREEN_SIZE,
+                use_resnet_normalization=True,
+                uuid=RearrangeBaseExperimentConfig.EGOCENTRIC_RGB_UUID,
+                mean=mean,
+                stdev=stdev,
+            ),
+            UnshuffledRGBRearrangeSensor(
+                height=RearrangeBaseExperimentConfig.SCREEN_SIZE,
+                width=RearrangeBaseExperimentConfig.SCREEN_SIZE,
+                use_resnet_normalization=True,
+                uuid=RearrangeBaseExperimentConfig.UNSHUFFLED_RGB_UUID,
+                mean=mean,
+                stdev=stdev,
+            ),
+        ]
 
     @classmethod
     def make_sampler_fn(
@@ -47,7 +69,7 @@ def make_sampler_fn(
         **kwargs,
     ) -> RearrangeTaskSampler:
         """Return a RearrangeTaskSampler."""
-        sensors = cls.SENSORS if sensors is None else sensors
+        sensors = cls.sensors() if sensors is None else sensors
         if "mp_ctx" in kwargs:
             del kwargs["mp_ctx"]
         assert not cls.RANDOMIZE_START_ROTATION_DURING_TRAINING

diff --git a/baseline_configs/one_phase/one_phase_rgb_clipresnet50_dagger.py b/baseline_configs/one_phase/one_phase_rgb_clipresnet50_dagger.py
@@ -0,0 +1,12 @@
+from baseline_configs.one_phase.one_phase_rgb_il_base import (
+    OnePhaseRGBILBaseExperimentConfig,
+)
+
+
+class OnePhaseRGBClipResNet50DaggerExperimentConfig(OnePhaseRGBILBaseExperimentConfig):
+    CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = ("RN50", "clip")
+    IL_PIPELINE_TYPE = "40proc"
+
+    @classmethod
+    def tag(cls) -> str:
+        return f"OnePhaseRGBClipResNet50Dagger_{cls.IL_PIPELINE_TYPE}"
diff --git a/baseline_configs/one_phase/one_phase_rgb_dagger.py b/baseline_configs/one_phase/one_phase_rgb_dagger.py
@@ -4,7 +4,7 @@
 
 
 class OnePhaseRGBDaggerExperimentConfig(OnePhaseRGBILBaseExperimentConfig):
-    USE_RESNET_CNN = False
+    CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = None
     IL_PIPELINE_TYPE = "40proc"
 
     @classmethod

diff --git a/baseline_configs/one_phase/one_phase_rgb_il_base.py b/baseline_configs/one_phase/one_phase_rgb_il_base.py
@@ -3,7 +3,7 @@
 import torch
 
 from allenact.algorithms.onpolicy_sync.losses.imitation import Imitation
-from allenact.base_abstractions.sensor import ExpertActionSensor
+from allenact.base_abstractions.sensor import ExpertActionSensor, Sensor
 from allenact.utils.experiment_utils import PipelineStage
 from allenact.utils.misc_utils import all_unique
 from baseline_configs.one_phase.one_phase_rgb_base import (
@@ -85,13 +85,15 @@ def il_training_params(label: str, training_steps: int):
 
 
 class OnePhaseRGBILBaseExperimentConfig(OnePhaseRGBBaseExperimentConfig):
-    SENSORS = [
-        *OnePhaseRGBBaseExperimentConfig.SENSORS,
-        ExpertActionSensor(len(RearrangeBaseExperimentConfig.actions())),
-    ]
-
     IL_PIPELINE_TYPE: Optional[str] = None
 
+    @classmethod
+    def sensors(cls) -> Sequence[Sensor]:
+        return [
+            *super(OnePhaseRGBILBaseExperimentConfig, cls).sensors(),
+            ExpertActionSensor(len(RearrangeBaseExperimentConfig.actions())),
+        ]
+
     @classmethod
     def _training_pipeline_info(cls, **kwargs) -> Dict[str, Any]:
         """Define how the model trains."""

diff --git a/baseline_configs/one_phase/one_phase_rgb_ppo.py b/baseline_configs/one_phase/one_phase_rgb_ppo.py
@@ -9,7 +9,7 @@
 
 
 class OnePhaseRGBPPOExperimentConfig(OnePhaseRGBBaseExperimentConfig):
-    USE_RESNET_CNN = False
+    CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = None
 
     @classmethod
     def tag(cls) -> str:

diff --git a/baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py b/baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py
@@ -4,7 +4,7 @@
 
 
 class OnePhaseRGBResNetDaggerExperimentConfig(OnePhaseRGBILBaseExperimentConfig):
-    USE_RESNET_CNN = True
+    CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = ("RN18", "imagenet")
     IL_PIPELINE_TYPE = "40proc"
 
     @classmethod

diff --git a/baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py b/baseline_configs/one_phase/one_phase_rgb_resnet_frozen_map_dagger.py
@@ -1,10 +1,11 @@
 import os
+from typing import Sequence
 
 import gym
 import torch
 from torch import nn
 
-from allenact.base_abstractions.sensor import SensorSuite
+from allenact.base_abstractions.sensor import SensorSuite, Sensor
 from allenact.embodiedai.mapping.mapping_models.active_neural_slam import (
     ActiveNeuralSLAM,
 )
@@ -27,7 +28,9 @@
 class OnePhaseRGBResNetFrozenMapDaggerExperimentConfig(
     OnePhaseRGBILBaseExperimentConfig
 ):
-    USE_RESNET_CNN = False  # Not necessary as we're handling things in the model
+    CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = (
+        None  # Not necessary as we're handling things in the model
+    )
     IL_PIPELINE_TYPE = "40proc"
 
     ORDERED_OBJECT_TYPES = list(sorted(PICKUPABLE_OBJECTS + OPENABLE_OBJECTS))
@@ -43,10 +46,11 @@ class OnePhaseRGBResNetFrozenMapDaggerExperimentConfig(
         resolution_in_cm=5,
     )
 
-    SENSORS = OnePhaseRGBILBaseExperimentConfig.SENSORS + [
-        RelativePositionChangeTHORSensor(),
-        MAP_RANGE_SENSOR,
-    ]
+    @classmethod
+    def sensors(cls) -> Sequence[Sensor]:
+        return list(
+            super(OnePhaseRGBResNetFrozenMapDaggerExperimentConfig, cls).sensors()
+        ) + [RelativePositionChangeTHORSensor(), cls.MAP_RANGE_SENSOR,]
 
     @classmethod
     def tag(cls) -> str:
@@ -63,7 +67,7 @@ def create_model(cls, **kwargs) -> nn.Module:
         )
 
         observation_space = (
-            SensorSuite(cls.SENSORS).observation_spaces
+            SensorSuite(cls.sensors()).observation_spaces
             if kwargs.get("sensor_preprocessor_graph") is None
             else kwargs["sensor_preprocessor_graph"].observation_spaces
         )

diff --git a/baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py b/baseline_configs/one_phase/one_phase_rgb_resnet_ppo.py
@@ -2,7 +2,7 @@
 
 
 class OnePhaseRGBResNetPPOExperimentConfig(OnePhaseRGBPPOExperimentConfig):
-    USE_RESNET_CNN = True
+    CNN_PREPROCESSOR_TYPE_AND_PRETRAINING = ("RN18", "imagenet")
 
     @classmethod
     def tag(cls) -> str:

diff --git a/baseline_configs/rearrange_base.py b/baseline_configs/rearrange_base.py
@@ -1,7 +1,7 @@
 import copy
 import platform
 from abc import abstractmethod
-from typing import Optional, List, Sequence, Dict, Any
+from typing import Optional, List, Sequence, Dict, Any, Tuple
 
 import ai2thor.platform
 import gym.spaces
@@ -64,7 +64,7 @@ class RearrangeBaseExperimentConfig(ExperimentConfig):
     # Training parameters
     TRAINING_STEPS = int(75e6)
     SAVE_INTERVAL = int(1e6)
-    USE_RESNET_CNN = False
+    CNN_PREPROCESSOR_TYPE_AND_PRETRAINING: Optional[Tuple[str, str]] = None
 
     # Sensor info
     SENSORS: Optional[Sequence[Sensor]] = None
@@ -93,6 +93,10 @@ class RearrangeBaseExperimentConfig(ExperimentConfig):
         )
     )
 
+    @classmethod
+    def sensors(cls) -> Sequence[Sensor]:
+        return cls.SENSORS
+
     @classmethod
     def actions(cls):
         other_move_actions = (
@@ -119,24 +123,50 @@ def actions(cls):
     @classmethod
     def resnet_preprocessor_graph(cls, mode: str) -> SensorPreprocessorGraph:
         def create_resnet_builder(in_uuid: str, out_uuid: str):
-            return ResNetPreprocessor(
-                input_height=cls.THOR_CONTROLLER_KWARGS["height"],
-                input_width=cls.THOR_CONTROLLER_KWARGS["width"],
-                output_width=7,
-                output_height=7,
-                output_dims=512,
-                pool=False,
-                torchvision_resnet_model=torchvision.models.resnet18,
-                input_uuids=[in_uuid],
-                output_uuid=out_uuid,
-            )
+            cnn_type, pretraining_type = cls.CNN_PREPROCESSOR_TYPE_AND_PRETRAINING
+            if pretraining_type == "imagenet":
+                assert cnn_type in [
+                    "RN18",
+                    "RN50",
+                ], "Only allow using RN18/RN50 with `imagenet` pretrained weights."
+                return ResNetPreprocessor(
+                    input_height=cls.THOR_CONTROLLER_KWARGS["height"],
+                    input_width=cls.THOR_CONTROLLER_KWARGS["width"],
+                    output_width=7,
+                    output_height=7,
+                    output_dims=512 if "18" in cnn_type else 2048,
+                    pool=False,
+                    torchvision_resnet_model=getattr(
+                        torchvision.models, f"resnet{cnn_type.replace('RN', '')}"
+                    ),
+                    input_uuids=[in_uuid],
+                    output_uuid=out_uuid,
+                )
+            elif pretraining_type == "clip":
+                from allenact_plugins.clip_plugin.clip_preprocessors import (
+                    ClipResNetPreprocessor,
+                )
+                import clip
+
+                # Let's make sure we download the clip model now
+                # so we don't download it on every spawned process
+                clip.load(cnn_type, "cpu")
+
+                return ClipResNetPreprocessor(
+                    rgb_input_uuid=in_uuid,
+                    clip_model_type=cnn_type,
+                    pool=False,
+                    output_uuid=out_uuid,
+                )
+            else:
+                raise NotImplementedError
 
         img_uuids = [cls.EGOCENTRIC_RGB_UUID, cls.UNSHUFFLED_RGB_UUID]
         return SensorPreprocessorGraph(
             source_observation_spaces=SensorSuite(
                 [
                     sensor
-                    for sensor in cls.SENSORS
+                    for sensor in cls.sensors()
                     if (mode == "train" or not isinstance(sensor, ExpertActionSensor))
                 ]
             ).observation_spaces,
@@ -194,7 +224,7 @@ def machine_params(cls, mode="train", **kwargs) -> MachineParams:
             devices=devices,
             sampler_devices=sampler_devices,
             sensor_preprocessor_graph=cls.resnet_preprocessor_graph(mode=mode)
-            if cls.USE_RESNET_CNN
+            if cls.CNN_PREPROCESSOR_TYPE_AND_PRETRAINING is not None
             else None,
         )
 
@@ -255,7 +285,6 @@ def stagewise_task_sampler_args(
         thor_platform: Optional[ai2thor.platform.BaseLinuxPlatform] = None
         if platform.system() == "Linux":
             try:
-                raise IOError
                 x_displays = get_open_x_displays(throw_error_if_empty=True)
 
                 if devices is not None and len(
@@ -289,7 +318,7 @@ def stagewise_task_sampler_args(
             },
         }
 
-        sensors = kwargs.get("sensors", copy.deepcopy(cls.SENSORS))
+        sensors = kwargs.get("sensors", copy.deepcopy(cls.sensors()))
         kwargs["sensors"] = sensors
 
         sem_sensor = next(
@@ -452,10 +481,10 @@ def training_pipeline(cls, **kwargs) -> TrainingPipeline:
 
     @classmethod
     def create_model(cls, **kwargs) -> nn.Module:
-        if not cls.USE_RESNET_CNN:
+        if cls.CNN_PREPROCESSOR_TYPE_AND_PRETRAINING is None:
             return RearrangeActorCriticSimpleConvRNN(
                 action_space=gym.spaces.Discrete(len(cls.actions())),
-                observation_space=SensorSuite(cls.SENSORS).observation_spaces,
+                observation_space=SensorSuite(cls.sensors()).observation_spaces,
                 rgb_uuid=cls.EGOCENTRIC_RGB_UUID,
                 unshuffled_rgb_uuid=cls.UNSHUFFLED_RGB_UUID,
             )