Unity-Technologies · vincentpierre · Dec 15, 2020 · Dec 4, 2020 · Dec 4, 2020 · Dec 4, 2020
diff --git a/com.unity.ml-agents/CHANGELOG.md b/com.unity.ml-agents/CHANGELOG.md
@@ -11,6 +11,7 @@ and this project adheres to
 ### Major Changes
 #### com.unity.ml-agents (C#)
 #### ml-agents / ml-agents-envs / gym-unity (Python)
+ - TensorFlow trainers have been deprecated, please use the Torch trainers instead. (#4707)
 
 ### Minor Changes
 #### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)

diff --git a/docs/ML-Agents-Overview.md b/docs/ML-Agents-Overview.md
@@ -372,7 +372,7 @@ your agent's behavior:
   below).
 - `rnd`: represents an intrinsic reward signal that encourages exploration
   in sparse-reward environments that is defined by the Curiosity module (see
-  below). (Not available for TensorFlow trainers)
+  below).
 
 ### Deep Reinforcement Learning
 
@@ -437,8 +437,6 @@ of the trained model is used as intrinsic reward. The more an Agent visits a sta
 more accurate the predictions and the lower the rewards which encourages the Agent to
 explore new states with higher prediction errors.
 
-__Note:__ RND is not available for TensorFlow trainers (only PyTorch trainers)
-
 ### Imitation Learning
 
 It is often more intuitive to simply demonstrate the behavior we want an agent

diff --git a/docs/Training-Configuration-File.md b/docs/Training-Configuration-File.md
@@ -32,7 +32,7 @@ choice of the trainer (which we review on subsequent sections).
 | `time_horizon`           | (default = `64`) How many steps of experience to collect per-agent before adding it to the experience buffer. When this limit is reached before the end of an episode, a value estimate is used to predict the overall expected reward from the agent's current state. As such, this parameter trades off between a less biased, but higher variance estimate (long time horizon) and more biased, but less varied estimate (short time horizon). In cases where there are frequent rewards within an episode, or episodes are prohibitively large, a smaller number can be more ideal. This number should be large enough to capture all the important behavior within a sequence of an agent's actions. <br><br> Typical range: `32` - `2048` |
 | `max_steps`              | (default = `500000`) Total number of steps (i.e., observation collected and action taken) that must be taken in the environment (or across all environments if using multiple in parallel) before ending the training process. If you have multiple agents with the same behavior name within your environment, all steps taken by those agents will contribute to the same `max_steps` count. <br><br>Typical range: `5e5` - `1e7`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
 | `keep_checkpoints`         | (default = `5`) The maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the checkpoint_interval option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. |
-| `checkpoint_interval`         | (default = `500000`) The number of experiences collected between each checkpoint by the trainer. A maximum of `keep_checkpoints` checkpoints are saved before old ones are deleted. Each checkpoint saves the `.onnx` (and `.nn` if using TensorFlow) files in `results/` folder.|
+| `checkpoint_interval`         | (default = `500000`) The number of experiences collected between each checkpoint by the trainer. A maximum of `keep_checkpoints` checkpoints are saved before old ones are deleted. Each checkpoint saves the `.onnx` files in `results/` folder.|
 | `init_path`              | (default = None) Initialize trainer from a previously saved model. Note that the prior run should have used the same trainer configurations as the current run, and have been saved with the same version of ML-Agents. <br><br>You should provide the full path to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`. This option is provided in case you want to initialize different behaviors from different runs; in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize all models from the same run.                                                                                                                                  |
 | `threaded`               | (default = `true`) By default, model updates can happen while the environment is being stepped. This violates the [on-policy](https://spinningup.openai.com/en/latest/user/algorithms.html#the-on-policy-algorithms) assumption of PPO slightly in exchange for a training speedup. To maintain the strict on-policyness of PPO, you can disable parallel updates by setting `threaded` to `false`. There is usually no reason to turn `threaded` off for SAC.                                                                                                                                                                                                                                                       |
 | `hyperparameters -> learning_rate`          | (default = `3e-4`) Initial learning rate for gradient descent. Corresponds to the strength of each gradient descent update step. This should typically be decreased if training is unstable, and the reward does not consistently increase. <br><br>Typical range: `1e-5` - `1e-3`                                                                                                                                                                                                                                                                                                                                                                                                                                                                |

diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md
@@ -317,9 +317,6 @@ behaviors:
       save_steps: 50000
       swap_steps: 2000
       team_change: 100000
-
-    # use TensorFlow backend
-    framework: tensorflow
 ```
 
 Here is an equivalent file if we use an SAC trainer instead. Notice that the

diff --git a/docs/Unity-Inference-Engine.md b/docs/Unity-Inference-Engine.md
@@ -19,19 +19,6 @@ Graphics Emulation is set to **OpenGL(ES) 3.0 or 2.0 emulation**. Also there
 might be non-fatal build time errors when target platform includes Graphics API
 that does not support **Unity Compute Shaders**.
 
-## Supported formats
-
-There are currently two supported model formats:
-
-- Barracuda (`.nn`) files use a proprietary format produced by the
-  [`tensorflow_to_barracuda.py`]() script.
-- ONNX (`.onnx`) files use an
-  [industry-standard open format](https://onnx.ai/about.html) produced by the
-  [tf2onnx package](https://github.com/onnx/tensorflow-onnx).
-
-Export to ONNX is used if using PyTorch (the default). To enable it
-while using TensorFlow, make sure `tf2onnx>=1.6.1` is installed in pip.
-
 ## Using the Unity Inference Engine
 
 When using a model, drag the model file into the **Model** field in the
@@ -56,7 +43,5 @@ If you wish to run inference on an externally trained model, you should use
 Barracuda directly, instead of trying to run it through ML-Agents.
 
 ## Model inference outside of Unity
-We do not provide support for inference anywhere outside of Unity. The
-`frozen_graph_def.pb` and `.onnx` files produced by training are open formats
-for TensorFlow and ONNX respectively; if you wish to convert these to another
+We do not provide support for inference anywhere outside of Unity. The `.onnx` files produced by training use the open format ONNX; if you wish to convert a `.onnx` file to another
 format or run inference with them, refer to their documentation.
diff --git a/ml-agents/mlagents/tf_utils/__init__.py b/ml-agents/mlagents/tf_utils/__init__.py
diff --git a/ml-agents/mlagents/tf_utils/tf.py b/ml-agents/mlagents/tf_utils/tf.py
diff --git a/ml-agents/mlagents/tf_utils/globals.py → ml-agents/mlagents/torch_utils/globals.py b/ml-agents/mlagents/tf_utils/globals.py → ml-agents/mlagents/torch_utils/globals.py
diff --git a/ml-agents/mlagents/trainers/cli_utils.py b/ml-agents/mlagents/trainers/cli_utils.py
@@ -4,6 +4,21 @@
 from mlagents.trainers.exception import TrainerConfigError
 from mlagents_envs.environment import UnityEnvironment
 import argparse
+from mlagents_envs import logging_util
+
+logger = logging_util.get_logger(__name__)
+
+
+class RaiseDeprecationWarning(argparse.Action):
+    """
+    Internal custom Action to raise warning when argument is called.
+    """
+
+    def __init__(self, nargs=0, **kwargs):
+        super().__init__(nargs=nargs, **kwargs)
+
+    def __call__(self, arg_parser, namespace, values, option_string=None):
+        logger.warning(f"The command line argument {option_string} was deprecated")
 
 
 class DetectDefault(argparse.Action):
@@ -171,14 +186,14 @@ def _create_parser() -> argparse.ArgumentParser:
     argparser.add_argument(
         "--torch",
         default=False,
-        action=DetectDefaultStoreTrue,
-        help="Use the PyTorch framework. Note that this option is not required anymore as PyTorch is the"
+        action=RaiseDeprecationWarning,
+        help="(Deprecated) Use the PyTorch framework. Note that this option is not required anymore as PyTorch is the"
         "default framework, and will be removed in the next release.",
     )
     argparser.add_argument(
         "--tensorflow",
         default=False,
-        action=DetectDefaultStoreTrue,
+        action=RaiseDeprecationWarning,
         help="(Deprecated) Use the TensorFlow framework instead of PyTorch. Install TensorFlow "
         "before using this option.",
     )

diff --git a/ml-agents/mlagents/trainers/learn.py b/ml-agents/mlagents/trainers/learn.py
@@ -10,7 +10,6 @@
 
 import mlagents.trainers
 import mlagents_envs
-from mlagents import tf_utils
 from mlagents.trainers.trainer_controller import TrainerController
 from mlagents.trainers.environment_parameter_manager import EnvironmentParameterManager
 from mlagents.trainers.trainer import TrainerFactory
@@ -21,7 +20,7 @@
     GaugeWriter,
     ConsoleWriter,
 )
-from mlagents.trainers.cli_utils import parser, DetectDefault
+from mlagents.trainers.cli_utils import parser
 from mlagents_envs.environment import UnityEnvironment
 from mlagents.trainers.settings import RunOptions
 
@@ -135,8 +134,6 @@ def run_training(run_seed: int, options: RunOptions) -> None:
             param_manager=env_parameter_manager,
             init_path=maybe_init_path,
             multi_gpu=False,
-            force_torch="torch" in DetectDefault.non_default_args,
-            force_tensorflow="tensorflow" in DetectDefault.non_default_args,
         )
         # Create controller and begin training.
         tc = TrainerController(
@@ -242,8 +239,6 @@ def run_cli(options: RunOptions) -> None:
         log_level = logging_util.DEBUG
     else:
         log_level = logging_util.INFO
-        # disable noisy warnings from tensorflow
-        tf_utils.set_warnings_enabled(False)
 
     logging_util.set_log_level(log_level)